Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.
I like those, and there's an electroshock lora that's just awesome out there. That said, Tarot and others like it are "illustrator" type styles with extra juice. I have not successfully trained a LoRa for any painting style, Flux does not seem to know about painting.
I'm curious to give this a go. I've been training a lot of LoRAs for FLUX dev recently (purely for fun). I'm sure there must be a way to get this working.
With fal, you can train a concept in around 2 minutes and only pay $2. Incredibly cheap. (You could also use it for training a style if you wanted to. I just found I seem to get slightly better results using Replicate's trainer for a style.)
$2 for 2 minutes? Can't you get less than $2 for 1 hour using GPU machines from providers like runpod or AirGPU? I found it a bit expensive to use replicate and fal after 10 minutes of prompting.
I have not used runpod or airgpu, and not affiliated.
Yes, renting raw compute via Runpod and friends will generally be much cheaper than renting a higher level service that uses that compute e.g. fal.ai or Replicate. For example, an A6000 on fal.ai is a little over $2/hr (they only show you the price in seconds, perhaps to make it more difficult to compare with ordinary GPU providers); on Runpod an A6000 is less than half that, $0.76/hr in their managed "Secure Cloud." If you're willing to take some risk of boxes disappearing, and don't need much security, Runpod's "Community Cloud" is even cheaper at $0.49/hr.
Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr.
And if you use the "serverless" services, the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod's "Secure Cloud" (and 8xH100s are extreme overkill for finetuning image generators: even 1xH100 would be sufficient, meaning it's actually 160x markup).
Happy to help! It's a lot of fun. And it becomes even more fun when you combine LoRAs. So you could train one on your face, and then use that with a style LoRA, giving you a stylised version of your face.
If you do end up training one on yourself with fal, it should ultimately take you here (https://fal.ai/models/fal-ai/flux-lora) with your new LoRA pre-filled.
Then:
1. Click 'Add item' to add another LoRA and enter the URL of a style LoRA's SafeTensor file (with Civitai, go to any style you like and copy the URL from the download button) (you can also find LoRAs on Hugging Face)
2. Paste that SafeTensor URL as the second LoRA, remembering to include the trigger word for yourself (you set this when you start the training) and the trigger word for the style (it tells you on the Civitai page)
3. Play with the strength for the LoRAs if you want it to look more like you or more like the style, etc.
I want to make a LoRA of Peokudin-Gorskii photographs from the Library of Congress collection and they have thousands of photos, so I’m curious whether that’s effective for autogenerating the caption for images.
It's funny you should ask. I recently released a plugin (https://community-en.eagle.cool/plugin/4B56113D-EB3E-4020-A8...) for Eagle (an asset library management app) that allows you to write rules to caption/tag images and videos using various AI models.
I have a preset in there that I sometimes use to generate captions using GPT-4o.
If you use Replicate, they'll also generate captions for you automatically if you wish. (I think they use LLaVA behind the scenes.) I typically use this just because it's easier, and seems to work well enough.
Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?
Yep absolutely nothing like degas well I take that back. I think it picked up some favorite colors/tones. But it has no concept of the materials or poses or composition. So plasticky! Compare to https://images.app.goo.gl/JiDRYNNKUP9tczkQ7
I suspect it really needs more training examples. The problem I found when I looked for images to use was that 60% were of dancers, and from past experience, it will end up trying to fit a dancer into every image you create. But of course, there are only a (small) finite number of Degas images that you can train with.
A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.
No, they absolutely did not just do that in this case, although that was the SD plan. If you prompt for "painterly, oil painting, thick brush strokes, impressionistic oil painting style" to flux, you will get ... anime-ish renderings.
That's not what I'm talking about, SDXL you can literally prompt a famous artists entire style and mix and match them, even conceptual artists and sculptors.
I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.
Yes, exactly. I think they purposely did not train on stuff like this. I'd bet that you could do a LoRa of Prokudin-Gorskii though; there's a lot of photographic content in flux's training set.
And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.
I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.
Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.
Nonsense. FLUX.1-dev is famous for its consistency, prompt adherence, etc.; and it fits on a consumer GPU. That has to come with compromises. You can call any optimization weakness: that's the nature of compromise.
I wonder if part of the reason it's good is because it's been trained for a more specific task. I can only imagine that if your concept of a "house" includes range from a stately home to "a pineapple under the sea" you're going to end up with a very generalised concept. It's then takes specific prompting to remove the influences you're not interested in.
I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.
That's what a refiner is for in auto1111. Taking an image the last 10% and touching it up with an alternative model.
I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.
We may differ on our take about the usefulness of diffusion models, but I'd say it's a loss in that many of the visuals humans will see in the next ten years are going to be generated by these models, and I for one wish they weren't just trained on weeb shit.
For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.