Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.

For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.



I've had a similar experience, incredible at generating a very specific style of image, but not great at generating anything with a specific style.

I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:

- Flux Tarot v1 [0]

- Flux Amateur Photography [1]

Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.

[0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149


I like those, and there's an electroshock lora that's just awesome out there. That said, Tarot and others like it are "illustrator" type styles with extra juice. I have not successfully trained a LoRa for any painting style, Flux does not seem to know about painting.


I'm curious to give this a go. I've been training a lot of LoRAs for FLUX dev recently (purely for fun). I'm sure there must be a way to get this working.

Here are a few I've recently trained: https://civitai.com/user/dvyio


This looks really good! What is your process to get this kind of high quality LoRAs?


Thank you!

A reasonable amount of training images (50 or so), and then I train for 2,000-ish steps for a new style.

Many of them work well with Flux, particularly if they're illustration-based. Some don't seem to work at all, so I didn't upload those!


How long does this take, and on what equipment? It's amazing to me that you can do this from just 50 images, I would have thought tens of thousands.


It's very impressive. I aim for around 50 images if I'm training a style, but only 10 to 20 if training a concept (like an object or a face).

I have a MacBook Air so I train using the various API providers.

For training a style, I use Replicate: https://replicate.com/ostris/flux-dev-lora-trainer/train

For training a concept/person, I use fal: https://fal.ai/models/fal-ai/flux-lora-fast-training

With fal, you can train a concept in around 2 minutes and only pay $2. Incredibly cheap. (You could also use it for training a style if you wanted to. I just found I seem to get slightly better results using Replicate's trainer for a style.)


$2 for 2 minutes? Can't you get less than $2 for 1 hour using GPU machines from providers like runpod or AirGPU? I found it a bit expensive to use replicate and fal after 10 minutes of prompting.

I have not used runpod or airgpu, and not affiliated.


Yes, renting raw compute via Runpod and friends will generally be much cheaper than renting a higher level service that uses that compute e.g. fal.ai or Replicate. For example, an A6000 on fal.ai is a little over $2/hr (they only show you the price in seconds, perhaps to make it more difficult to compare with ordinary GPU providers); on Runpod an A6000 is less than half that, $0.76/hr in their managed "Secure Cloud." If you're willing to take some risk of boxes disappearing, and don't need much security, Runpod's "Community Cloud" is even cheaper at $0.49/hr.

Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr.

And if you use the "serverless" services, the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod's "Secure Cloud" (and 8xH100s are extreme overkill for finetuning image generators: even 1xH100 would be sufficient, meaning it's actually 160x markup).


Wow, fantastic, thanks! I thought it would be much, much more expensive than this. Thanks for the info!


Happy to help! It's a lot of fun. And it becomes even more fun when you combine LoRAs. So you could train one on your face, and then use that with a style LoRA, giving you a stylised version of your face.

If you do end up training one on yourself with fal, it should ultimately take you here (https://fal.ai/models/fal-ai/flux-lora) with your new LoRA pre-filled.

Then:

1. Click 'Add item' to add another LoRA and enter the URL of a style LoRA's SafeTensor file (with Civitai, go to any style you like and copy the URL from the download button) (you can also find LoRAs on Hugging Face)

2. Paste that SafeTensor URL as the second LoRA, remembering to include the trigger word for yourself (you set this when you start the training) and the trigger word for the style (it tells you on the Civitai page)

3. Play with the strength for the LoRAs if you want it to look more like you or more like the style, etc.

-----

If you want a style LoRA to try, this one of SNL title cards I trained actually makes some great photographic images. https://civitai.com/models/773477/flux-lora-snl-portrait (the download link would be https://civitai.com/api/download/models/865105?type=Model&fo...)

-----

There's a lot of trial and error to get the best combinations. Have fun!


Have you tried img2text when training a style?

I want to make a LoRA of Peokudin-Gorskii photographs from the Library of Congress collection and they have thousands of photos, so I’m curious whether that’s effective for autogenerating the caption for images.


It's funny you should ask. I recently released a plugin (https://community-en.eagle.cool/plugin/4B56113D-EB3E-4020-A8...) for Eagle (an asset library management app) that allows you to write rules to caption/tag images and videos using various AI models.

I have a preset in there that I sometimes use to generate captions using GPT-4o.

If you use Replicate, they'll also generate captions for you automatically if you wish. (I think they use LLaVA behind the scenes.) I typically use this just because it's easier, and seems to work well enough.


That’s awesome! Thank you for the replicate link too. I didn’t know they also did LoRA training. They’ve been kind of hitting it lit the park lately.


Thanks for all this! I had created a SD LoRA of my face back in the day, time for another one!


Awesome! :)


@davidbarker -- please do, that sounds awesome! I did not have good results.


It's trickier than I thought it would be.

Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?

https://imgur.com/a/sqsQLPg


Yep absolutely nothing like degas well I take that back. I think it picked up some favorite colors/tones. But it has no concept of the materials or poses or composition. So plasticky! Compare to https://images.app.goo.gl/JiDRYNNKUP9tczkQ7


I suspect it really needs more training examples. The problem I found when I looked for images to use was that 60% were of dancers, and from past experience, it will end up trying to fit a dancer into every image you create. But of course, there are only a (small) finite number of Degas images that you can train with.

A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.


Out of curiosity, what do you think of these? https://imgur.com/a/8p7RlMe


Significantly better - they feel like watercolor more than degas but if that’s flux I am impressed!


Unfortunately, not Flux. They're from Midjourney, using a few Degas as a style reference.

Whatever they're doing at Midjourney is still impressive. No training needed and a better result.


>However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance

It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.


No, they absolutely did not just do that in this case, although that was the SD plan. If you prompt for "painterly, oil painting, thick brush strokes, impressionistic oil painting style" to flux, you will get ... anime-ish renderings.


That's not what I'm talking about, SDXL you can literally prompt a famous artists entire style and mix and match them, even conceptual artists and sculptors.


I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.


Yes, exactly. I think they purposely did not train on stuff like this. I'd bet that you could do a LoRa of Prokudin-Gorskii though; there's a lot of photographic content in flux's training set.


i'm fairly confident they did a broad FirstName LastName removal.


And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.


I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.

Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.


DALL-E3 doesn't struggle with this. It's just opinions. There's no technical limitation. They chose to weaken the model in this regard.


Nonsense. FLUX.1-dev is famous for its consistency, prompt adherence, etc.; and it fits on a consumer GPU. That has to come with compromises. You can call any optimization weakness: that's the nature of compromise.


I wonder if part of the reason it's good is because it's been trained for a more specific task. I can only imagine that if your concept of a "house" includes range from a stately home to "a pineapple under the sea" you're going to end up with a very generalised concept. It's then takes specific prompting to remove the influences you're not interested in.

I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.


There are people who undistilled Flux so it can be further finetuned, so adding art training won't be an issue.

https://huggingface.co/nyanko7/flux-dev-de-distill


I wonder if you can use Flux to generate the base image then img2img on SD1.4 to impart artistic style?


That's what a refiner is for in auto1111. Taking an image the last 10% and touching it up with an alternative model.

I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.


That is an interesting idea, I somehow hadn't thought of using flux in a chain like that, thanks!


Yes, that is my current workflow as well.


>but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity

But that real art still exists, and can still be found, so what exactly is the loss here?


We may differ on our take about the usefulness of diffusion models, but I'd say it's a loss in that many of the visuals humans will see in the next ten years are going to be generated by these models, and I for one wish they weren't just trained on weeb shit.


Just think that before 1995 (and in reality, decades later than that) most of the world would never have access to 99% of the worlds art.

And between 1995 and 2022 the amount of Art produced surpasses the cumulative output of all other periods of human history.


... And between 2022 and 2025 the amount of imagery generated will drive the percent of Art created to roughly 0% of all imagery.


You'll still be able to ask a person to create art in a specific style if you'd like.


Unfortunately we will have a generation of young artists who learn to draw based on models like flux, unless they get classical training..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: