-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flex.1 Alpha LoRA/Finetuning #3056
Comments
My initial attempt with a LR or 1e-5 overtrained rapidly. A second attempt with a LR of 2e-6 seems to be more stable so far. |
Before anyone else tries this, it seems to break the guidance module that Ostris created. It seems some more work will be needed to explicitly exclude that from training. |
I'm just using standard flux settings with block swapping and other stuff at 1.8-5 lr at 24k steps already and samples are decent, also just a dataset of all 1024 x 1024. No multires res stuff . My 3090 does 7.89 it's..I will try your method later when I get to 50k steps |
@CodeAlexx . Are you saying 7.89 seconds per iterations ? |
That sounds about right given it's all 1024 resolution images. I'm using a 512/768/1024 blend on a 4090 for the 2.5s/it times. |
Yes, near 8 seconds per step. How are your samples with your hack ? |
The samples look decent, but if you try to use it in Comfy, you'll find that guidance no longer works. The training process is likely training that part of the network as well, not realizing that it should be ignoring it. |
kohya-ss/sd-scripts#1891 (comment) If anyone wants to play with this, I've created a minimal working example here: https://github.com/stepfunction83/sd-scripts/tree/sd3 With this commit just brute forcing in the relevant code snippets from ai-toolkit: I was able to quickly train a 1000 step finetune of flex and was able to test it in Comfy to validate that the training does take and the guidance module is not destroyed in the process. Additionally, the sampling was corrected as well and now works as expected. You can replace the default sd-scripts installation that comes with Kohya with this one and replace the Flux model file with the Flex version. Make sure to do this when the server is already running. Kohya_ss gets the latest version of the official sd-scripts repo when it first starts up. (You can probably tell I don't have much experience with this...) |
THANK YOU!! i am new to git and how to use it, can i just download the three changed files and replace them. |
Make sure to pass the --bypass_flux_guidance parameter with the latest commit, and yes, you can just replace the respective files with the ones from the forked version. |
thank you sooooo much! |
Yep, let me know how your experience goes. I'll submit a PR once I get it in a slightly better state. |
i will, i won't use for lora but finetuning with no block swaps |
That's currently the only way I've tested it, so ensuring it works for LoRA
too is still needed. I'll probably try that tomorrow after this finetune
run finishes.
…On Wed, Jan 22, 2025, 8:33 PM CodeAlexx ***@***.***> wrote:
i will, i won't use for lora but finetuning with no block swaps
—
Reply to this email directly, view it on GitHub
<#3056 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH2WKO6BG7DIGL6WEJ22IQ32MBBGFAVCNFSM6AAAAABVRLCIAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBYGY2TINBVGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Created a PR to add the functionality to sd-scripts: kohya-ss/sd-scripts#1893 |
From my experimentation with finetuning so far, I've found that lower learning rates are needed than with Flux Dev. 5e-6, Cosine, 5000 steps destroyed hands and general composition, but 1e-6, Cosine, 10000 steps seems to be more stable so far, but it may be worth going even lower than that. When sampling, I would recommend a guidance of 5. The guidance module is not the same as base Flux and the sweet spot seems to be roughly 4.5-5.5 For reference, I'm using 300 medium quality real images and 200 synthetic images to train a concept model. It's also very quick to train vs finetuning Flux Dev. I'm getting 2.25s/it with a 50/50 512/768 resolution mixed training set on a 4090 using only 19GB of VRAM. With a purely 512 dataset and a couple blocks offloaded, I could definitely see this being trained on a 16GB card. |
today i am off work and running it. mine is a 1.8-6 lr and at 8000 steps still holding up.hands and compostion still good ..i was using 2 in cfg and took it to 5 and it works very good. my dataset is 6k real high quality pics, 512k and 1024, both square. |
I think an LR even lower than 1e-6 may be better. Even with that, it trains quickly and reaches approximately the same place as a 5e-6 LR in 5000 steps, with fewer artifacts and quality loss. In my next run, I'll go down further to 1e-7 to see how that goes. |
i tried to do a new session, seems as your instructions to del sdscripts dir and clone your version line is gone. can you repost so the command line will work again.. |
ome/alex/kohya_ss/venv/bin/python3.10: can't open file '/home/alex/kohya_ss/sd-scripts/flux_train.py': [Errno 2] No such file or directory it is missing flux_train.py |
Okay, I've updated the functionality and I think it should be working now. Apparently sd-scripts already has a convenient toggle for turning flux guidance on and off in the Flux model parameters. Feel free to go ahead and try again. So the steps to run are:
I'm doing a training run which is comparable to a previous one I've done, so I'll see how the results compare once it gets a little further in. |
thank you, in about 2 hours my current training will be over and then i will use it. thank you for your hard work! What setting did you find the best? i use cosine, no warmup.. |
I am using Cosine with 1% of steps warmup. I've tried a bunch of other configurations over time, but Cosine always seems to get the job done!
…On Thu, Jan 23, 2025, 4:49 PM CodeAlexx ***@***.***> wrote:
thank you, in about 2 hours my current training will be over and then i
will use it. thank you for your hard work! What setting did you find the
best? i use cosine, no warmup..
—
Reply to this email directly, view it on GitHub
<#3056 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH2WKO6EUZUTTXC5YWVJNID2MFPVHAVCNFSM6AAAAABVRLCIAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJRGA4DMNRRG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
500 steps in with the same settings as a previous run and I'm already seeing different results. Seems like it is in fact making a difference this time. |
Okay, now we're working! Results on the full finetune (4e-6 LR, 5000 Steps, Cosine) in comfy are dramatically better! Both scene composition and hand quality are back to normal. I'm going to attempt a longer run with a lower initial LR of 1e-6, 10000 Steps, Cosine, with 1024 images introduced into mix and see how that goes. |
it am at 3000 and they are very good. it is planned for 20k steps.. i am getting 5.79 it's( a 3090) a mixture of 512, 1024 and some 1300 res pics. 1-6e lr, same -- cosine, i am using adafactor |
tested at 3500k steps -- loving the output! Thank you. |
Hello there, is Flex 100% supported by kohya_ss ( GUI ) ? Is it possible to have an example config ? Thanks for your work |
See my comment above: #3056 (comment) |
Hi there ! Im getting File "/workspace/kohya_ss/sd-scripts/library/train_util.py", line 2080, in init when running the fork, is the meta_lat.json mandatory ? |
Sounds like you're on the Finetune tab instead of the Dreambooth one. |
Ah yes correct, is they're a difference between theses 2 in term of result or it's just about how is setup the dataset |
I believe it's just needing to specify the Metadata file for the finetune tab. Beyond that, there's not a substantial difference as far as I'm aware. |
I think this would be a good place to discuss finetuning the new Flex.1 Alpha model created by Ostris: https://huggingface.co/ostris/Flex.1-alpha
Initial tests I've tried on training LoRAs using ai-toolkit are extremely promising, with LoRAs being able to be trained much more smoothly than with Flux.1 Dev.
Currently, I believe we can train this in Kohya in a similar way to how the un-distilled versions of Flux have been trained, by treating them as Flux Schnell to bypass the guidance mechanism. Until this is built in though, you can force it by temporarily changing line 62 in library/flux_utils.py, line 62 from:
is_schnell = not ("guidance_in.in_layer.bias" in keys or "time_text_embed.guidance_embedder.linear_1.bias" in keys)
to:
is_schnell = True
This lets the finetuning/LoRA process begin a training run. I'm currently doing a test run of this and will post about how it goes. Obviously, it hasn't been out particularly long, but so far I have been able to start a finetuning run and the loss seems to be decreasing.
Due to the model's smaller size, it can fit entirely on a 24GB card with the fused backward pass and no block swap, resulting in faster training iterations (average of 2.54s/it on my 4090 when using an even mix of 512/768/1024 resolution images). I used exactly the same config that I use for a normal Flux run, only swapping out the model file.
Samples are garbled, as they were with the undistilled versions, so I expect there will need to be some fixes there, but they're not beyond recognition.
The text was updated successfully, but these errors were encountered: