-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something wrong with img2img samplers #889
Comments
I just confirmed that img2img with the plms and k* samplers is working as expected on a CUDA system. This may be an M1-specific issue, although I'd be surprised if this was the case. Could you post the original images and the exact prompts you used? |
I'm getting the same weird washed out and blurry images at lower img2img strengths. Also on M1. @netsvetaev 's original images are the last two images they posted btw. And here's k_lms, k_euler, and ddim each at 32 steps and -f 0.25, 0.5, 0.75.
|
I notice that the sampler only runs for one step (regardless of input), this is probably the reason for the blurry results |
This seems to be an MPS-related bug. I don’t have a Mac to test on, but I’ll investigate any suspicious “if MPS” statements. Maybe @mh-dm or @Any-Winter-4079 could have a look? |
I don't have an MPS device to test on. |
@lstein I'm not convinced it is MPS-related. I have the exact same behavior after forcing my torch device to be in def choose_torch_device() -> str:
'''Convenience routine for guessing which GPU device to run model on'''
# if torch.cuda.is_available():
# return 'cuda'
# if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# return 'mps'
return 'cpu' in def fix_func(orig):
# if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# def new_func(*args, **kw):
# device = kw.get("device", "mps")
# kw["device"]="cpu"
# return orig(*args, **kw).to(device)
# return new_func
return orig Each step now takes about 4x as long, and there are no messages about mismatched torch devices, so I think I've forced |
I have been writing some documentation for img2img, which involved visualizing the latent space, and i noticed that when the step count does not scale with |
Ok, there is something wrong on a deeper level. I have reverted the force-cpu changes to my local branch and run some tests with strength 0.75 and 0.85, each running from steps 15 to ~40. Sampler DDIM. My REPL command:
Strength 0.75 images (named the zip f7.5 but its f0.75): Strength 0.85 images (named the zip f8.5 but its f0.85): Images are prefixed by the steps count entered in the REPL (not the actual steps). Have a scroll through those and you will notice that there are some interesting patterns. There seem to be 2 types of result which I'll call the "common" type - very close to the init image - and the "occasional" type - which diverges more. You get 4 or 5 "common" types, then a couple "occasional" types, then 4 or 5 "common" types, then a couple "occasional" types... It's far more obvious on the f0.85 images which are "common" and which are "occasional". Also, it seems like the more steps, the closer the "common" images are to the init image. I would have expected the images to converge but not on something very much like the original... That ain't right! |
@psychedelicious you might want to try my branch |
this is actually a subtle thing that i haven't seen any of the SD guis communicate well. because of the way the SD algorithm works (actually diffusion algorithms more generally), doing steps=50 is not equivalent to doing steps=49 then feeding the 49th image in for "one more" step. if the step count is different then the amount of denoising that happens at each step is different, all the way back to the first step. so your difference between the best way to get a handle on this, i found, was to try low step counts and look at the latents at each step. if there's interest i can clean up my intermediate writer and submit it as a feature..? |
@damian0815 Thanks, I don't understand what is really going on internally - thanks for explaining that. Do I understand you correct, that the patterns I am seeing are to be expected? I checked out your branch, but it seems to be missing I discovered by trial and error that, in the step_callback, calling |
yep that's the problem, see how we've already diverged enough to see with our eyes alone after about the 3rd or 4th step. top is 20 steps ( @psychedelicious i've fixed the issue on my branch - if you fetch again you can add & i'd love to hear your comments on my updated img2img docs! :) |
@psychedelicious @lstein However, with higher strength: A lot closer / identical |
@damian0815 awesome guide! It would be great if you'd decide to merge it. |
Thanks @Any-Winter-4079 , I closed @hipsterusername's issue without noticing it was on a different architecture. Think something got misplaced in the shuffle during @lstein 's recent changes... |
This also produces the same result after #925 (btw, not sure how this affected ddim?) So it might be fixed now? |
@netsvetaev @damian0815 I leave it to you to test / check if it works on your end, or if you see some issue |
Oh, also about @psychedelicious finding, which may be an additional issue.
I can still reproduce this. |
Using my CUDA system I've just compared the output of the img2img ddim sampler between the current code and a version from September 15, long before I made any changes to the samplers. The results are identical. I don't see any blurriness or color degradation:
I'm not expecting the images to be the same between M1 and CUDA. However, I find it alarming that I'm not getting anything that looks like what the prompt is asking for. This is something that I've noticed in passing with smaller init images on a couple of occasions. To check this out, I rescaled the init image to 512x512, applied the same prompt and other parameters, and voila! Upping the strength and CFG to 0.8 and 15.0 respectively, gives me this: Then tweaking the prompt a bit gives something more photorealistic:
So in summary, we've got multiple bugs:
|
Just to confirm, the washed out issue is affecting the DDIM sampler? I did refactor a large amount of common code shared by ddim and plms, so it’s possible I broke something in a way that’s only manifested on M1. Has anyone tried a bisect to track down the offending commit? |
Now that's very odd. The e601163 commit was from before I added support for the k* samplers. So presumably you generated a DDIM image here. I'd proposed the test in order to see if there was a regression on ddim. I'm away from my system now, but as soon as I'm back I'll see if I can reproduce the color distortion on CUDA. |
Oh, well DDIM seems fine now in terms of washed-out effects (see #889 (comment) at Update: yes, |
I'm just about at my depth of understanding here (over my depth to be honest), but my understanding is that each of the sigmas represents the amount of noise to inject and denoise at each step. We can get away with removing the 0.0 at the end; this is just a placeholder that prevents an index error at the very last step (there is a call to My current hypothesis is that there is something amiss with the step that occurs just before the noising/denoising loop starts. In this step, the latent image is noised with the value of
(Here If the wrong sigma index is being applied at this step, this would explain the behavior we're seeing. I briefly experimented with varying this step, but haven't explored it exhaustively. What bugs me is that this works fine in the plms and ddim samplers, so why should it change? |
@lstein yes, img2img works fine with k-diffusion in my fork. Birch-san/stable-diffusion@7b42d40 low strengths work (they don't go blurry like yours), but so many high-sigma denoising steps are employed that very little of the original latents survive. strength 0.3 is the lowest I've ever gotten coherent results from: |
@Birch-san THANK YOU! When I reviewed your code I found the place I was going wrong. As I suspected, it was the step of adding noise to the init_latent. I don't know what part of the world you live in, but if you're ever in Toronto swing by and I'll buy you a beer or three. |
I just committed the fix to |
Ah, I may have spoken too soon. There's too much noise being added at higher strengths and the image is replaced completely after about I've reverted and continue to explore the problem. |
I'm in the UK. always up for a beer 🍻 |
I'll be in Cambridge this February. Are you in London? |
Another example, at All of this on top of having
from 2c27e75 |
And here's another example, I'm close to calling it a success for me. The only thing is, there is probably a formula, better than leaving out the first 9 sigmas (which might be a bit too creative in terms of resulting outputs). I'll try to compare this too with @Birch-san behavior for the same original image. @lstein let me know if you experience the same on cuda and washed out effect is gone with this (and if this is a bug, hope the experiments help you figure it out) |
@lstein can discuss on Discord -- are you on the LAION server? |
@Birch-san - Invokes discord >> https://discord.gg/ZmtBAhwWhy |
I got it working. The key was to remove the stochastic_encode() call. I will be committing in a minute. |
I am just removing debugging code and will commit in a sec. @Any-Winter-4079 , please compare with your solution and let me know which one is working better. UPDATE: pushed. The revised code is now in |
I've got discord open at this link, but I'm such a noob I don't know how to rendezvous with you folk. What are your discord names? |
have made contact |
@lstein 7541c7c looks good in general. The washed out effect is completely removed. Playing with an old commit (2c27e75) I noticed that playing with sigmas you can increase/decrease the creativity (the more sigmas from the start you remove, the less creativity is has) But my problem was I was removing a fixed amount of sigmas, regardless of steps and strength. So, removing a dynamic amount of sigmas (like in 7541c7c) to fix this problem, we can also obtain nice results:
I'm not saying one option should be preferred over the other (how much variation we should have at Also, something I've seen is increasing for example 20% the number of steps in 7541c7c won't produce the same results as
|
To get to the image obtained with |
I guess it's a matter of preference. For example, with 7541c7c we can't obtain this image from So we're basically shifting the strength to start at more or less creativity, which affects how far into creativity we can go when we reach |
There's also a |
I would expose sigma_min, sigma_max and rho (mostly relevant for users trying to get good results at low number of sampler steps, by tactically choosing the range and curve of their schedule). rho explained here: deliberate exclusion of sigmas from the schedule (e.g. increased sigma_min) demonstrated here: the sigmas on which the model trained ( and yes, exposing churn is a good idea. I saw in k-diffusion's clip-guided diffusion example that a typical value for churn is 50. one problem with churn at the moment is that k-diffusion doesn't yet provide any way to discretize the sigma_hats that arise after applying churn, so the model is asked to denoise sigmas on which it never trained. I think it'd probably still look relatively good though. |
personally I find the idea of |
Tweaking |
Describe your environment
Describe the bug
I’m trying new samplers with img2img (klms & keuler), but always get strange result.
I use a classic command from https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/IMG2IMG.md, just with «-A klms» for example.
strength 0.5
strength 0.3
Originals:
The text was updated successfully, but these errors were encountered: