Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

img2img will soon work with all samplers #790

Closed
lstein opened this issue Sep 24, 2022 · 15 comments
Closed

img2img will soon work with all samplers #790

lstein opened this issue Sep 24, 2022 · 15 comments

Comments

@lstein
Copy link
Collaborator

lstein commented Sep 24, 2022

Just a note to update everyone who has been waiting on this feature that I've added img2img support for all the samplers to my local repository, This includes all the k* samplers. I will putting up a PR sometime Sunday after I have thoroughly tested the changes.

Unfortunately I had to make minor code changes to the k-diffusion library in order for this to work, so we will be back to using my fork of the k-diffusion code. However, I will also make a PR against @Birch-san 's k-diffusion, which is what we have been using for the upstream.

After confirming the samplers are all working, I'll add the awaited enhancements to inpainting and outpainting, and hopefully deal with the recent new bug reports and pull requests.

@Birch-san
Copy link

nice! what changes did you need to make? I've been using img2img with k-diffusion samplers just fine, without any changes (except for general changes related to Mac support).

@Birch-san
Copy link

Birch-san commented Sep 24, 2022

adding img2img support to text2img, including k-diffusion support, just looked like this (copied how Deforum Diffusion did it):
Birch-san/stable-diffusion@7b42d40

@lstein
Copy link
Collaborator Author

lstein commented Sep 25, 2022

I basically did the same thing, except that I have refactored the samplers so that they inherit from a common code base (reducing the code length by about 50%), and got rid of decode(), so that both img2img and txt2img use sample() uniformly.

I also changed the k-diffusion sample_* functions so that I had access to the sample update function (the inner loop), rather than rely on their built in iteration. This is needed in order to support inpainting, where the mask is applied to the sample at every iteration. I futzed around with the callback, but it didn't seem to provide a way to alter the sample within the loop.

@Birch-san
Copy link

Birch-san commented Sep 25, 2022

it's worth mentioning this use-case to @crowsonkb; Katherine is considering what methods need to be exposed by the model base classes, and if there's a hook you need in sampling: that might be a relevant input into how the API should be designed.

the tensor decorate() callback parameter I proposed here might be a generalized pattern useful for hooking into and fiddling with particular steps of the sampling algorithms.

@lstein
Copy link
Collaborator Author

lstein commented Sep 25, 2022

I've been considering how to use the callbacks so I don't have to modiy Katherine's code, Maybe I'm not understanding it properly. The issue is that the inpainting code I'm using needs to insert itself into the middle of the sampling loop in order to update the image with the mask. So a typical piece of sampling code looks like:

def k_sampler(img, callback, args):
      for i in range(steps):
           ...lots of math...
            callback({'image':img,'args':args})
           ...lots more math

The sample is in img and it is updated each time through the loop. The loop executes step times, and the mask code wants to get in there to modify img and pass it back to the next run through the loop. However, I don't think that the callback can modify the variables in the function.

@lstein
Copy link
Collaborator Author

lstein commented Sep 25, 2022

I've also been having a hard time puzzling out how your k-sampler/img2img integration works. You're making a call to a sampler object's decode() method to run the sampling. I'd like to see how this code talks to the k-sampler functions, but when I look for sampler modules in your distribution's directories, all I can find are ddm, ddim, and plm.

Meanwhile there's something wrong with my own implementation. The k_* samplers require a large number of steps or high f to produce satisfactory results. I've been searching, but I've not found what I'm doing wrong.

@Birch-san
Copy link

Birch-san commented Sep 25, 2022

The k-diffusion sampling code is just this:
Birch-san/stable-diffusion@6a74a6a

@Birch-san
Copy link

Birch-san commented Sep 25, 2022

Heun can get good results in few steps (e.g. 7):
https://twitter.com/Birchlabs/status/1564792349221330944

The key is to use the noise schedule that Karras et al proposed in the same paper (k-diffusion's get_sigmas_karras() function), discretize the sigmas, and set the sigma_min slightly above the minimum supported by the model:
crowsonkb/k-diffusion#23

Here's an exploration of the impact of which sigmas you sample from:
https://twitter.com/Birchlabs/status/1565114066548527104?s=20&t=axLeSaZV28s5m104_vMdxQ

@lstein
Copy link
Collaborator Author

lstein commented Sep 27, 2022

This is hugely helpful. Just to be clear, the CompVisDenoiser() object in the k-diffusion external.py module, comes with a get_sigmas() method. To use the Karras sigmas, I should ignore what get_sigmas() gives me and use the discretized sigmas from the get_sigmas_karras() instead?

Given that I'm motivated to do this for img2img, will making this change have a bad effect on txt2img?

@tildebyte
Copy link
Contributor

I'd definitely like to be able to test it out with txt2image (been following Birch-san talking about the Karras noise schedule with great interest)

@Birch-san
Copy link

Birch-san commented Sep 28, 2022

@lstein yes, a key finding of (Karras et al 2022) was that we can decouple the noise schedule from the model. that's why we stop using a method of the model wrapper, and use a schedule that is constructed moreorless without knowledge of the model at all.

neither approach will inherently give you discretized sigmas. but if you construct your CompVisDenoiser with quantize=true -- and are sure not to change the sampler's default of churn=0 -- then you'll get discretized sigmas, yes.

if you want to play around with increasing churn (i.e. to inject noise, to make it more "creative" / varied -- similar to the effect you observe with Euler ancestral), then you would need to wait for crowsonkb/k-diffusion#23 to be sure the sigmas are still discretized.

there would be no bad effect for txt2img, only positive effects. this basically just implements the state-of-the-art sampling recommendations from that Karras paper.

@tildebyte
Copy link
Contributor

if you want to play around with increasing churn (i.e. to inject noise, to make it more "creative" / varied

Yes, please 🤣

@lstein
Copy link
Collaborator Author

lstein commented Sep 28, 2022 via email

@lstein
Copy link
Collaborator Author

lstein commented Oct 1, 2022

Well, the Karras noise schedule seems to work very well. The k-samplers are doing great for img2img. @Birch-san thank you for the help! I found your code particularly helpful.

One thing however, that I found puzzling. In your repository's call to K.sampling.get_sigmas_karras(), you include a keyword argument of concat_zero. However, I don't see this keyword in the method call signature in either your or crowson's k-diffusion repositories. Does this append an extra zero to the karras noise array, and if so, why is that necessary?

@Birch-san
Copy link

concat_zero=True is equivalent to the default k-diffusion behaviour; I exposed that to provide a way to not end the sigma ramp with a zero.

See the first post in this thread for what concat_zero did and why it appeared to help:
crowsonkb/k-diffusion#23

I thought "if discretization means we're only allowed to sample from sigmas the model was trained on: we shouldn't include 0 in our schedule, since the model never trained on that".

So I ramped down to sigma_min and stopped there instead of continuing to 0. It gave way better results (for small number of sample steps), but mostly because it has the effect of making the ramp longer, which means sampling from more small sigmas.
Katherine pointed out that it was mathematically wrong to exclude zero (means you don't fully denoise the image), and the significant part of what I did was equivalent to picking a higher sigma_min (instead of going down to the smallest sigma the model supports).
I also checked the paper more closely, and zero is a special case (never actually sampled from) so it doesn't matter that it's not a sigma the model trained on.

So yeah, I removed concat_zero=False and instead provided a function to compute the same sigma_min that it gave (effectively the penultimate sigma from a noise ramp 1 longer than your current one, going down to the model's smallest sigma).

@lstein lstein closed this as completed Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants