Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something wrong with img2img samplers #889

Closed
netsvetaev opened this issue Oct 2, 2022 · 73 comments
Closed

Something wrong with img2img samplers #889

netsvetaev opened this issue Oct 2, 2022 · 73 comments
Labels
enhancement New feature or request

Comments

@netsvetaev
Copy link
Contributor

netsvetaev commented Oct 2, 2022

Describe your environment

  • mbp m1pro
  • Branch: development

Describe the bug
I’m trying new samplers with img2img (klms & keuler), but always get strange result.
I use a classic command from https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/IMG2IMG.md, just with «-A klms» for example.

strength 0.5
000024 983454061

strength 0.3
000025 313777670
000026 2372658446

Originals:
1
2

@lstein
Copy link
Collaborator

lstein commented Oct 2, 2022

I just confirmed that img2img with the plms and k* samplers is working as expected on a CUDA system. This may be an M1-specific issue, although I'd be surprised if this was the case. Could you post the original images and the exact prompts you used?

@lstein lstein added the bug Something isn't working label Oct 3, 2022
@psychedelicious
Copy link
Collaborator

psychedelicious commented Oct 3, 2022

I'm getting the same weird washed out and blurry images at lower img2img strengths. Also on M1.

@netsvetaev 's original images are the last two images they posted btw.

Here's my init image:
initimage

And here's k_lms, k_euler, and ddim each at 32 steps and -f 0.25, 0.5, 0.75.

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_euler -f 0.25
000196 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_euler -f 0.5
000197 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_euler -f 0.75
000195 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_lms -f 0.25
000200 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_lms -f 0.5
000199 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_lms -f 0.75
000198 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A ddim -f 0.25
000201 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A ddim -f 0.5
000202 12345

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A ddim -f 0.75
000203 12345

@ranhalprin
Copy link

ranhalprin commented Oct 3, 2022

I notice that the sampler only runs for one step (regardless of input), this is probably the reason for the blurry results

@psychedelicious
Copy link
Collaborator

psychedelicious commented Oct 3, 2022

I notice that the sampler only runs for one step (regardless of input), this is probably the reason for the blurry results

I'm not sure that's accurate - I get very different results when I specify 1 step vs any other number:

1 step "photograph of a tree on a hill with a river" -s 1 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_euler -f 0.25
000212 12345

32 steps "photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A k_euler -f 0.25
000211 12345

It's possible you are using a low number of steps - with img2img, the number of actual steps is a function of strength. So you specify 32 steps at strength 0.25, you get floor((32 - 1) * 0.25) --> 7 steps. The result is that for certain combinatoins of low step count and strength, it only does one step.

@lstein
Copy link
Collaborator

lstein commented Oct 4, 2022

This seems to be an MPS-related bug. I don’t have a Mac to test on, but I’ll investigate any suspicious “if MPS” statements. Maybe @mh-dm or @Any-Winter-4079 could have a look?

@mh-dm
Copy link
Contributor

mh-dm commented Oct 4, 2022

I don't have an MPS device to test on.

@psychedelicious
Copy link
Collaborator

psychedelicious commented Oct 4, 2022

@lstein I'm not convinced it is MPS-related. I have the exact same behavior after forcing my torch device to be cpu:

in ldm/dream/devices.py:

def choose_torch_device() -> str:
    '''Convenience routine for guessing which GPU device to run model on'''
    # if torch.cuda.is_available():
    #     return 'cuda'
    # if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    #     return 'mps'
    return 'cpu'

in ldm/generate.py:

def fix_func(orig):
    # if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    #     def new_func(*args, **kw):
    #         device = kw.get("device", "mps")
    #         kw["device"]="cpu"
    #         return orig(*args, **kw).to(device)
    #     return new_func
    return orig

Each step now takes about 4x as long, and there are no messages about mismatched torch devices, so I think I've forced cpu successfully...

@damian0815
Copy link
Contributor

I have been writing some documentation for img2img, which involved visualizing the latent space, and i noticed that when the step count does not scale with strength. if you request 100 steps with a strength of 0.3, SD will only actually do 30 steps of inference. with a strength of 0.6 it will 60. not sure if this is intended behaviour, a bug, or an oversight in the UI design, but since it seems to be surprising i'd suggest adjusting the step count to compensate for strength when doing img2img.

@psychedelicious
Copy link
Collaborator

Ok, there is something wrong on a deeper level. I have reverted the force-cpu changes to my local branch and run some tests with strength 0.75 and 0.85, each running from steps 15 to ~40. Sampler DDIM.

My REPL command:

"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I /Users/spencer/Documents/Code/initimage.png -A ddim -f {strength} -s {steps}

Using this same init image:
initimage

Strength 0.75 images (named the zip f7.5 but its f0.75):
f7.5.zip

Strength 0.85 images (named the zip f8.5 but its f0.85):
f8.5.zip

Images are prefixed by the steps count entered in the REPL (not the actual steps).

Have a scroll through those and you will notice that there are some interesting patterns. There seem to be 2 types of result which I'll call the "common" type - very close to the init image - and the "occasional" type - which diverges more. You get 4 or 5 "common" types, then a couple "occasional" types, then 4 or 5 "common" types, then a couple "occasional" types...

It's far more obvious on the f0.85 images which are "common" and which are "occasional".

Also, it seems like the more steps, the closer the "common" images are to the init image. I would have expected the images to converge but not on something very much like the original...

Here's f0.85 s200:
000220 12345

And here's f0.85 s201:
000221 12345

That ain't right!

@damian0815
Copy link
Contributor

@psychedelicious you might want to try my branch document-img2img which spits out pngs for each intermediate step. check the outputs/img_samples/intermediates folder, which may help understand what's going on (this is what i'm documenting!)

@damian0815
Copy link
Contributor

damian0815 commented Oct 4, 2022

this is actually a subtle thing that i haven't seen any of the SD guis communicate well. because of the way the SD algorithm works (actually diffusion algorithms more generally), doing steps=50 is not equivalent to doing steps=49 then feeding the 49th image in for "one more" step. if the step count is different then the amount of denoising that happens at each step is different, all the way back to the first step. so your difference between s200 and s201 could be emerging actually from a change that happened on step 2 or 3 - two pixels get interpreted slightly differently and then 200 iterations later your spindly tree has become a bushy tree somewhere else.

the best way to get a handle on this, i found, was to try low step counts and look at the latents at each step. if there's interest i can clean up my intermediate writer and submit it as a feature..?

@psychedelicious
Copy link
Collaborator

@damian0815 Thanks, I don't understand what is really going on internally - thanks for explaining that. Do I understand you correct, that the patterns I am seeing are to be expected?

I checked out your branch, but it seems to be missing from scripts.modules.preview_decoder import ApproximateDecoder.

I discovered by trial and error that, in the step_callback, calling sample_to_image(sample) doesn't give you the intermediate image. Would love to have access to your intermediate writer. Thanks.

@damian0815
Copy link
Contributor

damian0815 commented Oct 4, 2022

yep that's the problem, see how we've already diverged enough to see with our eyes alone after about the 3rd or 4th step. top is 20 steps (-s 30 -f 0.7), bottom is 19 (-s 29 -f 0.7).
000046 steps gravity
000045 steps gravity

@psychedelicious i've fixed the issue on my branch - if you fetch again you can add --write_intermediates to the end of your dream> prompt and it will spit out all the latent steps to an intermediates folder under outputs/img_samples. please let me know how you get on with it!

& i'd love to hear your comments on my updated img2img docs! :)

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 4, 2022

@psychedelicious @lstein
I can reproduce this
Input image
002109 514644559
"a painting of * in the style of van gogh" -s 50 -S 514644559 -W 512 -H 512 -C 7.5 -I outputs/img-samples/002109.514644559.png -A ddim -f 0.05
Before:
Screenshot 2022-10-04 at 19 47 45
Now:
Screenshot 2022-10-04 at 19 49 32

However, with higher strength:
"a painting of * :1.8 in the style of van gogh" -s 50 -S 514644559 -W 512 -H 512 -C 7.5 -I outputs/img-samples/002109.514644559.png -A ddim -f 0.75
Before:
Screenshot 2022-10-04 at 19 57 01

Now:
Screenshot 2022-10-04 at 19 56 16

A lot closer / identical

@Any-Winter-4079
Copy link
Contributor

@damian0815 awesome guide! It would be great if you'd decide to merge it.

@Any-Winter-4079
Copy link
Contributor

@lstein a similar problem has been reported on cuda it seems #898
It may not be M1-specific

@psychedelicious
Copy link
Collaborator

Thanks @Any-Winter-4079 , I closed @hipsterusername's issue without noticing it was on a different architecture.

Think something got misplaced in the shuffle during @lstein 's recent changes...

@Any-Winter-4079
Copy link
Contributor

Original:
Screenshot 2022-10-05 at 00 29 46

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.25
Now:
Screenshot 2022-10-05 at 00 29 22
Before:
Screenshot 2022-10-05 at 00 30 32

The color is a bit washed-out as you say, but I think it's gotten much better.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 4, 2022

This also produces the same result after #925 (btw, not sure how this affected ddim?)
"a painting of * in the style of van gogh" -s 50 -S 514644559 -W 512 -H 512 -C 7.5 -I outputs/img-samples/002109.514644559.png -A ddim -f 0.05
Now:
Screenshot 2022-10-05 at 00 53 39

Before:
Screenshot 2022-10-05 at 00 52 41

So it might be fixed now?

@Any-Winter-4079
Copy link
Contributor

@netsvetaev @damian0815 I leave it to you to test / check if it works on your end, or if you see some issue

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 4, 2022

Oh, also about @psychedelicious finding, which may be an additional issue.

"photograph of a tree on a hill with a river" -s 200 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A ddim -f 0.85
Screenshot 2022-10-05 at 00 58 48

"photograph of a tree on a hill with a river" -s 201 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A ddim -f 0.85
Screenshot 2022-10-05 at 01 00 56

I can still reproduce this.

@lstein
Copy link
Collaborator

lstein commented Oct 5, 2022

Using my CUDA system I've just compared the output of the img2img ddim sampler between the current code and a version from September 15, long before I made any changes to the samplers. The results are identical. I don't see any blurriness or color degradation:

"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I./test-pictures/193946000-c42a96d8-5a74-4f8a-b4c3-5213e6cadcce.png -Addim -f 0.85
000025 12345

I'm not expecting the images to be the same between M1 and CUDA. However, I find it alarming that I'm not getting anything that looks like what the prompt is asking for. This is something that I've noticed in passing with smaller init images on a couple of occasions. To check this out, I rescaled the init image to 512x512, applied the same prompt and other parameters, and voila!

000031 2369689915

Upping the strength and CFG to 0.8 and 15.0 respectively, gives me this:
000034 3522133100

Then tweaking the prompt a bit gives something more photorealistic:

"tree on a hill with a river, nature photograph, national geographic" -I ./test-pictures/tree-and-river.png -A ddim -f 0.8 -C15
000035 1695676082

So in summary, we've got multiple bugs:

  1. On M1 systems, the images are getting washed out. Would you please check out e601163 and run the generations again to see if this is a regression that has happened recently?
  2. On all systems, img2img is not working on images smaller than 512x512.

@lstein
Copy link
Collaborator

lstein commented Oct 5, 2022

This also produces the same result after #925 (btw, not sure how this affected ddim?) "a painting of * in the style of van gogh" -s 50 -S 514644559 -W 512 -H 512 -C 7.5 -I outputs/img-samples/002109.514644559.png -A ddim -f 0.05 Now: Screenshot 2022-10-05 at 00 53 39

Before: Screenshot 2022-10-05 at 00 52 41

So it might be fixed now?

Just to confirm, the washed out issue is affecting the DDIM sampler? I did refactor a large amount of common code shared by ddim and plms, so it’s possible I broke something in a way that’s only manifested on M1. Has anyone tried a bisect to track down the offending commit?

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 5, 2022

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A ddim -f0.25
DDIM
Screenshot 2022-10-05 at 21 17 46

"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.25
K samplers
Screenshot 2022-10-05 at 21 17 22

DDIM is (now) fine.

@Any-Winter-4079
Copy link
Contributor

@lstein e601163
"photograph of a tree on a hill with a river" -s 32 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.25
K samplers
Screenshot 2022-10-05 at 21 39 41

@lstein
Copy link
Collaborator

lstein commented Oct 5, 2022

Now that's very odd. The e601163 commit was from before I added support for the k* samplers. So presumably you generated a DDIM image here. I'd proposed the test in order to see if there was a regression on ddim.

I'm away from my system now, but as soon as I'm back I'll see if I can reproduce the color distortion on CUDA.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 5, 2022

Oh, well DDIM seems fine now in terms of washed-out effects (see #889 (comment) at -f0.25). No need to compare with old commits. For the K sampler, it may very well have defaulted to DDIM when I used e601163 commit and I may have missed it. Let me check again.

Update: yes, sampler 'KSampler' is not yet supported. Using DDIM sampler

@lstein
Copy link
Collaborator

lstein commented Oct 5, 2022

It seems to be an M1 problem. On CUDA, here is what I get with k_lms, k_euler_a, k_euler, and k_heun respectively. I don't have any earlier k* img2img images to compare to, but they look pretty bright to me.

000038 1285602007

000039 4135135148

000041 2044935010
000040 703542737

For comparison, here is the plms image:

000042 49222529

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

I'm just about at my depth of understanding here (over my depth to be honest), but my understanding is that each of the sigmas represents the amount of noise to inject and denoise at each step. We can get away with removing the 0.0 at the end; this is just a placeholder that prevents an index error at the very last step (there is a call to sigmas[i+1] in each of the samplers). However, truncating sigmas more deeply is not a good option because it will leave us with a noisy image.

My current hypothesis is that there is something amiss with the step that occurs just before the noising/denoising loop starts. In this step, the latent image is noised with the value of sigmas at the first step:

if x_T is not None:
    x = x_T * sigmas[0]

(Here x_T is the latent image, and x is the noised latent image that will get passed to the denoising loop. sigmas[0] is the truncated sigmas that starts at the strength-specified intermediate step.)

If the wrong sigma index is being applied at this step, this would explain the behavior we're seeing. I briefly experimented with varying this step, but haven't explored it exhaustively. What bugs me is that this works fine in the plms and ddim samplers, so why should it change?

@Birch-san
Copy link

Birch-san commented Oct 6, 2022

@lstein yes, img2img works fine with k-diffusion in my fork.
https://twitter.com/Birchlabs/status/1566557708089712641

Birch-san/stable-diffusion@7b42d40

low strengths work (they don't go blurry like yours), but so many high-sigma denoising steps are employed that very little of the original latents survive.

strength 0.3 is the lowest I've ever gotten coherent results from:
grid-0216 s3947343483_crystal maiden__str0 3_sca10 0heun10_kns_dcrt_nz

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

@Birch-san THANK YOU! When I reviewed your code I found the place I was going wrong. As I suspected, it was the step of adding noise to the init_latent.

I don't know what part of the world you live in, but if you're ever in Toronto swing by and I'll buy you a beer or three.

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

I just committed the fix to development. This uses the model's sigmas. I will see what results I get with Karras now.

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

Ah, I may have spoken too soon. There's too much noise being added at higher strengths and the image is replaced completely after about -f0.5. Anyway, it's on the right track.

I've reverted and continue to explore the problem.

@Birch-san
Copy link

I'm in the UK. always up for a beer 🍻

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

I'm in the UK. always up for a beer beers

I'll be in Cambridge this February. Are you in London?

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 6, 2022

On top of

x = torch.randn([batch_size, *shape], device=self.device) * sigmas[0]
if x_T is not None:
       x = x_T + x

I tried using the full sigmas (doing as many steps as -s)
"photograph of a tree on a hill with a river" -s 50 -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f 0.25
but then the image
Screenshot 2022-10-06 at 22 53 16
differs a bit too much from the original.
Screenshot 2022-10-06 at 22 54 33
Note we are at -f0.25

Then, reading

sigmas is a full steps in length, but t_enc might
be less. We start in the middle of the sigma array
and work our way to the end after t_enc steps.

in p_sample made me think about sigmas in sample.
What if we only took the last -f * -s sigmas?
"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.25 -s40
Screenshot 2022-10-06 at 23 01 30
"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.8 -s40
Screenshot 2022-10-06 at 22 59 18

Not saying this is the answer, but it's the first time I'm getting somewhat good results (it's the first time that I recall on Mac that the output is not noisy or washed out).

Edit: actually, double-checking, I was leaving out the same sigmas in the last 2 images (removing 9 first sigmas), not keeping -s * -f

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 6, 2022

Another example, at -f0.5
"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.5 -s40
With sigmas = self.sigmas[-S-1:]
Screenshot 2022-10-06 at 23 18 54

With sigmas = self.sigmas[9:]
Screenshot 2022-10-06 at 23 21 38

All of this on top of having

x = torch.randn([batch_size, *shape], device=self.device) * sigmas[0]
if x_T is not None:
       x = x_T + x

from 2c27e75

Without having that code
Screenshot 2022-10-06 at 23 23 53
back to washed-out

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 6, 2022

And here's another example, -f0.35
Screenshot 2022-10-06 at 23 30 51

I'm close to calling it a success for me. The only thing is, there is probably a formula, better than leaving out the first 9 sigmas (which might be a bit too creative in terms of resulting outputs).

I'll try to compare this too with @Birch-san behavior for the same original image.
193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png.zip

@lstein let me know if you experience the same on cuda and washed out effect is gone with this (and if this is a bug, hope the experiments help you figure it out)

@Birch-san
Copy link

@lstein can discuss on Discord -- are you on the LAION server?

@hipsterusername
Copy link
Member

@Birch-san - Invokes discord >> https://discord.gg/ZmtBAhwWhy

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

I got it working. The key was to remove the stochastic_encode() call. I will be committing in a minute.

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

"photograph of a tree on a hill with a river" -s 40 -S 12345 -W 512 -H 512 -C 7.5 -I ./test-pictures/river-and-mountain.png -A k_euler
-f0.3
000232 12345

-f0.75
000230 12345

-f0.9
000231 12345

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

I am just removing debugging code and will commit in a sec. @Any-Winter-4079 , please compare with your solution and let me know which one is working better.

UPDATE: pushed. The revised code is now in development and release-candidate-2

@lstein
Copy link
Collaborator

lstein commented Oct 6, 2022

@Birch-san - Invokes discord >> https://discord.gg/ZmtBAhwWhy

I've got discord open at this link, but I'm such a noob I don't know how to rendezvous with you folk. What are your discord names?

@Birch-san
Copy link

have made contact

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 7, 2022

@lstein 7541c7c looks good in general. The washed out effect is completely removed.

Playing with an old commit (2c27e75) I noticed that playing with sigmas you can increase/decrease the creativity (the more sigmas from the start you remove, the less creativity is has)
image

But my problem was I was removing a fixed amount of sigmas, regardless of steps and strength. So, removing a dynamic amount of sigmas (like in 7541c7c) to fix this problem, we can also obtain nice results:

sigmas = self.karras_sigmas[-S-1:] from 7541c7c vs. sigmas = self.karras_sigmas[-int(1.1*S)-1:] (basically removing a few less sigmas) vs. sigmas = self.karras_sigmas[-int(1.2*S)-1:]
image

I'm not saying one option should be preferred over the other (how much variation we should have at -f0.6, -f0.75... I'd say is a bit of a personal choice/preference).

Also, something I've seen is increasing for example 20% the number of steps in 7541c7c won't produce the same results as
sigmas = self.karras_sigmas[-int(1.2*S)-1:] despite both doing 28 true steps, because the sigmas are not the same.

sigmas = self.karras_sigmas[-S-1:] from 7541c7c -> 29 sigmas

tensor([1.9426, 1.7265, 1.5313, 1.3554, 1.1971, 1.0549, 0.9274, 0.8134, 0.7116,
        0.6209, 0.5403, 0.4689, 0.4057, 0.3499, 0.3009, 0.2578, 0.2202, 0.1874,
        0.1588, 0.1341, 0.1127, 0.0944, 0.0786, 0.0652, 0.0538, 0.0441, 0.0360,
        0.0292, 0.0000], device='mps:0')

sigmas = self.karras_sigmas[-int(1.2*S)-1:] -> 29 sigmas

tensor([3.6092, 3.1686, 2.7749, 2.4239, 2.1117, 1.8346, 1.5892, 1.3726, 1.1817,
        1.0141, 0.8673, 0.7391, 0.6275, 0.5307, 0.4469, 0.3748, 0.3129, 0.2599,
        0.2148, 0.1767, 0.1444, 0.1174, 0.0948, 0.0761, 0.0606, 0.0479, 0.0375,
        0.0292, 0.0000], device='mps:0')

image

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 7, 2022

To get to the image obtained with sigmas = self.karras_sigmas[-int(1.2*S)-1:]) using 7541c7c you need to increase strength by 20% (not steps).
"photograph of a tree on a hill with a river" -S 12345 -W 512 -H 512 -C 7.5 -I 193515615-921b5f80-a2f3-4351-9459-b8e447ea765d.png -A k_euler -f0.72 -s40
Screenshot 2022-10-07 at 14 09 42

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 7, 2022

I guess it's a matter of preference. For example, with 7541c7c we can't obtain this image from sigmas = self.karras_sigmas[-int(1.2*S)-1:] at -f0.9
Screenshot 2022-10-07 at 14 14 49
Because we'd need 0.9 strength * 1.2 = 1.08
But for example we can obtain this image from sigmas = self.karras_sigmas[-int(1.1*S)-1:] at -f0.9
Screenshot 2022-10-07 at 14 23 38

by setting strength to 0.99
Screenshot 2022-10-07 at 14 25 17

So we're basically shifting the strength to start at more or less creativity, which affects how far into creativity we can go when we reach -f1

@lstein
Copy link
Collaborator

lstein commented Oct 7, 2022

There's also a churn variable in the sampling algorithms which increases stochasticity at each step. It's been set to zero, but I think you get more "creativity" for positive values. The major question is what options do we expose to the user?

@Any-Winter-4079 Any-Winter-4079 added enhancement New feature or request and removed bug Something isn't working labels Oct 7, 2022
@Birch-san
Copy link

I would expose sigma_min, sigma_max and rho (mostly relevant for users trying to get good results at low number of sampler steps, by tactically choosing the range and curve of their schedule).

rho explained here:
https://twitter.com/Birchlabs/status/1576705558177935361

deliberate exclusion of sigmas from the schedule (e.g. increased sigma_min) demonstrated here:
crowsonkb/k-diffusion#23

the sigmas on which the model trained (model.sigmas) are known. not sure what UI element would be appopriate for restricting your choice to these:
https://gist.github.com/Birch-san/6cd1574e51871a5e2b88d59f0f3d4fd3

and yes, exposing churn is a good idea. I saw in k-diffusion's clip-guided diffusion example that a typical value for churn is 50.

one problem with churn at the moment is that k-diffusion doesn't yet provide any way to discretize the sigma_hats that arise after applying churn, so the model is asked to denoise sigmas on which it never trained. I think it'd probably still look relatively good though.

@Birch-san
Copy link

personally I find the idea of strength a little hard to predict; I'd prefer something like a sigma cutoff. so you say "make a 20-step noise schedule, keep only the sigmas higher than 4.0300). you'd pick sigmas from here:
https://gist.github.com/Birch-san/6cd1574e51871a5e2b88d59f0f3d4fd3

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 8, 2022

Tweaking sigma_min was interesting and definitely seemed to make a difference. https://github.com/invoke-ai/InvokeAI/discussions/914#discussioncomment-3800884
Which reminds me, I have it pending to test sigma_min for more prompts. Maybe I'll do another document similar to https://github.com/invoke-ai/InvokeAI/blob/development/docs/help/SAMPLER_CONVERGENCE.md to share the results.

@lstein lstein closed this as completed in 2c27e75 Oct 10, 2022
austinbrown34 pushed a commit to cognidesign/InvokeAI that referenced this issue Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants