Skip to content

Instantly share code, notes, and snippets.

@mchaker
Last active July 28, 2024 23:57
Show Gist options
  • Save mchaker/e0cf180db6849fa03696293b3fc8f485 to your computer and use it in GitHub Desktop.
Save mchaker/e0cf180db6849fa03696293b3fc8f485 to your computer and use it in GitHub Desktop.
stable-diffusion-links: useful optimizations
@mchaker
Copy link
Author

mchaker commented Sep 6, 2022

Maybe you want to add basujindal/stable-diffusion#122 I made a pull request to improve upon neonsecret's

Adding that to the list, thanks! :)

@ilikenwf
Copy link

ilikenwf commented Sep 7, 2022

Maybe you want to add basujindal/stable-diffusion#122 I made a pull request to improve upon neonsecret's

Do you think you could come up with a version of it that works with Doggettx's optimizations?

@ilikenwf
Copy link

ilikenwf commented Sep 7, 2022

This one is also interesting CompVis/stable-diffusion#142

@ilikenwf
Copy link

ilikenwf commented Sep 7, 2022

In addition, it may make sense with both @mchaker and the Doggettx mods to have some kind of memory threshold or image dimension size based on GPU VRAM be used to determine if "slow and steady" mode kicks in or not?

@y0himba
Copy link

y0himba commented Sep 7, 2022

I am new to using git. How do I implement the Doggettx optimizations or the basujindal?

@extesy
Copy link

extesy commented Sep 7, 2022

CompVis/stable-diffusion#177 worked very well for me

@mchaker
Copy link
Author

mchaker commented Sep 7, 2022

This one is also interesting CompVis/stable-diffusion#142

This seems to utilize CPU -- what stands out about it to you @ilikenwf ?

@mchaker
Copy link
Author

mchaker commented Sep 7, 2022

In addition, it may make sense with both @mchaker and the Doggettx mods to have some kind of memory threshold or image dimension size based on GPU VRAM be used to determine if "slow and steady" mode kicks in or not?

I think that is how the Doggettx optimizations work

I am new to using git. How do I implement the Doggettx optimizations or the basujindal?

For the Doggettx optimizations, replace files in your local copy of stable-diffusion with the files that were changed in the linked PR.

I've added some instructions above to clarify.

CompVis/stable-diffusion#177 worked very well for me

This looks exactly like the changes in item (2) in my list, except that they are in PR format. I will add that as an alternate link. Thank you.

@ryudrigo
Copy link

ryudrigo commented Sep 7, 2022

Maybe you want to add basujindal/stable-diffusion#122 I made a pull request to improve upon neonsecret's

Do you think you could come up with a version of it that works with Doggettx's optimizations?

Currently, the PR I mentioned uses less memory (VRAM) than Doggetx's for the same generation time (at least for a 1024 image), so I'd just use that. If you can tell me a specific aspect of their optimization that I should include, I'll consider implementing it. Otherwise, it's just too much work to go comb through the details, and compare all the changes.

@andrewginns
Copy link

andrewginns commented Sep 7, 2022

Thanks for this. I was previously using the tweak from neonsecret and was able to generate up to 1024x640 images on 8GB; however, this came at the cost of speed where it took multiple seconds per iteration due to the attention splitting.

Results for 512x512 default parameters
Baseline code from following this guide JoshuaKimsey/Linux-StableDiffusion-Script@120a13b :

  • 6921MB peak
  • 5.54it/s

Using attention.py from https://github.com/basujindal/stable-diffusion/pull/122/files:

  • 5992MB peak
  • 5.01it/s
  • Can generate 1024x640 using 8132MB peak

Will update this once I add in Doggettx tweaks

System: Win10 with wsl2 Ubuntu 22.04, i7 11800H, 16GB of RAM, 3070 mobile 8GB

@mrpixelgrapher
Copy link

Quick question!

Can we apply the speed time as well as lower vRam mod both at the same time?

@mchaker
Copy link
Author

mchaker commented Sep 10, 2022

@mrpixelgrapher I have not tried that yet but it looks like some of the changes overlap. I'm not sure if it's possible to combine both approaches -- but perhaps there is and I just don't know enough math to do it 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment