Skip to content

Instantly share code, notes, and snippets.

@madebyollin
Last active April 23, 2024 06:21
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save madebyollin/ff6aeadf27b2edbc51d05d5f97a595d9 to your computer and use it in GitHub Desktop.
Save madebyollin/ff6aeadf27b2edbc51d05d5f97a595d9 to your computer and use it in GitHub Desktop.
notes_on_sd_vae

Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes and decodes images into a compressed "latent" format. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.

(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)

This document is a big pile of various links with more info.

VAE Versions & Lineage

Other SD-VAE-related Codebases

Other Info

@madebyollin
Copy link
Author

Diagram of VAE

Animation of how VAE (decoder) is used during SD generation

@madebyollin
Copy link
Author

sd_vae_modification_chart

sdxl_vae_modification_chart

@madebyollin
Copy link
Author

Additive Gaussian noise
image
image

@madebyollin
Copy link
Author

effect of input image resolution on the scale of the encoded latents
Unknown-15

effect of scaling the "artifact" (brightest spot) of the SD latents up / down

anim-3.mp4

@madebyollin
Copy link
Author

effect of flips

image
image

@madebyollin
Copy link
Author

better latent-max chart with gaussian baseline
Unknown-16

animated visualization of the artifact that shows up SD-VAE for larger input images

scale_check_sd_vae-3.mp4

same test with the SDXL-VAE which doesn't have the artifact

scale_check_sdxl_vae-3.mp4

@madebyollin
Copy link
Author

Adding a quick grid for the Wuerstchen (Stable Cascade) Stage A f=4 VQGAN

Unknown-18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment