Skip to content

Instantly share code, notes, and snippets.

@drdaxxy
Last active July 29, 2023 20:07
Show Gist options
  • Save drdaxxy/64652f0072bf5f63ecc3518abbcf72b0 to your computer and use it in GitHub Desktop.
Save drdaxxy/64652f0072bf5f63ecc3518abbcf72b0 to your computer and use it in GitHub Desktop.

Typical approaches to training, and sampling from, denoising diffusion models yield results whose per-item means match the initial input - i.e. zero when using i.i.d. samples from a standard normal distribution. This has major implications for what outputs can be obtained from popular text-to-image generative models, see e.g. https://twitter.com/apeoffire/status/1624884816851206145 and https://www.crosslabs.org/blog/diffusion-with-offset-noise.

It also means we can reliably produce dark, bright, or tinted images by shifting the input to a desired color.

Now, I was curious what would happen if I made Stable Diffusion denoise an "impossible" image whose mean color exceeds the [0,1] valid RGB range:

init_latent = vae_encode(tensor([1.5, 1.5, 1.5])[None,:,None,None].tile(1,1,512,512)) + sigma_max * randn(1,4,64,64)

I got... this.

image_temporarily_unavailable

# pytorch 1.12.1-cu116, stablediffusion@fc1488421a2761937b9d54784194157882cbc3b1
# works with lots of seeds
import torch
from omegaconf import OmegaConf
from torch import autocast
from einops import rearrange
from PIL import Image
from pytorch_lightning import seed_everything
from ldm.util import instantiate_from_config
from ldm.models.diffusion.ddim import DDIMSampler
config_path = "../stable-diffusion/configs/stable-diffusion/v1-inference.yaml"
ckpt_path = "../stable-diffusion/models/ldm/stable-diffusion-v1/sd-v1-4.ckpt"
opts = {"bs": 2, "prompt": "A dark alleyway in a rainstorm", "uc_prompt": "", "cs": 7., "seed": 1024, "steps": 50}
rgb = torch.tensor([1.5, 1.5, 1.5], device="cuda", dtype=torch.float16)[None,:,None,None].tile(opts["bs"],1,512,512)
config = OmegaConf.load(config_path)
model = instantiate_from_config(config.model)
model.load_state_dict(torch.load(ckpt_path, map_location="cpu")["state_dict"], strict=False)
model.eval().half().cuda()
sampler = DDIMSampler(model)
with autocast("cuda"):
init_latent = model.get_first_stage_encoding(model.encode_first_stage((rgb - 0.5) * 2))
sampler.make_schedule(ddim_num_steps=opts["steps"], ddim_eta=0, verbose=False)
seed_everything(opts["seed"])
x_T = sampler.stochastic_encode(init_latent, torch.tensor([opts["steps"]-1] * opts["bs"], device="cuda"))
x_0 = sampler.decode(
x_T,
model.get_learned_conditioning([opts["prompt"]] * opts["bs"]),
opts["steps"]-1,
unconditional_guidance_scale=opts["cs"],
unconditional_conditioning=model.get_learned_conditioning([opts["uc_prompt"]] * opts["bs"])
)
img = model.decode_first_stage(x_0)
img = ((img + 1.) * 127.5).clamp(0,255).to(device="cpu", dtype=torch.uint8).numpy()
img = Image.fromarray(rearrange(img, 'b c h w -> h (b w) c'))
img.save("image_temporarily_unavailable.png")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment