Ollin Boer Bohan madebyollin

## notes_on_sd_vae.md

      
              1 file
            
          
              0 forks
            
          
              7 comments
            
          
              7 stars
            
          
                madebyollin
                / notes_on_sd_vae.md
            
            
              Last active
              April 23, 2024 06:21
            
              
                notes_on_sd_vae
              
          
    Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes and decodes images into a compressed "latent" format. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.
(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)
This document is a big pile of various links with more info.
VAE Versions & Lineage


CompVis


## Mamba_Diffusion_IADB_Colab.ipynb

      
              1 file
            
          
              1 fork
            
          
              4 comments
            
          
              12 stars
            
          
                madebyollin
                / Mamba_Diffusion_IADB_Colab.ipynb
            
            
              Created
              December 6, 2023 04:47
            
              
                Mamba Diffusion (IADB)
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 2024_04_07_Space_Filling_VAE_Animation.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                madebyollin
                / 2024_04_07_Space_Filling_VAE_Animation.ipynb
            
            
              Created
              April 7, 2024 22:58
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## variational_autoencoders_will_never_work.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              2 stars
            
          
                madebyollin
                / variational_autoencoders_will_never_work.md
            
            
              Created
              June 22, 2023 15:36
            
          
    Variational Autoencoders Will Never Work

So you want to generate images with neural networks. You're in luck! VAEs are here to save the day. They're simple to implement, they generate images in one inference step (unlike those awful slow autoregressive models) and (most importantly) VAEs are 🚀🎉🎂🥳 theoretically grounded 🚀🎉🎂🥳 (unlike those scary GANs - don't look at the GANs)!
The idea


The idea of VAE is so simple, even an AI chatbot could explain it:

Your goal is to train a "decoder" neural network that consumes blobs of random noise from a fixed distribution (like torch.randn(1024)), interprets that noise as decisions about what to generate, and produces corresponding real-looking images. You want to train this network with nice simple image-space MSE loss against your dataset of real images.


## README.md

      
              2 files
            
          
              0 forks
            
          
              12 comments
            
          
              8 stars
            
          
                madebyollin
                / README.md
            
            
              Last active
              April 2, 2024 13:50
                — forked from mrsteyk/README.md
            
              
                dalle_runner_api.model_infra.modules.public_diff_vae
              
          
    Consistency Decoder PyTorch Model Code

Cleaned up version of https://gist.github.com/mrsteyk/74ad3ec2f6f823111ae4c90e168505ac,
which is in turn based on the public_diff_vae.ConvUNetVAE from https://github.com/openai/consistencydecoder.
Example Usage

Install the consistency decoder code (for the inference logic) and download the extracted weights:


## make_audiobook.py
#!/usr/bin/env python3
"""
To use:
1. install/set-up the google cloud api and dependencies listed on https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/texttospeech/cloud-client
2. install pandoc and pypandoc, also tqdm
3. create and download a service_account.json ("Service account key") from https://console.cloud.google.com/apis/credentials
4. run GOOGLE_APPLICATION_CREDENTIALS=service_account.json python make_audiobook.py book_name.epub
"""
import re
import sys

## automatic_profiling_markers.py
def add_profiling_markers(model):
    """Monkey-patch profiling markers into an nn.Module.

    Args:
        model: an nn.Module

    Effect:
        all model.named_module() forward calls get wrapped in their
        own profiling scope, making traces easier to understand.
    """

## simple_convolution.py
#!/usr/bin/env python
import numpy as np
import cv2
from scipy.signal import convolve2d
from skimage import color, data, restoration
import console

# read input
frame = cv2.imread("input.jpg").astype(np.float32) / 255.0

## stable_diffusion_m1.py
# ------------------------------------------------------------------
# EDIT: I eventually found a faster way to run SD on macOS, via MPSGraph (~0.8s / step on M1 Pro):
#   https://github.com/madebyollin/maple-diffusion
# The original CoreML-related code & discussion is preserved below :)
# ------------------------------------------------------------------

# you too can run stable diffusion on the apple silicon GPU (no ANE sadly)
#
# quick test portraits (each took 50 steps x 2s / step ~= 100s on my M1 Pro):
# * https://i.imgur.com/5ywISvm.png

## list_of_good_image_generator_training_logs.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              4 stars
            
          
                madebyollin
                / list_of_good_image_generator_training_logs.md
            
            
              Last active
              February 10, 2024 02:25
            
              
                List of good image generator training logs
              
          
    List of good image generator training logs

A list of public training logs from neural network image generation models, since I think they're interesting.
The Criteria


Publicly accessible link
Losses plotted every so often
Samples generated every so often
Nontrivial dataset (i.e. not MNIST - 64x64 output RGB or better)
	#!/usr/bin/env python3
	"""
	To use:
	1. install/set-up the google cloud api and dependencies listed on https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/texttospeech/cloud-client
	2. install pandoc and pypandoc, also tqdm
	3. create and download a service_account.json ("Service account key") from https://console.cloud.google.com/apis/credentials
	4. run GOOGLE_APPLICATION_CREDENTIALS=service_account.json python make_audiobook.py book_name.epub
	"""
	import re
	import sys
	def add_profiling_markers(model):
	"""Monkey-patch profiling markers into an nn.Module.

	Args:
	model: an nn.Module

	Effect:
	all model.named_module() forward calls get wrapped in their
	own profiling scope, making traces easier to understand.
	"""
	#!/usr/bin/env python
	import numpy as np
	import cv2
	from scipy.signal import convolve2d
	from skimage import color, data, restoration
	import console

	# read input
	frame = cv2.imread("input.jpg").astype(np.float32) / 255.0
	# ------------------------------------------------------------------
	# EDIT: I eventually found a faster way to run SD on macOS, via MPSGraph (~0.8s / step on M1 Pro):
	# https://github.com/madebyollin/maple-diffusion
	# The original CoreML-related code & discussion is preserved below :)
	# ------------------------------------------------------------------

	# you too can run stable diffusion on the apple silicon GPU (no ANE sadly)
	#
	# quick test portraits (each took 50 steps x 2s / step ~= 100s on my M1 Pro):
	# * https://i.imgur.com/5ywISvm.png