Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Summary for Conditional Image Generation with PixelCNN Decoder

Conditional Image Generation with PixelCNN Decoder



  • New image density model based on PixelCNN
  • Can generate variety of images from text embeddings or CNN layer weights
  • Serves as decoder in image autoencoder
  • Gated PixelCNN: Matches PixelRNN accuracy
    • PixelRNN generates images pixel by pixel.
    • Slow, as hard to parallelise
    • Previous PixelCNN did not give good results
  • Returns probability density unlike GAN - easy to apply on compression


  • PixelCNN and PixelRNN is modelled on joint distribution of image x as product of conditional distribution of pixels on top & left: P(X) = (product from i to n^2) P(xi|x1,x2…xi-1)
  • 3 color channels are conditioned successively on each other.
  • Gated CNN
    • A gated (LSTM) like architecture to remember previous pixel values
    • y = tanh(weights_1 * image) sigmoid(weights_2 * image)
  • Blind Spot: To avoid blind spot, another vertical stack(without mask) is given as input to horizontal stack along with output of previous layer.
  • Conditional PixelCNN: a latent factor h(high level latent image description: mid-layer weights) is used to generate similar images. This term is added in the gated unit equation.


  • Unconditioned modelling: accuracy almost as same as PixelRNN
  • Conditioned modelling:
    • ImageNet: Faster and better than PixelRNN
    • Portraits: Uses triplet loss function
  • Auto encoder:
    • Maps to low-dimensional representation
    • Architecture as mentioned in the original paper
    • Loss function: MSE
    • Latent reprenenstation size: 10(mnist) or 100(cifar)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment