Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Summary for Conditional Image Generation with PixelCNN Decoder

Conditional Image Generation with PixelCNN Decoder

Implemenetation:

What

  • New image density model based on PixelCNN
  • Can generate variety of images from text embeddings or CNN layer weights
  • Serves as decoder in image autoencoder
  • Gated PixelCNN: Matches PixelRNN accuracy
    • PixelRNN generates images pixel by pixel.
    • Slow, as hard to parallelise
    • Previous PixelCNN did not give good results
  • Returns probability density unlike GAN - easy to apply on compression

How

  • PixelCNN and PixelRNN is modelled on joint distribution of image x as product of conditional distribution of pixels on top & left: P(X) = (product from i to n^2) P(xi|x1,x2…xi-1)
  • 3 color channels are conditioned successively on each other.
  • Gated CNN
    • A gated (LSTM) like architecture to remember previous pixel values
    • y = tanh(weights_1 * image) sigmoid(weights_2 * image)
  • Blind Spot: To avoid blind spot, another vertical stack(without mask) is given as input to horizontal stack along with output of previous layer.
  • Conditional PixelCNN: a latent factor h(high level latent image description: mid-layer weights) is used to generate similar images. This term is added in the gated unit equation.

Experiments

  • Unconditioned modelling: accuracy almost as same as PixelRNN
  • Conditioned modelling:
    • ImageNet: Faster and better than PixelRNN
    • Portraits: Uses triplet loss function
  • Auto encoder:
    • Maps to low-dimensional representation
    • Architecture as mentioned in the original paper
    • Loss function: MSE
    • Latent reprenenstation size: 10(mnist) or 100(cifar)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment