Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created November 13, 2016 09:47
Show Gist options
  • Save shagunsodhani/1f9dc0444142be8bd8a7404a226880eb to your computer and use it in GitHub Desktop.
Save shagunsodhani/1f9dc0444142be8bd8a7404a226880eb to your computer and use it in GitHub Desktop.
Summary of "Generative Adversarial Nets" paper

Generative Adversarial Nets

Introduction

  • The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
  • Link to the paper

Adversarial Net

  • Two models - Generative Model(G) and Discriminative Model(D)
  • Both are multi-layer perceptrons.
  • G takes as input a noise variable z and outputs data sample x(=G(z)).
  • D takes as input a data sample x and predicts whether it came from true data or from G.
  • G tries to minimise log(1-D(G(z))) while D tries to maximise the probability of correct classification.
  • Think of it as a minimax game between 2 players and the global optimum would be when G generates perfect samples and D can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data).
  • Alternate between k steps of training D and 1 step of training G so that D is maintained near its optimal solution.
  • When starting training, the loss log(1-D(G(z))) would saturate as G would be weak. Instead maximise log(D(G(z)))
  • The paper contains the theoretical proof for global optimum of the minimax game.

Experiments

  • Datasets
    • MNIST, Toronto Face Database, CIFAR-10
  • Generator model uses RELU and sigmoid activations.
  • Discriminator model uses maxout and dropout.
  • Evaluation Metric
    • Fit Gaussian Parzen window to samples obtained from G and compare log-likelihood.

Strengths

  • Computational advantages
    • Backprop is sufficient for training with no need for Markov chains or performing inference.
    • A variety of functions can be used in the model.
  • Since G is trained only using the gradients from D, fewer chances of directly copying features from the true data.
  • Can represent sharp (even degenerate) distributions.

Weakness

  • D must be well synchronised with G.
  • While G may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.

Possible Extensions

  • Conditional generative models.
  • Inference network to predict z given x.
  • Implement a stochastic extension of the deterministic Multi-Prediction Deep Boltzmann Machines
  • Using discriminator net or inference net for feature selection.
  • Accelerating training by ensuring better coordination between G and D or by determining better distributions to sample z from during training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment