shagunsodhani/DCGAN.md

## DCGAN.md

      
    Raw
  

              DCGAN.md
            
          
    Deep Convolutional Generative Adversarial Nets

Introduction


The paper presents Deep Convolutional Generative Adversarial Nets (DCGAN) - a topologically constrained variant of conditional GAN.
Link to the paper

Benefits


Stable to train
Very useful to learn unsupervised image representations.

Model


GANs difficult to scale using CNNs.
Paper proposes following changes to GANs:

Replace any pooling layers with strided convolutions (for discriminator) and fractional strided convolutions (for generators).
Remove fully connected hidden layers.
Use batch normalisation in both generator (all layers except output layer) and discriminator (all layers except input layer).
Use LeakyReLU in all layers of the discriminator.
Use ReLU activation in all layers of the generator (except output layer which uses Tanh).


Datasets


Large-Scale Scene Understanding.
Imagenet-1K.
Faces dataset.

Hyperparameters


Minibatch SGD with minibatch size of 128.
Weights initialized with 0 centered Normal distribution with standard deviation = 0.02
Adam    Optimizer
Slope of leak = 0.2 for LeakyReLU.
Learning rate = 0.0002, β1 = 0.5

Observations


Large-Scale Scene Understanding data

Demonstrates that model scales with more data and higher resolution generation.
Even though it is unlikely that model would have memorized images (due to low learning rate of minibatch SGD).


Classifying CIFAR-10 dataset

Features

Train in Imagenet-1K and test on CIFAR-10.
Max pool discriminator's convolutional features (from all layers) to get 4x4 spatial grids.
Flatten and concatenate to get a 28672-dimensional vector.
Linear L2-SVM classifier trained over the feature vector.


82.8% accuracy, outperforms K-means (80.6%)


Street View House Number Classifier

Similar pipeline as CIFAR-10
22.48% test error.


The paper contains many examples of images generated by final and intermediate layers of the network.
Images in the latent space do not show sharp transitions indicating that network did not memorize images.
DCGAN can learn an interesting hierarchy of features.
Networks seems to have some success in disentangling image representation from object representation.
Vector arithmetic can be performed on the Z vectors corresponding to the face samples to get results like smiling woman - normal woman + normal man = smiling man visually.