Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save derekphilipau/e2c0ff95d1502758e010a8d4f5811b20 to your computer and use it in GitHub Desktop.
Save derekphilipau/e2c0ff95d1502758e010a8d4f5811b20 to your computer and use it in GitHub Desktop.
StyleGAN & StyleGAN2 on Google Cloud Compute

StyleGAN & StyleGAN2 on Google Cloud Compute

These instructions are for StyleGAN2 but may work for the original version of StyleGAN.

Essential Reading:

StyleGAN2

gwern.net: "Making Anime Faces With StyleGAN"

5agado: "StyleGAN v2: notes on training and latent space exploration"

Before Starting

Many steps in this install take time to run, not to mention StyleGAN2 itself, and you should familiarize yourself with screen so that your process is not killed if you get kicked off the server.

Set up VM

Before setting up a new VM, you may need to request a quota increase if you plan to use more than one GPU.

  1. IAM & Admin -> Quotas
  2. Under Service select Compute Engine API
  3. Under Metric search for "GPU"
  4. Select both "GPUs (all regions)" and "NVIDIA V100 GPUs" (or whatever GPU you need). Make sure you select the correct region for the GPUs.
  5. Click "Edit Quotas" and request the total number you require.
  6. Within an hour you should receive a confirmation email to your gmail inbox.

Your VM must have CUDA 10.0 installed. A number of the Marketplace VMs have other CUDA versions.

  1. Compute Engine -> VM instances -> Create Instance
  2. Select "Marketplace" and search for "CUDA"
  3. Many options, but only one I could find with the right environment is "VM: New Deep Learning VM deployment"
  4. For server, 8 vCPUs, 30 GB memory
  5. Set GPU type & number
  6. IMPORTANT: Framework = TensorFlow Enterprise 1.15
  7. Check: Install NVIDIA GPU driver automatically on first startup.
  8. Boot Disk: SSD, 200 GB. You will need a lot of space.

Once deployed, ssh in and check configuration:

  1. nvidia-smi to ensure GPU's are detected and CUDA is version 10.0

Python version

Use python --version and python3 --version to determine versions. You must be using Python 3.6.x

Install Python 3.6

sudo apt update
sudo apt-get install -y libssl-dev
wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
tar xf Python-3.6.8.tgz
cd Python-3.6.8
./configure --enable-optimizations
make -j 8
sudo make altinstall
python3.6 --version

Install StyleGAN2 Python dependencies

sudo pip3.6 install numpy scipy requests Pillow;

StyleGAN2 requires Tensorflow 1.14 or 1.15 with GPU support:

sudo pip3.6 install tensorflow-gpu==1.14

For generating videos:

sudo pip3.6 install moviepy;
sudo apt install ffmpeg;

Check that tensorflow is installed and we can see the GPU's:

import tensorflow as tf; print(tf.__version__)
from tensorflow.python.client import device_lib

device_lib.list_local_devices()

Paths

Make sure you have the correct paths set in your environment, if not:

export PATH=/usr/local/cuda/bin:$PATH
export CPATH=/usr/local/cuda-10.0/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH

Set Up Cloud Storage

StyleGAN requires large datasets and produces large data files. You might need to use Cloud Storage to transfer the files between VM's and other cloud providers.

  1. In Google Cloud Platform navigate to Storage and enable service.
  2. Create new cloud storage bucket.
  3. Edit bucket permissions, Add user to Storage bucket permissions
  4. On VM:
    1. Stop VM instance
    2. Open VM instance details
    3. Press "Edit"
    4. Change Cloud API access scope--> "Allow full access to all cloud APIs"
    5. Start VM instance
    6. gcloud config set account yourstorageaccount@gmail.com
    7. gcloud auth login
    8. Use gsutil commands to transfer files from VM to cloud storage
    • Example: gsutil cp -r datasets gs://mycloudstoragebucket
    • Example: gsutil cp -r gs://mycloudstoragebucket/datasets .

Transferring between Google Cloud Storage and AWS s3

  1. aws configure to set your AWS access key
  2. gsutil -m rsync -r gs://storagename s3://bucketname
  3. Files over 5G are too big, you'll need to split up big files and transfer those instead, e.g. split --bytes=2G vases-r09.tfrecords

https://cloud.google.com/storage/docs/gsutil/commands/rsync

Install StyleGAN

wget https://github.com/NVlabs/stylegan2/archive/master.zip;
unzip master.zip;
cd stylegan2-master/

Your datasets can be moved into stylegan2-master/datasets Results will be stored in unique directories under stylegan2-master/results

Preparing the Images

Use Image Magick to crop all images in a directory to 512x512 pixels.

for file in *.jpg; do convert $file -resize 512x512 -gravity center -extent 512x512 -background white -density 72 -set colorspace sRGB -quality 80 "sq_`basename $file .jpg`.jpg"; done

From gwern.net, find images that are not perfectly sized and in correct colorspace:

find . -type f | xargs --max-procs=16 -n 9000 identify | \
    # remember the warning: images must be identical, square, and sRGB/grayscale:
    fgrep -v " JPEG 512x512 512x512+0+0 8-bit sRGB"| cut -d ' ' -f 1 | \
    xargs --max-procs=16 -n 10000 rm

Create the Multi-resolution TFRecords

python3.6 dataset_tool.py create_from_images datasets/nameOfDataset yourDirectory/yourImageDirectory

If Resuming Training from a Previous pkl

You must tell StyleGAN2 that you are resuming by editing a file, otherwise training will start from scratch!

Edit the file training/training_loop.py, with XXXX being your .pkl's Kimg value:

resume_pkl = "/path/to/pkl/00001-stylegan2-project-1gpu-config-e/network-snapshot-00XXXX.pkl",   # Network pickle to resume training from, None = train from scratch.
resume_kimg             = XXXX.0,      # Assumed training progress at the beginning. Affects reporting and training schedule.
resume_time             = 0.0,      # Assumed wallclock time at the beginning. Affects reporting.
resume_with_new_nets    = False):   # Construct new networks according to G_args and D_args before resuming training?

Run StyleGAN2

One GPU, config-e, 10000 Kimg, mirroring: python3.6 run_training.py --num-gpus=1 --data-dir=datasets --config=config-e --dataset=vases --total-kimg=10000

Two GPUs, config-e, 10000 Kimg, mirroring: python3.6 run_training.py --num-gpus=2 --data-dir=datasets --config=config-e --mirror-augment=true --dataset=vases --total-kimg=10000

Memory Error

Every once in a while, StyleGAN2 crashes on my VM due to a memory error (output below). I have not had time to investigate, and am simply stopping & restarting StyleGAN2 every 8 hours (logs show that the crash occurs 8-14hrs into training). Make sure you update the resume* vars in training/training_loop.py with each restart.

A few other times my screen session simply crashed with no warning. Ideally one would have a monitoring script that checks not the output log file but perhaps the existence of the python process or nvidia-smi to ensure the GPUs are working.

Traceback (most recent call last):
  File "run_training.py", line 192, in <module>
    main()
  File "run_training.py", line 187, in main
    run(**vars(args))
  File "run_training.py", line 120, in run
    dnnlib.submit_run(**kwargs)
  File "/home/myuser/stylegan2-master/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/myuser/stylegan2-master/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/myuser/stylegan2-master/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/myuser/stylegan2-master/training/training_loop.py", line 341, in training_loop
    metrics.run(pkl, run_dir=dnnlib.make_run_dir_path(), data_dir=dnnlib.convert_path(data_dir), num_gpus=num_gpus, tf_config=tf_config)
  File "/home/myuser/stylegan2-master/metrics/metric_base.py", line 151, in run
    metric.run(*args, **kwargs)
  File "/home/myuser/stylegan2-master/metrics/metric_base.py", line 67, in run
    self._evaluate(Gs, Gs_kwargs=Gs_kwargs, num_gpus=num_gpus)
  File "/home/myuser/stylegan2-master/metrics/frechet_inception_distance.py", line 54, in _evaluate
    labels = self._get_random_labels_tf(self.minibatch_per_gpu)
  File "/home/myuser/stylegan2-master/metrics/metric_base.py", line 140, in _get_random_labels_tf
    return self._get_dataset_obj().get_random_labels_tf(minibatch_size)
  File "/home/myuser/stylegan2-master/metrics/metric_base.py", line 121, in _get_dataset_obj
    self._dataset_obj = dataset.load_dataset(data_dir=self._data_dir, **self._dataset_args)
  File "/home/myuser/stylegan2-master/training/dataset.py", line 192, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**kwargs)
  File "/home/myuser/stylegan2-master/training/dataset.py", line 86, in __init__
    self._np_labels = np.zeros([1<<30, 0], dtype=np.float32)
MemoryError: Unable to allocate 0 bytes for an array with shape (1073741824, 0) and data type float32

Generate Images

The scripts for generating images and interpolation videos from gwern.net: "Making Anime Faces With StyleGAN" still work with StyleGAN2, although they may need some tweaking. For example, config.py no longer exists in v2, but that file is only used to determine the cache, results, and datasets directories, so you can just hard-code them in instead. These minor changes can be found in my colab notebook here:

https://colab.research.google.com/drive/1sINSGdVlkepc_0vFH54E7yIY3zzLmNvu

Uncurated images: python3.6 run_generator.py generate-images --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --seeds=6626-7626 --truncation-psi=0.5

Curated images: python3.6 run_generator.py generate-images --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --seeds=6626-7626 --truncation-psi=1.0

Generate style mixing example (matches style mixing video clip)

python3.6 run_generator.py style-mixing-example --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --row-seeds=85,100,75,458,1500 --col-seeds=55,821,1789,293 --truncation-psi=1.0;

python3.6 run_generator.py style-mixing-example --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --row-seeds=65,110,75,468,1520 --col-seeds=95,831,1749,393 --truncation-psi=1.0;

python3.6 run_generator.py style-mixing-example --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --row-seeds=95,120,85,478,1540 --col-seeds=150,841,1759,793 --truncation-psi=1.0;

python3.6 run_generator.py style-mixing-example --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --row-seeds=55,130,35,488,1560 --col-seeds=170,851,1769,593 --truncation-psi=1.0;

python3.6 run_generator.py style-mixing-example --network=results/00001-stylegan2-glaze-1gpu-config-e/network-snapshot-000782.pkl --row-seeds=45,140,25,498,1570 --col-seeds=180,861,1779,693 --truncation-psi=1.0;

Here is my updated script for generating interpolation videos, from Cyril Diagne (@kikko_fr) via gwern. I have added variables for the truncation psi and random seeds to make editing easier.

import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import scipy

def main():

    tflib.init_tf()

    # Load pre-trained network.
    # url = 'https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ'
    # with dnnlib.util.open_url(url, cache_dir=cache) as f:
    ## NOTE: insert model here:
    _G, _D, Gs = pickle.load(open("../00001-stylegan2-vases-1gpu-config-e_network-snapshot-000240.pkl", "rb"))
    # _G = Instantaneous snapshot of the generator. Mainly useful for resuming a previous training run.
    # _D = Instantaneous snapshot of the discriminator. Mainly useful for resuming a previous training run.
    # Gs = Long-term average of the generator. Yields higher-quality results than the instantaneous snapshot.

    my_truncation_psi = 1.0
    random_grid_seed = 382
    random_interpolate_seed = 413
    random_fine_seed = 533

    grid_size = [2,2]
    image_shrink = 1
    image_zoom = 1
    duration_sec = 60.0
    smoothing_sec = 1.0
    mp4_fps = 20
    mp4_codec = 'libx264'
    mp4_bitrate = '5M'
    mp4_file = 'results/random_grid_%s.mp4' % random_grid_seed
    minibatch_size = 8

    num_frames = int(np.rint(duration_sec * mp4_fps))
    random_state = np.random.RandomState(random_grid_seed)

    # Generate latent vectors
    shape = [num_frames, np.prod(grid_size)] + Gs.input_shape[1:] # [frame, image, channel, component]
    all_latents = random_state.randn(*shape).astype(np.float32)
    import scipy
    all_latents = scipy.ndimage.gaussian_filter(all_latents,
                   [smoothing_sec * mp4_fps] + [0] * len(Gs.input_shape), mode='wrap')
    all_latents /= np.sqrt(np.mean(np.square(all_latents)))


    def create_image_grid(images, grid_size=None):
        assert images.ndim == 3 or images.ndim == 4
        num, img_h, img_w, channels = images.shape

        if grid_size is not None:
            grid_w, grid_h = tuple(grid_size)
        else:
            grid_w = max(int(np.ceil(np.sqrt(num))), 1)
            grid_h = max((num - 1) // grid_w + 1, 1)

        grid = np.zeros([grid_h * img_h, grid_w * img_w, channels], dtype=images.dtype)
        for idx in range(num):
            x = (idx % grid_w) * img_w
            y = (idx // grid_w) * img_h
            grid[y : y + img_h, x : x + img_w] = images[idx]
        return grid

    # Frame generation func for moviepy.
    def make_frame(t):
        frame_idx = int(np.clip(np.round(t * mp4_fps), 0, num_frames - 1))
        latents = all_latents[frame_idx]
        fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
        images = Gs.run(latents, None, truncation_psi=my_truncation_psi,
                              randomize_noise=False, output_transform=fmt)

        grid = create_image_grid(images, grid_size)
        if image_zoom > 1:
            grid = scipy.ndimage.zoom(grid, [image_zoom, image_zoom, 1], order=0)
        if grid.shape[2] == 1:
            grid = grid.repeat(3, 2) # grayscale => RGB
        return grid

    # Generate video.
    import moviepy.editor
    video_clip = moviepy.editor.VideoClip(make_frame, duration=duration_sec)
    video_clip.write_videofile(mp4_file, fps=mp4_fps, codec=mp4_codec, bitrate=mp4_bitrate)

    # import scipy
    # coarse
    duration_sec = 60.0
    smoothing_sec = 1.0
    mp4_fps = 20

    num_frames = int(np.rint(duration_sec * mp4_fps))
    random_state = np.random.RandomState(random_interpolate_seed)


    w = 512
    h = 512
    #src_seeds = [601]
    dst_seeds = [700]
    style_ranges = ([0] * 7 + [range(8,16)]) * len(dst_seeds)

    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    synthesis_kwargs = dict(output_transform=fmt, truncation_psi=my_truncation_psi, minibatch_size=8)

    shape = [num_frames] + Gs.input_shape[1:] # [frame, image, channel, component]
    src_latents = random_state.randn(*shape).astype(np.float32)
    src_latents = scipy.ndimage.gaussian_filter(src_latents,
                                                smoothing_sec * mp4_fps,
                                                mode='wrap')
    src_latents /= np.sqrt(np.mean(np.square(src_latents)))

    dst_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in dst_seeds)


    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    dst_dlatents = Gs.components.mapping.run(dst_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    dst_images = Gs.components.synthesis.run(dst_dlatents, randomize_noise=False, **synthesis_kwargs)


    canvas = PIL.Image.new('RGB', (w * (len(dst_seeds) + 1), h * 2), 'white')

    for col, dst_image in enumerate(list(dst_images)):
        canvas.paste(PIL.Image.fromarray(dst_image, 'RGB'), ((col + 1) * h, 0))

    def make_frame(t):
        frame_idx = int(np.clip(np.round(t * mp4_fps), 0, num_frames - 1))
        src_image = src_images[frame_idx]
        canvas.paste(PIL.Image.fromarray(src_image, 'RGB'), (0, h))

        for col, dst_image in enumerate(list(dst_images)):
            col_dlatents = np.stack([dst_dlatents[col]])
            col_dlatents[:, style_ranges[col]] = src_dlatents[frame_idx, style_ranges[col]]
            col_images = Gs.components.synthesis.run(col_dlatents, randomize_noise=False, **synthesis_kwargs)
            for row, image in enumerate(list(col_images)):
                canvas.paste(PIL.Image.fromarray(image, 'RGB'), ((col + 1) * h, (row + 1) * w))
        return np.array(canvas)

    # Generate video.
    import moviepy.editor
    mp4_file = 'results/interpolate_%s.mp4' % (random_interpolate_seed)
    mp4_codec = 'libx264'
    mp4_bitrate = '5M'

    video_clip = moviepy.editor.VideoClip(make_frame, duration=duration_sec)
    video_clip.write_videofile(mp4_file, fps=mp4_fps, codec=mp4_codec, bitrate=mp4_bitrate)

    import scipy

    duration_sec = 60.0
    smoothing_sec = 1.0
    mp4_fps = 20

    num_frames = int(np.rint(duration_sec * mp4_fps))
    random_state = np.random.RandomState(random_fine_seed)


    w = 512
    h = 512
    style_ranges = [range(6,16)]

    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    synthesis_kwargs = dict(output_transform=fmt, truncation_psi=my_truncation_psi, minibatch_size=8)

    shape = [num_frames] + Gs.input_shape[1:] # [frame, image, channel, component]
    src_latents = random_state.randn(*shape).astype(np.float32)
    src_latents = scipy.ndimage.gaussian_filter(src_latents,
                                                smoothing_sec * mp4_fps,
                                                mode='wrap')
    src_latents /= np.sqrt(np.mean(np.square(src_latents)))

    dst_latents = np.stack([random_state.randn(Gs.input_shape[1])])


    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    dst_dlatents = Gs.components.mapping.run(dst_latents, None) # [seed, layer, component]


    def make_frame(t):
        frame_idx = int(np.clip(np.round(t * mp4_fps), 0, num_frames - 1))
        col_dlatents = np.stack([dst_dlatents[0]])
        col_dlatents[:, style_ranges[0]] = src_dlatents[frame_idx, style_ranges[0]]
        col_images = Gs.components.synthesis.run(col_dlatents, randomize_noise=False, **synthesis_kwargs)
        return col_images[0]

    # Generate video.
    import moviepy.editor
    mp4_file = 'results/fine_%s.mp4' % (random_fine_seed)
    mp4_codec = 'libx264'
    mp4_bitrate = '5M'

    video_clip = moviepy.editor.VideoClip(make_frame, duration=duration_sec)
    video_clip.write_videofile(mp4_file, fps=mp4_fps, codec=mp4_codec, bitrate=mp4_bitrate)

if __name__ == "__main__":
    main()

Other Resources

Notebooks

Interpolating between two models

https://twitter.com/arfafax/status/1234627216098484225

https://github.com/arfafax/StyleGAN2_experiments/blob/master/StyleGAN2%20Network%20Interpolation.ipynb

Interpolation Video

https://colab.research.google.com/drive/1KE1BtDgqaDTK2JaDH9th0zl4iQ-tR0te

Datasets

ArtGAN Dataset refined

https://github.com/cs-chan/ArtGAN/tree/master/WikiArt%20Dataset

StyleGAN2 Dataset: https://archive.org/details/wikiart-dataset

Beetle Photos

https://www.flickr.com/photos/coleoptera-us/albums/72157607363771409

StyleGAN2 Models

Wildlife (256x256)

https://twitter.com/MichaelFriese10

https://mega.nz/#!rewlECYI!YxVxdCKoeauEbiPKt92otVVHZBOiI-KkZMr0cvKHBdg

Microscopic (1024x1024)

https://twitter.com/MichaelFriese10

https://mega.nz/#!PbgzWTZT!JbVpqgMU7AOg-sQUoG1BDepuwKtgAsLgjd4YwlTXlpc

WikiArt (1024x1024)

Deep learning conditional StyleGAN2 model for generating art trained on WikiArt images; includes the model, a ResNet based encoder into the model's latent space, and source code (mirror of the pbaylies/stylegan2 repo on github as of 2020-01-25)

https://archive.org/details/wikiart-stylegan2-conditional-model

https://archive.org/details/wikiart-dataset

https://archive.org/download/wikiart-stylegan2-conditional-model/pbaylies-stylegan2-master.zip

https://archive.org/download/wikiart-stylegan2-conditional-model/network-snapshot-012052.pkl

Various Notes

Difficulties with separating styles

https://www.reddit.com/r/MachineLearning/comments/ewmwsh/d_issues_with_stylemixing_in_stylegan2/

Just spitballing here:

StyleGAN2 cuts one (?) layer at the start of the model, which changes the indexing. According to the StyleGAN2 paper, the high resolution style for StyleGAN was mostly inactive, doing minor sharpening, whereas the StyleGAN2 high resolution style has increased capacity and effet, which might also affect the proper indexing.

I don't think there's anything special about coarse, middle and fine styles: the groupings seem to be for illustration purposes. You should be able to test the significance of the different style layers and choose which you want to mix depending on what sort of effect you want. (Though higher resolution should correspond to finer details, overall.)

It might very well be that disentanglement between layers is poorer than in StyleGAN1. The generator regularization seems to be causing problems for me.

Thank you so much for your advice.

After some testing, I found out that range(10,18) controls color scheme, and range(6,10) would control facial features.

You were right about the first layer of the model being inactive.

Thank you again for the help. Greatly appreciated.

Continue this thread

I too tried style mixing. 0-6 for facial features and 6-18 for color scheme. See this for some examples: https://github.com/nikhiltiru/stylegan2

Upscaling an existing model

StyleGAN-Resolution-Convertor

https://twitter.com/xsteenbrugge/status/1205108736349614081

https://github.com/tr1pzz/StyleGAN-Resolution-Convertor

after upscale, suggestion is to "New 1024 model saved to models/new_1024_model_with_0512_base.pkl When finetuning this model, you should set "resume_kimg" to 9600 (I think :p)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment