viuts/gist:09911f0e86c224514efc4ba1dacd1005 Secret

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Hi this is Allen from Zeals.
Recently I found one great work that posted by bryandlee on github, FreezeG, the result of this work is absolutely stunning, almost the most intuitive transformation model that I have ever seen.

https://github.com/bryandlee/FreezeG/blob/master/imgs/cat2wild/1.gif?raw=true

https://github.com/bryandlee/FreezeG/blob/master/imgs/cat2wild/2.gif?raw=true
How awesome! I immediately wanted to play on it, great thing that bryandlee did provide pre-trained model here so I can download and play with it.
Exploring the latent space

After downloading the cat and cat2wild model from the above issue, since my device I am working on didn't have a GPU, I instead ran it on colab.
The repo itself already come with gradio app interactive demo, simply change the argparse to code that runnable on colab
  
from gradio_app import Sampler

args = SimpleNamespace(
    device="cuda",
    size=256,
    truncation=0.7,
    n_factors=10,
    finetune_loc=3,
    source_ckpt="./cat.pt",
    target_ckpt="./cat2wild_30k.pt",
)

args.factors = list(range(args.n_factors))

sampler = Sampler(args)

gradio_inputs = [gradio.inputs.Slider(minimum=0, maximum=99999, step=1, default=sampler.seed, label='seed')]
for i in range(args.n_factors):
    gradio_inputs.append(gradio.inputs.Slider(minimum=-5, maximum=5, step=0.2, default=0, label=str(i+1)))

gradio.Interface(fn=sampler.create_sample, inputs=gradio_inputs, outputs=gradio.outputs.Image(), live=True, title='FreezeG').launch()
Launched the interactive demo on google colab and the model also working well!

You can change the random seed and generate different kitties

How does it work?

From the repo readme we know that the idea of this work is trying to freeze only the later layers of other pre-trained models, so the higher level layers which handle feature mapping are frozen, but only the style generating layers are enabled for training.
Diving into the code to check out which layers are exactly frozen by bryandlee, by comparing the code with the original repo, we can see this part of the code is added when training generator.
# update G
        if args.finetune_loc <= 0:
            requires_grad(generator, True)
        else:
            for loc in range(args.finetune_loc):
                requires_grad(generator, True, target_layer=f'convs.{generator.num_layers-2-2*loc}')
                requires_grad(generator, True, target_layer=f'convs.{generator.num_layers-3-2*loc}')
                requires_grad(generator, True, target_layer=f'to_rgbs.{generator.log_size-3-loc}')
        requires_grad(discriminator, False)
We can see only convs and to_rgbs layers are enabled for training, all the rest is frozen

https://arxiv.org/abs/1812.04948
We can confirm that the generator mapping model is completely frozen during training, so we know that given with a same z, source and target model will generate a same w.
For stylegan2, the right side of the model changed to below structure.

https://arxiv.org/abs/1912.04958
So we can see also only partial layers of synthesis model is enabled for training, Conv layer inside graph (c) and referring to stylegan2's official source code, to_rgbs layer enabled for training in each block.
Since both source and target model having the same w, with the same feature mappings, in the context of generating an animal image, means the eyes, mouth, ears, etc. position. That's why the generated images look quite identical but only difference in styles.
Adjusting latent directions

Thanks for the original implementation by rosinality, we can actually use this script generate latent directions to edit the generated image.
Simply running
!python closed_form_factorization.py --out "./cat_directions.pt" "./cat.pt"
This script extracts eigenvectors from the input model and creating meaningful direction for us to adjust the image, but these factors are arbitrary and if you looking well defined adjustable directions, you can check out my previous article.

Comparing with the image included in the previous section, we can see the generated image with much more lighter color in terms of skin.
Style Transfer...?

The result looks quite promising, but can it actually be used for end to end image style transfer? the quick answer will be quite limited.
Given the we will need a z for both of the model input, means we need to project real images to latent space.
The original repo did provided a projector based on VGG model, let's try it out how will it look like.
# 550000.pt is FFHQ pre-trained model downloaded from the original repo:
# https://github.com/rosinality/stylegan2-pytorch/tree/d8cdab7ade2f094afb2c30f56fe3d9974a40e1c3#pretrained-checkpoints
!python projector.py --ckpt "550000.pt" --step 1000 "./takakuda.png"
Thanks to my colleague Takakuda-san who publishes his images inside our company with CC license, I can use it for the input image.

After 1k iteration, the closest latent vector it found is this and the generated image looks like this.

Well they do have a similar color tune, but it's hard to say it projected successfully since the latent space is limited which is impossible for it to reflect all images.
bryandlee published a new model recently here, which is a FFHQ transferred naver webtoon style model.
This is the transform result that I used FFHQ to webtoon model

For fun, let's also put the latent vector we just converted from takakudan-san into this model

hmm, at least it looks like a human after the convention!
What's next

This work is amazing and I am looking forward to seeing more insights. I also want to try out to train with my custom dataset. If I am able to make something work, I will write again!