ohjho/fast_neural_style.md

## fast_neural_style.md

      
    Raw
  

              fast_neural_style.md
            
          
    Clone from PyTorch Example

git clone --depth 1 https://github.com/pytorch/examples.git pytorch_examples
cd pytorch_examples/fast_neural_style/

Install requirements

check PyTorch official website, as of July 2021, with a g4dn.xlarge running CUDA 11.1 I used:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Evaluation Pre-trained Model

download pre-trained models:
python download_saved_models.py

evaluate the udnie.path model by
python neural_style/neural_style.py eval --content-imageest path/to/your/image.jpg --model saved_models/udnie.pth --output-image udnie_example.jpg --cuda 1

Train a new model

the training script expects a directory with classes sub-directories where the images go. The doc mentioned COCO dataset,
but getting a mini-Imagenet was easier (you will need the kaggle API):
kaggle datasets download -d ifigotin/imagenetmini-1000
unzip imagenetmini-1000.zip 

download style image (example The Great Wave off Kanagawa):
wget https://uploads5.wikiart.org/00129/images/katsushika-hokusai/the-great-wave-off-kanagawa.jpg

other places to find art pieces to train are WikiArt or Kaggle Dataset: Best Artworks of All Time
and to understand how the training works this kaggle notebook and this youtuber do a pretty good job
then to train I did the follow:
 python neural_style/neural_style.py train --dataset imagenet-mini/train/ \
  --style-image the-great-wave-off-kanagawa.jpg 
  --save-model-dir saved_models --epochs 2 --cuda 1 --batch-size 2 --style-weight 1e11


batch-size: default was 4 which uses more CUDA memory than I had available. So I took it down to 2. CUDA memory usage also depends on the size of the style image. An image of size 2000x1584 (about 1.3MB) at batch-size of 4 uses about 9.5GB of CUDA memory.
style-weight: when keeping the content-weight constant (1e5), a higher style weight will minimize the style's feature map's gram loss more, therefore, making the input image more and more like the style image. It's best adjusted by power of 10 (e.g. 1e10, 1e11, 1e12)

Alternative

this repo I built is a modified version of the PyTorch Example above with a streamlit app attached.