Skip to content

Instantly share code, notes, and snippets.

@victorchall
Created October 11, 2022 18:50
Show Gist options
  • Save victorchall/67bc53472f86641aef1ebee1e154f5d1 to your computer and use it in GitHub Desktop.
Save victorchall/67bc53472f86641aef1ebee1e154f5d1 to your computer and use it in GitHub Desktop.
Quick start on caption training with kanewallmann repo
Using this repo:
https://github.com/kanewallmann/Dreambooth-Stable-Diffusion
Folder structure, using a project name of "ff7r" for example but you can name it however you want
/reg/man/ (all your regularization images of men)
/training_samples/ff7r/man (all your images of men to train)
/reg/woman/ (all your regulaization images of women)
/training_samples/ff7r/woman (all your images of women to train)
/reg/group/ (all your regulaization images of groups of people)
/training_samples/ff7r/group (all your images of multiple characters in one frame)
/reg/city/ (all your regulaization images of city stuff, like "aerial photo of a city at night" or "photo of a city street")
/training_samples/ff7r/city (all your images of city styles to train)
etc. as many pairings as you want. /indoors, /building, whatever. Make a pairing of the train and reg sets in identical subfolders in your /reg and /training_samples/projectname
Python run command to kick off training:
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume last.ckpt -n ff7r --gpus 0, --data_root training_samples\ff7r --reg_data_root reg
Last successful run:
Training images are run through blip interrogator, 16 beams, and files are renamed to that caption it spits out
"a man" and "a woman" and so forth are changed to "cloud strife" or "barret wallace", obviously to the correct character name shown in the image
Every single training image has a custom caption such as "
120-140 images each of Cloud Strife and Barret Wallace in /training_samples/ff7r/man
120-140 images each of Aerith Gainsborough and Tifa Lockhart in /training_samples/ff7r/woman
80 images of Jessie Rasberry in /training_samples/ff7r/woman
60 group photos (various combinations of characters) in /training_samples/ff7r/group
30 images of Wedge and Biggs in /training_samples/ff7r/man
10 images of red xiii in /training_samples/ff7r/dog
10 images of aerial screesshot of midgar city in /training_samples/ff7r/city
10 images of city streets and concept art in /training_samples/ff7r/city
etc.
Results: Cloud, Barret, Aerith, Tifa, and Jessie all look very good.
Biggs/wedge look like PS2-era renders and kinda smoothed over, but are at least there, more training samples will fix this
Style transfer for city of midgar works fairly well given the limited set
Tom Cruise still looks like Tom Cruise, Emma Watson still looks like Emma Watson, etc.
"photo of city streets" does not turn into midgar unless "midgar city" or "midgar" is in the prompt
There is some degradation, but if you want to generate context mashups with Cloud strife as Captain America it works VERY well, or Robert Downney Jr as Cloud Strife, it still works great
Future:
1400 images in next training set, more wedge/biggs, etc
Adding "slums district" and "business district" in next model, fairly certain it will do extremely well
Adding more training images for wedge/biggs, sephiroth, president shinra, heidegger, rufus shinra, etc.
@victorchall
Copy link
Author

victorchall commented Oct 11, 2022

You can monitor training by looking at the logs folder, for example: logs[ff7r2022-10-11T04-19-59_ff7rv4]\images\train
It will spit out test images ever so many steps based on the ImageLogger settings in the finetune yaml.

@victorchall
Copy link
Author

The caption is per-image in kanewallmann's repo. Underscore is used to mark the end of the caption so you can have multiple images with the same caption without filename collision

Ex.
"zack fair in a black outfit holding a broadsword.jpg"
"cloud strife sitting on a motorcycle with his buster sword in his hand.png"
"cloud strife standing in a burning alleyway_1.jpg"
"cloud strife standing in a burning alleyway_2.jpg"
"cloud strife standing in a burning alleyway_3.jpg"
"a food truck in the slums distrct of midgar city_1.png"
"a food truck in the slums distrct of midgar city_2.png"
"ruined streets of midgar city with a fallen building in the background and people standing around.png"
"ruined streets of midgar city with a fallen building in the background and people standing around_1.png"

Same goes for reg images!!
"a small a 2-story apartment building_ (1).png"
"a small a 2-story apartment building_ (11).png"
"an interior photo of a small hometown bar with a cash register on the counter_ (1).png"
"an interior photo of a small hometown bar with a cash register on the counter_ (2).png"

@victorchall
Copy link
Author

victorchall commented Oct 12, 2022

New ckpt with another epoch (~4080 steps) added to the above at LR 5e-7, about 14k training steps total (~18k with validation?):
https://drive.google.com/file/d/1BpaJi9JtOoekd0cjXBHni9-R9UnpItWk/view?usp=sharing

@victorchall
Copy link
Author

Samples from v4 (not 4.1 posted above): https://imgur.com/a/hVOyRmZ#8xoSy4i

Please look at the comments on each image.

@victorchall
Copy link
Author

New 4.1 samples: https://imgur.com/a/J8lJYrQ

@victorchall
Copy link
Author

victorchall commented Oct 17, 2022

If you're interested in discussing fine tuning techniques moving past the Dreambooth techniques, I'll be sharing more here as well on discord: https://discord.gg/UwM6T5Jp

@IdiotSandwichTheThird
Copy link

Discord invite seems invalid.

@victorchall
Copy link
Author

Discord invite seems invalid.

https://discord.gg/uheqxU6sXN

hopefully permanent link. A lot has happened in the last week! Now using Laion ground truth data for model preservation and up to 2200+ training images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment