Created
October 11, 2022 18:50
-
-
Save victorchall/67bc53472f86641aef1ebee1e154f5d1 to your computer and use it in GitHub Desktop.
Quick start on caption training with kanewallmann repo
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Using this repo: | |
https://github.com/kanewallmann/Dreambooth-Stable-Diffusion | |
Folder structure, using a project name of "ff7r" for example but you can name it however you want | |
/reg/man/ (all your regularization images of men) | |
/training_samples/ff7r/man (all your images of men to train) | |
/reg/woman/ (all your regulaization images of women) | |
/training_samples/ff7r/woman (all your images of women to train) | |
/reg/group/ (all your regulaization images of groups of people) | |
/training_samples/ff7r/group (all your images of multiple characters in one frame) | |
/reg/city/ (all your regulaization images of city stuff, like "aerial photo of a city at night" or "photo of a city street") | |
/training_samples/ff7r/city (all your images of city styles to train) | |
etc. as many pairings as you want. /indoors, /building, whatever. Make a pairing of the train and reg sets in identical subfolders in your /reg and /training_samples/projectname | |
Python run command to kick off training: | |
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume last.ckpt -n ff7r --gpus 0, --data_root training_samples\ff7r --reg_data_root reg | |
Last successful run: | |
Training images are run through blip interrogator, 16 beams, and files are renamed to that caption it spits out | |
"a man" and "a woman" and so forth are changed to "cloud strife" or "barret wallace", obviously to the correct character name shown in the image | |
Every single training image has a custom caption such as " | |
120-140 images each of Cloud Strife and Barret Wallace in /training_samples/ff7r/man | |
120-140 images each of Aerith Gainsborough and Tifa Lockhart in /training_samples/ff7r/woman | |
80 images of Jessie Rasberry in /training_samples/ff7r/woman | |
60 group photos (various combinations of characters) in /training_samples/ff7r/group | |
30 images of Wedge and Biggs in /training_samples/ff7r/man | |
10 images of red xiii in /training_samples/ff7r/dog | |
10 images of aerial screesshot of midgar city in /training_samples/ff7r/city | |
10 images of city streets and concept art in /training_samples/ff7r/city | |
etc. | |
Results: Cloud, Barret, Aerith, Tifa, and Jessie all look very good. | |
Biggs/wedge look like PS2-era renders and kinda smoothed over, but are at least there, more training samples will fix this | |
Style transfer for city of midgar works fairly well given the limited set | |
Tom Cruise still looks like Tom Cruise, Emma Watson still looks like Emma Watson, etc. | |
"photo of city streets" does not turn into midgar unless "midgar city" or "midgar" is in the prompt | |
There is some degradation, but if you want to generate context mashups with Cloud strife as Captain America it works VERY well, or Robert Downney Jr as Cloud Strife, it still works great | |
Future: | |
1400 images in next training set, more wedge/biggs, etc | |
Adding "slums district" and "business district" in next model, fairly certain it will do extremely well | |
Adding more training images for wedge/biggs, sephiroth, president shinra, heidegger, rufus shinra, etc. | |
Discord invite seems invalid.
Discord invite seems invalid.
hopefully permanent link. A lot has happened in the last week! Now using Laion ground truth data for model preservation and up to 2200+ training images.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If you're interested in discussing fine tuning techniques moving past the Dreambooth techniques, I'll be sharing more here as well on discord: https://discord.gg/UwM6T5Jp