Skip to content

Instantly share code, notes, and snippets.

@mcmonkey4eva
Last active March 30, 2024 13:43
Show Gist options
  • Save mcmonkey4eva/0f0bd074c17802213817a9a5a50098df to your computer and use it in GitHub Desktop.
Save mcmonkey4eva/0f0bd074c17802213817a9a5a50098df to your computer and use it in GitHub Desktop.
SDXL 0.9-ish LoRA Training Fit On an RTX 2070

These are the precise settings and commands I used (minus filenames) for my reddit post https://www.reddit.com/r/StableDiffusion/comments/14jck90/finetuning_sdxl_on_an_rtx_2070_consumer_tier_gpu/

I was at the time running this version of Kohya's trainer (now very outdated): https://github.com/kohya-ss/sd-scripts/tree/747af145ed32eb85205dca144a4e49f25032d130

Some edits were made between runs, this is the lastmost settings I had.

This was ran on an RTX 2070 within 8 GiB VRAM, with latest nvidia drivers. The VRAM limit was burnt a bit during the initial VAE processing to build the cache (there have been improvements since such that this should no longer be an issue, with eg the bf16 or fp16 VAE variants, or tiled VAE).

This was ran on Windows, so a bit of VRAM was used by the OS. You'd be safer on Linux.

accelerate launch --mixed_precision fp16 --num_cpu_threads_per_process 1 sdxl_train_network.py
--network_module=networks.lora --config_file settings.toml --dataset_config dataset.toml
--pretrained_model_name_or_path "my-sdxl-09-ish.safetensors" --output_dir outputs/
--output_name sdxl_arcane_lo2 --logging_dir logs --log_prefix sdxl_arcane_lo-
[general]
flip_aug = true
color_aug = false # true
[[datasets]]
enable_bucket = true
resolution = [1024,1024]
# resolution = [384,384]
max_bucket_reso = 1536
min_bucket_reso = 128
batch_size = 1
random_crop = false # true
shuffle_caption = false
[[datasets.subsets]]
image_dir = "(path-here)/arcane"
num_repeats = 1
caption_extension = ".txt"
keep_tokens = 1
vae_batch_size = 4
cache_latents = true
cache_latents_to_disk = true
save_model_as = "safetensors"
max_train_epochs = 5
save_every_n_epochs = 1
xformers = true
max_data_loader_n_workers = 4
persistent_data_loader_workers = true
seed = 42
gradient_checkpointing = true
mixed_precision = "fp16"
save_precision = "fp16"
optimizer_type = "adamw"
#optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
# lr_scheduler = "constant_with_warmup"
lr_scheduler = "cosine"
lr_warmup_steps = 100
learning_rate = 1e-3
network_train_unet_only = true
max_grad_norm = 0.0
network_dim = 16
network_args = [ "conv_dim=16",]
cache_text_encoder_outputs = true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment