Skip to content

Instantly share code, notes, and snippets.

@chavinlo
Last active November 26, 2022 20:20
Show Gist options
  • Save chavinlo/29b5d7af364f9a0beb317ea13580a937 to your computer and use it in GitHub Desktop.
Save chavinlo/29b5d7af364f9a0beb317ea13580a937 to your computer and use it in GitHub Desktop.

November 26th Hivemind Test Run

Instructions:

For Windows Users:

Hivemind only works on Linux. Thankfully you can still use WSL to run the training.

Follow this guide: https://learn.microsoft.com/es-es/windows/wsl/install

Basically, you should open a CMD window (the black window with white words) and type:

wsl --install -d ubuntu

Once you run it, Ubuntu will install as WSL on your computer. It will ask you for account creation, username, password, and prompt you with a terminal. Once you are there you can follow the rest of this guide. If you need help ask me on the discord.

1.- Setup

Install the following packages corresponding to your distribution: htop screen psmisc python3-pip unzip wget gcc g++ nano

On ubuntu: apt-get install htop screen psmisc python3-pip unzip wget gcc g++ nano -y

Then, install the Python Packages:

pip install diffusers>=0.5.1 numpy==1.23.4 wandb==0.13.4 torch torchvision transformers>=4.21.0 huggingface-hub>=0.10.0 Pillow==9.2.0 tqdm==4.64.1 ftfy==6.1.1 bitsandbytes pynvml~=11.4.1 psutil~=5.9.0 accelerate==0.13.1 scipy==1.9.3 hivemind triton==2.0.0.dev20221120

Optional, install xformers for larger batch size: conda install xformers -c xformers/label/dev

2.- Download Trainer

The trainer is available here: https://gist.github.com/chavinlo/7b03320b1a519c47edd365835366aee5

To download it directly into your instance, you can use wget:

wget https://gist.githubusercontent.com/chavinlo/7b03320b1a519c47edd365835366aee5/raw/f394b89b01a423d4d0a6cb5ad61e6ec49c2e9358/trainer.py

If your system does not has Wget, you can use curl, which is included on most distributions:

curl https://gist.githubusercontent.com/chavinlo/7b03320b1a519c47edd365835366aee5/raw/f394b89b01a423d4d0a6cb5ad61e6ec49c2e9358/trainer.py -o trainer.py

3.- Configuration

In the meantime the trainer only supports CLI flags. I will add a YAML soon.

Write the following into a text file named "run.sh":

torchrun --nproc_per_node=1 \
	trainer.py \
	--workingdirectory hivemindtemp \
	--wantedimages 500 \
	--datasetserver="DATASET_SERVER_IP" \
	--node="true" \
	--o_port1=LOCAL_TCP_PORT \
	--o_port2=LOCAL_UDP_PORT \
	--ip_is_different="true" \
	--p_ip="PUBLIC_IP" \
	--p_port1=PUBLIC_TCP_PORT \
	--p_port2=PUBLIC_UDP_PORT \
	--batch_size 2 \
	--use_xformers="true" \
	--save_steps 1000 \
	--image_log_steps 400 \
	--hf_token="YOUR HUGGIGNFACE TOKEN" \
	--model runwayml/stable-diffusion-v1-5 \
	--run_name testrun1 \
	--gradient_checkpointing="true" \
	--use_8bit_adam="false" \
	--fp16="true" \
	--resize="true" \
	--wandb="false" \
	--no_migration="true" \
	

OR

Download the following file:

Via Wget:

wget https://gist.githubusercontent.com/chavinlo/35e304fc0015dc746d270caa1e327111/raw/efadd14db24aef14cf3143f5bf4456014cdc0e36/run.sh

Via curl:

curl https://gist.githubusercontent.com/chavinlo/35e304fc0015dc746d270caa1e327111/raw/efadd14db24aef14cf3143f5bf4456014cdc0e36/run.sh -o run.sh

Once you have it on your computer run: chmod +x run.sh

Now, go to https://huggingface.co/runwayml/stable-diffusion-v1-5 and accept the terms with your huggingfaces account.

3.1.- Configuration Vars

This is the most important part. On the file you just created you need to change certain parts:

Required

DATASET_SERVER_IP --> Provided Dataset Server IP (check discord)

LOCAL_TCP_PORT --> Local port to get TCP requests

LOCAL_UDP_PORT --> Local port to get UDP requests

YOUR HUGGINGFACE TOKEN -> Your HuggingFace Token, you can create or find one here: https://huggingface.co/settings/tokens

If you dont want to extend the network:

set ip_is_different="true" to ip_is_different="false"

change PUBLIC_IP to 127.0.0.1

change PUBLIC_TCP_PORT to 0

change PUBLIC_UDP_PORT to 0

Please, please if you don't know what port forwarding is or not sure how your network is configured DO NOT TAKE THE TIME TO DO THE REST BELOW!!!!!

Optional

If only, AND ONLY YOU WOULD LIKE TO EXTEND THE NETWORK (NOT REQUIRED!!!!), and your instance is behind a firewall, NAT, or has portforwarding:

leave ip_is_different="true" as it is

change PUBLIC_IP to your public IP

change PUBLIC_TCP_PORT to the PUBLIC port thats forwarding into your LOCAL_TCP_PORT

change PUBLIC_UDP_PORT to the PUBLIC port thats forwarding into your LOCAL_UDP_PORT

Batch size, Images per round

If you want to increase the batch size:

change the "2" next to the "batch_size" to a higher number. usually its 2GB every 1 increase

For RTX3090 users, 2 is the max. batch size with XFORMERS enabled. If you dont have XFORMERS enabled, set it to 1 AND set "use_xformers="true"" to "use_xformers="false""

If you want to process more images per round (and spend less time downloading files), change "wanted_images 500" to "wanted_images 1000" or some higher number

DO NOT CHANGE THE REST!!!

Do not change anything else, they are already set for you.

I will release a MUCH simpler setup soon, along with PROPER documentation

SUPPORT

if you need help go to the private channel at https://discord.gg/NPQsdPeA

@mchaker
Copy link

mchaker commented Nov 26, 2022

DATASET_SERVER_IP is IP:PORT format
min 24GB VRAM required

running is as simple as ./run.sh (if chmod +x run.sh -- if not, do bash ./run.sh)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment