Skip to content

Instantly share code, notes, and snippets.

@harishanand95
Last active March 8, 2024 03:19
Show Gist options
  • Star 62 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Stable Diffusion on AMD GPUs on Windows using DirectML

Stable Diffusion for AMD GPUs on Windows using DirectML

UPDATE: A faster (20x) approach for running Stable Diffusion using MLIR/Vulkan/IREE is available on Windows:

https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md

Install 🤗 diffusers

conda create --name sd39 python=3.9 -y
conda activate sd39
pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime
pip install onnx

Install DirectML latest release

You can download the nightly onnxruntime-directml release from the link below

Run python --version to find out, which whl file to download.

  • If you are on Python3.7, download the file that ends with **-cp37-cp37m-win_amd64.whl.
  • If you are on Python3.8, download the file that ends with **-cp38-cp38m-win_amd64.whl
  • and likewise
pip install ort_nightly_directml-1.13.0.dev20220908001-cp39-cp39-win_amd64.whl --force-reinstall

Convert Stable Diffusion model to ONNX format

This apporach is faster than downloading the onnx models files.

wget https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py
  • Run huggingface-cli.exe login and provide huggingface access token.
  • Convert the model using the command below. Models are stored in stable_diffusion_onnx folder.
python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

Run Stable Diffusion on AMD GPUs

Here is an example python code for stable diffusion pipeline using huggingface diffusers.

from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0] 
image.save("astronaut_rides_horse.png")
@averad
Copy link

averad commented Dec 1, 2022

@harishanand95 I will give it a try and update the Instructions.

@averad
Copy link

averad commented Dec 2, 2022

@harishanand95 I wasn't able to test the process as IREE doesn't have support for RX 500 series cards - GCNv3

I've suggested adding def VK_TTA_RGCNv3 : I32EnumAttrCase<"AMD_RGCNv3", 103, "rgcn3">; and am working on compiling IREE with my suggested changes for testing.

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

I am getting 3.85 it/s on my 6900xt on SHARK (vulkan), that is 13 seconds for 50 iterations

@phreeware
Copy link

hi, the exe doesnt work for me following your little guide (using the MLIR driver on 6900XT), im getting errors:
image

ill try the manual guide

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

For me the Advanced Installation worked

@Dwakener
Copy link

Dwakener commented Mar 2, 2023

Time generation ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment