Skip to content

Instantly share code, notes, and snippets.

@harishanand95
Last active March 8, 2024 03:19
Show Gist options
  • Star 62 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Stable Diffusion on AMD GPUs on Windows using DirectML

Stable Diffusion for AMD GPUs on Windows using DirectML

UPDATE: A faster (20x) approach for running Stable Diffusion using MLIR/Vulkan/IREE is available on Windows:

https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md

Install 🤗 diffusers

conda create --name sd39 python=3.9 -y
conda activate sd39
pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime
pip install onnx

Install DirectML latest release

You can download the nightly onnxruntime-directml release from the link below

Run python --version to find out, which whl file to download.

  • If you are on Python3.7, download the file that ends with **-cp37-cp37m-win_amd64.whl.
  • If you are on Python3.8, download the file that ends with **-cp38-cp38m-win_amd64.whl
  • and likewise
pip install ort_nightly_directml-1.13.0.dev20220908001-cp39-cp39-win_amd64.whl --force-reinstall

Convert Stable Diffusion model to ONNX format

This apporach is faster than downloading the onnx models files.

wget https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py
  • Run huggingface-cli.exe login and provide huggingface access token.
  • Convert the model using the command below. Models are stored in stable_diffusion_onnx folder.
python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

Run Stable Diffusion on AMD GPUs

Here is an example python code for stable diffusion pipeline using huggingface diffusers.

from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0] 
image.save("astronaut_rides_horse.png")
@phreeware
Copy link

Hi, i love how you made me able to finally use Stable Diffusion on Windows.

I wanted to try out the seed option explained here:

https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb

and i just can't get it to work. error is:

(1, pipe.unet.in_channels, height // 8, width // 8), AttributeError: 'OnnxRuntimeModel' object has no attribute 'in_channels'

i've tried different things but unfortunately to no avail. do you have any idea how this could work?

cheers!

@19wolf
Copy link

19wolf commented Sep 16, 2022

How do you make it work with other schedulers?

@Stable777
Copy link

Hey Harisha, I still can't copy past the token into my miniconda window, I can't hardcode it this time Can I?

@Stable777
Copy link

Stable777 commented Sep 22, 2022

You forgot pip install wget, in either case "wget" does not work for me (windows)

@harishanand95
Copy link
Author

Hey Harisha, I still can't copy past the token into my miniconda window, I can't hardcode it this time Can I?

Try this approach for huggingface token

  1. Run huggingface-cli.exe login in a command prompt and provide huggingface access token.
  2. Replace use_auth_token=True in the file convert_stable_diffusion_checkpoint_to_onnx.py with use_auth_token="YOUR_TOKEN"

@MilanTodorovic
Copy link

Using this method yields a performance degradation. My RX6600 does about 12s/it compared to the earlier method of using save_onnx.py and dml_onnx.py which yielded around 5s/it. Any ideas what might cause it?

@Saturnix
Copy link

Saturnix commented Sep 25, 2022

On a PC with 8GB of RAM and an RX580 I get this error:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\diffusers\pipeline_utils.py", line 383, in from_pretrained
loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\diffusers\onnx_utils.py", line 182, in from_pretrained
return cls._from_pretrained(
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\diffusers\onnx_utils.py", line 151, in _from_pretrained
model = OnnxRuntimeModel.load_model(os.path.join(model_id, model_file_name), provider=provider)
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\diffusers\onnx_utils.py", line 68, in load_model
return ort.InferenceSession(path, providers=[provider])
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "C:\Users...\Anaconda3\envs\sd39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 395, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(563)\onnxruntime_pybind11_state.pyd!00007FFE741F6051: (caller: 00007FFE741F5DF2) Exception(2) tid(690) 8007000E Not enough memory resources are available to complete this operation.

Do you think this is because of not enough RAM? Can I fix it anyway?

@Stable777
Copy link

Stable777 commented Sep 25, 2022

2. use_auth_token

Hey Harisha, Thanks for your answer,
I have to create a TOKEN diffrent from the first one right? (I still have your first method method runing in the background),
Anyway hardcoded my new token into the "True" value but it did nothing.
Tried both tokens! This time it's different. It's as if I can't even write anything in the miniconda window, and I press enter it says wrong token even if I try to hardcode a token into my convert_stable_diffusion_checkpoint_to_onnx.py file.

@Stable777
Copy link

Found a solution: Click on the top of the conda window -> modify -> paste your NEW TOKEN. It will work. If it does not it will ask you to write this: (git config --global credential.helper store) then restart the login and it should work.

@Stable777
Copy link

For anyone having the error (ModuleNotFoundError: No module named 'onnx') you must write: "pip install onnx" without the "", just before the step wher you laynch the script (convert_stable_diffusion_checkpoint_to_onnx)

@Stable777
Copy link

Stable777 commented Sep 25, 2022

Ok I made it work, i made a script with these:
from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0] 
image.save("astronaut_rides_horse.png")

Then, run that script.
After that, I changed the resolution with:
image = pipe(prompt, height=1024, width=1024, num_inference_steps=2, guidance_scale=5, eta=0.0, execution_provider="DmlExecutionProvider")["sample"][0]
And tried to obtain an 1024x1023 image, it did not work.

..niconda3\envs\sd39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException

Do I have to modify something else somewhere to make the script work?

Edit. The results are different though, I tried the same prompts for you first and the second method i got different results.Interesting.

@mrtolkien
Copy link

Thanks a lot for this, works perfectly for me. Very unfortunate that it's almost an order of magnitude slower than Nvidia GPUs though.

@cpietsch
Copy link

instead of "Convert Stable Diffusion model to ONNX format" you can now download the onnx version directly from https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx
This got me a 1.5x speed increase

@nielsvdp
Copy link

nielsvdp commented Sep 30, 2022

instead of "Convert Stable Diffusion model to ONNX format" you can now download the onnx version directly from https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx This got me a 1.5x speed increase

Stupid question probably, how do you download that? I assume it's just git clone https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx, but authentication keeps on failing for me. I assume you need to use the User Access Token in some way.

Edit: figured it out, had to just get the repo and change branch to onnx. My git is rusty.
Edit2: Now stuck at changing to that onnx tree because that's apparently not the same as a branch.

@cpietsch
Copy link

cpietsch commented Oct 1, 2022

You need to enable LFS.
git lfs install
Also you can directly clone that branch git clone <url> --branch <branch> --single-branch [<folder>]

@maikelsz
Copy link

maikelsz commented Oct 3, 2022

how to run "convert_stable_diffusion_checkpoint_to_onnx.py" (or any converter, I´ve seen two ) with a already downloaded model checkpoint?

@keughai
Copy link

keughai commented Oct 4, 2022

whenever i try to run the huggingface cli.exe- login command it just stops.
image
when i close it out to retry it says there's something running, so is the command just really slow for me or am i doing something wrong? i've tried it with and without the .exe part and it still doesn't do anythin. i tried putting my token after login as well and still no luck haha. i plan to keep it running throughout the night and see if anything changes

@s-show
Copy link

s-show commented Oct 5, 2022

Version
OS Windows11 pro 22H2 (build 22621.608)
CPU AMD Ryzen 5 5600X 6-Core Processor
GPU AMD Radeon RX 6600
Python Python 3.9.12
conda conda 4.12.0
pip pip 21.2.4 from C:\ProgramData\Anaconda3\lib\site-packages\pip (python 3.9)

Following the instructions on this page, I was able to run Stable Diffusion on Windows 11 & AMD GPUs. Thanks a lot for writing this article.
One of the images I want to generate is a Japanese anime style image, so I want to change the model from CompVis/stable-diffusion-v1-4 to naclbit/trinart_stable_diffusion_v2 - Hugging Face.
Therefore, we would like to use python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="naclbit/trinart_stable_diffusion_v2" --output_path=". /naclbit_trinart_diffusers_v2_onnx" The following error occurred when executing the command.

C:\ProgramData\Anaconda3\lib\site-packages\scipy\__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.3
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████| 5/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\hoge\StableDiffusions\convert_stable_diffusion_checkpoint_to_onnx.py", line 215, in <module>
    convert_models(args.model_path, args.output_path, args.opset)
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\hoge\StableDiffusions\convert_stable_diffusion_checkpoint_to_onnx.py", line 73, in convert_models
    pipeline = StableDiffusionPipeline.from_pretrained(model_path, use_auth_token=True)
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\diffusers\pipeline_utils.py", line 300, in from_pretrained
    config_dict = cls.get_config_dict(cached_folder)
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\diffusers\configuration_utils.py", line 201, in get_config_dict
    raise EnvironmentError(
OSError: Error no file named model_index.json found in directory C:\Users\hoge/.cache\huggingface\diffusers\models--naclbit--trinart_stable_diffusion_v2\snapshots\8c916a773e4795927ab986fcec351ae112c2b3c7.

Then ayan4m1/trinart_diffusers_v2 at main (the original model of naclbit/trinart_stable_diffusion_v2) and tried to use python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="ayan4m1/trinart_diffusers_v2" --output_path=". /ayan4m1_trinart_diffusers_v2_onnx" command. No error here.

Then I ran the pipe = StableDiffusionOnnxPipeline.from_pretrained(". /ayan4m1_trinart_diffusers_v2_onnx", provider="DmlExecutionProvider") and run image = pipe(prompt, height=512, width=512)["sample"][0], the following error occurs the following error occurs: image = pipe(prompt, height=512, width=512)["sample"][0].

Traceback (most recent call last):
  File "C:\Users\hoge\StableDiffusions\test_stable_diffusions.py", line 30, in <module>
    image = pipe(prompt, height=512, width=512)["sample"][0]
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_onnx.py", line 132, in __call__
    noise_pred = self.unet(
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\diffusers\onnx_utils.py", line 51, in __call__
    return self.model.run(None, inputs)
  File "C:\Users\hoge\AppData\Roaming\Python\Python39\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int32)) , expected: (tensor(int64))

Is it possible to change the model from CompVis/stable-diffusion-v1-4 when using Stable Diffusion following this procedure?

Note that the Python script I am running is as follows

from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.from_pretrained("./ayan4m1_trinart_diffusers_v2_onnx", provider="DmlExecutionProvider")

prompt = "japanese anime of a beaultiful girl,high school uniform,beautiful composition,cinematic lighting,glasses,detailed blond hair,detailed human eyes,detailed mouth,detailed arms,detailed bust,pixiv,light novel,digital painting,extremely detailed,sharp focus,ray tracing,4k,cinematic postprocessing"

for i in range(3):
  image = pipe(prompt, height=512, width=512)["sample"][0]
  date = datetime.now().strftime("%Y%m%d_%H%M%S")
  path = dir_name + "/" +date + ".png"
  image.save(path)

@cstueckrath
Copy link

cstueckrath commented Oct 6, 2022

change scheduler.
There is astype(np.int64) in scheduling_pndm.py line 168 but not in other schedulers
That's why change to PNDMScheduler can fix this. Or modify the other schedulers yourself.
have a look here for a more verbose explanation: https://www.travelneil.com/stable-diffusion-updates.html#the-first-thing

@AssasinatorCzar
Copy link

instead of "Convert Stable Diffusion model to ONNX format" you can now download the onnx version directly from https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx This got me a 1.5x speed increase

Did it work??

@s-show
Copy link

s-show commented Oct 8, 2022

@cstueckrath
I was able to get ayan4m1/trinart_diffusers_v2 at main working with the information you provided.
Here is what I did.

  1. I use pip show diffusers command to see where diffusion package is stored (c:\users\hoge\appdata\roaming\python\python39\site-packages).
  2. I edit \diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_onnx.py as follows.
# predict the noise residual
noise_pred = self.unet(
-  sample=latent_model_input, timestep=np.array([t]), encoder_hidden_states=text_embeddings
)
+  sample=latent_model_input, timestep=np.array([t], dtype=np.int64), encoder_hidden_states=text_embeddings
)
  1. I edit Python Script as follows.
+ scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="np")
- pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
+ pipe = StableDiffusionOnnxPipeline.from_pretrained("./ayan4m1_trinart_diffusers_v2_onnx", scheduler=scheduler)

@if-ai
Copy link

if-ai commented Oct 8, 2022

whenever i try to run the huggingface cli.exe- login command it just stops. image when i close it out to retry it says there's something running, so is the command just really slow for me or am i doing something wrong? i've tried it with and without the .exe part and it still doesn't do anythin. i tried putting my token after login as well and still no luck haha. i plan to keep it running throughout the night and see if anything changes

_ I did this I guess _

pip install huggingface_hub
python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('MY_HUGGINGFACE_TOKEN_HERE')"

@Stable777
Copy link

Stable777 commented Oct 9, 2022

I am running into this problem:

  from diffusers import StableDiffusionOnnxPipeline
ModuleNotFoundError: No module named 'diffusers'

I hate to create a new sd39 to solve it, old one became inneficient.

@Stable777
Copy link

Question to all:
Does anyone knows to select the "seed"?
I want to make my results more DETERMINISTIC, by selecting each time the seed that I Like more for specific type of images.

Thanks

@SpandexWizard
Copy link

instead of "Convert Stable Diffusion model to ONNX format" you can now download the onnx version directly from https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx This got me a 1.5x speed increase

honestly i'm going to feel like a brick for having to ask this but... HOW do i download it?

@cpietsch
Copy link

@SpandexWizard

git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 --branch onnx --single-branch sd-onnx

@SpandexWizard
Copy link

@SpandexWizard

git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 --branch onnx --single-branch sd-onnx

well that worked a treat. went from generating images in 1:50 to 1 minute flat.

is it possible to use this thing with img2img?

@cpietsch
Copy link

there was a pull request for img2img: huggingface/diffusers#552
If you have linux, just go with https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs which is much faster

@SpandexWizard
Copy link

SpandexWizard commented Oct 10, 2022

huh.... neat. last i knew the preferred way was to try and run it through rocm and plain stable diffusion (iirc), and that was a disaster for me. for some reason it just did NOT want to work with my gpu. actually was thrilled to get it to work at all, let alone on windows. maybe i'll give this a go.

@SpandexWizard
Copy link

while i'm still messing with this version, has anyone got any scale other than 512 to work? iirc i read on their personal website where this tutorial is hosted that it doesnt work yet?

@Stable777
Copy link

@SpandexWizard

git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 --branch onnx --single-branch sd-onnx

well that worked a treat. went from generating images in 1:50 to 1 minute flat.

is it possible to use this thing with img2img?

Hello
I am so lost, could you redescribe the whole tutorial including all the changes you made? like from step 0 (from this: conda create --name sd39 python=3.9 -y..
to you generating images in 1 minute.
(you are using AMD right?)

have no idea how to use Img2Img with this github version by the way, how do you do that aswell?
I would be grateful if you write that down

@Stable777
Copy link

Stable777 commented Oct 11, 2022

while i'm still messing with this version, has anyone got any scale other than 512 to work? iirc i read on their personal website where this tutorial is hosted that it doesnt work yet?

I have been searching all over internet, it seems there is some limitation to it, 512x768 can be done, even a bit higher but not much much more.

Quick question about the downloading sd-onnx, what Am I supposed to to with this new sd-onnx?
thanks

@Stable777
Copy link

New:
I remplaced it in this line:
pipe = StableDiffusionOnnxPipeline.from_pretrained("./sd-onnx", provider="DmlExecutionProvider")
I do however get this:
[W:onnxruntime:, inference_session.cc:490 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.

Don't know what that means or if that's a problem or not

@Stable777
Copy link

Question @harishanand95 , is there a way yo use the Image to Image (img2img) feature with this AMD method?

@lordzerg
Copy link

I think for now there is no way to use img2img with AMD, I hope soon we can use it. Also if I'm wrong I want to know too :)

@Stable777
Copy link

Ok thanks, someone said it is possible in Windows something about Torsh..
I have ANOTHER QUESTION:

What line of code should I add to get the SEED of an image I am generating?

Thanks

@averad
Copy link

averad commented Oct 14, 2022

@Stable777 @harishanand95 to use a seed it needs to be turned into latents using torch.randn before the pipe is called to generate the image.

import torch
import numpy as np
from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", torch_dtype=torch.float16,)
prompt = "a photo of an astronaut riding a horse on mars"
height = 512
width = 512
num_inference_steps = 75
guidance_scale = 7.5
eta = 0.0
seed = 239571688563800
generator = torch.Generator()
#seed = generator.seed()
generator = generator.manual_seed(seed)
latents = torch.randn((1, 4, height // 8, width // 8),generator = generator)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, eta, latents = latents, execution_provider="DmlExecutionProvider").images[0]
image.save("astronaut_rides_horse.png")

Prompt: a photo of an astronaut riding a horse on mars - Seed: 239571688563800
10152022-041206

Below is an example script for generating an image using a random seed + some logging and getting the prompt via console user input.

Stable Diffusion Onnx DirectML Text to Img:

  • User is prompted in console for "Prompt Text"
  • Image is generated using a random seed + Prompt Text
  • Date/Time, Prompt Text, Seed & Completion Time is logged in a Txt File "prompts.txt"
  • Image is saved, named date-time.png (date-time = time image generation was started)
  • User is asked for another prompt or q to quit.
import os
import sys
import time
import torch
import numpy as np
from diffusers import StableDiffusionOnnxPipeline

height = 512
width = 512
num_inference_steps = 75
guidance_scale = 7.5
eta = 0.0
prompt=""
variations=""
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", torch_dtype=torch.float16,)

def txt_to_img(prompt):
    generator = torch.Generator()
    seed = generator.seed()
    generator = generator.manual_seed(seed)
    latents = torch.randn(
        (1, 4, height // 8, width // 8),
        generator = generator
    )
    gen_time = time.strftime("%m%d%Y-%H%M%S")
    start_time = time.time()
    image = pipe(prompt, height, width, num_inference_steps, guidance_scale, eta, latents = latents, execution_provider="DmlExecutionProvider").images[0] 
    image.save("./" + gen_time + ".png")
    log_info = "\n" + gen_time + " - Prompt: " + prompt + " - Seed: " + str(seed) + " (" + str(time.time() - start_time) + "s)"
    with open('./prompts.txt', 'a+', encoding="utf-8") as f:
        f.write(log_info)
    image = None

os.system('cls')
print('Stable Diffusion Onnx DirectML\nText to Img\n')
while prompt != "q":
    while prompt == "":
        prompt = input('Please Enter Prompt (or q to quit): ')
    if prompt != "q":
        while variations == "":
            variations = input('How many image variations?: ')
            if variations.isnumeric() == False:
                variations = ""
        for i in range(int(variations)):
            txt_to_img(prompt)
        prompt = ""
        variations = ""
pipe = None
os.system('cls')
sys.exit("Quit Called, Script Ended")

Prompt: a photo of an astronaut riding a horse on mars - Seed: 239155044343300
10152022-040509

@adammehaney
Copy link

Are negative prompts possible at all with this?

@averad
Copy link

averad commented Oct 21, 2022

The Stable Diffusion 1.5 weights and Onnx Files have been released. You will need to go to: https://huggingface.co/runwayml/stable-diffusion-v1-5 and review and accept the usage/download agreement.

You can download the SD v1.5 Onnx files using the following command
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 --branch onnx --single-branch stable_diffusion_onnx

I've put up some updated instructions for the install process:
https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674

@averad
Copy link

averad commented Oct 21, 2022

Are negative prompts possible at all with this?

Yes

from diffusers import StableDiffusionOnnxPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
eta=0.0
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="purple"
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
mage = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt, eta).images[0] 
image.save("astronaut_rides_horse.png")

@averad
Copy link

averad commented Oct 21, 2022

@harishanand95 looks like the process works up to diffusers==0.5.0 after that StableDiffusionOnnxPipeline is changed to OnnxStableDiffusionPipeline

@averad
Copy link

averad commented Oct 23, 2022

I think for now there is no way to use img2img with AMD, I hope soon we can use it. Also if I'm wrong I want to know too :)

@lordzerg @Stable777 An Onnx Img2Img Pipeline has been added in Diffusers 0.6.0
huggingface/diffusers#552
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion

@averad
Copy link

averad commented Oct 23, 2022

change scheduler. There is astype(np.int64) in scheduling_pndm.py line 168 but not in other schedulers That's why change to PNDMScheduler can fix this. Or modify the other schedulers yourself. have a look here for a more verbose explanation: https://www.travelneil.com/stable-diffusion-updates.html#the-first-thing

If anyone is wondering how to change to PNDMScheduler for your specific model that is not working (Such as the trinart or wifu models). Open the model_index.json file (Located in the model folder you are trying to use) and edit the scheduler option.

@exaltedb
Copy link

image
I'm having a bit of an issue trying to convert the model. Every time I try to run the command under python 3.10.8 it fails referring to line 24 of the .py file. Anything that I could be doing wrong?

@SpandexWizard
Copy link

i'm now trying to convert other models i've already downloaded and the conversion script is yelling at me about invalid repo id's. but i'm not trying to use a repo? does anyone know how to point the convert_stable_diffusion_checkpoint_to_onnx.py at a downloaded model?

@harishanand95
Copy link
Author

Unfortunately I don't have time to update the instructions, please follow @averad 's instructions for diffusers>=0.6.0 Thanks! https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674

@averad
Copy link

averad commented Nov 3, 2022

Unfortunately I don't have time to update the instructions, please follow @averad 's instructions for diffusers>=0.6.0 Thanks! https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674

Thank you @harishanand95 for all you and your team at AMD are doing!

@claforte
Copy link

FYI, @harishanand95 is documenting how to use IREE (https://iree-org.github.io/iree/) through the Vulkan API to run StableDiffusion text->image. We expect to release the instructions next week. In our tests, this alternative toolchain runs >10X faster than ONNX RT->DirectML for text->image, and Nod.ai is also working to support img->img soon... we think the performance difference is in part explained by MLIR and IREE being a compiler toolchain, compared to ORT that's more of an interpreter. If you're interested in learning more and supporting this new code path, please email me at claforte at my employer's domain, or send me a Discord friend invite at claforte (my number is #7115). BTW I'm also trying to get the authorization to reward the most helpful open-source developers with a few Navi2 and Navi3 GPUs (soon after they are officially released). :-)

@nomanHasan
Copy link

Thank you @claforte @harishanand95 for your efforts at making Stable Diffusion more accessible. I run an RX 580, GFX803 which seems to have lost AMD ROCM support long ago. Still, the internet is full of workarounds that do not work in my experience. Looking forward to your guy's hard work to get us to use the open-source API method.

@cpietsch
Copy link

The main issue here is the windows route. If you use linux you can even use the goto stable diffusion UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
Still, I would love to see windows support through the Vulkan API.
If I understand it correctly we need to convert the SD model to SPIR-V using iree-compiler?
There is an example using SHARK: https://github.com/nod-ai/SHARK/blob/b448770ec26d8b8b0cf332f752915ac39b02d935/shark/examples/shark_inference/stable_diff.py

@nomanHasan
Copy link

@cpietsch It doesn't work for Linux very well. The Linux-exclusive ROCM only properly support their workstation GPUs and support for consumer GPUs is lagging. You'd have to follow weird workarounds to get them working on the recent cards. And for slightly older cards like GFX803, it turns out to be impossible.

@cpietsch
Copy link

Oh sorry about that. It worked out of the box for my Radeon VII and I thought that that this was the same for the rest.

@harishanand95
Copy link
Author

Hello everyone. As Christian mentioned, we have added a new pipeline for AMD GPUs using MLIR/IREE. This approach significantly boosts the performance of running Stable Diffusion in Windows and avoids the current ONNX/DirectML approach.

Instructions: https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md

Please reach out to us on the discord link on the instructions page or create GitHub issues if something does not work for you.

Thanks!

@averad, Could you please give it a try and update your instructions too? You can reach us on the discord channel if you have any questions, Thanks!

@averad
Copy link

averad commented Dec 1, 2022

@harishanand95 I will give it a try and update the Instructions.

@averad
Copy link

averad commented Dec 2, 2022

@harishanand95 I wasn't able to test the process as IREE doesn't have support for RX 500 series cards - GCNv3

I've suggested adding def VK_TTA_RGCNv3 : I32EnumAttrCase<"AMD_RGCNv3", 103, "rgcn3">; and am working on compiling IREE with my suggested changes for testing.

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

I am getting 3.85 it/s on my 6900xt on SHARK (vulkan), that is 13 seconds for 50 iterations

@phreeware
Copy link

hi, the exe doesnt work for me following your little guide (using the MLIR driver on 6900XT), im getting errors:
image

ill try the manual guide

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

For me the Advanced Installation worked

@Dwakener
Copy link

Dwakener commented Mar 2, 2023

Time generation ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment