Skip to content

Instantly share code, notes, and snippets.

@averad
Forked from harishanand95/Stable_Diffusion.md
Last active March 6, 2024 17:05
Show Gist options
  • Star 63 You must be signed in to star a gist
  • Fork 13 You must be signed in to fork a gist
  • Save averad/256c507baa3dcc9464203dc14610d674 to your computer and use it in GitHub Desktop.
Save averad/256c507baa3dcc9464203dc14610d674 to your computer and use it in GitHub Desktop.
Stable Diffusion on AMD GPUs on Windows using DirectML

🤗 Stable Diffusion for AMD GPUs on Windows using DirectML

Requirements

Installation

Create a Folder to Store Stable Diffusion Related Files

  • Open File Explorer and navigate to your prefered storage location.
  • Create a new folder named "Stable Diffusion" and open it.
  • In the navigation bar, in file explorer, highlight the folder path and type cmd and press enter.

Install 🤗 diffusers

The following steps creates a virtual environment (using venv) named sd_env (in the folder you have the cmd window opened to). Then it installs diffusers (latest from main branch), transformers, onnxruntime, onnx, onnxruntime-directml and protobuf:

pip install virtualenv
python -m venv sd_env
.\sd_env\Scripts\activate
python -m pip install --upgrade pip
pip install git+https://github.com/huggingface/diffusers.git
pip install git+https://github.com/huggingface/transformers.git
pip install onnxruntime onnx torch ftfy spacy scipy
pip install onnxruntime-directml --force-reinstall
pip install protobuf==3.20.1

To exit the virtual environment, close the command prompt. To start the virtual environment go to the scripts folder in sd_env and open a command prompt. Type activate and the virtual environment will activate.

Download the Stable Diffusion ONNX model

red-stop-icon

You will need to go to: https://huggingface.co/runwayml/stable-diffusion-v1-5 and https://huggingface.co/runwayml/stable-diffusion-inpainting. Review and accept the usage/download agreements before completing the following steps.

  • stable-diffusion-v1-5 uses 5.10 GB
  • stable-diffusion-inpainting uses 5.10 GB

If your model folders are larger, open stable_diffusion_onnx and stable_diffusion_onnx_inpainting and delete the .git folders

git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 --branch onnx --single-branch stable_diffusion_onnx
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting --branch onnx --single-branch stable_diffusion_onnx_inpainting

Enter in your HuggingFace credentials and the download will start. Once complete, you are ready to start using Stable Diffusion

Scripts / Examples

Copy one of the examples below and save it as a .py file. Then you type "python name_of_the_file.py" in a cmd window.

Stable Diffusion Txt 2 Img on AMD GPUs

Here is an example python code for the Onnx Stable Diffusion Pipeline using huggingface diffusers.

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"
pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

image

Stable Diffusion Img 2 Img on AMD GPUs

Here is an example python code for Onnx Stable Diffusion Img2Img Pipeline using huggingface diffusers.

import time
import torch
from PIL import Image
from diffusers import OnnxStableDiffusionImg2ImgPipeline

init_image = Image.open("test.png")
prompt = "A fantasy landscape, trending on artstation"

pipe = OnnxStableDiffusionImg2ImgPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", revision="onnx", safety_checker=None)
image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images[0] 
image.save("test-output.png")

Stable Diffusion Inpainting on AMD GPUs

Here is an example python code for the Onnx Stable Diffusion Inpaint Pipeline using huggingface diffusers.

import torch
from PIL import Image
from diffusers import OnnxStableDiffusionInpaintPipeline

pipe = OnnxStableDiffusionInpaintPipeline.from_pretrained("./stable_diffusion_onnx_inpainting", provider="DmlExecutionProvider", revision="onnx", safety_checker=None)

init_image = Image.open("test.png")
init_image = init_image.resize((512, 512))
mask_image = Image.open("mask.png")
mask_image = mask_image.resize((512, 512))
prompt = "Face of a yellow cat, high resolution"

image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5).images[0] 
image.save("test-output.png")

Inpaint Images need to be width 512 height 512

You can make an image mask using photopea

Example Txt2Img Script With More Features

  • User is prompted in console for Image Parameters
  • Date/Time, Image Parameters & Completion Time is logged in a Txt File "prompts.txt"
  • Image is saved, named date-time.png (date-time = time image generation was started)
  • User is asked for another prompt or q to quit.
import os
import gc
import sys
import time
import traceback
import numpy as np
from diffusers import OnnxStableDiffusionPipeline
from diffusers import (
    DDPMScheduler,
    DDIMScheduler,
    PNDMScheduler,
    LMSDiscreteScheduler,
    EulerDiscreteScheduler,
    EulerAncestralDiscreteScheduler,
    DPMSolverMultistepScheduler,
)

output_folder = "complete"
log_folder = "complete"
models_folder = "model"

def choose_model():
    model=None
    while model == None:
        os.system('cls')
        print('Stable Diffusion Onnx DirectML\nText to Img\n')
        model_root = os.path.realpath(os.path.dirname(__file__))+"\\"+models_folder+"\\"
        model_list = [ item for item in os.listdir(model_root) if os.path.isdir(os.path.join(model_root, item)) ]
        if len(model_list) <= 0 or model_list == None:
            call_quit(
                "No models found.\nPlease place your model folders in "+str(model_root)+" or update the\
'models_folder' variable in this script")
        model_choices = "Avalible Models\n"
        x = 1
        for i in model_list:
            model_choices += str(x) + " (" + str(i) + ")\n"
            x += 1
        model_choices += "Please Choose a Model#: (or q to quit): "
        user_input_model = input(model_choices)
        if user_input_model == "q":
            call_quit("Quit Called, Script Ended")
        if user_input_model.isnumeric():
            if int(user_input_model) >= 0 and int(user_input_model) <= len(model_list):
                model = str(model_root)+str(model_list[int(user_input_model)-1])
        else:
            model = None
    return model

def choose_scheduler(model):
    sched=None
    scheduler_list = [
        [1,DDPMScheduler.from_pretrained(model, subfolder="scheduler"),"DDPMScheduler"],
        [2,DDIMScheduler.from_pretrained(model, subfolder="scheduler"),"DDIMScheduler"],
        [3,PNDMScheduler.from_pretrained(model, subfolder="scheduler"),"PNDMScheduler"],
        [4,LMSDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),"LMSDiscreteScheduler"],
        [5,EulerAncestralDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),
        "EulerAncestralDiscreteScheduler"],
        [6,EulerDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),"EulerDiscreteScheduler"],
        [7,DPMSolverMultistepScheduler.from_pretrained(model, subfolder="scheduler"),
        "DPMSolverMultistepScheduler"],
    ]
    os.system('cls')
    scheduler_choices = "Avalible Schedulers\n"
    for i in scheduler_list:
        scheduler_choices += str(i[0]) + " (" + str(i[2]) + ")\n"
    scheduler_choices += "Please Choose a Scheduler#: (or q to quit): "
    while sched == None:
        os.system('cls')
        print('Stable Diffusion Onnx DirectML\nText to Img\n')
        user_input_sched = input(scheduler_choices)
        if user_input_sched == "q":
            call_quit("Quit Called, Script Ended")
        for i in scheduler_list:
            if user_input_sched == str(i[0]):
                sched = i[1]
                sched_txt = str(i[2])
    return sched_txt, sched;

def loadPipe(model=None, sched=None, sched_txt=None, provider="DmlExecutionProvider"):
    pipe = None
    if model == None:
        model = choose_model()
    if sched_txt == None and sched == None:
        sched_txt, sched = choose_scheduler(model)
    os.system('cls')
    pipe = OnnxStableDiffusionPipeline.from_pretrained(
        model,
        revision="onnx",
        provider=provider, 
        safety_checker=None,
        scheduler=sched,
    )
    return model, pipe, sched_txt, sched;

def txt_to_img(prompt, negative_prompt, num_inference_steps, guidance_scale, width, height, seed):
    gen_time = time.strftime("%m%d%Y-%H%M%S")
    rng = np.random.RandomState(seed)
    start_time = time.time()
    image = pipe(
        prompt,
        height,
        width,
        num_inference_steps,
        guidance_scale,
        negative_prompt,
        generator = rng,
        ).images[0]
    image.save("./complete/" + gen_time + ".png")
    del image
    del rng
    gc.collect()
    log_info = "\n" + gen_time + " - Seed: " + str(seed) + " - Gen Time: "+ str(time.time() - start_time) + "s"
    with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
        f.write(log_info)

def check_folders(output_folder, log_folder, models_folder):
    output_folder_check = os.path.isdir(output_folder)
    if not output_folder_check: 
        os.makedirs(output_folder)
    log_folder_check = os.path.isdir(log_folder)
    if not log_folder_check:
        os.makedirs(log_folder)
    models_folder_check = os.path.isdir(models_folder)
    if not models_folder_check:
        os.makedirs(models_folder)

def call_quit(msg):
    try:
        del pipe
    except:
        None
    gc.collect()
    os.system('cls')
    sys.exit(str(msg))

check_folders(output_folder, log_folder, models_folder)
user_input = None
prev_height = None
prev_width = None
reload = False
error = ["", False]
model, pipe, sched_txt, sched = loadPipe()
while user_input == None:
    os.system('cls')
    print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
    prompt=None
    while prompt == "" or prompt == None:
        prompt = input('Please Enter Prompt (or q to quit): ')
    if prompt != "q":
        negative_prompt = input('Please Enter Negative Prompt (Optional): ')
        variations=None
        while variations == None:
            variations = input('How Many Images? (Optional): ')
            if variations.isnumeric() == False:
                variations = None
            if variations == 0 or variations == "" or variations == None :
                variations = "1"
        num_inference_steps = input('Please Enter # of Inference Steps (Optional): ')
        if num_inference_steps.isnumeric() == False:
            num_inference_steps = 50
        guidance_scale =  input('Please Enter Guidance Scale (Optional): ')
        if guidance_scale.isnumeric() == False:
            guidance_scale = 7.5
        width = input('Please Enter Width 512 576 640 704 768 832 896 960 (Optional): ')
        if width.isnumeric() == False:
            width = 512
        if prev_width != None:
            if prev_width != width:
                prev_width = width
                reload = True
        else:
            prev_width = width
        height = input('Please Enter Height 512 576 640 704 768 832 896 960 (Optional): ')
        if height.isnumeric() == False:
            height = 512
        if prev_height != None:
            if prev_height != height:
                prev_height = height
                reload = True
        else:
            prev_height = height
        seed = input('Please Enter Seed (Optional): ')
        if seed.isnumeric() == False:
            seed = None
        gen_time = time.strftime("%m%d%Y-%H%M%S")
        log_info = "\n" + gen_time + " - Model: " + model + " Scheduler: " + sched_txt
        log_info += "\n" + gen_time + " - Prompt: " + prompt
        log_info += "\n" + gen_time + " - Neg_Prompt: " + negative_prompt
        log_info += "\n" + gen_time + " - Inference Steps: " + str(num_inference_steps) + " Guidance Scale: " \
+ str(guidance_scale) + " Width: " + str(width) + " Height: " + str(height)
        with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
            f.write(log_info)
        if seed == "" or seed == None:
            rng = np.random.default_rng()
            seed = rng.integers(np.iinfo(np.uint32).max)
        else:
            try:
                seed = int(seed) & np.iinfo(np.uint32).max
            except ValueError:
                seed = hash(seed) & np.iinfo(np.uint32).max
        seeds = np.array([seed], dtype=np.uint32)
        if int(variations) > 1:
            seed_seq = np.random.SeedSequence(seed)
            seeds = np.concatenate((seeds, seed_seq.generate_state(int(variations) - 1)))
        if reload == True:
            del pipe
            gc.collect()
            model, pipe, sched_txt, sched = loadPipe(model, sched, sched_txt)
            reload == False
        os.system('cls')
        print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
        for i in range(int(variations)):
            print(str(i+1) + "/" + str(variations))
            try:
                txt_to_img(str(prompt), str(negative_prompt), int(num_inference_steps), int(guidance_scale), int(width), int(height), int(seeds[i]))
            except KeyboardInterrupt:
                gen_time = time.strftime("%m%d%Y-%H%M%S")
                log_info = "\n" + gen_time + " - Error: Keyboard Interrupt"
                log_info += "\n--------------------------------------------------"
                with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
                    f.write(log_info)
                call_quit("CTRL+C Pressed, Script Ended")
            except Exception as e:
                error = [str(e),True]
                gen_time = time.strftime("%m%d%Y-%H%M%S")
                log_info = "\n" + gen_time + " - Error: " + str(e)
                log_info += "\n" + gen_time + " - " + traceback.format_exc()
                with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
                    f.write(log_info)
                break
        log_info = "\n--------------------------------------------------"
        with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
            f.write(log_info)
        prompt = None
        variations = None
        os.system('cls')
        print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
        if error[1] == True:
            print("Image Generation Failed\nError: " + error[0] + "\nSee './" + log_folder + "/prompts.txt' for more info\n")
            error = ["", False]
        change_model = ""
        while change_model == "":
            change_model = input('Change Model? (y/n) or (q to quit): ')
            if change_model == "y" or change_model == "Y":
                del pipe
                model, pipe, sched_txt, sched = loadPipe()
            elif change_model == "q":
                call_quit("Quit Called, Script Ended")
            elif change_model == "n" or change_model == "N":
                change_sched = ""
                while change_sched == "":
                    change_sched = input('Change Scheduler? (y/n) or (q to quit): ')
                    if change_sched == "y" or change_sched == "Y":
                        sched_txt, sched = choose_scheduler(model)
                        if type(pipe.scheduler) is not type(sched):
                            pipe.scheduler = sched
                    elif change_sched == "q":
                        call_quit("Quit Called, Script Ended")
            else:
                change_model = ""
    else:
        call_quit("Quit Called, Script Ended")

Output

prompts.txt

10232022-233730 - Model: ./stable_diffusion_onnx
10232022-233730 - Prompt: cat
10232022-233730 - Neg_Prompt: dog
10232022-233730 - Inference Steps: 50 Guidance Scale: 7.5 Width: 512 Height: 512
10232022-233730 - Seed: 22220167420300 - Gen Time: 250.15623688697815s

image

Convert Stable Diffusion model to ONNX format

Some Models are not avalible in Onnx format and will need to be converted.

Install wget for Windows

  1. Download wget for Windows and install the package.
  2. Copy the wget.exe file into your C:\Windows\System32 folder.

Convert Original Stable Diffusion to Diffusers (Ckpt File)

Notes:

  • Change --checkpoint_path="./model.ckpt" to match the ckpt file to convert
  • Change --dump_path="./model_diffusers" to the output folder location to use
  • You will need to run Convert Stable Diffusion Checkpoint to Onnx (see below) to use the model

Convert Stable Diffusion Checkpoint to Onnx

Additional Tools (Optional)

Upscaling

Real-ESRGAN

https://github.com/xinntao/Real-ESRGAN#portable-executable-files-ncnn

Usage: realesrgan-ncnn-vulkan.exe -i infile -o outfile [options]...

  -h                   show this help
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -s scale             upscale ratio (can be 2, 3, 4. default=4)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        folder path to the pre-trained models. default=models
  -n model-name        model name (default=realesr-animevideov3, can be realesr-animevideov3 | realesrgan-x4plus | realesrgan-x4plus-anime | realesrnet-x4plus)
  -g gpu-id            gpu device to use (default=auto) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode"
  -f format            output image format (jpg/png/webp, default=ext/png)
  -v                   verbose output

RealSR ncnn Vulkan

https://github.com/nihui/realsr-ncnn-vulkan

Usage: realsr-ncnn-vulkan -i infile -o outfile [options]...

  -h                   show this help
  -v                   verbose output
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -s scale             upscale ratio (4, default=4)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        realsr model path (default=models-DF2K_JPEG)
  -g gpu-id            gpu device to use (-1=cpu, default=0) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode
  -f format            output image format (jpg/png/webp, default=ext/png)

SRMD ncnn Vulkan

https://github.com/nihui/srmd-ncnn-vulkan

Usage: srmd-ncnn-vulkan -i infile -o outfile [options]...

  -h                   show this help
  -v                   verbose output
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -n noise-level       denoise level (-1/0/1/2/3/4/5/6/7/8/9/10, default=3)
  -s scale             upscale ratio (2/3/4, default=2)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        srmd model path (default=models-srmd)
  -g gpu-id            gpu device to use (default=0) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode
  -f format            output image format (jpg/png/webp, default=ext/png)

Image Editing

ImageMagick

Use ImageMagick® to create, edit, compose, or convert digital images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, WebP, HEIC, SVG, PDF, DPX, EXR and TIFF. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

https://imagemagick.org/script/index.php

Photopea

Photopea is a web-based photo and graphics editor by Ivan Kuckir. It is used for image editing, making illustrations, web design or converting between different image formats. Photopea is advertising-supported software. It is compatible with all modern web browsers, including Opera, Edge, Chrome, and Firefox. The app is compatible with raster and vector graphics, such as Photoshop’s PSD as well as JPEG, PNG, DNG, GIF, SVG, PDF and other image file formats. While browser-based, Photopea stores all files locally, and does not upload any data to a server.

https://www.photopea.com/

FAQs

How can I clear cached models?

huggingface-cli scan-cache --dir ~/.cache/huggingface/diffusers
huggingface-cli delete-cache --dir ~/.cache/huggingface/diffusers

Can I download and install ort-nightly-directml instead of onnxruntime-directml?

Yes and it can provide better image generation times.

You can download the nightly onnxruntime-directml release from the link below

https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-directml/versions/

Run python --version to find out, which whl file to download.

Which file should I download?

  • If you are on Python3.7, download the file that ends with **-cp37-cp37m-win_amd64.whl.
  • If you are on Python3.8, download the file that ends with **-cp38-cp38m-win_amd64.whl
  • If you are on Python3.9, download the file that ends with **-cp38-cp38m-win_amd64.whl
  • etc. etc.
pip install replace_with_the_file_you_downloaded.whl --force-reinstall
pip install protobuf==3.20.1

How do you install or use diffrent models?

Instructions for converting models to the Onnx format are available at: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674#convert-stable-diffusion-model-to-onnx-format

If the model you want to use is already in the Onnx format, you need to adjust the pipe to call the model you want to use:

Example:

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)

In the above pipe example, you would change ./stable_diffusion_onnx to match the model folder you want to use.

If you want to load an Onnx Model directly from the Huggingface website and cache it in your virtual environment (sd_env), adjust the pipe as follows:

Example:

pipe = OnnxStableDiffusionPipeline.from_pretrained("lambdalabs/sd-pokemon-diffusers", revision="onnx", provider="DmlExecutionProvider", safety_checker=None)

Note: Loading models directly from the hugging face website requires running huggingface-cli login and enter the requested token information.

@averad
Copy link
Author

averad commented Nov 19, 2022

@johnmwalker you are 100% correct. Using the nightly manual download of the onnxruntime-directml files does often result in better (lower) image generation times.

I didn't include it in the install workflow as it could:

  • Confuse some users
  • Result in users reporting errors related to a specific nightly build

I will update the install information to include the manual download of the onnxruntime-directml files information as an optional step or a "suggested" advanced install step.

@CheeseBrownie
Copy link

CheeseBrownie commented Nov 19, 2022

@CheeseBrownie place the "model_onnx" folder in D:\Stable Diffusion\ then update your script to use the model you converted.

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"
pipe = OnnxStableDiffusionPipeline.from_pretrained("../../model_onnx", provider="DmlExecutionProvider", safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

Note: You can rename the "model_onnx" folder to whatever you like.

@averad I got my model working with txt2img and img2img, but is it possible to get the model working with inpainting? This is my error:

Traceback (most recent call last):
  File "D:\Stable Diffusion\sd_env\Scripts\inpainting.py", line 14, in <module>
    image = pipe(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5).images[0]
  File "D:\Stable Diffusion\sd_env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Stable Diffusion\sd_env\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion_inpaint.py", line 417, in __call__
    noise_pred = self.unet(
  File "D:\Stable Diffusion\sd_env\lib\site-packages\diffusers\onnx_utils.py", line 61, in __call__
    return self.model.run(None, inputs)
  File "D:\Stable Diffusion\sd_env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/conv_in/Conv' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1866)\onnxruntime_pybind11_state.pyd!00007FFF7C68D0CA: (caller: 00007FFF7C68E6CF) Exception(3) tid(26e4) 80070057 The parameter is incorrect.

@averad
Copy link
Author

averad commented Nov 19, 2022

@CheeseBrownie for non inpainting trained models use OnnxStableDiffusionInpaintPipelineLegacy

More information #1237

@CheeseBrownie
Copy link

@averad Dang. I got this error:

File "D:\Stable Diffusion\sd_env\Scripts\inpainting.py", line 15, in <module>
    image = pipe(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5).images[0]
TypeError: OnnxStableDiffusionInpaintPipelineLegacy.__call__() missing 1 required positional argument: 'init_image'

So I changed image=init_image to init_image=init_image and now I have this:

return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int64)) , expected: (tensor(float))

@averad
Copy link
Author

averad commented Nov 20, 2022

@CheeseBrownie sorry for the slow response.

Try the following code:

#Import time to collect systime for timekeeping/logging
import time
#Import torch machine learning library
import torch
#Import Image from PIL for image loading and resizing
from PIL import Image
#Import Legacy Inpaint pipeline
from diffusers import OnnxStableDiffusionInpaintPipelineLegacy
#Import schedulers
from diffusers import (
    DDPMScheduler,
    DDIMScheduler,
    PNDMScheduler,
    LMSDiscreteScheduler,
    EulerDiscreteScheduler,
    EulerAncestralDiscreteScheduler,
    DPMSolverMultistepScheduler,
)

#Open initial image
init_image = Image.open("test.png")
#Resize initial image to 512 by 512
init_image = init_image.resize((512, 512))

#Open mask image
mask_image = Image.open("mask.png")
#Resize mask image to 512 by 512
mask_image = mask_image.resize((512, 512))

#Set prompt string
prompt = "cartoon shapes, vibrant colors, alien creatures"

#Set Model folder location
model = "../../model_onnx"

#Set scheduler variables
ddpm = DDPMScheduler.from_pretrained(model, subfolder="scheduler")
ddim = DDIMScheduler.from_pretrained(model, subfolder="scheduler")
pndm = PNDMScheduler.from_pretrained(model, subfolder="scheduler")
lms = LMSDiscreteScheduler.from_pretrained(model, subfolder="scheduler")
euler_anc = EulerAncestralDiscreteScheduler.from_pretrained(model, subfolder="scheduler")
euler = EulerDiscreteScheduler.from_pretrained(model, subfolder="scheduler")
dpm = DPMSolverMultistepScheduler.from_pretrained(model, subfolder="scheduler")

#Initiate the pipe
pipe = OnnxStableDiffusionInpaintPipelineLegacy.from_pretrained(
    #Model is set in the above variables
    model,
    #Revision is used when downloading from Huggingface, variable not needed when using local model file
    revision="onnx",
    #Choose either "DmlExecutionProvider" (GPUs) or "CPUExecutionProvider" (CPU)
    provider="DmlExecutionProvider",
    #Set the scheduler
    scheduler=pndm,
    #Disable the saftey checker
    safety_checker=None
)

#Generate 4 images
for i in range(4):
    #Set the image generation start time for logging
    gen_time = time.strftime("%m%d%Y-%H%M%S")
    #Start generating an image from intial_image in the masked area defined by mask variable
    image = pipe(prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5).images[0] 
    #Save the generated image
    image.save("./" + gen_time + ".png")
    #Clear previous image before generating a new image
    image = ""

test.png
test

mask.png
mask

output:
11202022-114127

@sharphandjoseph
Copy link

how do you install other models in this? or is it not possible?

@averad
Copy link
Author

averad commented Nov 20, 2022

@sharphandjoseph depends on the model you want to use and how you want to use it.

Instructions for converting models to the Onnx format are available at:
https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674#convert-stable-diffusion-model-to-onnx-format

If the model you want to use is already in the Onnx format, you just need to adjust the pipe to call the model you want to use:

Example:
pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)

In the above pipe example, you would change ./stable_diffusion_onnx to match the model folder you want to use.

If you want to load an Onnx Model directly from the Huggingface website and cache it in your virtual environment (sd_env), adjust the pipe as follows:

Example:
pipe = OnnxStableDiffusionPipeline.from_pretrained("lambdalabs/sd-pokemon-diffusers", revision="onnx", provider="DmlExecutionProvider", safety_checker=None)

Note: Loading models directly from the hugging face website requires running huggingface-cli login and enter the requested token information.

@Amblyopius
Copy link

For the ORT Nightly I changed pip.ini so that it knows where to fetch it rather than me having to download it myself. Probably also an idea to word "it can provide better image generation times" a bit stronger. A speed up of 2 to 3 times faster is quite something. Clearly the standard version was leaving a lot of performance on the table. Now waiting for the AMD team to tell us how IREE gives us another 3x to 5x (10x promised in comments here but let's assume that was not using ORT Nightly)

@kadrim
Copy link

kadrim commented Nov 27, 2022

Can i somehow select which GPU (i have 2 AMDs in my system) will be used? Currently only the slow APU (Vega 10) is used ...

EDIT: Nevermind, found the solution myself by looking at the unit-tests ;-)

For anyone searching for a solution, this script selects the 2nd GPU:

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"

gpu_provider = ('DmlExecutionProvider', {
	'device_id': 1,
})

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider=gpu_provider, safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

just change device_id to 0 for the first GPU and to 1 for the second.

@averad
Copy link
Author

averad commented Nov 28, 2022

👻 IREE - Getting started - Building from Source (Windows)
https://gist.github.com/averad/b0c020eaf9e0a480660b0476954f600a

@claforte @harishanand95 - [Document] Basic workflow for building IREE for Windows

@jamiecropley
Copy link

Can i somehow select which GPU (i have 2 AMDs in my system) will be used? Currently only the slow APU (Vega 10) is used ...

EDIT: Nevermind, found the solution myself by looking at the unit-tests ;-)

For anyone searching for a solution, this script selects the 2nd GPU:

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"

gpu_provider = ('DmlExecutionProvider', {
	'device_id': 1,
})

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider=gpu_provider, safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

just change device_id to 0 for the first GPU and to 1 for the second.

Can you do a for loop or something to cycle between two GPU's like this?

@JStrbg
Copy link

JStrbg commented Dec 1, 2022

Im having issue with converting stable-diffusion 2 checkpoint to onnx with this process

Console is spammed with :
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).

Is there a step I am missing?
It seem to work fine with https://huggingface.co/runwayml/stable-diffusion-v1-5 checkpoint

EDIT:
The latest update of the script (Updated 4 days ago) seem to have fixed this issue for me. After also having updated transformers to ==4.22 I got it completely working :)

@averad
Copy link
Author

averad commented Dec 1, 2022

@JStrbg

Make sure you are using the latest version of the conversion scripts. (Updated 6 Days Ago)

https://github.com/huggingface/diffusers/tree/main/scripts

@crazyfox55
Copy link

Can i somehow select which GPU (i have 2 AMDs in my system) will be used? Currently only the slow APU (Vega 10) is used ...

EDIT: Nevermind, found the solution myself by looking at the unit-tests ;-)

For anyone searching for a solution, this script selects the 2nd GPU:

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"

gpu_provider = ('DmlExecutionProvider', {
	'device_id': 1,
})

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider=gpu_provider, safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

just change device_id to 0 for the first GPU and to 1 for the second.

Your fix worked great for me 6800M GPU with a 6900HS CPU.

This is critical information for anyone on laptops. I struggled for hours trying to force my dedicated GPU to have priority over the integrated one. I searched for gpu_id, force gpu, choose gpu, torch.device("cuda:1"), pipe.to("cuda:1"), "cuda", "gpu" but nothing was helping me fix the problem. My last ditch hope was to read every comment on this page hoping for another way to use stable diffusion with AMD gpu.

@kadrim double plus good work

@AkshaySapra
Copy link

How would I apply different styles to the images, like you see people doing on Youtube with I guess the "normal" installation with a NVidia GPU?

@Shanesan
Copy link

Has the OnnxStableDiffusionPipeline changed? I cannot add width and height to my pipe without getting an error. I have the following variables and the following pipe:

tall = 600
wide = 600
inference_steps = 10
guidance_multiplier = 10
image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps).images[0]

The above runs fine and gives the default 512x512 image at the multipliers and inference steps I want.

If I try adding width and height like so:
image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps, width=wide, height=tall).images[0]

I get the following error:

Traceback (most recent call last):
  File ".\test.py", line 61, in <module>
    image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps, width=wide, height=tall).images[0]
  File ".\virtualenv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 274, in __call__
    noise_pred = self.unet(sample=latent_model_input, timestep=timestep, encoder_hidden_states=text_embeddings)
  File ".\virtualenv\lib\site-packages\diffusers\onnx_utils.py", line 61, in __call__
    return self.model.run(None, inputs)
  File ".\virtualenv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Concat node. Name:'/up_blocks.1/Concat' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1878)\onnxruntime_pybind11_state.pyd!00007FFB146DF72D: (caller: 00007FFB146E0AEF) Exception(3) tid(49b0) 80070057 The parameter is incorrect.

@Amblyopius
Copy link

It might be tripping over the 600x600 which is not really an ideal format. Does it also fault if you use multiples of 64 (which should generally give the best results). Note that generally you'll get a lot more artefacts if neither width or height is 512 when the model is trained for 512x512. So for example 704x512 and 512x704 generally still work with at times still some weird duplication.

@Shanesan
Copy link

@Amblyopius thanks for the advice. Before you posted I was trying a --force-reinstall of the latest nightly of ort-nightly-directml, then I tried it at 64x64 just to see and that worked, and now I can push it up to 704x704 (but boy does that slow it down). I can't guarantee that the previous ort-nightly-directml was bugged or something, but this seems to work now! Thank you!

@Amblyopius
Copy link

If speed is an issue, shameless plug for my instruction on how to do FP16: https://github.com/Amblyopius/AMD-Stable-Diffusion-ONNX-FP16

@catwiesel
Copy link

catwiesel commented Dec 19, 2022

this is a great post and might be the best to get it to work with amd...

ive been playing around with it for a few days now.

i have a few questions, comments....

  • I am using the (modified) txt2img script. I have multiple models to try from and tested. the 1st scheduler (DDPMScheduler) seems to be completely useless, even when going up to >200 iterations, and using very low, or medium, or very high guidance scales, its mostly garbage. I mean, at around 250 iterations I sometimes can kinda see where it will be going...
    Is the DDPMScheduler just needing 10,100x more iterations than the others? is some part of the script or my understanding flawed? I did not really find any ressources online that seem to indicate that what I observe is normal behaviour, but, seeing how the same script works for everything else, and seeing how it behaves the same with different models, different seeds, different prompts, while all the same settings work find with any other scheduler, I am really uncertain what is going on here...
    what are your experiences?

  • Convert Stable Diffusion model to ONNX format
    the way it is written now its hard to understand that you have to run both scripts to go from a ckpt file to working onnx model

  • it would be really nice to have some more insight/script examples to see how it works with inpainting, outpainting,

@m8ax
Copy link

m8ax commented Jan 15, 2023

Is there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please

@kadrim
Copy link

kadrim commented Jan 15, 2023

there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please

RTFM https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674#convert-stable-diffusion-model-to-onnx-format

@catwiesel
Copy link

catwiesel commented Jan 15, 2023 via email

@KrakaD
Copy link

KrakaD commented Jan 24, 2023

Is there an in browser version of this that is ran locally?

@m8ax
Copy link

m8ax commented Jan 30, 2023

yeah, its a two step process which is described in the original text, but was not really well explained, as in that is is a two step process (which is my second point in my comment that you replied to) - Convert Original Stable Diffusion to Diffusers (Ckpt File) - Convert Stable Diffusion Checkpoint to Onnx you need to do/follow both to get from ckpt to onnx, and pay attention to the script/command run, they are very similar but not exactly so... Am 15.01.2023 um 18:40 schrieb --- MvIiIaX ---:

Re: averad/Stable_Diffusion.md @.**** commented on this gist. ------------------------------------------------------------------------ Is there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please — Reply to this email directly, view it on GitHub https://gist.github.com/256c507baa3dcc9464203dc14610d674#gistcomment-4437769 or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5ZFZ6DRRCQQO7GH7PLYTWSQY7DBFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTCOBZGY3DQMJVU52HE2LHM5SXFJTDOJSWC5DF. You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

nothing ... i cannot convert ckpt model with my face to onnx if anybody wants to convert the ckpt by me... my telegram is @mviiiax

@KangbingZhao
Copy link

KangbingZhao commented Feb 20, 2023

does anyone have the same error below?

(sdonnx) C:\Data\Code\AI\sd-onnx\stable-difussion>python scripts/t2i.py
2023-02-20 11:12:30.3635554 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:31.2427573 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:31.2481745 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-02-20 11:12:36.1750254 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:36.2239146 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:36.2288181 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-02-20 11:12:36.8986406 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:37.0327712 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:37.0380411 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
2023-02-20 11:12:37.6270735 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:37.6803258 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:37.6853572 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_onnx_stable_diffusion.OnnxStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
  2%|█▋                                                                                 | 1/51 [00:03<03:07,  3.75s/it]
Traceback (most recent call last):
  File "C:\Data\Code\AI\sd-onnx\stable-difussion\scripts\t2i.py", line 10, in <module>
    image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0]
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 273, in __call__
    noise_pred = self.unet(sample=latent_model_input, timestep=timestep, encoder_hidden_states=prompt_embeds)
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 60, in __call__
    return self.model.run(None, inputs)
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail

the script failed if I set the width and height to 512, but it's OK if the resolution is 256x256. And If I change the DmlExecutionProvider to CPU the 512x512 works. Very wired because I allocate 16 GB VRAM to my iGPU (I am using 5700G with total 32 GB ram)

my script:

from diffusers import OnnxStableDiffusionPipeline
import traceback
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"
pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

@Amblyopius
Copy link

Probably better to use an up to date guide. Try this: https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16

@wchao1115
Copy link

onnxruntime-directml v1.15 with DirectML v1.12 were out 2 weeks ago in nuget.org. All the up-to-date SD optimizations are there, so no need to install ort-nightly-directml anymore. In fact I would caution using the in-between releases nightly drop from main anytime b/c it could be very unstable.

For more details on how to optimize SD for ONNX and DirectML, check out this official sample. And, if you're already seeking best performance, why stop there? Go download the latest driver update for SD for NVDIA or AMD.

@fdwr
Copy link

fdwr commented Jul 11, 2023

PyTorch 2 did not work for me. So for anyone else seeing a bewildering torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 14 is not supported, use PyTorch 1.13, like OLive does in its requirements.txt.

Failed for me:

torch==2.0.1
diffusers==0.18.1
onnx==1.14.0
onnxruntime==1.15.0
onnxruntime-directml==1.15.0
numpy==1.24.3

Worked for me:

torch==1.13.1
diffusers==0.17.1   # 0.18.1 would probably work too, as the PyTorch version is the bigger factor
onnx==1.14.0
onnxruntime==1.15.0
onnxruntime-directml=1.15.0
numpy==1.21.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment