reid3333/1111_windows_amd_directml.md

## 1111_windows_amd_directml.md

      
    Raw
  

              1111_windows_amd_directml.md
            
          
    CAUTION!

If the VRAM allocated to the AMD iGPU is small, such as 512MB ("Dedicated GPU Memory" in Task Manager), the following procedure may cause a BSOD.

Please increase the VRAM to a larger value, such as 2GB, from the BIOS in advance.
Install Instruction


Install Latest GPU Driver
Install Python
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
Open cmd and go to stable-diffusion-webui folder

> cd path\to\stable-diffusion-webui

Create venv

> python -m venv venv
> venv\Scripts\activate.bat
> python -m pip install -U pip

Install PyTorch for CPU (not ROCm) and torch_directml

(DirectML does not currently support PyTorch 2.0. microsoft/DirectML#415)

> pip install torch==1.13.1 torchvision==0.14.1 torch_directml


Add --skip-torch-cuda-test --precision full --no-half to COMMANDLINE_ARGS= in webui-user.bat

(DirectML does not currently support automatic mixing precision. microsoft/DirectML#192)


Run webui-user.bat for download external repositories


Modify modules/devices.py for use DirectML


diff --git a/modules/devices.py b/modules/devices.py
--- a/modules/devices.py	(revision 2217331cd1245d0bdda786a5dcaf4f7b843bd7e4)
+++ b/modules/devices.py	(date 1675333882247)
@@ -42,7 +42,11 @@
     if has_mps():
         return "mps"
 
-    return "cpu"
+    try:
+        import torch_directml
+        return torch_directml.device()
+    except ImportError as e:
+        return "cpu"
 
 
 def get_optimal_device():

Modify repositories\k-diffusion\k_diffusion\external.py for workaround DirectML bugs (microsoft/DirectML#368)

This bug has been fixed in torch-directml 0.1.13.1.dev230413, so this step is not necessary.
diff --git a/k_diffusion/external.py b/k_diffusion/external.py
index 79b51ce..c7ba36f 100644
--- a/k_diffusion/external.py
+++ b/k_diffusion/external.py
@@ -69,7 +69,8 @@ class DiscreteSchedule(nn.Module):
         dists = log_sigma - self.log_sigmas[:, None]
         if quantize:
             return dists.abs().argmin(dim=0).view(sigma.shape)
-        low_idx = dists.ge(0).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
+        # low_idx = dists.ge(0).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
+        low_idx = dists.ge(0).to(torch.int32).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
         high_idx = low_idx + 1
         low, high = self.log_sigmas[low_idx], self.log_sigmas[high_idx]
         w = (low - log_sigma) / (low - high)

Stop webui (Ctrl+C)
Run webui-user.bat
Enjoy!

Note: Not all features are confirmed to work. Even if they do, they may work without using the GPU.
Note 2: The following samplers are currently not working: DPM++ SDE, DPM fast, DPM adaptive, DPM++ SDE Karras, DDIM, PLMS
Note 3: Compared to the Linux + ROCm environment, performance is poor (about 1/3 to 1/4) and memory usage is high. Use options to reduce memory usage like --medvram or --lowvram if you crash due to lack of memory.