If the VRAM allocated to the AMD iGPU is small, such as 512MB ("Dedicated GPU Memory" in Task Manager), the following procedure may cause a BSOD.
Please increase the VRAM to a larger value, such as 2GB, from the BIOS in advance.
- Install Latest GPU Driver
- Install Python
- git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Open cmd and go to
stable-diffusion-webui
folder
> cd path\to\stable-diffusion-webui
- Create venv
> python -m venv venv
> venv\Scripts\activate.bat
> python -m pip install -U pip
- Install PyTorch for CPU (not ROCm) and torch_directml
(DirectML does not currently support PyTorch 2.0. microsoft/DirectML#415)
> pip install torch==1.13.1 torchvision==0.14.1 torch_directml
-
Add
--skip-torch-cuda-test --precision full --no-half
toCOMMANDLINE_ARGS=
inwebui-user.bat
(DirectML does not currently support automatic mixing precision. microsoft/DirectML#192) -
Run
webui-user.bat
for download external repositories -
Modify
modules/devices.py
for use DirectML
diff --git a/modules/devices.py b/modules/devices.py
--- a/modules/devices.py (revision 2217331cd1245d0bdda786a5dcaf4f7b843bd7e4)
+++ b/modules/devices.py (date 1675333882247)
@@ -42,7 +42,11 @@
if has_mps():
return "mps"
- return "cpu"
+ try:
+ import torch_directml
+ return torch_directml.device()
+ except ImportError as e:
+ return "cpu"
def get_optimal_device():
Modifyrepositories\k-diffusion\k_diffusion\external.py
for workaround DirectML bugs (microsoft/DirectML#368)
This bug has been fixed in torch-directml 0.1.13.1.dev230413
, so this step is not necessary.
diff --git a/k_diffusion/external.py b/k_diffusion/external.py
index 79b51ce..c7ba36f 100644
--- a/k_diffusion/external.py
+++ b/k_diffusion/external.py
@@ -69,7 +69,8 @@ class DiscreteSchedule(nn.Module):
dists = log_sigma - self.log_sigmas[:, None]
if quantize:
return dists.abs().argmin(dim=0).view(sigma.shape)
- low_idx = dists.ge(0).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
+ # low_idx = dists.ge(0).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
+ low_idx = dists.ge(0).to(torch.int32).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
high_idx = low_idx + 1
low, high = self.log_sigmas[low_idx], self.log_sigmas[high_idx]
w = (low - log_sigma) / (low - high)
- Stop webui (Ctrl+C)
- Run
webui-user.bat
- Enjoy!
Note: Not all features are confirmed to work. Even if they do, they may work without using the GPU.
Note 2: The following samplers are currently not working: DPM++ SDE, DPM fast, DPM adaptive, DPM++ SDE Karras, DDIM, PLMS
Note 3: Compared to the Linux + ROCm environment, performance is poor (about 1/3 to 1/4) and memory usage is high. Use options to reduce memory usage like --medvram
or --lowvram
if you crash due to lack of memory.