Model: any 1.5 base (no difference on 2.1 anyway, just the embeddings don't work, that would require changing the prompt)
Size: 512x512, Steps: 100, CFG scale: 7, Seed: 2583996050
a picture of a house
with 2 negative embeddings (embeddings don't affect the performance anyway):
// 75 tokens:
NG_DeepNegative, EasyNegative, ,
// 76 tokens
NG_DeepNegative, EasyNegative, , ,
Upcast to float32 OFF below 75 tokens
REFERENCE: 512x512 size on 33~ it/s = ~3 seconds
Test \ Method
xformers
sdp-no-mem
sdp
Doggettx
sub-quadratic
V1 - original v1
InvokeAI
Euler a
33.14it/s
33.47it/s
33.36it/s
24.46it/s
17.93it/s
16.43it/s
24.49it/s
Euler
33.25it/s
33.60it/s
33.39it/s
24.55it/s
17.98it/s
16.52it/s
24.58it/s
LMS
32.97it/s
33.46it/s
33.23it/s
24.41it/s
17.90it/s
16.46it/s
24.37it/s
Heun
16.78it/s
17.06it/s
16.87it/s
12.38it/s
9.05it/s
8.28it/s
12.41it/s
DPM2
16.75it/s
17.05it/s
16.87it/s
12.38it/s
9.04it/s
8.30it/s
12.40it/s
DPM2 a
16.72it/s
17.02it/s
16.84it/s
12.39it/s
9.03it/s
8.30it/s
12.37it/s
DPM++ 2S a
16.71it/s
17.02it/s
16.84it/s
12.37it/s
8.97it/s
8.28it/s
12.37it/s
DPM++ 2M
33.18it/s
33.83it/s
33.36it/s
24.56it/s
17.94it/s
16.50it/s
24.52it/s
DPM++ SDE
15.56it/s
15.83it/s
15.65it/s
11.74it/s
8.64it/s
7.94it/s
11.74it/s
DPM++ 2M SDE
30.93it/s
31.31it/s
31.01it/s
23.25it/s
17.17it/s
15.90it/s
23.23it/s
DPM fast
33.22it/s
33.73it/s
33.36it/s
24.54it/s
17.98it/s
16.54it/s
24.55it/s
DPM adaptive
33.34it/s
33.76it/s
33.43it/s
24.54it/s
17.98it/s
16.51it/s
24.55it/s
LMS Karras
33.00it/s
33.57it/s
33.14it/s
24.39it/s
17.89it/s
16.48it/s
24.41it/s
DPM2 Karras
16.71it/s
17.03it/s
16.86it/s
12.40it/s
9.05it/s
8.31it/s
12.38it/s
DPM2 a Karras
16.76it/s
17.04it/s
16.82it/s
12.39it/s
9.01it/s
8.31it/s
12.37it/s
DPM++ 2S a Karras
16.71it/s
17.00it/s
16.71it/s
12.34it/s
9.03it/s
8.28it/s
12.36it/s
DPM++ 2M Karras
33.18it/s
33.71it/s
33.34it/s
23.90it/s
17.93it/s
16.50it/s
24.54it/s
DPM++ SDE Karras
15.63it/s
15.86it/s
15.74it/s
11.78it/s
8.68it/s
8.04it/s
11.75it/s
DPM++ 2M SDE Karras
30.79it/s
31.30it/s
31.04it/s
23.23it/s
17.18it/s
15.88it/s
23.20it/s
DDIM
33.62it/s
34.17it/s
33.76it/s
24.81it/s
18.10it/s
16.61it/s
24.74it/s
PLMS
33.32it/s
33.82it/s
33.54it/s
24.53it/s
17.87it/s
16.46it/s
24.48it/s
UniPC
30.21it/s
30.63it/s
30.37it/s
22.94it/s
17.11it/s
15.75it/s
22.96it/s
Upcast to float 32 OFF above 75 tokens
Test \ Method
xformers
sdp-no-mem
sdp
Doggettx
sub-quadratic
V1 - original v1
InvokeAI
Euler a
21.22it/s
22.04it/s
22.46it/s
15.74it/s
14.47it/s
12.46it/s
16.47it/s
Euler
21.42it/s
22.29it/s
22.54it/s
15.92it/s
14.54it/s
12.52it/s
16.54it/s
LMS
21.26it/s
22.07it/s
22.53it/s
15.87it/s
14.46it/s
12.73it/s
16.53it/s
Heun
10.77it/s
11.20it/s
11.39it/s
7.98it/s
7.32it/s
6.42it/s
8.24it/s
DPM2
10.73it/s
11.19it/s
11.39it/s
7.99it/s
7.32it/s
6.41it/s
8.37it/s
DPM2 a
10.76it/s
11.20it/s
11.40it/s
8.01it/s
7.33it/s
6.41it/s
8.33it/s
DPM++ 2S a
10.74it/s
11.17it/s
11.39it/s
8.00it/s
7.33it/s
6.42it/s
8.30it/s
DPM++ 2M
21.38it/s
22.19it/s
22.54it/s
15.88it/s
14.56it/s
12.75it/s
16.48it/s
DPM++ SDE
10.27it/s
10.61it/s
10.82it/s
7.69it/s
7.07it/s
6.26it/s
8.00it/s
DPM++ 2M SDE
20.35it/s
21.09it/s
21.41it/s
15.35it/s
14.20it/s
12.45it/s
15.95it/s
DPM fast
21.22it/s
22.24it/s
22.44it/s
15.89it/s
14.64it/s
12.77it/s
16.45it/s
DPM adaptive
21.41it/s
22.21it/s
22.54it/s
15.91it/s
14.63it/s
12.76it/s
16.54it/s
LMS Karras
21.20it/s
22.05it/s
22.40it/s
15.84it/s
14.56it/s
12.73it/s
16.45it/s
DPM2 Karras
10.76it/s
11.27it/s
11.36it/s
7.95it/s
7.36it/s
6.42it/s
8.32it/s
DPM2 a Karras
10.74it/s
11.30it/s
11.37it/s
8.07it/s
7.34it/s
6.39it/s
8.32it/s
DPM++ 2S a Karras
10.75it/s
11.27it/s
11.34it/s
8.04it/s
7.33it/s
6.39it/s
8.31it/s
DPM++ 2M Karras
21.28it/s
22.14it/s
22.51it/s
15.95it/s
14.54it/s
12.72it/s
16.49it/s
DPM++ SDE Karras
10.27it/s
10.74it/s
10.76it/s
7.73it/s
7.07it/s
6.26it/s
7.98it/s
DPM++ 2M SDE Karras
20.34it/s
21.18it/s
21.50it/s
15.34it/s
14.05it/s
12.43it/s
15.65it/s
DDIM
35.76it/s
36.01it/s
35.35it/s
25.96it/s
18.98it/s
17.15it/s
26.10it/s
PLMS
35.41it/s
35.80it/s
34.99it/s
25.68it/s
18.77it/s
17.02it/s
25.89it/s
UniPC
32.08it/s
32.40it/s
31.80it/s
24.15it/s
18.02it/s
16.43it/s
24.31it/s
Upcast to float32 ON below 75 tokens
Test \ Method
xformers
sdp-no-mem
sdp
Doggettx
sub-quadratic
V1 - original v1
InvokeAI
Euler a
23.91it/s
34.41it/s
33.80it/s
18.67it/s
18.42it/s
14.70it/s
18.77it/s
Euler
24.01it/s
34.58it/s
33.35it/s
18.70it/s
18.46it/s
14.71it/s
18.82it/s
LMS
23.88it/s
34.27it/s
33.69it/s
18.61it/s
18.41it/s
14.57it/s
18.74it/s
Heun
12.11it/s
17.42it/s
17.09it/s
9.45it/s
9.30it/s
7.21it/s
9.50it/s
DPM2
12.11it/s
17.37it/s
17.09it/s
9.45it/s
9.31it/s
7.21it/s
9.50it/s
DPM2 a
12.10it/s
17.38it/s
17.07it/s
9.44it/s
9.29it/s
7.26it/s
9.50it/s
DPM++ 2S a
12.08it/s
17.38it/s
17.04it/s
9.43it/s
9.30it/s
7.19it/s
9.49it/s
DPM++ 2M
24.03it/s
34.52it/s
33.92it/s
18.68it/s
18.47it/s
14.27it/s
18.81it/s
DPM++ SDE
11.48it/s
16.19it/s
15.93it/s
9.05it/s
8.94it/s
7.15it/s
9.11it/s
DPM++ 2M SDE
22.75it/s
32.05it/s
31.54it/s
17.55it/s
17.75it/s
14.17it/s
18.05it/s
DPM fast
23.99it/s
34.48it/s
33.88it/s
18.50it/s
18.45it/s
14.61it/s
18.79it/s
DPM adaptive
24.03it/s
34.51it/s
33.90it/s
18.65it/s
18.47it/s
14.40it/s
18.83it/s
LMS Karras
23.89it/s
34.27it/s
33.62it/s
18.62it/s
18.39it/s
14.47it/s
18.73it/s
DPM2 Karras
12.11it/s
17.39it/s
17.10it/s
9.43it/s
9.31it/s
7.24it/s
9.50it/s
DPM2 a Karras
12.10it/s
17.38it/s
17.06it/s
9.43it/s
9.30it/s
7.23it/s
9.49it/s
DPM++ 2S a Karras
12.09it/s
17.36it/s
16.96it/s
9.42it/s
9.30it/s
7.25it/s
9.49it/s
DPM++ 2M Karras
23.98it/s
34.41it/s
33.81it/s
18.66it/s
18.45it/s
14.38it/s
18.76it/s
DPM++ SDE Karras
11.50it/s
16.24it/s
15.95it/s
9.08it/s
8.86it/s
7.13it/s
9.14it/s
DPM++ 2M SDE Karras
22.73it/s
32.09it/s
31.57it/s
17.94it/s
17.69it/s
14.15it/s
18.06it/s
DDIM
24.21it/s
34.89it/s
34.25it/s
18.78it/s
18.62it/s
14.60it/s
18.55it/s
PLMS
23.94it/s
34.58it/s
33.87it/s
18.61it/s
18.44it/s
14.50it/s
18.73it/s
UniPC
22.57it/s
31.42it/s
30.98it/s
17.96it/s
17.67it/s
14.19it/s
18.07it/s
Upcast to float 32 ON above 75 tokens
Test \ Method
xformers
sdp-no-mem
sdp
Doggettx
sub-quadratic
V1 - original v1
InvokeAI
Euler a
18.21it/s
21.21it/s
21.14it/s
13.64it/s
14.10it/s
11.58it/s
14.13it/s
Euler
18.32it/s
21.23it/s
21.24it/s
13.77it/s
14.28it/s
11.65it/s
14.18it/s
LMS
18.26it/s
21.13it/s
21.36it/s
13.64it/s
14.24it/s
11.78it/s
14.13it/s
Heun
9.24it/s
10.68it/s
10.75it/s
6.90it/s
7.19it/s
5.91it/s
7.16it/s
DPM2
9.25it/s
10.69it/s
10.74it/s
6.90it/s
7.19it/s
5.92it/s
7.15it/s
DPM2 a
9.28it/s
10.71it/s
10.74it/s
6.92it/s
7.21it/s
5.92it/s
7.14it/s
DPM++ 2S a
9.22it/s
10.69it/s
10.74it/s
6.91it/s
7.17it/s
5.92it/s
7.14it/s
DPM++ 2M
18.32it/s
21.22it/s
21.41it/s
13.74it/s
14.36it/s
11.76it/s
14.20it/s
DPM++ SDE
8.98it/s
10.34it/s
10.40it/s
6.69it/s
7.00it/s
5.82it/s
6.95it/s
DPM++ 2M SDE
17.83it/s
20.49it/s
20.57it/s
13.17it/s
13.85it/s
11.56it/s
13.65it/s
DPM fast
18.56it/s
21.46it/s
21.42it/s
13.77it/s
14.30it/s
11.83it/s
14.23it/s
DPM adaptive
18.39it/s
21.23it/s
21.30it/s
13.83it/s
14.30it/s
11.76it/s
14.17it/s
LMS Karras
18.36it/s
21.14it/s
21.23it/s
13.75it/s
14.23it/s
11.75it/s
14.18it/s
DPM2 Karras
9.34it/s
10.70it/s
10.74it/s
6.93it/s
7.16it/s
5.96it/s
7.17it/s
DPM2 a Karras
9.33it/s
10.68it/s
10.76it/s
6.91it/s
7.19it/s
5.96it/s
7.19it/s
DPM++ 2S a Karras
9.30it/s
10.67it/s
10.72it/s
6.90it/s
7.18it/s
5.96it/s
7.14it/s
DPM++ 2M Karras
18.58it/s
21.19it/s
21.25it/s
13.71it/s
14.27it/s
11.78it/s
14.29it/s
DPM++ SDE Karras
8.97it/s
10.36it/s
10.42it/s
6.67it/s
6.94it/s
5.83it/s
7.01it/s
DPM++ 2M SDE Karras
17.76it/s
20.56it/s
20.47it/s
13.25it/s
13.78it/s
11.46it/s
13.86it/s
DDIM
24.17it/s
34.88it/s
34.29it/s
18.80it/s
18.62it/s
14.54it/s
18.96it/s
PLMS
23.92it/s
34.55it/s
33.90it/s
18.62it/s
18.41it/s
14.31it/s
18.73it/s
UniPC
22.56it/s
31.46it/s
30.95it/s
17.94it/s
17.70it/s
14.11it/s
18.10it/s
GPU VRAM usage per 512x512 image (4x, up to 8x more for 2048x2048)
Test \ Method
xformers
sdp-no-mem
sdp
Doggettx
sub-quadratic
V1 - original v1
InvokeAI
<75 tokens, no f32 upcast
~300MB
~300MB
~300MB
~1200MB
~500MB
~300MB
~1200MB
<75 tokens, f32 upcast
~300MB
~300MB
~300MB
~2200MB
~700MB
~300MB
~2200MB
>75 tokens, no f32 upcast
~300MB
~300MB
~300MB
~700MB (?)
~200MB
~200MB
~700MB
>75 tokens, f32 upcast
~300MB
~300MB
~300MB
~1200MB
~300MB
~300MB
~1200MB