Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save DarkStoorM/3c10c0c027ef1c90c230ffa0cc39213c to your computer and use it in GitHub Desktop.
Save DarkStoorM/3c10c0c027ef1c90c230ffa0cc39213c to your computer and use it in GitHub Desktop.

Settings

Model: any 1.5 base (no difference on 2.1 anyway, just the embeddings don't work, that would require changing the prompt)

Size: 512x512, Steps: 100, CFG scale: 7, Seed: 2583996050

Prompt

a picture of a house with 2 negative embeddings (embeddings don't affect the performance anyway):

// 75 tokens:
NG_DeepNegative, EasyNegative, ,

// 76 tokens
NG_DeepNegative, EasyNegative, , ,

Upcast to float32 OFF below 75 tokens

REFERENCE: 512x512 size on 33~ it/s = ~3 seconds

Test \ Method xformers sdp-no-mem sdp Doggettx sub-quadratic V1 - original v1 InvokeAI
Euler a 33.14it/s 33.47it/s 33.36it/s 24.46it/s 17.93it/s 16.43it/s 24.49it/s
Euler 33.25it/s 33.60it/s 33.39it/s 24.55it/s 17.98it/s 16.52it/s 24.58it/s
LMS 32.97it/s 33.46it/s 33.23it/s 24.41it/s 17.90it/s 16.46it/s 24.37it/s
Heun 16.78it/s 17.06it/s 16.87it/s 12.38it/s 9.05it/s 8.28it/s 12.41it/s
DPM2 16.75it/s 17.05it/s 16.87it/s 12.38it/s 9.04it/s 8.30it/s 12.40it/s
DPM2 a 16.72it/s 17.02it/s 16.84it/s 12.39it/s 9.03it/s 8.30it/s 12.37it/s
DPM++ 2S a 16.71it/s 17.02it/s 16.84it/s 12.37it/s 8.97it/s 8.28it/s 12.37it/s
DPM++ 2M 33.18it/s 33.83it/s 33.36it/s 24.56it/s 17.94it/s 16.50it/s 24.52it/s
DPM++ SDE 15.56it/s 15.83it/s 15.65it/s 11.74it/s 8.64it/s 7.94it/s 11.74it/s
DPM++ 2M SDE 30.93it/s 31.31it/s 31.01it/s 23.25it/s 17.17it/s 15.90it/s 23.23it/s
DPM fast 33.22it/s 33.73it/s 33.36it/s 24.54it/s 17.98it/s 16.54it/s 24.55it/s
DPM adaptive 33.34it/s 33.76it/s 33.43it/s 24.54it/s 17.98it/s 16.51it/s 24.55it/s
LMS Karras 33.00it/s 33.57it/s 33.14it/s 24.39it/s 17.89it/s 16.48it/s 24.41it/s
DPM2 Karras 16.71it/s 17.03it/s 16.86it/s 12.40it/s 9.05it/s 8.31it/s 12.38it/s
DPM2 a Karras 16.76it/s 17.04it/s 16.82it/s 12.39it/s 9.01it/s 8.31it/s 12.37it/s
DPM++ 2S a Karras 16.71it/s 17.00it/s 16.71it/s 12.34it/s 9.03it/s 8.28it/s 12.36it/s
DPM++ 2M Karras 33.18it/s 33.71it/s 33.34it/s 23.90it/s 17.93it/s 16.50it/s 24.54it/s
DPM++ SDE Karras 15.63it/s 15.86it/s 15.74it/s 11.78it/s 8.68it/s 8.04it/s 11.75it/s
DPM++ 2M SDE Karras 30.79it/s 31.30it/s 31.04it/s 23.23it/s 17.18it/s 15.88it/s 23.20it/s
DDIM 33.62it/s 34.17it/s 33.76it/s 24.81it/s 18.10it/s 16.61it/s 24.74it/s
PLMS 33.32it/s 33.82it/s 33.54it/s 24.53it/s 17.87it/s 16.46it/s 24.48it/s
UniPC 30.21it/s 30.63it/s 30.37it/s 22.94it/s 17.11it/s 15.75it/s 22.96it/s

Upcast to float 32 OFF above 75 tokens

Test \ Method xformers sdp-no-mem sdp Doggettx sub-quadratic V1 - original v1 InvokeAI
Euler a 21.22it/s 22.04it/s 22.46it/s 15.74it/s 14.47it/s 12.46it/s 16.47it/s
Euler 21.42it/s 22.29it/s 22.54it/s 15.92it/s 14.54it/s 12.52it/s 16.54it/s
LMS 21.26it/s 22.07it/s 22.53it/s 15.87it/s 14.46it/s 12.73it/s 16.53it/s
Heun 10.77it/s 11.20it/s 11.39it/s 7.98it/s 7.32it/s 6.42it/s 8.24it/s
DPM2 10.73it/s 11.19it/s 11.39it/s 7.99it/s 7.32it/s 6.41it/s 8.37it/s
DPM2 a 10.76it/s 11.20it/s 11.40it/s 8.01it/s 7.33it/s 6.41it/s 8.33it/s
DPM++ 2S a 10.74it/s 11.17it/s 11.39it/s 8.00it/s 7.33it/s 6.42it/s 8.30it/s
DPM++ 2M 21.38it/s 22.19it/s 22.54it/s 15.88it/s 14.56it/s 12.75it/s 16.48it/s
DPM++ SDE 10.27it/s 10.61it/s 10.82it/s 7.69it/s 7.07it/s 6.26it/s 8.00it/s
DPM++ 2M SDE 20.35it/s 21.09it/s 21.41it/s 15.35it/s 14.20it/s 12.45it/s 15.95it/s
DPM fast 21.22it/s 22.24it/s 22.44it/s 15.89it/s 14.64it/s 12.77it/s 16.45it/s
DPM adaptive 21.41it/s 22.21it/s 22.54it/s 15.91it/s 14.63it/s 12.76it/s 16.54it/s
LMS Karras 21.20it/s 22.05it/s 22.40it/s 15.84it/s 14.56it/s 12.73it/s 16.45it/s
DPM2 Karras 10.76it/s 11.27it/s 11.36it/s 7.95it/s 7.36it/s 6.42it/s 8.32it/s
DPM2 a Karras 10.74it/s 11.30it/s 11.37it/s 8.07it/s 7.34it/s 6.39it/s 8.32it/s
DPM++ 2S a Karras 10.75it/s 11.27it/s 11.34it/s 8.04it/s 7.33it/s 6.39it/s 8.31it/s
DPM++ 2M Karras 21.28it/s 22.14it/s 22.51it/s 15.95it/s 14.54it/s 12.72it/s 16.49it/s
DPM++ SDE Karras 10.27it/s 10.74it/s 10.76it/s 7.73it/s 7.07it/s 6.26it/s 7.98it/s
DPM++ 2M SDE Karras 20.34it/s 21.18it/s 21.50it/s 15.34it/s 14.05it/s 12.43it/s 15.65it/s
DDIM 35.76it/s 36.01it/s 35.35it/s 25.96it/s 18.98it/s 17.15it/s 26.10it/s
PLMS 35.41it/s 35.80it/s 34.99it/s 25.68it/s 18.77it/s 17.02it/s 25.89it/s
UniPC 32.08it/s 32.40it/s 31.80it/s 24.15it/s 18.02it/s 16.43it/s 24.31it/s

Upcast to float32 ON below 75 tokens

Test \ Method xformers sdp-no-mem sdp Doggettx sub-quadratic V1 - original v1 InvokeAI
Euler a 23.91it/s 34.41it/s 33.80it/s 18.67it/s 18.42it/s 14.70it/s 18.77it/s
Euler 24.01it/s 34.58it/s 33.35it/s 18.70it/s 18.46it/s 14.71it/s 18.82it/s
LMS 23.88it/s 34.27it/s 33.69it/s 18.61it/s 18.41it/s 14.57it/s 18.74it/s
Heun 12.11it/s 17.42it/s 17.09it/s 9.45it/s 9.30it/s 7.21it/s 9.50it/s
DPM2 12.11it/s 17.37it/s 17.09it/s 9.45it/s 9.31it/s 7.21it/s 9.50it/s
DPM2 a 12.10it/s 17.38it/s 17.07it/s 9.44it/s 9.29it/s 7.26it/s 9.50it/s
DPM++ 2S a 12.08it/s 17.38it/s 17.04it/s 9.43it/s 9.30it/s 7.19it/s 9.49it/s
DPM++ 2M 24.03it/s 34.52it/s 33.92it/s 18.68it/s 18.47it/s 14.27it/s 18.81it/s
DPM++ SDE 11.48it/s 16.19it/s 15.93it/s 9.05it/s 8.94it/s 7.15it/s 9.11it/s
DPM++ 2M SDE 22.75it/s 32.05it/s 31.54it/s 17.55it/s 17.75it/s 14.17it/s 18.05it/s
DPM fast 23.99it/s 34.48it/s 33.88it/s 18.50it/s 18.45it/s 14.61it/s 18.79it/s
DPM adaptive 24.03it/s 34.51it/s 33.90it/s 18.65it/s 18.47it/s 14.40it/s 18.83it/s
LMS Karras 23.89it/s 34.27it/s 33.62it/s 18.62it/s 18.39it/s 14.47it/s 18.73it/s
DPM2 Karras 12.11it/s 17.39it/s 17.10it/s 9.43it/s 9.31it/s 7.24it/s 9.50it/s
DPM2 a Karras 12.10it/s 17.38it/s 17.06it/s 9.43it/s 9.30it/s 7.23it/s 9.49it/s
DPM++ 2S a Karras 12.09it/s 17.36it/s 16.96it/s 9.42it/s 9.30it/s 7.25it/s 9.49it/s
DPM++ 2M Karras 23.98it/s 34.41it/s 33.81it/s 18.66it/s 18.45it/s 14.38it/s 18.76it/s
DPM++ SDE Karras 11.50it/s 16.24it/s 15.95it/s 9.08it/s 8.86it/s 7.13it/s 9.14it/s
DPM++ 2M SDE Karras 22.73it/s 32.09it/s 31.57it/s 17.94it/s 17.69it/s 14.15it/s 18.06it/s
DDIM 24.21it/s 34.89it/s 34.25it/s 18.78it/s 18.62it/s 14.60it/s 18.55it/s
PLMS 23.94it/s 34.58it/s 33.87it/s 18.61it/s 18.44it/s 14.50it/s 18.73it/s
UniPC 22.57it/s 31.42it/s 30.98it/s 17.96it/s 17.67it/s 14.19it/s 18.07it/s

Upcast to float 32 ON above 75 tokens

Test \ Method xformers sdp-no-mem sdp Doggettx sub-quadratic V1 - original v1 InvokeAI
Euler a 18.21it/s 21.21it/s 21.14it/s 13.64it/s 14.10it/s 11.58it/s 14.13it/s
Euler 18.32it/s 21.23it/s 21.24it/s 13.77it/s 14.28it/s 11.65it/s 14.18it/s
LMS 18.26it/s 21.13it/s 21.36it/s 13.64it/s 14.24it/s 11.78it/s 14.13it/s
Heun 9.24it/s 10.68it/s 10.75it/s 6.90it/s 7.19it/s 5.91it/s 7.16it/s
DPM2 9.25it/s 10.69it/s 10.74it/s 6.90it/s 7.19it/s 5.92it/s 7.15it/s
DPM2 a 9.28it/s 10.71it/s 10.74it/s 6.92it/s 7.21it/s 5.92it/s 7.14it/s
DPM++ 2S a 9.22it/s 10.69it/s 10.74it/s 6.91it/s 7.17it/s 5.92it/s 7.14it/s
DPM++ 2M 18.32it/s 21.22it/s 21.41it/s 13.74it/s 14.36it/s 11.76it/s 14.20it/s
DPM++ SDE 8.98it/s 10.34it/s 10.40it/s 6.69it/s 7.00it/s 5.82it/s 6.95it/s
DPM++ 2M SDE 17.83it/s 20.49it/s 20.57it/s 13.17it/s 13.85it/s 11.56it/s 13.65it/s
DPM fast 18.56it/s 21.46it/s 21.42it/s 13.77it/s 14.30it/s 11.83it/s 14.23it/s
DPM adaptive 18.39it/s 21.23it/s 21.30it/s 13.83it/s 14.30it/s 11.76it/s 14.17it/s
LMS Karras 18.36it/s 21.14it/s 21.23it/s 13.75it/s 14.23it/s 11.75it/s 14.18it/s
DPM2 Karras 9.34it/s 10.70it/s 10.74it/s 6.93it/s 7.16it/s 5.96it/s 7.17it/s
DPM2 a Karras 9.33it/s 10.68it/s 10.76it/s 6.91it/s 7.19it/s 5.96it/s 7.19it/s
DPM++ 2S a Karras 9.30it/s 10.67it/s 10.72it/s 6.90it/s 7.18it/s 5.96it/s 7.14it/s
DPM++ 2M Karras 18.58it/s 21.19it/s 21.25it/s 13.71it/s 14.27it/s 11.78it/s 14.29it/s
DPM++ SDE Karras 8.97it/s 10.36it/s 10.42it/s 6.67it/s 6.94it/s 5.83it/s 7.01it/s
DPM++ 2M SDE Karras 17.76it/s 20.56it/s 20.47it/s 13.25it/s 13.78it/s 11.46it/s 13.86it/s
DDIM 24.17it/s 34.88it/s 34.29it/s 18.80it/s 18.62it/s 14.54it/s 18.96it/s
PLMS 23.92it/s 34.55it/s 33.90it/s 18.62it/s 18.41it/s 14.31it/s 18.73it/s
UniPC 22.56it/s 31.46it/s 30.95it/s 17.94it/s 17.70it/s 14.11it/s 18.10it/s

Other notes

GPU VRAM usage per 512x512 image (4x, up to 8x more for 2048x2048)

Test \ Method xformers sdp-no-mem sdp Doggettx sub-quadratic V1 - original v1 InvokeAI
<75 tokens, no f32 upcast ~300MB ~300MB ~300MB ~1200MB ~500MB ~300MB ~1200MB
<75 tokens, f32 upcast ~300MB ~300MB ~300MB ~2200MB ~700MB ~300MB ~2200MB
>75 tokens, no f32 upcast ~300MB ~300MB ~300MB ~700MB (?) ~200MB ~200MB ~700MB
>75 tokens, f32 upcast ~300MB ~300MB ~300MB ~1200MB ~300MB ~300MB ~1200MB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment