Skip to content

Instantly share code, notes, and snippets.

@sayakpaul
Created August 4, 2023 08:59
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save sayakpaul/a57a86ee7419ac3e7a7879fd100e8d06 to your computer and use it in GitHub Desktop.
Save sayakpaul/a57a86ee7419ac3e7a7879fd100e8d06 to your computer and use it in GitHub Desktop.
"""
Examples:
(1) python benchmark_distilled_sd.py --pipeline_id CompVis/stable-diffusion-v1-4
(2) python benchmark_distilled_sd.py --pipeline_id CompVis/stable-diffusion-v1-4 --vae_path sayakpaul/taesd-diffusers
(3) python benchmark_distilled_sd.py --pipeline_id nota-ai/bk-sdm-small
(4) python benchmark_distilled_sd.py --pipeline_id nota-ai/bk-sdm-small --vae_path sayakpaul/taesd-diffusers
"""
import argparse
import time
import torch
from diffusers import AutoencoderTiny, DiffusionPipeline
NUM_ITERS_TO_RUN = 3
NUM_INFERENCE_STEPS = 25
NUM_IMAGES_PER_PROMPT = 4
PROMPT = "a golden vase with different flowers"
SEED = 0
def load_pipeline(pipeline_id, vae_path=None):
pipe = DiffusionPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
if vae_path is not None:
pipe.vae = AutoencoderTiny.from_pretrained(
vae_path, torch_dtype=torch.float16
).to("cuda")
return pipe
def run_inference(args):
torch.cuda.reset_peak_memory_stats()
pipe = load_pipeline(args.pipeline_id, args.vae_path)
start = time.time_ns()
for _ in range(NUM_ITERS_TO_RUN):
images = pipe(
PROMPT,
num_inference_steps=NUM_INFERENCE_STEPS,
generator=torch.manual_seed(SEED),
num_images_per_prompt=NUM_IMAGES_PER_PROMPT,
).images
end = time.time_ns()
mem_bytes = torch.cuda.max_memory_allocated()
mem_MB = int(mem_bytes / (10**6))
total_time = f"{(end - start) / 1e6:.1f}"
results = {
"pipeline_id": args.pipeline_id,
"total_time (ms)": total_time,
"memory (mb)": mem_MB,
}
if args.vae_path is not None:
results.update({"vae_path": args.vae_path})
return results
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--pipeline_id",
type=str,
default="CompVis/stable-diffusion-v1-4",
required=True,
)
parser.add_argument("--vae_path", type=str, default=None)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
results = run_inference(args)
print(results)
@sayakpaul
Copy link
Author

Results

{'pipeline_id': 'CompVis/stable-diffusion-v1-4', 'total_time (ms)': '14715.5', 'memory (mb)': 13786}

{'pipeline_id': 'CompVis/stable-diffusion-v1-4', 'total_time (ms)': '13604.6', 'memory (mb)': 3663, 'vae_path': 'sayakpaul/taesd-diffusers'}

{'pipeline_id': 'nota-ai/bk-sdm-small', 'total_time (ms)': '10060.5', 'memory (mb)': 13026}

{'pipeline_id': 'nota-ai/bk-sdm-small', 'total_time (ms)': '8990.1', 'memory (mb)': 2881, 'vae_path': 'sayakpaul/taesd-diffusers'}

@sayakpaul
Copy link
Author

With torch.compile

{'pipeline_id': 'nota-ai/bk-sdm-small', 'total_time (ms)': '41485.8', 'memory (mb)': 3214, 'vae_path': 'sayakpaul/taesd-diffusers'}

@sayakpaul
Copy link
Author

Benchmark conducted on Tesla P8 (24GB VRAM).

Environment:

- `diffusers` version: 0.20.0.dev0
- Platform: Linux-5.15.0-76-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

diffusers installed from this commit: 801a5e2199bf0043c02b2df060aa2d28c6c61d86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment