Skip to content

Instantly share code, notes, and snippets.

@bjacob
Last active March 18, 2024 13:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjacob/75717f6335603628d09c58313aa0d47d to your computer and use it in GitHub Desktop.
Save bjacob/75717f6335603628d09c58313aa0d47d to your computer and use it in GitHub Desktop.
Download, compile and run OPT-1.3b on CPU with IREE

Download

Download the MLIR model code without parameters (so that's lightweight and will compile fast): https://storage.googleapis.com/shark_tank/elias/facebook_opt_1.3b.mlir

Download the parameters: https://storage.googleapis.com/shark_tank/elias/facebook_opt_1.3b_weights.irpa

The below command lines assumes that these have been downloaded under $HOME/testing.

Compile

iree-compile \
  --iree-hal-target-backends=llvm-cpu \
  --iree-llvmcpu-target-cpu=znver4 \
  --iree-llvmcpu-enable-ukernels=all \
  ~/testing/facebook_opt_1.3b.mlir -o /tmp/facebook_opt_1.3b.vmfb

On my AMD 7950X3D PC, this takes about 5 seconds. Note: without the -iree-llvmcpu-enable-ukernels=all, it takes 3x longer (15 seconds).

Note that the --iree-llvmcpu-target-cpu= is really important to allow iree-compile to generate code for anything more recent than baseline x86-64 (ie SSE2). In addition, -iree-llvmcpu-enable-ukernels=all ensures that you benefit from the latest arithmetic optimizations.

Benchmark

iree-benchmark-module \
  --parameters=model=$HOME/testing/facebook_opt_1.3b_weights.irpa \
  --module=/tmp/facebook_opt_1.3b.vmfb \
  --function=run --input=1x1xi64

On my AMD 7950X3D, this prints:

----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
BM_run/process_time/real_time        514 ms         8144 ms            1 items_per_second=1.94693/s

"Time" is the wall-clock latency, 514 ms here. Without the above ukernels flag to iree-compile, it's about 560 ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment