Download the MLIR model code without parameters (so that's lightweight and will compile fast): https://storage.googleapis.com/shark_tank/elias/facebook_opt_1.3b.mlir
Download the parameters: https://storage.googleapis.com/shark_tank/elias/facebook_opt_1.3b_weights.irpa
The below command lines assumes that these have been downloaded under $HOME/testing
.
iree-compile \
--iree-hal-target-backends=llvm-cpu \
--iree-llvmcpu-target-cpu=znver4 \
--iree-llvmcpu-enable-ukernels=all \
~/testing/facebook_opt_1.3b.mlir -o /tmp/facebook_opt_1.3b.vmfb
On my AMD 7950X3D PC, this takes about 5 seconds. Note: without the -iree-llvmcpu-enable-ukernels=all
, it takes 3x longer (15 seconds).
Note that the --iree-llvmcpu-target-cpu=
is really important to allow iree-compile
to generate code for anything more recent than baseline x86-64 (ie SSE2). In addition, -iree-llvmcpu-enable-ukernels=all
ensures that you benefit from the latest arithmetic optimizations.
iree-benchmark-module \
--parameters=model=$HOME/testing/facebook_opt_1.3b_weights.irpa \
--module=/tmp/facebook_opt_1.3b.vmfb \
--function=run --input=1x1xi64
On my AMD 7950X3D, this prints:
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------
BM_run/process_time/real_time 514 ms 8144 ms 1 items_per_second=1.94693/s
"Time" is the wall-clock latency, 514 ms here. Without the above ukernels
flag to iree-compile
, it's about 560 ms.