The following benchmark is run on an AORUS RTX 3090 eGPU connected to an HP Z8 G4 w/ dual port HP Thunderbolt 3 card.
env CUDA_VISIBLE_DEVICES=0 llama-bench \
--model "${LLAMA_CACHE}/gpt-oss-120b-mxfp4.gguf" \
--threads 1 \
--flash-attn 1 \