pip install -r torchao/prototype/autoround/requirements.txt - Benchmark llama2/llama3 with lightweight config, it depends on a small fix in pytorch/ao#769.
 
# auto-round w/ quant_lm_head
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --quantization autoround