| Method | Memory | Compression | Quality Loss | Works On |
|---|---|---|---|---|
| FP16 (baseline) | 19.3 GB | 1× | 0% | All GPUs |
| AWQ | 5.2 GB | 3.7× | ~2% | L4, A100 |
| FP8 | ~9.7 GB | 2× | ~1% | H100 only |
-
-
Save bhatti/6a98791438c77e17eec78605d93ed349 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment