Details here: https://github.com/EvilFreelancer/benchmarking-llms
- Graphics Card: RTX 4090 24Gb
- CUDA Version: 11.7 (for ruGPT3 family) and 11.8 (for other models)
- Python Version: 3.11.4
max_new_tokens=1024,
top_k=20,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
use_cache=False
Name | Size | Context | VRAM (Gb) | MAX Init RAM (Gb) | AVG GenTime (s) | AVG Tokens | AVG t/s |
---|---|---|---|---|---|---|---|
StableBeluga 7b | 7b | 4096 | ~22.5 | ~22.7 | ~31.25 | ~529.7 | ~16.9 |
LLaMA 7b | 7b | 4096 | ~22.47 | ~22.7 | ~34.52 | ~545.5 | ~15.8 |
LLaMA 2 7b | 7b | 4096 | |||||
LLaMA 2 7b 32k | 7b-32k | 32768 | ~21.5 | ~22.7 | ~56.63 | ~868.5 | ~15.3 |
MosaicML 7b | 7b | 8192 | ~22.6 (~13.7) | ~9.8 | ~87.27 | ~1046.2 | ~12.0 |
MosaicML 7b-storywriter | 7b-storywriter | 65536 | ~22.9 | ~10.4 | ~109.12 | ~1048.2 | ~9.6 |
MosaicML 7b-instruct | 7b-instruct | 4096 | ~22.93 | ~9.8 | ~110.47 | ~1045.2 | ~9.5 |
MosaicML 7b-instruct-8k | 7b-instruct-8k | 8192 | ~22.66 | ~10.5 | ~84.32 | ~1045.5 | ~12.4 |
ruGPT 3 small | 125m | 2048 | ~6.18 | ~1.3 | ~6.4 | ~1041.8 | ~162.7 |
ruGPT 3 medium | 410m | 2048 | ~6.66 | ~2.6 | ~12.74 | ~1044.3 | ~82.0 |
ruGPT 3 large | 750m | 2048 | ~7.48 | ~5.2 | ~15.19 | ~1045.5 | ~68.8 |
ruGPT 3 xl | 1.3B | 2048 | ~13.76 | ~4.7 | ~13.38 | ~567.1 | ~42.4 |
ruGPT 3.5 13b | 13b | 2048 | |||||
mGPT | 1.3b | 2048 | ~22.96 (~4.11) | ~7.01 | ~24.72 | ~1046.8 | ~42.3 |
mGPT 13b | 13b | 2048 |