Skip to content

Instantly share code, notes, and snippets.

@mlabonne
Last active July 9, 2024 09:28
Show Gist options
  • Save mlabonne/90294929a2dbcb8877f9696f28105fdf to your computer and use it in GitHub Desktop.
Save mlabonne/90294929a2dbcb8877f9696f28105fdf to your computer and use it in GitHub Desktop.
Leaderboard made with 🧐 LLM AutoEval (https://github.com/mlabonne/llm-autoeval) using Nous benchmark suite.
Model Average AGIEval GPT4All TruthfulQA Bigbench
mlabonne/OmniTruthyBeagle-7B-v0 πŸ“„ 57.8 45.72 77.49 76.16 50.18
mlabonne/NeuralOmniBeagle-7B-v2 πŸ“„ 57.75 45.86 77.31 75.34 50.09
mlabonne/OmniBeagle-7B πŸ“„ 57.72 45.64 77.48 75.03 50.03
mlabonne/NeuralOmniBeagle-7B πŸ“„ 57.71 45.85 77.26 76.06 50.03
mlabonne/NeuralOmni-7B πŸ“„ 57.7 45.8 77.5 74.51 49.8
mlabonne/OmniTruthyBeagle-7B πŸ“„ 57.69 45.65 77.22 75.77 50.21
mlabonne/Omnarch-7B πŸ“„ 57.64 45.88 77.28 74.07 49.76
mlabonne/BeagleB-7B πŸ“„ 57.61 45.19 77.75 73.19 49.88
mlabonne/Monarch-7B πŸ“„ 57.56 45.48 77.07 78.04 50.14
mlabonne/Beyonder-4x7B-v3 πŸ“„ 57.55 45.85 76.67 74.98 50.12
abideen/AlphaMonarch-daser πŸ“„ 57.55 45.48 76.95 78.46 50.21
mlabonne/AlphaMonarch-7B πŸ“„ 57.53 45.37 77.01 78.39 50.2
mlabonne/NeuralMonarch-7B πŸ“„ 57.53 45.31 76.99 78.35 50.28
mlabonne/Beagle4 πŸ“„ 57.53 45.5 77.38 73.84 49.7
shadowml/BeagSake-7B πŸ“„ 57.53 45.9 77.36 72.82 49.32
shadowml/WestBeagle-7B πŸ“„ 57.52 46.19 77.23 72.25 49.15
abideen/AlphaMonarch-dora πŸ“„ 57.51 45.42 76.93 78.48 50.18
abideen/AlphaMonarch-laser πŸ“„ 57.51 45.39 77.0 78.4 50.15
mlabonne/Monarch-7B-dare πŸ“„ 57.44 45.16 77.22 77.98 49.95
shadowml/WestBeagle-7B-gen3 πŸ“„ 57.42 45.74 77.28 72.29 49.23
mlabonne/ArchBeagle-7B πŸ“„ 57.41 45.56 77.32 73.36 49.36
shadowml/OmnixBeagle-7B πŸ“„ 57.38 45.3 77.64 75.24 49.2
shadowml/BeagleSempra-7B πŸ“„ 57.38 45.56 77.44 73.35 49.15
mlabonne/Monarch-7B-slerp πŸ“„ 57.26 45.13 77.09 78.63 49.56
shadowml/FoxBeagle-7B πŸ“„ 57.26 45.46 77.42 72.08 48.91
shadowml/BeagleX-7B πŸ“„ 57.18 45.39 77.52 72.91 48.63
flemmingmiguel/MBX-7B-v3 πŸ“„ 57.14 45.08 77.72 72.61 48.63
mlabonne/Zebrafish-7B πŸ“„ 57.13 44.92 77.18 78.25 49.28
shadowml/Beaglake-7B πŸ“„ 57.1 45.03 77.8 72.58 48.48
shadowml/TurdusBeagle-7B-gen3 πŸ“„ 57.1 45.08 77.52 70.36 48.69
shadowml/TurdusBeagle-7B πŸ“„ 57.1 45.08 77.52 70.36 48.69
shadowml/MBTrix-7B πŸ“„ 57.08 44.92 77.14 77.26 49.18
mlabonne/Zebrafish-slerp-7B πŸ“„ 57.07 44.83 77.13 78.27 49.25
shadowml/Beagwake-7B πŸ“„ 57.04 45.03 77.54 72.37 48.56
shadowml/MBeagleX-7B πŸ“„ 57.02 45.02 76.87 78.04 49.18
mlabonne/Zebrafish-linear-7B πŸ“„ 56.98 44.58 77.12 78.25 49.24
mlabonne/Zebrafish-dare-7B πŸ“„ 56.96 44.68 77.0 78.28 49.21
mlabonne/UltraMerge-7B πŸ“„ 56.95 44.36 77.15 78.47 49.35
mlabonne/NeuralBeagle14-7B πŸ“„ 56.9 46.06 76.77 70.32 47.86
mlabonne/FrankenMonarch-11b πŸ“„ 56.89 44.01 76.45 76.7 50.22
yam-peleg/Experiment26-7B πŸ“„ 56.85 44.49 77.06 78.58 49.0
mlabonne/NeuBeagle-7B πŸ“„ 56.81 44.43 76.62 79.13 49.38
bardsai/jaskier-7b-dpo-v3.3 πŸ“„ 56.77 44.57 76.53 80.0 49.22
mlabonne/DareBeagle-7B-v2 πŸ“„ 56.75 45.6 76.58 69.48 48.07
CultriX/NeuralTrix-7B-dpo πŸ“„ 56.73 44.61 76.33 79.8 49.24
shadowml/DareBeagle-7B πŸ“„ 56.72 45.47 76.63 69.48 48.05
CultriX/NeuralTrix-bf16 πŸ“„ 56.7 44.43 76.43 80.18 49.23
mlabonne/UltraMerge-v2-7B πŸ“„ 56.69 44.16 76.72 79.58 49.2
argilla/distilabeled-Marcoro14-7B-slerp πŸ“„ 56.68 45.38 76.48 65.68 48.18
flemmingmiguel/MBX-7B-v2 πŸ“„ 56.66 44.23 77.27 71.04 48.47
mlabonne/NeuralDaredevil-7B πŸ“„ 56.65 45.23 76.2 67.61 48.52
shadowml/DareBeagel-2x7B πŸ“„ 56.63 45.51 76.56 69.45 47.82
mlabonne/FrakenBeagle14-11B πŸ“„ 56.58 45.08 76.08 70.93 48.58
occultml/CatMarcoro14-7B-slerp πŸ“„ 56.14 45.21 75.91 63.81 47.31
shadowml/mibe-7B πŸ“„ 56.13 44.22 76.9 71.25 47.27
mlabonne/NeuralDarewin-7B πŸ“„ 56.08 45.6 74.29 63.15 48.35
mlabonne/Beagle14-7B πŸ“„ 56.05 44.38 76.53 69.44 47.25
shadowml/Daredevil-7B πŸ“„ 56.0 44.85 76.07 64.89 47.07
mlabonne/Darewin-7B πŸ“„ 55.96 45.08 75.36 60.94 47.44
mlabonne/NeuralMarcoro14-7B πŸ“„ 55.89 44.59 76.17 65.94 46.9
mlabonne/Beyonder-4x7B-v2 πŸ“„ 55.88 45.29 75.95 60.86 46.4
OpenPipe/mistral-ft-optimized-1218 πŸ“„ 55.84 44.74 75.6 59.89 47.17
SanjiWatsuki/Kunoichi-DPO-v2-7B πŸ“„ 55.83 44.79 75.05 65.68 47.65
mlabonne/FrankenMonarch-7B πŸ“„ 55.81 45.1 75.53 73.86 46.79
mlabonne/Marcoro14-7B-slerp πŸ“„ 55.51 44.66 76.24 64.15 45.64
Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp πŸ“„ 55.29 43.5 74.88 63.22 47.5
fblgit/una-cybertron-7b-v2-bf16 πŸ“„ 55.24 43.29 74.98 65.32 47.45
mlabonne/Daredevil-8B πŸ“„ 54.81 44.13 73.52 59.05 46.77
openchat/openchat-3.6-8b-20240522 πŸ“„ 54.73 44.03 73.67 49.78 46.48
mlabonne/NeuralDaredevil-8B-abliterated πŸ“„ 54.71 43.73 73.6 59.36 46.8
mlabonne/Daredevil-8B-abliterated πŸ“„ 54.26 43.29 73.33 57.47 46.17
Nexusflow/Starling-LM-7B-beta πŸ“„ 54.17 44.21 73.7 56.45 44.6
openchat/openchat-3.5-0106 πŸ“„ 54.1 44.17 73.72 52.53 44.4
NousResearch/Hermes-2-Theta-Llama-3-8B πŸ“„ 53.58 43.9 72.62 56.36 44.23
openchat/openchat-3.5-1210 πŸ“„ 53.11 42.62 72.84 53.21 43.88
mlabonne/NeuralHermes-2.5-Mistral-7B-laser πŸ“„ 53.07 43.54 73.44 55.26 42.24
NousResearch/Hermes-2-Pro-Llama-3-8B πŸ“„ 52.9 42.52 72.64 57.8 43.53
mlabonne/NeuralHermes-2.5-Mistral-7B πŸ“„ 52.89 43.67 73.24 55.37 41.76
microsoft/Phi-3-mini-4k-instruct πŸ“„ 52.74 44.44 71.88 57.77 41.9
openchat/openchat_3.5 πŸ“„ 52.7 42.67 72.92 47.27 42.51
NousResearch/Hermes-2-Pro-Mistral-7B πŸ“„ 52.55 44.54 71.2 59.12 41.9
mlabonne/ChimeraLlama-3-8B-v3 πŸ“„ 52.49 42.11 71.48 55.03 43.87
berkeley-nest/Starling-LM-7B-alpha πŸ“„ 52.44 42.06 72.72 47.33 42.53
FuseAI/FuseChat-7B-VaRM πŸ“„ 52.3 41.91 72.02 46.76 42.96
teknium/OpenHermes-2.5-Mistral-7B πŸ“„ 52.23 42.75 72.99 52.99 40.94
FuseAI/OpenChat-3.5-7B-Mixtral πŸ“„ 52.22 41.97 71.95 46.81 42.73
FuseAI/OpenChat-3.5-7B-Solar πŸ“„ 52.2 41.61 71.99 46.7 43.01
FuseAI/FuseChat-7B-Slerp πŸ“„ 52.16 41.73 72.03 46.72 42.71
mlabonne/ChimeraLlama-3-8B-v2 πŸ“„ 52.13 41.01 71.11 55.48 44.26
mlabonne/NeuralLlama-3-8B-Instruct-abliterated πŸ“„ 51.6 41.6 69.95 54.22 43.26
cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser πŸ“„ 51.56 38.32 73.77 61.03 42.58
mlabonne/ChimeraLlama-3-8B πŸ“„ 51.3 39.12 71.81 52.4 42.98
meta-llama/Meta-Llama-3-8B-Instruct πŸ“„ 51.24 41.22 69.86 51.65 42.64
Open-Orca/Mistral-7B-OpenOrca πŸ“„ 51.09 39.24 72.39 52.27 41.65
beowolx/CodeNinja-1.0-OpenChat-7B πŸ“„ 50.89 39.98 71.77 48.73 40.92
failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 πŸ“„ 50.81 40.23 69.5 52.44 42.69
mistralai/Mistral-7B-Instruct-v0.2 πŸ“„ 50.81 38.5 71.64 66.82 42.29
cognitivecomputations/dolphin-2.9-llama3-8b πŸ“„ 50.8 39.59 69.96 55.66 42.84
mlabonne/Darewin-7B-v2 πŸ“„ 50.61 37.67 73.16 49.5 41.01
cognitivecomputations/dolphin-2.8-mistral-7b-v02 πŸ“„ 50.54 38.99 72.22 51.96 40.41
mlabonne/Llama-3-DARE-8B πŸ“„ 50.42 38.2 71.38 50.15 41.69
HuggingFaceH4/zephyr-7b-alpha πŸ“„ 50.27 38.0 72.24 56.06 40.57
mlabonne/zephusion-2x7b πŸ“„ 50.22 37.82 72.14 55.96 40.71
dreamgen/opus-v1.2-llama-3-8b πŸ“„ 50.2 37.9 70.55 50.45 42.16
Weyaxi/Einstein-v6.1-Llama3-8B πŸ“„ 50.17 36.33 73.08 55.07 41.11
cognitivecomputations/dolphin-2.2.1-mistral-7b πŸ“„ 50.03 38.64 72.24 54.09 39.22
mlabonne/Meta-Llama-3-12B-Instruct πŸ“„ 50.0 41.7 67.71 52.75 40.58
mlabonne/Llama-3-SLERP-8B πŸ“„ 49.91 36.82 72.03 49.52 40.88
HuggingFaceH4/zephyr-7b-beta πŸ“„ 49.62 37.33 71.83 55.1 39.7
abacusai/Llama-3-Smaug-8B πŸ“„ 48.98 37.15 69.12 51.66 40.67
mlabonne/Llama-3-linear-8B πŸ“„ 48.41 39.89 70.88 43.01 34.46
AetherResearch/Cerebrum-1.0-7b πŸ“„ 48.2 35.25 71.93 46.99 37.43
Weyaxi/Einstein-v4-7B πŸ“„ 48.04 37.83 67.52 55.56 38.78
shadowml/phixtral-4x2_8odd πŸ“„ 48.03 34.46 72.34 49.56 37.3
Venkman42/Phiter πŸ“„ 48.01 34.62 71.23 48.93 38.18
mlabonne/Llama-3-12B-Instruct πŸ“„ 48.0 36.04 67.53 51.36 40.44
Venkman42/PhiPhiter πŸ“„ 47.93 34.65 71.14 48.54 37.99
rhysjones/phi-2-orange-v2 πŸ“„ 47.89 34.55 70.96 54.87 38.17
Venkman42/ReversePhiter πŸ“„ 47.87 35.0 70.64 48.31 37.97
shadowml/phixtral-4x2_8odo πŸ“„ 47.87 33.74 71.93 48.68 37.95
mlabonne/phixtral-3x2_8 πŸ“„ 47.78 33.58 72.1 49.59 37.67
Muhammad2003/OrpoLlama3-8B πŸ“„ 47.59 34.26 70.91 55.4 37.59
Lumpen1/Orpo-Mad-Max-Mistral-7B-v0.3 πŸ“„ 47.58 35.4 71.26 50.74 36.07
microsoft/WizardLM-2-7B πŸ“„ 47.52 35.76 68.56 56.46 38.24
mlabonne/phixtral-2x2_8 πŸ“„ 47.45 34.1 70.44 48.78 37.82
mlabonne/OrpoLlama-3-8B πŸ“„ 47.37 34.17 70.59 52.39 37.36
mlabonne/phixtral-4x2_8 πŸ“„ 47.34 33.91 70.44 48.78 37.68
rhysjones/phi-2-orange πŸ“„ 47.33 33.37 71.33 49.87 37.3
meta-math/MetaMath-Mistral-7B πŸ“„ 47.25 33.91 70.12 44.83 37.71
cognitivecomputations/dolphin-phi-2-kensho πŸ“„ 47.14 34.05 69.25 50.2 38.11
Locutusque/Llama-3-Orca-1.0-8B πŸ“„ 47.13 34.37 69.34 49.95 37.69
mlabonne/Mistralpaca-7B πŸ“„ 47.08 33.48 70.71 52.89 37.06
mistralai/Mistral-7B-Instruct-v0.1 πŸ“„ 46.9 33.36 67.87 55.89 39.48
meetkai/functionary-small-v2.2 πŸ“„ 46.82 33.15 70.35 51.5 36.97
abacaj/phi-2-super πŸ“„ 46.75 31.95 70.81 48.39 37.49
cognitivecomputations/dolphin-2_6-phi-2 πŸ“„ 46.72 33.12 69.85 47.39 37.2
marcel/phixtral-4x2_8-gates-poc πŸ“„ 46.34 31.78 70.22 47.01 37.02
macadeliccc/Mistral-7B-v0.2-OpenHermes πŸ“„ 46.33 35.57 67.15 42.06 36.27
Lumpen1/MadWizardOrpoMistral-7b-v0.3 πŸ“„ 46.18 32.47 71.75 47.45 34.33
meta-llama/Meta-Llama-3-8B πŸ“„ 45.92 31.1 69.95 43.91 36.7
g-ronimo/phi-2-OpenHermes-2.5 πŸ“„ 45.78 30.27 71.18 43.87 35.9
Yhyu13/phi-2-sft-dpo-gpt4_en-ep1 πŸ“„ 45.66 30.61 71.13 48.74 35.23
lxuechen/phi-2-dpo πŸ“„ 45.66 30.39 71.68 50.75 34.9
deepseek-ai/deepseek-moe-16b-chat πŸ“„ 44.72 30.42 68.72 48.73 35.02
microsoft/phi-2 πŸ“„ 44.66 27.98 70.8 44.43 35.21
mlabonne/Meta-Llama-3-12B πŸ“„ 44.35 29.46 68.01 41.02 35.57
stabilityai/stablelm-zephyr-3b πŸ“„ 43.74 34.04 62.07 46.46 35.11
mlabonne/Llama-3-12B πŸ“„ 43.73 28.11 68.75 43.02 34.34
venkycs/phi-2-instruct πŸ“„ 43.54 25.8 67.93 44.82 36.88
Qwen/CodeQwen1.5-7B-Chat πŸ“„ 38.28 27.42 53.72 44.71 33.71
Qwen/CodeQwen1.5-7B πŸ“„ 37.72 24.84 54.76 42.36 33.55
mlabonne/Gemmalpaca-2B πŸ“„ 35.52 24.48 51.22 47.02 30.85
google/gemma-2b πŸ“„ 32.36 22.7 43.35 39.96 31.03
google/gemma-2b-it πŸ“„ 32.26 23.76 43.6 47.64 29.41
mlabonne/OrcaGemma-2B πŸ“„ 32.23 24.44 42.49 45.84 29.76
mlabonne/OrcaGemma-2B-v2 πŸ“„ 31.79 24.22 42.24 44.51 28.9
mlabonne/Gemmalpaca-7B πŸ“„ 31.0 21.68 40.93 44.76 30.38
google/gemma-7b-it πŸ“„ 30.81 21.33 40.84 41.7 30.25
VAGOsolutions/SauerkrautLM-Gemma-7b πŸ“„ 29.64 20.75 39.29 46.2 28.88
alpindale/gemma-7b πŸ“„ 29.24 20.67 38.48 46.66 28.58
google/gemma-7b πŸ“„ 29.21 20.64 38.49 46.61 28.51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment