Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
---|---|---|---|---|---|
FelixChao/WestSeverus-7B-DPO-v2 π | 60.98 | 45.29 | 77.2 | 72.72 | 48.71 |
CultriX/CombinaTrix-7B π | 60.58 | 45.52 | 77.42 | 71.12 | 48.24 |
CultriX/OmniTrixAI π | 60.35 | 44.94 | 77.31 | 70.62 | 48.52 |
mlabonne/NeuralBeagle14-7B π | 60.25 | 46.06 | 76.77 | 70.32 | 47.86 |
jsfs11/TurdusTrixBeagle-DARETIES-7B π | 59.99 | 44.46 | 77.81 | 69.15 | 48.54 |
CultriX/SevereNeuralBeagleTrix-7B π | 59.82 | 44.37 | 77.38 | 69.59 | 47.95 |
CultriX/MergeTrix-7B-v2 π | 59.53 | 44.7 | 77.66 | 67.52 | 48.23 |
senseable/WestLake-7B-v2 π | 59.42 | 44.27 | 77.86 | 67.46 | 48.09 |
fblgit/UNA-TheBeagle-7b-v1 π | 59.17 | 42.73 | 77.12 | 70.82 | 46.01 |
CultriX/MergeTrix-7B π | 58.88 | 44.93 | 76.85 | 66.56 | 47.18 |
mlabonne/Marcoro14-7B-slerp π | 57.67 | 44.66 | 76.24 | 64.15 | 45.64 |
microsoft/phi-2 π | 44.61 | 27.96 | 70.84 | 44.46 | 35.17 |
TheBloke/guanaco-7B-HF π | 40.38 | 23.12 | 66.85 | 38.92 | 32.64 |
TinyLlama/TinyLlama-1.1B-Chat-v1.0 π | 36.32 | 20.77 | 54.28 | 37.84 | 32.4 |
Forked from mlabonne/YALL - Yet Another LLM Leaderboard.md
Last active
January 26, 2024 03:46
-
-
Save CultriX-Github/b76cac4fadb466ec7053a3d056bf4e4b to your computer and use it in GitHub Desktop.
Leaderboard made with π§ LLM AutoEval (https://github.com/mlabonne/llm-autoeval) using Nous benchmark suite.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment