Skip to content

Instantly share code, notes, and snippets.

View CultriX-Github's full-sized avatar

CultriX CultriX-Github

View GitHub Profile
@CultriX-Github
CultriX-Github / NeuralBeagle14-7B-Nous.md
Created January 25, 2024 11:14
NeuralBeagle14-7B-Nous.md
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralBeagle14-7B 46.06 76.77 70.32 47.86 60.25

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.56 ± 1.91
@CultriX-Github
CultriX-Github / MergeTrix-7B-v2-Nous.md
Created January 25, 2024 11:15
MergeTrix-7B-v2-Nous.md
Model AGIEval GPT4All TruthfulQA Bigbench Average
MergeTrix-7B-v2 44.7 77.66 67.52 48.23 59.53

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.20 ± 2.73
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 39.32 ± 1.92
@CultriX-Github
CultriX-Github / YALL - Yet Another LLM Leaderboard.md
Last active January 26, 2024 03:46 — forked from mlabonne/YALL - Yet Another LLM Leaderboard.md
Leaderboard made with 🧐 LLM AutoEval (https://github.com/mlabonne/llm-autoeval) using Nous benchmark suite.
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[Daredevil-7B](https://huggingface.co/mlabonne/Daredevil-7B)| 44.85| 76.07| 64.89| 47.07| 58.22|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |26.38|± | 2.77|
| | |acc_norm|25.20|± | 2.73|
|agieval_logiqa_en | 0|acc |38.86|± | 1.91|
Model AGIEval GPT4All TruthfulQA Bigbench Average
distilabeled-Marcoro14-7B-slerp 45.38 76.48 65.68 48.18 58.93

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 39.17 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
openchat-3.5-1210 42.62 72.84 53.21 43.88 53.14

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 22.44 ± 2.62
acc_norm 24.41 ± 2.70
agieval_logiqa_en 0 acc 41.17 ± 1.93
Model AGIEval GPT4All TruthfulQA Bigbench Average
openchat_3.5 42.67 72.92 47.27 42.51 51.34

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 38.86 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-beta 37.33 71.83 55.1 39.7 50.99

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.26 ± 2.57
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 33.33 ± 1.85
Model AGIEval GPT4All TruthfulQA Bigbench Average
MistralTrix-v1 44.98 76.62 71.44 47.17 60.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.59 ± 2.74
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 37.48 ± 1.90