Skip to content

Instantly share code, notes, and snippets.

View mlabonne's full-sized avatar

Maxime Labonne mlabonne

View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.8-mistral-7b-v02 38.99 72.22 51.96 40.41 50.9

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 35.79 ± 1.88
Model EQ-Bench Average
AlphaMonarch-7B 73.62 73.62

EQ-Bench

Task Version Metric Value Stderr
eq_bench 2.1 eqbench,none 73.62
eqbench_stderr,none 2
percent_parseable,none 97.66
Model AGIEval GPT4All TruthfulQA Bigbench Average
Hermes-2-Pro-Mistral-7B 44.54 71.2 59.12 41.9 54.19

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 22.83 ± 2.64
agieval_logiqa_en 0 acc 38.40 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
CodeNinja-1.0-OpenChat-7B 39.98 71.77 48.73 40.92 50.35

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.10 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
Kunoichi-DPO-v2-7B 44.79 75.05 65.68 47.65 58.29

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 38.71 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
Beyonder-4x7B-v3 45.85 76.67 74.98 50.12 61.91

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 39.48 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
UltraMerge-7B 44.36 77.15 78.47 49.35 62.33

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 23.23 ± 2.65
agieval_logiqa_en 0 acc 39.48 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
FrankenMonarch-7B 45.1 75.53 73.86 46.79 60.32

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.59 ± 2.74
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
AlphaMonarch-dora 45.42 76.93 78.48 50.18 62.75

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.71 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
AlphaMonarch-daser 45.48 76.95 78.46 50.21 62.77

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.71 ± 1.91