Skip to content

Instantly share code, notes, and snippets.

View tosh's full-sized avatar
💭
🍄🌈

Thomas Schranz tosh

💭
🍄🌈
View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.8-mistral-7b-v02 38.99 72.22 51.96 40.41 50.9

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 35.79 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
pandafish-3-7B-32k 40.85 73.57 56.3 42.17 53.22

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 20.47 ± 2.54
acc_norm 20.87 ± 2.55
agieval_logiqa_en 0 acc 34.10 ± 1.86
Model AGIEval GPT4All TruthfulQA Bigbench Average
pandafish-2-7b-32k 40.8 73.35 57.46 42.69 53.57

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 22.05 ± 2.61
acc_norm 19.69 ± 2.50
agieval_logiqa_en 0 acc 35.94 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
Mistral-7B-Instruct-v0.2 38.5 71.64 66.82 42.29 54.81

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.62 ± 2.67
acc_norm 22.05 ± 2.61
agieval_logiqa_en 0 acc 36.10 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
HeatherSpellGen3 44.88 76.87 78.3 49.89 62.48

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 25.20 ± 2.73
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
pandafish-dt-7b 45.24 77.19 78.41 49.76 62.65

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.95 ± 2.82
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 39.32 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
HeatherSpellGen2 40.73 75.43 72.75 47.12 59.01

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 36.41 ± 1.89
Model AGIEval GPT4All TruthfulQA Bigbench Average
HeatherSpell-7b 45.65 77.24 75.75 50 62.16

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.74 ± 2.85
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 39.63 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
pandafish-7b 40 74.23 53.22 40.51 51.99

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 21.65 ± 2.59
agieval_logiqa_en 0 acc 34.10 ± 1.86
Model AGIEval GPT4All TruthfulQA Bigbench Average
MonarchPipe-7B-slerp 46.12 74.89 66.59 47.49 58.77

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 39.32 ± 1.92