Skip to content

Instantly share code, notes, and snippets.

View CultriX-Github's full-sized avatar

CultriX CultriX-Github

View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
AlphaMonarch-daser 45.48 76.95 78.46 50.21 62.77

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.71 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
FrankenMonarch-7B 45.1 75.53 73.86 46.79 60.32

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.59 ± 2.74
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
UltraMerge-7B 44.36 77.15 78.47 49.35 62.33

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 23.23 ± 2.65
agieval_logiqa_en 0 acc 39.48 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
Beyonder-4x7B-v3 45.85 76.67 74.98 50.12 61.91

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 39.48 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
Kunoichi-DPO-v2-7B 44.79 75.05 65.68 47.65 58.29

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 24.02 ± 2.69
agieval_logiqa_en 0 acc 38.71 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
CodeNinja-1.0-OpenChat-7B 39.98 71.77 48.73 40.92 50.35

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.10 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
Monatrix-v4-dpo 45.4 76.33 78.44 49.59 62.44

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.13 ± 2.86
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralCeptrix-7B-SLERPv3 45.28 77.03 78.84 49.75 62.73

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.10 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
T3Q-Mistral-Orca-Math-DPO 44.41 76.83 78.78 49.43 62.36

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 23.62 ± 2.67
agieval_logiqa_en 0 acc 39.32 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
M7-7b 44.84 77.01 78.4 49.1 62.34

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 25.20 ± 2.73
agieval_logiqa_en 0 acc 39.78 ± 1.92