Skip to content

Instantly share code, notes, and snippets.

View CultriX-Github's full-sized avatar

CultriX CultriX-Github

View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
CultMerge-7B-v1 45.2 77.1 78.22 49.87 62.6

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 25.59 ± 2.74
agieval_logiqa_en 0 acc 39.48 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
Phi-3-Goru 38.59 70.54 59.44 38.23 51.7

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 24.41 ± 2.70
agieval_logiqa_en 0 acc 33.18 ± 1.85
Model AGIEval GPT4All TruthfulQA Bigbench Average
MonaTrix-7B-DPOv2 45.63 76.98 78.63 50.18 62.86

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.92 ± 2.88
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 39.94 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
YamshadowExperiment28-7B 44.73 77.28 78.85 49.73 62.65

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 39.17 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralShadow-7B-v2 44.95 77.17 78.53 49.44 62.52

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.95 ± 2.82
acc_norm 25.20 ± 2.73
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralShadow-7B 44.58 77.21 78.91 49.6 62.57

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.77 ± 2.78
acc_norm 23.62 ± 2.67
agieval_logiqa_en 0 acc 38.56 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
ACultriX-7B 45.23 77 78.95 49.82 62.75

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.95 ± 2.82
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 39.63 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.8-mistral-7b-v02 38.99 72.22 51.96 40.41 50.9

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 35.79 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
Hermes-2-Pro-Mistral-7B 44.54 71.2 59.12 41.9 54.19

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 22.83 ± 2.64
agieval_logiqa_en 0 acc 38.40 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
Einstein-v4-7B 37.83 67.52 55.56 38.78 49.92

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.62 ± 2.67
acc_norm 22.83 ± 2.64
agieval_logiqa_en 0 acc 37.33 ± 1.90