Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
Phi-3-mini-4k-instruct | 44.44 | 71.88 | 57.77 | 41.9 | 54 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 29.13 | ± | 2.86 |
acc_norm | 28.74 | ± | 2.85 | ||
agieval_logiqa_en | 0 | acc | 42.86 | ± | 1.94 |