Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
pandafish-3-7B-32k | 40.85 | 73.57 | 56.3 | 42.17 | 53.22 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 20.47 | ± | 2.54 |
acc_norm | 20.87 | ± | 2.55 | ||
agieval_logiqa_en | 0 | acc | 34.10 | ± | 1.86 |