Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
LLAMA_Harsha_8_B_ORDP_10k | 35.54 | 71.15 | 55.39 | 37.96 | 50.01 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 26.77 | ± | 2.78 |
acc_norm | 27.17 | ± | 2.80 | ||
agieval_logiqa_en | 0 | acc | 31.34 | ± | 1.82 |