Skip to content

Instantly share code, notes, and snippets.

@bharadwajswarna2
Last active April 18, 2024 06:14

Revisions

  1. bharadwajswarna2 revised this gist Apr 18, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gemma-2b-sft-tel-float16-Nous.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
    |-------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
    |[gemma-2b-sft-tel](https://huggingface.co/bharadwajswarna/gemma-2b-sft-tel)| 21.53| 55.56| 48.33| 30.56| 38.99|
    |[gemma-2b-sft-telugu](https://huggingface.co/bharadwajswarna/gemma-2b-sft-telugu)| 21.53| 55.56| 48.33| 30.56| 38.99|

    ### AGIEval
    | Task |Version| Metric |Value| |Stderr|
  2. bharadwajswarna2 revised this gist Apr 18, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gemma-2b-sft-tel-float16-Nous.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
    |-------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
    |[gemma-2b-sft-tel-float16](https://huggingface.co/bharadwajswarna/gemma-2b-sft-tel-float16)| 21.53| 55.56| 48.33| 30.56| 38.99|
    |[gemma-2b-sft-tel](https://huggingface.co/bharadwajswarna/gemma-2b-sft-tel)| 21.53| 55.56| 48.33| 30.56| 38.99|

    ### AGIEval
    | Task |Version| Metric |Value| |Stderr|
  3. bharadwajswarna2 created this gist Apr 18, 2024.
    80 changes: 80 additions & 0 deletions gemma-2b-sft-tel-float16-Nous.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,80 @@
    | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
    |-------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
    |[gemma-2b-sft-tel-float16](https://huggingface.co/bharadwajswarna/gemma-2b-sft-tel-float16)| 21.53| 55.56| 48.33| 30.56| 38.99|

    ### AGIEval
    | Task |Version| Metric |Value| |Stderr|
    |------------------------------|------:|--------|----:|---|-----:|
    |agieval_aqua_rat | 0|acc |19.29|± | 2.48|
    | | |acc_norm|23.23|± | 2.65|
    |agieval_logiqa_en | 0|acc |22.43|± | 1.64|
    | | |acc_norm|25.35|± | 1.71|
    |agieval_lsat_ar | 0|acc |17.39|± | 2.50|
    | | |acc_norm|16.09|± | 2.43|
    |agieval_lsat_lr | 0|acc |20.39|± | 1.79|
    | | |acc_norm|21.96|± | 1.83|
    |agieval_lsat_rc | 0|acc |18.59|± | 2.38|
    | | |acc_norm|17.47|± | 2.32|
    |agieval_sat_en | 0|acc |18.45|± | 2.71|
    | | |acc_norm|22.33|± | 2.91|
    |agieval_sat_en_without_passage| 0|acc |18.93|± | 2.74|
    | | |acc_norm|19.42|± | 2.76|
    |agieval_sat_math | 0|acc |29.09|± | 3.07|
    | | |acc_norm|26.36|± | 2.98|

    Average: 21.53%

    ### GPT4All
    | Task |Version| Metric |Value| |Stderr|
    |-------------|------:|--------|----:|---|-----:|
    |arc_challenge| 0|acc |33.96|± | 1.38|
    | | |acc_norm|36.35|± | 1.41|
    |arc_easy | 0|acc |61.07|± | 1.00|
    | | |acc_norm|55.09|± | 1.02|
    |boolq | 1|acc |66.88|± | 0.82|
    |hellaswag | 0|acc |48.89|± | 0.50|
    | | |acc_norm|63.67|± | 0.48|
    |openbookqa | 0|acc |26.40|± | 1.97|
    | | |acc_norm|34.00|± | 2.12|
    |piqa | 0|acc |74.43|± | 1.02|
    | | |acc_norm|74.86|± | 1.01|
    |winogrande | 0|acc |58.09|± | 1.39|

    Average: 55.56%

    ### TruthfulQA
    | Task |Version|Metric|Value| |Stderr|
    |-------------|------:|------|----:|---|-----:|
    |truthfulqa_mc| 1|mc1 |30.84|± | 1.62|
    | | |mc2 |48.33|± | 1.52|

    Average: 48.33%

    ### Bigbench
    | Task |Version| Metric |Value| |Stderr|
    |------------------------------------------------|------:|---------------------|----:|---|-----:|
    |bigbench_causal_judgement | 0|multiple_choice_grade|52.63|± | 3.63|
    |bigbench_date_understanding | 0|multiple_choice_grade|40.92|± | 2.56|
    |bigbench_disambiguation_qa | 0|multiple_choice_grade|30.23|± | 2.86|
    |bigbench_geometric_shapes | 0|multiple_choice_grade|11.70|± | 1.70|
    | | |exact_str_match | 0.00|± | 0.00|
    |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|22.40|± | 1.87|
    |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|14.29|± | 1.32|
    |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|34.33|± | 2.75|
    |bigbench_movie_recommendation | 0|multiple_choice_grade|27.00|± | 1.99|
    |bigbench_navigate | 0|multiple_choice_grade|47.60|± | 1.58|
    |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|23.80|± | 0.95|
    |bigbench_ruin_names | 0|multiple_choice_grade|28.35|± | 2.13|
    |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|18.34|± | 1.23|
    |bigbench_snarks | 0|multiple_choice_grade|54.14|± | 3.71|
    |bigbench_sports_understanding | 0|multiple_choice_grade|49.70|± | 1.59|
    |bigbench_temporal_sequences | 0|multiple_choice_grade|27.00|± | 1.40|
    |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|19.28|± | 1.12|
    |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|14.00|± | 0.83|
    |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|34.33|± | 2.75|

    Average: 30.56%

    Average score: 38.99%

    Elapsed time: 05:39:41