Skip to content

Instantly share code, notes, and snippets.

@relyt0925
Created July 7, 2024 02:54
Show Gist options
  • Save relyt0925/4fd6480a9b6f828f1c381f6dfecf67d3 to your computer and use it in GitHub Desktop.
Save relyt0925/4fd6480a9b6f828f1c381f6dfecf67d3 to your computer and use it in GitHub Desktop.
mt_bench_branch.log (ilab model evaluate --benchmark mt_bench_branch --model /instructlab/models/tuned-0701-1954/samples_4992 --judge-model /instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1 --taxonomy-path /instructlab/taxonomy --output-dir /instructlab/mtbench --base-model /instructlab/models/ibm/granite-7b-base --branch main --base-bran…
INFO 2024-07-05 19:34:55,630 utils.py:145: _init_num_threads Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-07-05 19:34:55,630 utils.py:148: _init_num_threads Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-07-05 19:34:55,630 utils.py:161: _init_num_threads NumExpr defaulting to 16 threads.
INFO 2024-07-05 19:34:55,789 config.py:58: <module> PyTorch version 2.3.1 available.
Generating questions and reference answers from qna files for branch main...
INFO 2024-07-05 19:35:02,464 vllm.py:148: run_vllm vLLM starting up on pid 212 at http://127.0.0.1:48895/v1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 169.53it/s]
generated 416 questions
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.83it/s]
Generating questions and reference answers from qna files for branch main...
INFO 2024-07-05 19:38:12,969 vllm.py:148: run_vllm vLLM starting up on pid 258 at http://127.0.0.1:47763/v1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 170.16it/s]
generated 416 questions
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.82it/s]
INFO 2024-07-05 19:42:41,432 vllm.py:148: run_vllm vLLM starting up on pid 301 at http://127.0.0.1:47965/v1
Evaluating answers for branch main...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:57<00:00, 4.69it/s]
Evaluating answers for branch main...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:42<00:00, 5.12it/s]
# SKILL EVALUATION REPORT
## BASE MODEL
/instructlab/models/ibm/granite-7b-base
## MODEL
/instructlab/models/tuned-0701-1954/samples_4992
### IMPROVEMENTS:
1. compositional_skills/extraction/abstractive/abstract/qna.yaml (+4.5)
2. compositional_skills/extraction/inference/qualitative/sentiment/qna.yaml (+3.0)
3. compositional_skills/extraction/information/named_entities/dates_and_events/qna.yaml (+2.0)
4. compositional_skills/extraction/annual_report/csv/qna.yaml (+1.17)
5. compositional_skills/extraction/information/named_entities/places/qna.yaml (+1.0)
6. compositional_skills/extraction/receipt/plain_text/qna.yaml (+0.83)
7. compositional_skills/extraction/fda_filing/markdown/qna.yaml (+0.83)
8. foundational_skills/reasoning/theory_of_mind/qna.yaml (+0.53)
9. compositional_skills/extraction/services_agreement/bullet_points/qna.yaml (+0.5)
10. compositional_skills/writing/freeform/technical/user_manual/qna.yaml (+0.5)
11. compositional_skills/extraction/abstractive/outline/qna.yaml (+0.5)
12. compositional_skills/writing/grounded/editing/spelling/qna.yaml (+0.5)
13. compositional_skills/extraction/commercial_lease_agreement/bullet_points/qna.yaml (+0.5)
14. compositional_skills/extraction/annual_report/reasoning/qna.yaml (+0.5)
15. compositional_skills/writing/freeform/brainstorming/refute_claim/qna.yaml (+0.5)
16. compositional_skills/extraction/invoice/bullet_points/qna.yaml (+0.5)
17. compositional_skills/extraction/abstractive/main_takeaway/qna.yaml (+0.5)
18. compositional_skills/extraction/services_agreement/plain_text/qna.yaml (+0.42)
19. foundational_skills/reasoning/logical_reasoning/causal/qna.yaml (+0.33)
20. foundational_skills/reasoning/linguistics_reasoning/object_identification/qna.yaml (+0.33)
21. compositional_skills/extraction/annual_report/plain_text/qna.yaml (+0.33)
22. compositional_skills/linguistics/summarization/list_of_sentences/qna.yaml (+0.33)
23. compositional_skills/extraction/email/reasoning/qna.yaml (+0.33)
24. compositional_skills/extraction/fda_filing/reasoning/qna.yaml (+0.33)
25. compositional_skills/extraction/technical_paper/equations/csv/qna.yaml (+0.25)
26. compositional_skills/linguistics/bullet_lists/qna.yaml (+0.2)
27. foundational_skills/reasoning/mathematical_reasoning/qna.yaml (+0.17)
28. compositional_skills/extraction/technical_paper/equations/plain_text/qna.yaml (+0.17)
29. compositional_skills/STEM/math/mensurational/qna.yaml (+0.17)
30. compositional_skills/STEM/math/arithmetic_reasoning/qna.yaml (+0.17)
31. compositional_skills/extraction/annual_report/bullet_points/qna.yaml (+0.17)
32. compositional_skills/linguistics/rhyming_words/qna.yaml (+0.17)
33. compositional_skills/linguistics/organize_lists/qna.yaml (+0.17)
34. compositional_skills/roleplay/explain_like_you_are/non_fictional/popular_personalities/qna.yaml (+0.06)
### REGRESSIONS:
1. compositional_skills/writing/grounded/summarization/wiki_insights/five_point/qna.yaml (-2.5)
2. compositional_skills/extraction/email/markdown/qna.yaml (-1.33)
3. compositional_skills/extraction/annual_report/markdown/qna.yaml (-1.0)
4. compositional_skills/STEM/science/geography/qna.yaml (-1.0)
5. compositional_skills/extraction/invoice/reasoning/qna.yaml (-0.83)
6. compositional_skills/extraction/commercial_lease_agreement/csv/qna.yaml (-0.83)
7. compositional_skills/extraction/invoice/csv/qna.yaml (-0.8)
8. compositional_skills/general/tables/editing/combining_altering/qna.yaml (-0.67)
9. compositional_skills/extraction/fda_filing/plain_text/qna.yaml (-0.67)
10. compositional_skills/extraction/receipt/bullet_points/qna.yaml (-0.6)
11. compositional_skills/extraction/commercial_lease_agreement/reasoning/qna.yaml (-0.5)
12. compositional_skills/STEM/math/time_series/qna.yaml (-0.37)
13. compositional_skills/general/tables/empty/qna.yaml (-0.33)
14. compositional_skills/extraction/technical_paper/tables/plain_text/qna.yaml (-0.33)
15. compositional_skills/extraction/technical_paper/abstract/reasoning/qna.yaml (-0.33)
16. compositional_skills/writing/freeform/jokes/puns/general/qna.yaml (-0.31)
17. compositional_skills/extraction/commercial_lease_agreement/plain_text/qna.yaml (-0.27)
18. foundational_skills/reasoning/logical_reasoning/general/qna.yaml (-0.23)
19. knowledge/textbook/history/ibm_history/qna.yaml (-0.17)
20. compositional_skills/extraction/receipt/reasoning/qna.yaml (-0.17)
21. compositional_skills/extraction/technical_paper/equations/reasoning/qna.yaml (-0.17)
22. compositional_skills/STEM/math/area/qna.yaml (-0.17)
23. compositional_skills/writing/freeform/grammar/basic_grammer_tests/qna.yaml (-0.17)
24. compositional_skills/linguistics/summarization/ignore_pii/qna.yaml (-0.17)
25. compositional_skills/extraction/technical_paper/abstract/markdown/qna.yaml (-0.17)
26. compositional_skills/extraction/technical_paper/abstract/plain_text/qna.yaml (-0.17)
27. compositional_skills/extraction/fda_filing/bullet_points/qna.yaml (-0.17)
28. compositional_skills/STEM/math/reasoning/qna.yaml (-0.17)
29. compositional_skills/roleplay/explain_like_you_are/abstract/qna.yaml (-0.1)
30. compositional_skills/STEM/math/distance_conversion/qna.yaml (-0.08)
31. compositional_skills/extraction/technical_paper/equations/markdown/qna.yaml (-0.08)
### NO CHANGE:
1. compositional_skills/extraction/invoice/plain_text/qna.yaml
2. compositional_skills/extraction/receipt/csv/qna.yaml
3. compositional_skills/extraction/technical_paper/tables/reasoning/qna.yaml
4. compositional_skills/roleplay/explain_like_you_are/non_fictional/historical_figures/qna.yaml
5. compositional_skills/linguistics/classification/agent_classification/qna.yaml
6. compositional_skills/writing/freeform/poetry/ballad/qna.yaml
7. compositional_skills/writing/freeform/emoji/qna.yaml
8. compositional_skills/linguistics/reversing_string/qna.yaml
9. compositional_skills/writing/freeform/technical/proposal/qna.yaml
10. compositional_skills/writing/freeform/poetry/ode/qna.yaml
11. compositional_skills/extraction/technical_paper/equations/bullet_points/qna.yaml
12. compositional_skills/linguistics/complete_common_expressions/qna.yaml
13. compositional_skills/extraction/abstractive/title/qna.yaml
14. compositional_skills/linguistics/word_gen/qna.yaml
15. compositional_skills/roleplay/explain_like_i_am/primary_schooler/qna.yaml
16. compositional_skills/STEM/math/pattern_recognition/qna.yaml
17. compositional_skills/STEM/science/units_conversion/temperature_conversion/qna.yaml
18. compositional_skills/extraction/technical_paper/abstract/csv/qna.yaml
19. compositional_skills/writing/freeform/debate/qna.yaml
20. compositional_skills/writing/freeform/riddles/qna.yaml
21. compositional_skills/linguistics/jumbled_sentences/qna.yaml
22. compositional_skills/STEM/math/arithmetic_w_grammar/qna.yaml
23. compositional_skills/writing/grounded/summarization/wiki_insights/high_level_outline/qna.yaml
24. compositional_skills/writing/grounded/meeting_insights/action_items/qna.yaml
25. compositional_skills/roleplay/explain_like_you_are/fictional/movies/qna.yaml
26. compositional_skills/extraction/technical_paper/abstract/bullet_points/qna.yaml
27. foundational_skills/reasoning/temporal_reasoning/qna.yaml
28. knowledge/technical_manual/ibm_redbooks/qna.yaml
29. compositional_skills/extraction/email/bullet_points/qna.yaml
30. compositional_skills/STEM/science/units_conversion/distance_conversion/qna.yaml
31. compositional_skills/extraction/services_agreement/reasoning/qna.yaml
32. compositional_skills/linguistics/pattern_recognition/qna.yaml
33. compositional_skills/writing/freeform/poetry/freeverse/qna.yaml
34. compositional_skills/general/tables/editing/add_remove/qna.yaml
35. compositional_skills/extraction/inference/quantitative/asciidoc/tables/qna.yaml
36. compositional_skills/writing/grounded/meeting_insights/corporate_email/qna.yaml
37. compositional_skills/extraction/fda_filing/csv/qna.yaml
38. foundational_skills/reasoning/unconventional_reasoning/lower_score_wins/qna.yaml
39. compositional_skills/writing/freeform/poetry/narrative_poetry/qna.yaml
40. foundational_skills/reasoning/common_sense_reasoning/qna.yaml
41. compositional_skills/extraction/email/plain_text/qna.yaml
42. compositional_skills/writing/grounded/summarization/wiki_insights/concise/qna.yaml
43. compositional_skills/writing/freeform/social_media/linkedin/qna.yaml
44. compositional_skills/writing/freeform/poetry/epic_poetry/qna.yaml
45. compositional_skills/extraction/commercial_lease_agreement/markdown/qna.yaml
46. compositional_skills/roleplay/explain_like_i_am/graduate/qna.yaml
47. compositional_skills/roleplay/explain_like_you_are/fictional/video_games/qna.yaml
48. compositional_skills/roleplay/explain_like_you_are/fictional/tv_shows/qna.yaml
49. compositional_skills/writing/freeform/technical/product_description/qna.yaml
50. foundational_skills/reasoning/linguistics_reasoning/logical_sequence_of_words/qna.yaml
51. compositional_skills/extraction/technical_paper/tables/bullet_points/qna.yaml
52. compositional_skills/writing/freeform/technical/report/qna.yaml
53. compositional_skills/writing/freeform/legal/agreement/qna.yaml
54. compositional_skills/writing/freeform/social_media/instagram/qna.yaml
55. foundational_skills/reasoning/logical_reasoning/tabular/qna.yaml
56. compositional_skills/extraction/invoice/markdown/qna.yaml
57. foundational_skills/reasoning/linguistics_reasoning/odd_one_out/qna.yaml
58. compositional_skills/extraction/receipt/markdown/qna.yaml
59. compositional_skills/extraction/technical_paper/tables/csv/qna.yaml
60. compositional_skills/writing/grounded/meeting_insights/executive_summaries/qna.yaml
61. compositional_skills/writing/freeform/social_media/twitter/qna.yaml
62. compositional_skills/writing/freeform/poetry/haiku/qna.yaml
63. compositional_skills/writing/freeform/poetry/sonnet/qna.yaml
64. compositional_skills/writing/freeform/prose/articles/qna.yaml
65. compositional_skills/writing/grounded/summarization/wiki_insights/detailed/qna.yaml
66. compositional_skills/writing/freeform/prose/screenplay/qna.yaml
67. compositional_skills/extraction/abstractive/key_points/qna.yaml
68. compositional_skills/writing/freeform/brainstorming/idea_generation/qna.yaml
69. compositional_skills/writing/grounded/summarization/wiki_insights/one_line/qna.yaml
70. compositional_skills/extraction/information/named_entities/person_names/qna.yaml
71. compositional_skills/writing/freeform/poetry/limerick/qna.yaml
72. compositional_skills/writing/grounded/meeting_insights/minutes_of_meeting/qna.yaml
73. compositional_skills/writing/grounded/editing/grammar/qna.yaml
74. compositional_skills/writing/freeform/brainstorming/support_claim/qna.yaml
75. compositional_skills/writing/grounded/editing/punctuation/qna.yaml
76. compositional_skills/extraction/inference/quantitative/table_analaysis/qna.yaml
77. compositional_skills/writing/freeform/technical/guide/qna.yaml
78. compositional_skills/writing/freeform/legal/contracts/qna.yaml
79. compositional_skills/writing/freeform/prose/stories/qna.yaml
80. compositional_skills/writing/freeform/technical/specification/qna.yaml
81. compositional_skills/writing/freeform/prose/emails/formal/qna.yaml
82. compositional_skills/writing/freeform/social_media/facebook/qna.yaml
83. compositional_skills/writing/freeform/prose/emails/informal/qna.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment