Skip to content

Instantly share code, notes, and snippets.

View relyt0925's full-sized avatar

Tyler Lisowski relyt0925

View GitHub Profile
{"document": "The XYZ Corporation, founded in 1985, specializes in manufacturing widgets. Their production process involves three main stages: sourcing raw materials, assembly, and quality control. The company has a strong focus on sustainability, implementing eco-friendly practices throughout their supply chain. XYZ Corporation has a global presence with factories in Asia, Europe, and North America, employing over 10,000 people worldwide.", "question": "What are the main stages of XYZ Corporation's production process?", "Ground Truth": "The main stages of XYZ Corporation's production process are sourcing raw materials, assembly, and quality control."}
{"document": "The XYZ Corporation, founded in 1985, specializes in manufacturing widgets. Their production process involves three main stages: sourcing raw materials, assembly, and quality control. The company has a strong focus on sustainability, implementing eco-friendly practices throughout their supply chain. XYZ Corporation has a global presence with factories in Asia, Europe, and North America, employing over 10,000 people worldwide.", "question": "What are the main stages of XYZ Corporation's production process?", "Ground Truth": "The main stages of XYZ Corporation's production process are sourcing raw materials, assembly, and quality control.", "agent_rag_granite": "The main stages of XYZ Corporation's production process are:\n\n1. Accessing TLM and selecting the DB-SUSP-2ND-DAY-BAL category.\n2. Double-clicking the HOLDOVER-CC-ITEMS-C account set for a specific region.\n3. Collapsing all line items and filtering the Items by Status grid to view all exception line items with a value greater than 0.\n4
@relyt0925
relyt0925 / gist:57f9f8f3ff0f8ee4051afbd672180273
Created October 3, 2024 03:26
instructlab checkpoint restart training journal file
current_phase: train2
ended_at_utc: null
eval_1: null
eval_2: null
final_output: null
run_id: 4aec5660-3a1f-4323-b936-aa418f8a5a20
started_at_utc: 2024-10-03 02:21:56.341951+00:00
train_1:
checkpoints: /var/mnt/instg1/instructlab/.local/share/instructlab/phased/phase1/checkpoints
ended_at_utc: '2024-10-03 02:55:34.691335+00:00'
[root@tyler-fsdp-testing root]# ls -lh /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/checkpoint-11128/
total 101G
-rw-r--r--. 1 root root 789 Sep 4 04:36 config.json
-rw-r--r--. 1 root root 144 Sep 4 04:36 generation_config.json
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00001-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00002-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00003-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00004-of-00006.safetensors
-rw-r--r--. 1 root root 4.6G Sep 4 04:36 model-00005-of-00006.safetensors
-rw-r--r--. 1 root root 2.6G Sep 4 04:37 model-00006-of-00006.safetensors
[root@tyler-a100-newimage-val root]# /root/bin/ilab.sh --config /var/mnt/inststg1/instructlab/config.yaml model evaluate --model /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/hf_format/samples_25376/ --base-model /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ --benchmark mmlu_branch --tasks-dir /var/mnt/inststg1/instructlab/generated/node_datasets_2024-08-18T15_57_14/
Using local safetensors found at '/var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/hf_format/samples_25376/' for '--model'
INFO 2024-08-18 22:00:17,135 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-08-18 22:00:17,135 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-08-18 22:00:17,135 numexpr.utils:161: NumExpr defaulting to 16 threads.
INFO 2024-08-18 22:00:17,797 datasets:58: PyTorch version 2.3.1 available.
INFO 2024-08-18 22
This file has been truncated, but you can view the full file.
[root@tyler-a100-newimage-val instructlab]# nohup /root/bin/ilab.sh train --strategy lab-multiphase --phased-phase1-data /var/mnt/inststg1/instructlab/generated/knowledge_train_msgs_2024-08-18T15_57_14.jsonl --phased-phase2-data /var/mnt/inststg1/instructlab/generated/skills_train_msgs_2024-08-18T15_57_14.jsonl --phased-base-dir /var/mnt/inststg1/instructlab/phasedbasedir --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 --phased-mt-bench-judge /var/mnt/inststg1/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --max-batch-len 10000 --max-seq-len 4096 --phased-phase1-effective-batch-size 128 --phased-phase2-effective-batch-size 3840 --enable-serving-output --gpus 8 --skip-user-confirm --model-path /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ &
[root@tyler-a100-newimage-val instructlab]# cat nohup.out
time="2024-08-18T20:04:24Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
You are using an aliased command, this wi
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/checkpoints/compositional_skills_extraction_information_named_entities_places/data_checkpoint_34ae9efe032748c294999e937f44b437.jsonl
{"task_description":"","seed_context":"'Brian Patrick Kennedy( born 5 November 1961) is an Irish- born art museum director who has worked in Ireland and Australia, and now lives and works in the United States.\\n\\nHe is currently the director of the Peabody Essex Museum.\\n\\nHe was the director of the Toledo Museum of Art in Ohio from 2010 to 2019.\\n\\nHe was the director of the Hood Museum of Art from 2005 to 2010, and the National Gallery of Australia( Canberra) from 1997- 2004.\\nIan Barry is an Australian director of film and TV.\\nSaltwater is a 2000 Irish drama film written and directed by Conor McPherson.\\n\\nThe film stars Peter McDonald, Brian Cox, Conor Mullen, Laurence Kinlan, Brendan Gleeson and Eva Birthistle.\\n\\nThe film was released on September 29, 2000, by Buena Vista International
@relyt0925
relyt0925 / gist:fafbc33e9c8d0d77cdb8f74a3ef27ebe
Created August 18, 2024 19:30
knowledge checkpoint example
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/checkpoints/knowledge_compliance_personally-identifiable-information/data_checkpoint_0b9687e0abdd41f688fd204d84698410.jsonl
{"icl_document":"hii","raw_document":"# What is PII?\n\n## Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.\n\nAs people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.\n\nSharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers, such as serving up more relevant sear
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/knowledge_recipe_2024-08-17T15_42_00.yaml
datasets:
- path: node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_p07.jsonl
sampling_size: 1.0
metadata:
sys_prompt: "I am, Red Hat\xAE Instruct Model based on Granite 7B, an AI language\
\ model developed by Red Hat and IBM Research, based on the Granite-7b-base language\
\ model. My primary function is to be a chat assistant."
[root@tyler-a100 instructlab]# cat /var/mnt/inststg1/instructlab/generated/skills_recipe_2024-08-17T15_42_00.yaml
datasets:
- path: /usr/share/instructlab/sdg/datasets/skills.jsonl
sampling_size: 1.0
- path: node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_p10.jsonl
sampling_size: 1.0
- path: node_datasets_2024-08-17T15_42_00/compositional_skills_general_tables_editing_add_remove.jsonl
sampling_size: 30
- path: node_datasets_2024-08-17T15_42_00/compositional_skills_general_tables_editing_combining_altering.jsonl
sampling_size: 30