Skip to content

Instantly share code, notes, and snippets.

@kevinmelodi
Created September 13, 2023 01:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kevinmelodi/40cedd6bed1a4f4bead58b70ccebffa2 to your computer and use it in GitHub Desktop.
Save kevinmelodi/40cedd6bed1a4f4bead58b70ccebffa2 to your computer and use it in GitHub Desktop.
[prodigy-container] [2023-09-13 01:45:28] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:45:29] Start labeling: python scripts/3_start_prodigy_UI.py --dataset your_dataset_name --input_file your_input_file_path.jsonl
[prodigy-container] [2023-09-13 01:45:29] Prep new PDFs for labeling, run: python scripts/1_format_PDFs_to_label_format-DocAI.py --input_file customer-PDFs
[prodigy-container] [2023-09-13 01:45:29] Pull most recently formatted labeling data into prodigy, run: python scripts/2_load_data_to_prodigy.py
[prodigy-container] [2023-09-13 01:45:29] Export labels to GCS with python scripts/4_export_prodigy_to_gcs.py dataset_name
[prodigy-container] [2023-09-13 01:45:29] Create contexts with python scripts/5_create_contexts_from_label-OpenAI.py dataset_name.jsonl
[prodigy-container] [2023-09-13 01:45:30] 01:45:30: INIT: Setting all logging levels to 40
[prodigy-container] [2023-09-13 01:45:30] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:45:34] 01:45:34: CLI: limiting user sessions to list: user3
[prodigy-container] [2023-09-13 01:45:34] 01:45:34: RECIPE: Calling recipe 'stats'
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:35] ============================== ✨ Prodigy Stats ==============================
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:35] Version 1.13.2
[prodigy-container] [2023-09-13 01:45:35] Location /usr/local/lib/python3.9/site-packages/prodigy
[prodigy-container] [2023-09-13 01:45:35] Prodigy Home /root/.prodigy
[prodigy-container] [2023-09-13 01:45:35] Platform Linux-4.4.0-x86_64-with-glibc2.36
[prodigy-container] [2023-09-13 01:45:35] Python Version 3.9.18
[prodigy-container] [2023-09-13 01:45:35] Spacy Version 3.6.1
[prodigy-container] [2023-09-13 01:45:35] Database Name PostgreSQL
[prodigy-container] [2023-09-13 01:45:35] Database Id postgresql
[prodigy-container] [2023-09-13 01:45:35] Total Datasets 1
[prodigy-container] [2023-09-13 01:45:35] Total Sessions 150
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:35] ================================ ✨ Datasets ================================
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:35] dummy-data_ds
[prodigy-container] [2023-09-13 01:45:35]
[prodigy-container] [2023-09-13 01:45:41] The most recent file is customer-PDFs/dummy-data/prodigy-json/paragraph-data-for-prodigy_20230913012432.jsonl and it was last modified on 2023-09-13 01:26:18.360000+00:00
[prodigy-container] [2023-09-13 01:45:41] 2_load_data_to_prodigy.py finished
[prodigy-container] [2023-09-13 01:45:42] 01:45:42: INIT: Setting all logging levels to 40
[prodigy-container] [2023-09-13 01:45:42] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: CLI: limiting user sessions to list: user3
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: CLI: Importing file scripts/custom_task_template.py
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: RECIPE: Calling recipe 'custom-text-classification'
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: get_stream: Loading .jsonl file
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: get_stream: Rehashing stream
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: get_stream: Removing duplicates
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: VALIDATE: Validating components returned by recipe
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: CONTROLLER: Initialising from recipe
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: VALIDATE: Creating validator for view ID 'blocks'
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: VALIDATE: Validating Prodigy and recipe config
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: FILTER: Filtering duplicates from stream
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: FILTER: Filtering duplicates from stream
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: DB: Creating unstructured dataset '2023-09-13_01-45-46'
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: STREAM: Created queue for dummy-data_ds-user3.
[prodigy-container] [2023-09-13 01:45:46] 01:45:46: CORS: initialized with wildcard "*" CORS origins
[prodigy-container] [2023-09-13 01:46:08] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:46:08] Start labeling: python scripts/3_start_prodigy_UI.py --dataset your_dataset_name --input_file your_input_file_path.jsonl
[prodigy-container] [2023-09-13 01:46:08] Prep new PDFs for labeling, run: python scripts/1_format_PDFs_to_label_format-DocAI.py --input_file customer-PDFs
[prodigy-container] [2023-09-13 01:46:08] Pull most recently formatted labeling data into prodigy, run: python scripts/2_load_data_to_prodigy.py
[prodigy-container] [2023-09-13 01:46:08] Export labels to GCS with python scripts/4_export_prodigy_to_gcs.py dataset_name
[prodigy-container] [2023-09-13 01:46:08] Create contexts with python scripts/5_create_contexts_from_label-OpenAI.py dataset_name.jsonl
[prodigy-container] [2023-09-13 01:46:09] 01:46:09: INIT: Setting all logging levels to 40
[prodigy-container] [2023-09-13 01:46:09] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:46:13] 01:46:13: CLI: limiting user sessions to list: user3
[prodigy-container] [2023-09-13 01:46:13] 01:46:13: RECIPE: Calling recipe 'stats'
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:13] ============================== ✨ Prodigy Stats ==============================
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:13] Version 1.13.2
[prodigy-container] [2023-09-13 01:46:13] Location /usr/local/lib/python3.9/site-packages/prodigy
[prodigy-container] [2023-09-13 01:46:13] Prodigy Home /root/.prodigy
[prodigy-container] [2023-09-13 01:46:13] Platform Linux-4.4.0-x86_64-with-glibc2.36
[prodigy-container] [2023-09-13 01:46:13] Python Version 3.9.18
[prodigy-container] [2023-09-13 01:46:13] Spacy Version 3.6.1
[prodigy-container] [2023-09-13 01:46:13] Database Name PostgreSQL
[prodigy-container] [2023-09-13 01:46:13] Database Id postgresql
[prodigy-container] [2023-09-13 01:46:13] Total Datasets 1
[prodigy-container] [2023-09-13 01:46:13] Total Sessions 151
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:13] ================================ ✨ Datasets ================================
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:13] dummy-data_ds
[prodigy-container] [2023-09-13 01:46:13]
[prodigy-container] [2023-09-13 01:46:19] The most recent file is customer-PDFs/dummy-data/prodigy-json/paragraph-data-for-prodigy_20230913012432.jsonl and it was last modified on 2023-09-13 01:26:18.360000+00:00
[prodigy-container] [2023-09-13 01:46:19] 2_load_data_to_prodigy.py finished
[prodigy-container] [2023-09-13 01:46:19] 01:46:19: INIT: Setting all logging levels to 40
[prodigy-container] [2023-09-13 01:46:20] OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: CLI: limiting user sessions to list: user3
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: CLI: Importing file scripts/custom_task_template.py
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: RECIPE: Calling recipe 'custom-text-classification'
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: get_stream: Loading .jsonl file
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: get_stream: Rehashing stream
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: get_stream: Removing duplicates
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: VALIDATE: Validating components returned by recipe
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: CONTROLLER: Initialising from recipe
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: VALIDATE: Creating validator for view ID 'blocks'
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: VALIDATE: Validating Prodigy and recipe config
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: FILTER: Filtering duplicates from stream
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: FILTER: Filtering duplicates from stream
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: DB: Creating unstructured dataset '2023-09-13_01-46-23'
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: STREAM: Created queue for dummy-data_ds-user3.
[prodigy-container] [2023-09-13 01:46:23] 01:46:23: CO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment