Skip to content

Instantly share code, notes, and snippets.

@xrsrke
Created February 23, 2024 11:15
Show Gist options
  • Save xrsrke/d5fca230d7030e1b11281e768437414a to your computer and use it in GitHub Desktop.
Save xrsrke/d5fca230d7030e1b11281e768437414a to your computer and use it in GitHub Desktop.
s3 streaming logs
02/23/2024 11:12:35 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1051776
02/23/2024 11:12:35 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=5
02/23/2024 11:12:35 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=0
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=3
02/23/2024 11:12:35 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=40
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=20
02/23/2024 11:12:35 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=0
02/23/2024 11:12:35 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243925
02/23/2024 11:12:35 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=2
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=1
02/23/2024 11:12:35 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=0
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=1
02/23/2024 11:12:35 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=3
02/23/2024 11:12:35 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=0
02/23/2024 11:12:35 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=854568
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=3
02/23/2024 11:12:35 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=0
02/23/2024 11:12:35 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=7
02/23/2024 11:12:35 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=2
02/23/2024 11:12:36 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=4
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=0
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=2
02/23/2024 11:12:36 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=569712
02/23/2024 11:12:36 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5407409
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=0
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=5
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: ################# epoch = 0
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]:
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]:
02/23/2024 11:12:36 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=481068
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=70
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=8
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: ################# epoch = 0
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]:
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]:
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=6
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=50
02/23/2024 11:12:36 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=396408
02/23/2024 11:12:36 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=493386
02/23/2024 11:12:36 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=3
02/23/2024 11:12:36 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=12
02/23/2024 11:12:41 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=4
02/23/2024 11:12:41 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=7
02/23/2024 11:12:41 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=343300
02/23/2024 11:12:41 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=10
02/23/2024 11:12:41 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=4
02/23/2024 11:12:41 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=415426
02/23/2024 11:12:41 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=5
02/23/2024 11:12:41 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=910867
02/23/2024 11:12:46 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=120
02/23/2024 11:12:46 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=3
02/23/2024 11:12:46 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7154691
02/23/2024 11:12:46 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=20
02/23/2024 11:12:46 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5900469
02/23/2024 11:12:46 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=5
02/23/2024 11:12:46 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=9
02/23/2024 11:12:46 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=6
02/23/2024 11:12:46 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=979219
02/23/2024 11:12:46 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=538374
02/23/2024 11:12:51 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=100
02/23/2024 11:12:51 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=11
02/23/2024 11:12:52 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=9
02/23/2024 11:12:52 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1205193
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=454606
02/23/2024 11:12:52 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=17
02/23/2024 11:12:52 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=652813
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243921
02/23/2024 11:12:52 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=7
02/23/2024 11:12:52 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=17
02/23/2024 11:12:52 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7894782
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=90
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: ################# epoch = 0
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]:
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]:
02/23/2024 11:12:52 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=8
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=15
02/23/2024 11:12:52 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=8
02/23/2024 11:12:52 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=14
02/23/2024 11:12:52 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=1080511
02/23/2024 11:12:57 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=5
02/23/2024 11:12:57 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=8
02/23/2024 11:12:57 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=8209122
02/23/2024 11:12:57 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=6
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=476650
02/23/2024 11:12:58 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=7
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=632478
02/23/2024 11:12:58 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=13
02/23/2024 11:12:58 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5407413
02/23/2024 11:12:58 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=416618
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1051777
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]: ################# epoch = 0
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]:
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]:
02/23/2024 11:12:58 [INFO|DP=0|PP=0|TP=0]: ################# epoch = 1
02/23/2024 11:12:58 [INFO|DP=0|PP=0|TP=0]:
02/23/2024 11:12:58 [INFO|DP=0|PP=0|TP=0]:
################# epoch = 1
02/23/2024 11:12:58 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=20
02/23/2024 11:12:58 [INFO|DP=1|PP=0|TP=0]: ################# epoch = 1
02/23/2024 11:12:58 [INFO|DP=1|PP=0|TP=0]:
02/23/2024 11:12:58 [INFO|DP=1|PP=0|TP=0]:
################# epoch = 1
02/23/2024 11:12:58 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=170
02/23/2024 11:12:58 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243933
02/23/2024 11:12:58 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=23
02/23/2024 11:12:58 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=150
02/23/2024 11:12:58 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=313973
02/23/2024 11:13:03 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=13
02/23/2024 11:13:03 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=25
02/23/2024 11:13:03 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=854569
02/23/2024 11:13:03 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=14
02/23/2024 11:13:03 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=29
02/23/2024 11:13:03 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5900472
02/23/2024 11:13:08 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=10
02/23/2024 11:13:08 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=12
02/23/2024 11:13:08 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=993922
02/23/2024 11:13:08 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=569713
02/23/2024 11:13:08 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=11
02/23/2024 11:13:08 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=9
02/23/2024 11:13:09 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=16
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=454230
02/23/2024 11:13:09 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=10
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=374603
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7154698
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=551239
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243930
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=362544
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=979220
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1205194
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=140
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: ################# epoch = 1
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]:
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]:
################# epoch = 1
02/23/2024 11:13:09 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1329860
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=19
02/23/2024 11:13:09 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=8
02/23/2024 11:13:10 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7894786
02/23/2024 11:13:10 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=24
02/23/2024 11:13:10 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=501216
02/23/2024 11:13:10 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=13
02/23/2024 11:13:10 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=9
02/23/2024 11:13:10 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=11
02/23/2024 11:13:10 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=8209126
02/23/2024 11:13:15 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=1123533
02/23/2024 11:13:15 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=740080
02/23/2024 11:13:15 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5407418
02/23/2024 11:13:15 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1382810
02/23/2024 11:13:15 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=910868
02/23/2024 11:13:15 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=493387
02/23/2024 11:13:15 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5900475
02/23/2024 11:13:15 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=749022
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=807562
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=481069
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: ################# epoch = 1
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]:
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]:
################# epoch = 1
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=200
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: ################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]:
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]:
################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=220
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=342602
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: ################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]:
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]:
################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243942
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=27
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=37
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=18
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=20
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1051779
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243939
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=34
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=17
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=396409
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=854570
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=14
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=362545
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=13
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=190
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: ################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]:
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]:
################# epoch = 2
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=31
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1205195
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=16
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=32
02/23/2024 11:13:16 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=13
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7154705
02/23/2024 11:13:16 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=12
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=26
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=979221
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=12
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=652814
02/23/2024 11:13:16 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=11
02/23/2024 11:13:16 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=415427
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=15
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=551240
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5407422
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7894789
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=740081
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=720341
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=910869
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=416619
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5900478
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=608260
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=993923
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=538375
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=8209129
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1382811
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243948
02/23/2024 11:13:17 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=1123534
02/23/2024 11:13:17 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=854571
02/23/2024 11:13:22 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243950
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]: ################# epoch = 2
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]:
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]:
################# epoch = 2
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1051780
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: ################# epoch = 3
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]:
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]:
################# epoch = 302/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=270
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]: ################# epoch = 3
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]:
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]:
################# epoch = 3
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=250
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=569714
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=240
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: ################# epoch = 3
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]:
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]:
################# epoch = 3
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=17
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=34
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=15
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=481070
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=37
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=42
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=33
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1051781
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=46
02/23/2024 11:13:23 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=22
02/23/2024 11:13:23 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=23
02/23/2024 11:13:23 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7154711
02/23/2024 11:13:23 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=21
02/23/2024 11:13:28 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=16
02/23/2024 11:13:28 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=41
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1205196
02/23/2024 11:13:28 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=15
02/23/2024 11:13:28 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=14
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=979222
02/23/2024 11:13:28 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=19
02/23/2024 11:13:28 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-ultrachat/standard/000_stories-ultrachat.ds at index=25
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=454231
02/23/2024 11:13:28 [INFO|DP=0|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=20
02/23/2024 11:13:28 [INFO|DP=1|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-python/standard/000_amt-python.ds at index=16
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=7894792
02/23/2024 11:13:28 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5407426
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=1329861
02/23/2024 11:13:28 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=1080512
02/23/2024 11:13:28 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=343301
02/23/2024 11:13:34 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=8209133
02/23/2024 11:13:34 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/jupyter/standard/000_jupyter.ds at index=910870
02/23/2024 11:13:34 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/stories-openhermes/standard/000_stories-openhermes.ds at index=521173
02/23/2024 11:13:34 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=5900481
02/23/2024 11:13:34 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=454607
02/23/2024 11:13:34 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=807563
02/23/2024 11:13:34 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-khan/standard/000_amt-khan.ds at index=632479
02/23/2024 11:13:34 [INFO|DP=2|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/amt-web/standard/000_amt-web.ds at index=854572
02/23/2024 11:13:34 [INFO|DP=3|PP=0|TP=0]: requesting new stream from self.s3_path=s3://huggingface-llm-datasets/synthetic-data/tokenization_per_source_v1/textbooks/standard/000_textbooks.ds at index=6243959
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment