Skip to content

Instantly share code, notes, and snippets.

@nousr
Created September 22, 2022 21:59
Show Gist options
  • Save nousr/21586fd3c794768e9c7d42760dbe4342 to your computer and use it in GitHub Desktop.
Save nousr/21586fd3c794768e9c7d42760dbe4342 to your computer and use it in GitHub Desktop.
Loading intelmpi version 2021.4.0
hosts gpu-st-p4d-24xlarge-340
go 1
/opt/slurm/bin/srun: line 27: [: too many arguments
cpu-bind=MASK - gpu-st-p4d-24xlarge-340, task 0 0 [40118]: mask 0xffffffffffff set
/usr/lib64/python3.7/runpy.py:125: RuntimeWarning: 'clip_retrieval.clip_inference.worker' found in sys.modules after import of package 'clip_retrieval.clip_inference', but prior to execution of 'clip_retrieval.clip_inference.worker'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
/usr/lib64/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
wandb:
wandb: Run history:
wandb: average_inference_duration_per_sample ▁
wandb: average_read_duration_per_sample ▁
wandb: average_total_duration_per_sample ▁
wandb: average_write_duration_per_sample ▁
wandb: sample_count ▁
wandb: sample_per_sec ▁
wandb: total_job_duration ▁
wandb:
wandb: Run summary:
wandb: average_inference_duration_per_sample 0.0007
wandb: average_read_duration_per_sample 0.00194
wandb: average_total_duration_per_sample 0.00265
wandb: average_write_duration_per_sample 0.0
wandb: sample_count 19118
wandb: sample_per_sec 1809.67095
wandb: total_job_duration 10.56435
wandb:
wandb: Synced glad-aardvark-16: https://wandb.ai/nousr_laion/clip_retrieval/runs/2j8sv9xm
wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20220922_213855-2j8sv9xm/logs
ERROR: Could not consume arg: s3
Usage: worker.py --input_dataset='['"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000000.tar '-'"'"',' ''"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000001.tar '-'"'"']' --output_folder=s3://s-laion/clip-h-embeddings-test --input_format=webdataset --cache_path=/fsx/nousr/.cache --batch_size=64 --num_prepro_workers=6 --enable_text=True --enable_image=True
For detailed information on this command, run:
worker.py --input_dataset='['"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000000.tar '-'"'"',' ''"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000001.tar '-'"'"']' --output_folder=s3://s-laion/clip-h-embeddings-test --input_format=webdataset --cache_path=/fsx/nousr/.cache --batch_size=64 --num_prepro_workers=6 --enable_text=True --enable_image=True --help
sample_per_sec 1809 ; sample_count 19118 srun: error: gpu-st-p4d-24xlarge-337: task 0: Exited with exit code 2
[I debug.cpp:47] [c10d] The debug level is set to INFO.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (/home/zion/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
/usr/lib64/python3.7/runpy.py:125: RuntimeWarning: 'clip_retrieval.clip_inference.worker' found in sys.modules after import of package 'clip_retrieval.clip_inference', but prior to execution of 'clip_retrieval.clip_inference.worker'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
[I debug.cpp:47] [c10d] The debug level is set to INFO.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (/home/zion/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
/usr/lib64/python3.7/runpy.py:125: RuntimeWarning: 'clip_retrieval.clip_inference.worker' found in sys.modules after import of package 'clip_retrieval.clip_inference', but prior to execution of 'clip_retrieval.clip_inference.worker'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
wandb: Currently logged in as: nousr_laion. Use `wandb login --relogin` to force relogin
[I debug.cpp:47] [c10d] The debug level is set to INFO.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (/home/zion/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
wandb: wandb version 0.13.3 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.21
wandb: Run data is saved locally in /fsx/nousr/clip-retrieval/wandb/run-20220922_213925-35qnxpnk
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run eager-lake-17
wandb: ⭐️ View project at https://wandb.ai/nousr_laion/clip_retrieval
wandb: 🚀 View run at https://wandb.ai/nousr_laion/clip_retrieval/runs/35qnxpnk
wandb: Waiting for W&B process to finish... (success).
wandb: - 0.002 MB of 0.002 MB uploaded (0.000 MB deduped)
wandb: \ 0.002 MB of 0.002 MB uploaded (0.000 MB deduped)
wandb: | 0.002 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: / 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: - 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: \ 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: | 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: / 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: - 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb: \ 0.004 MB of 0.004 MB uploaded (0.000 MB deduped)
wandb:
/usr/lib64/python3.7/runpy.py:125: RuntimeWarning: 'clip_retrieval.clip_inference.worker' found in sys.modules after import of package 'clip_retrieval.clip_inference', but prior to execution of 'clip_retrieval.clip_inference.worker'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
/usr/lib64/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
wandb:
wandb: Run history:
wandb: average_inference_duration_per_sample ▁
wandb: average_read_duration_per_sample ▁
wandb: average_total_duration_per_sample ▁
wandb: average_write_duration_per_sample ▁
wandb: sample_count ▁
wandb: sample_per_sec ▁
wandb: total_job_duration ▁
wandb:
wandb: Run summary:
wandb: average_inference_duration_per_sample 0.0007
wandb: average_read_duration_per_sample 0.00194
wandb: average_total_duration_per_sample 0.00265
wandb: average_write_duration_per_sample 0.0
wandb: sample_count 19118
wandb: sample_per_sec 1854.90355
wandb: total_job_duration 10.30674
wandb:
wandb: Synced eager-lake-17: https://wandb.ai/nousr_laion/clip_retrieval/runs/35qnxpnk
wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20220922_213925-35qnxpnk/logs
ERROR: Could not consume arg: s3
Usage: worker.py --input_dataset='['"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000000.tar '-'"'"',' ''"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000001.tar '-'"'"']' --output_folder=s3://s-laion/clip-h-embeddings-test --input_format=webdataset --cache_path=/fsx/nousr/.cache --batch_size=64 --num_prepro_workers=6 --enable_text=True --enable_image=True
For detailed information on this command, run:
worker.py --input_dataset='['"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000000.tar '-'"'"',' ''"'"'pipe:aws' s3 cp --quiet s3://s-datasets/laion5b/laion2B-data/000001.tar '-'"'"']' --output_folder=s3://s-laion/clip-h-embeddings-test --input_format=webdataset --cache_path=/fsx/nousr/.cache --batch_size=64 --num_prepro_workers=6 --enable_text=True --enable_image=True --help
sample_per_sec 1854 ; sample_count 19118 srun: error: gpu-st-p4d-24xlarge-340: task 0: Exited with exit code 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment