Skip to content

Instantly share code, notes, and snippets.

View dandelin's full-sized avatar
📚
Reading Papers

Wonjae Kim dandelin

📚
Reading Papers
View GitHub Profile
@dandelin
dandelin / convert_pyarrow.py
Created April 6, 2022 09:09 — forked from csarron/convert_pyarrow.py
pip install pyarrow fire tqdm
"""
crawl images:
pip install img2dataset==1.11.0
img2dataset --url_list cc3m.tsv\
--output_folder cc3m-img --input_format "tsv"\
--url_col "url" --caption_col "caption"\
--output_format files --resize_mode=no\
--processes_count 10 --thread_count 64 --number_sample_per_shard 2000\
--enable_wandb True --save_metadata False