Skip to content

Instantly share code, notes, and snippets.

@gvanhorn38
Last active July 20, 2018 21:29
Show Gist options
  • Save gvanhorn38/ad8a47269cd9a378f372449285fc25fe to your computer and use it in GitHub Desktop.
Save gvanhorn38/ad8a47269cd9a378f372449285fc25fe to your computer and use it in GitHub Desktop.
Download Many Images Using wget and parallel

Assuming you have a file urls.txt that has, for each row, the name of file to save and the url to fetch, space separated. You can then use the following to download the urls.

parallel -j8 --colsep " " "wget -q -O {1} {2}" < urls.txt
parallel --eta -j4 --colsep " " "wget -q -N -t 1 -T 10 -O {1} {2}" < urls.txt

-q for quiet -N for timestamping -t for number of retries -T for timeout limit.

wget info page here

See here for notes on how to prep these images for model training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment