Skip to content

Instantly share code, notes, and snippets.

@vdalv
Created April 29, 2018 08:30
Show Gist options
  • Save vdalv/236981205d8b09af0ae7e1e31ab42dcf to your computer and use it in GitHub Desktop.
Save vdalv/236981205d8b09af0ae7e1e31ab42dcf to your computer and use it in GitHub Desktop.

Bulk Downloading With The Ability to Specify Individual Filenames

After a few hours of searching, I've finally found a convenient way to download a large amount of files in a bulk/multi-threaded/parallel manner, while still having the ability to specify the saved files' names.

Many thanks to Diego Torres Milano

The input file (dl_data.txt):

bird_4345_543.jpg https://example.com/pictures/5351/image.jpg
bird_4345_544.jpg https://example.com/5352/pictures/image.jpg
bird_12950_3912.jpg https://example.com/6593/pictures/image.jpg
...

The bash script:

function mywget()
{
    IFS=' ' read -a myarray <<< "$1"
    wget -O "Birds/${myarray[0]}" "${myarray[1]}"
}
export -f mywget
xargs -P 5 -n 1 -I {} bash -c "mywget '{}'" < "dl_data.txt"

Info:

IFS=' ' read -a myarray <<< "$1" - Splits the line ($1), using my delimiter (a space)

wget -O "Birds/${myarray[0]}" "${myarray[1]}" - Download statement, with our specified save location and the file's URL

xargs -P 5 -n 1 -I {} bash -c "mywget '{}'" < "dl_data.txt":

-P 5 - This specifies that we want 5 wget processes, maximum, running simultaneously

-n 1 - Read 1 line of the input file at a time

"dl_data.txt" - The input file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment