Skip to content

Instantly share code, notes, and snippets.

@Willshaw
Last active April 12, 2016 11:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Willshaw/2eb8d515f4cf3ce9f08c8fcf8b44bf8b to your computer and use it in GitHub Desktop.
Save Willshaw/2eb8d515f4cf3ce9f08c8fcf8b44bf8b to your computer and use it in GitHub Desktop.
Download a load of files from the web and push to an s3 bucket
#!/bin/bash
#
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# Version 2, December 2004
# Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
# Everyone is permitted to copy and distribute verbatim or modified
# copies of this license document, and changing it is allowed as long
# as the name is changed.
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
# 0. You just DO WHAT THE FUCK YOU WANT TO.
#
# USAGE
#
# You need the s3cmd cli tool for this script to work, and make sure it works - http://s3tools.org/s3cmd
# type `s3cmd ls` and if you can see your buckets, you're ok
#
# the "img.dat" file that the while loop reads needs to just contain a list of http:// links to images
# e.g.
# http://example.com/imageOfCat.png
# http://example.com/photoOfASlipper.jpg
# http://example.com/pictureOfCatEnjoyingBeingInASlipper.pdf
#
# Holy Batman's spectacles - there's actually no reason why this has to be limited to images,
# but I've named it now, so I'm not changing it. Should work for any file type
#
# oh actually sure, I've renamed it, that didn't take long at all.
#
s3path=$1
[[ -z "$s3path" ]] && { echo "Where the hell am I supposed to put your stuff? I need a bucket path man" ; exit 1; }
# read file line by line (file of http://place/where/some/images/be/at.png
while read line
do
# reverse line, get 1st part of it (Delimiter as /, so you get gnp.ta) then re-reverse it to get image name
file="$(echo $line | rev | cut -d'/' -f 1 | rev)"
echo sort this file : $file
# download the file, overwriting with file name (to stop at.png.1, at.png.2 etc)
wget $line -O $file
# push to s3
s3cmd put $file $s3path$file
# get rid of the old files, just to be clean and tidy
echo clean up and remove old files
rm $file
done < img.dat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment