Skip to content

Instantly share code, notes, and snippets.

@slavakurilyak
Last active April 18, 2024 03:11
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save slavakurilyak/ea12b15d21ce9722fa98143f5392488a to your computer and use it in GitHub Desktop.
Save slavakurilyak/ea12b15d21ce9722fa98143f5392488a to your computer and use it in GitHub Desktop.
Download images stored as URLs from a CSV file

Download images stored as URLs from a CSV file

Dealing with a image dataset? Dealing with CSVs intead of JPGs? Use this script to download images from a CSV file, which were originally stored as URLs.

Usage

To download full resolution images, type:

$ python download-images-from-csv.py <csv_filename>

To download thumbnail images, type:

$ python download-thumbnails-from-csv.py <csv_filename>

Examples

$ python download-images-from-csv.py images

Assuming images.csv has the following columns:

  • Image Name (ImageID) in column 1
  • Full Resolution URL (OriginalURL) in column 3
$ python download-thumbnails-from-csv.py images

Assuming images.csv has the following columns:

  • Image Name (ImageID) in column 1
  • Thumbnail URL (Thumbnail300KURL) in column 11

Results

Full resolution images are stored into fullres folder, as <ImageID>.jpg

Thumbnail images are stored into thumbnails folder, as <ImageID>.jpg

Inspired By

## Assuming a csv file has:
## Image Name (ImageID) in column 1 (line[0])
## Full Resolution URL (OriginalURL) in column 3 (line[2])
import sys
import urllib
from csv import reader
import os.path
csv_filename = sys.argv[1]
with open(csv_filename+".csv".format(csv_filename), 'r') as csv_file:
for line in reader(csv_file):
if os.path.isfile("fullres/" + line[0] + ".jpg"):
print "Image skipped for {0}".format(line[0])
else:
if line[2] != '' and line[0] != "ImageID":
urllib.urlretrieve(line[2], "fullres/" + line[0] + ".jpg")
print "Image saved for {0}".format(line[0])
else:
print "No result for {0}".format(line[0])
## Assuming a csv file has:
## Image Name (ImageID) in column 1 (line[0])
## Thumbnail URL (Thumbnail300KURL) in column 11 (line[10])
import sys
import urllib
from csv import reader
import os.path
csv_filename = sys.argv[1]
with open(csv_filename+".csv".format(csv_filename), 'r') as csv_file:
for line in reader(csv_file):
if os.path.isfile("thumbnails/" + line[0] + ".jpg"):
print "Image skipped for {0}".format(line[0])
else:
if line[10] != '' and line[0] != "ImageID":
urllib.urlretrieve(line[10], "thumbnails/" + line[0] + ".jpg")
print "Image saved for {0}".format(line[0])
else:
print "No result for {0}".format(line[0])
@polkunus
Copy link

Traceback (most recent call last):
File "download-images-from-csv.py", line 18, in
urllib.urlretrieve(line[2], "fullres/" + line[0] + ".jpg")
File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.7/urllib.py", line 248, in retrieve
fp = self.open(url, data)
File "/usr/lib/python2.7/urllib.py", line 216, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 472, in open_file
return self.open_local_file(url)
File "/usr/lib/python2.7/urllib.py", line 486, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'Handle ID'

@mritterhoff
Copy link

Thanks for this! I got it working for python3 but had to change instances of urllib to urllib.request, fwiw.

@jordanboston
Copy link

Hi, I set this up, and it runs with python3, but I never get any files. 😃
Any thoughts as to what I'm missing? The printed output shows it having worked, but no files end up in any dir here.

## Image (URL) in column 1 (line[0])

import sys
import urllib.request
from csv import reader
import os

csv_filename = sys.argv[1]

with open(csv_filename+".csv".format(csv_filename), 'r') as csv_file:
    os.mkdir(csv_filename + '/')
    os.chdir(csv_filename)
    for line in reader(csv_file):
        if line[0] != "URL" and line[1] != 'Place':
            print('Attempting to download: ' + line[0])
            urllib.request.urlretrieve(line[0])
            print('Image saved for '+ line[0])
        else:
            print("No result for " + line[0])

@freunddd
Copy link

freunddd commented Jul 1, 2022

For python3 use:

`import sys
from csv import reader
import os.path
import urllib.request

csv_filename = sys.argv[1]

with open(csv_filename+".csv".format(csv_filename), 'r') as csv_file:
for line in reader(csv_file):
if os.path.isfile("fullres/" + line[0] + ".jpg"):
print ("Image skipped for {0}".format(line[0]))
else:
if line[2] != '' and line[0] != "ImageID":
urllib.request.urlretrieve(line[2], "fullres/" + line[0] + ".jpg")
print ("Image saved for {0}".format(line[0]))
else:
print ("No result for {0}".format(line[0]))`

@jordanboston
Copy link

Cool, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment