Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save lisawebcoder/a804124ff1e74c2f0fe120f694cad298 to your computer and use it in GitHub Desktop.
Save lisawebcoder/a804124ff1e74c2f0fe120f694cad298 to your computer and use it in GitHub Desktop.
grab image urls from google

execute these JavaScript snippet in a google image result page, you'll get those urls in a file named urls.txt:

// pull down jquery into the JavaScript console
var script = document.createElement('script');
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);

// grab the URLs
var urls = $('.rg_di .rg_meta').map(function(){return JSON.parse($(this).text()).ou;});

// write the URLs to file (one per line)
var textToSave = urls.toArray().join('\n');
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:attachment/text,' + encodeURI(textToSave);
hiddenElement.target = '_blank';
hiddenElement.download = 'urls.txt';
hiddenElement.click();

This python script can help you download each of the individual images:

# download_images.py
#import the necessary packages
from imutils import paths
import argparse
import requests
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-u", "--urls", required=True, help="path to file containg image URLs")
ap.add_argument("-o", "--output", required=True, help="path to ooutput directory of images")
args = vars(ap.parse_args())

# grab the list of URLs from the imput file, then initialize the 
# total number of images downloaded thus far
rows = open(args["urls"]).read().strip().split("\n")
total = 0

# loop the URLs
for url in rows:
    try:
        # try to download the image
        r = requests.get(url, timeout=60)
        
        # save the image to disk
        p = os.path.sep.join([args["output"], "{}.jpg".format(str(total).zfill(8))])
        f= open(p, "wb")
        f.write(r.content)
        f.close()
        
        # update the counter
        print("[INFO] downloaded: {} -- {}".format(p,url))
    
    # handle if any exceptions ar thrown during the download process
    except:
        print ("[INFO] error downloading {}--{}...skipping".format(p, url))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment