Skip to content

Instantly share code, notes, and snippets.

@windsting
Last active June 29, 2020 06:52
Show Gist options
  • Save windsting/6380fbdc1ecfef637e0777ec6f43adcf to your computer and use it in GitHub Desktop.
Save windsting/6380fbdc1ecfef637e0777ec6f43adcf to your computer and use it in GitHub Desktop.
grab image urls from google

execute these JavaScript snippet in a google image result page, you'll get those urls in a file named urls.txt:

// pull down jquery into the JavaScript console
var script = document.createElement('script');
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);

// grab the URLs
var urls = $('.rg_di .rg_meta').map(function(){return JSON.parse($(this).text()).ou;});

// write the URLs to file (one per line)
var textToSave = urls.toArray().join('\n');
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:attachment/text,' + encodeURI(textToSave);
hiddenElement.target = '_blank';
hiddenElement.download = 'urls.txt';
hiddenElement.click();

This python script can help you download each of the individual images:

# download_images.py
#import the necessary packages
from imutils import paths
import argparse
import requests
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-u", "--urls", required=True, help="path to file containg image URLs")
ap.add_argument("-o", "--output", required=True, help="path to ooutput directory of images")
args = vars(ap.parse_args())

# grab the list of URLs from the imput file, then initialize the 
# total number of images downloaded thus far
rows = open(args["urls"]).read().strip().split("\n")
total = 0

# loop the URLs
for url in rows:
    try:
        # try to download the image
        r = requests.get(url, timeout=60)
        
        # save the image to disk
        p = os.path.sep.join([args["output"], "{}.jpg".format(str(total).zfill(8))])
        f= open(p, "wb")
        f.write(r.content)
        f.close()
        
        # update the counter
        print("[INFO] downloaded: {} -- {}".format(p,url))
    
    # handle if any exceptions ar thrown during the download process
    except:
        print ("[INFO] error downloading {}--{}...skipping".format(p, url))
@lisawebcoder
Copy link

hello
what do you mean google image result page?
also are the links clickable?

Lisa

@windsting
Copy link
Author

hello
what do you mean google image result page?
also are the links clickable?

Lisa

Sorry for the late reply, the code above is obsolete now, due to Google changed their webpage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment