Skip to content

Instantly share code, notes, and snippets.

@ParkerSaint
Created January 27, 2019 03:08
Show Gist options
  • Save ParkerSaint/5e6b45748bbe7d45d66bd20a2ffd4aa4 to your computer and use it in GitHub Desktop.
Save ParkerSaint/5e6b45748bbe7d45d66bd20a2ffd4aa4 to your computer and use it in GitHub Desktop.
URL Scraper (?) | I use this to download images from sites that have a "random image" feature, but the image directory is forbidden (403). It is easy to find the array of images if you look in the JS file of a website.
import requests
import shutil
import os
log = open("log.txt","a+")
l = input('enter array without brackets (example: "item1","item2","item3"): ')
root = input('enter root url (example: https://example.com/example): ')
path = input('enter the path of the directory to save the files in (example: /path/to/file/): ')
l = l.replace('"', '').split(",")
log.write("\n\nroot = " + root + "\r\n" + "list = " + str(l) + "\r\n" + "path = " + path +"\r\n")
log.close()
for i in range(len(l)):
url = root + l[i]
print(url)
r = requests.get(url, stream=True, headers={'User-agent': 'Mozilla/5.0'})
if r.status_code == 200:
with open(path + l[i], 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment