Skip to content

Instantly share code, notes, and snippets.

@philshem
Created April 16, 2019 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save philshem/e59388197fd9ddb7dcdb8098f9f0aaf2 to your computer and use it in GitHub Desktop.
Save philshem/e59388197fd9ddb7dcdb8098f9f0aaf2 to your computer and use it in GitHub Desktop.
Download top500 favicons from csv
import requests
import pandas as pd
import os
from io import StringIO
def request_function(domain):
domain = domain.replace('/','')
url = 'https://www.google.com/s2/favicons?domain=' + domain
fav = requests.get(url).content
with open('images'+os.sep+domain+'.png', 'wb') as handler:
handler.write(fav)
return
# top 500 websites from mozilla https://moz.com/top500
url = "https://moz.com:443/top500/domains/csv"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"}
req = requests.get(url, headers=headers)
data = StringIO(req.text)
df = pd.read_csv(data)
df.URL.apply(request_function)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment