Skip to content

Instantly share code, notes, and snippets.

💭
🐙

@philshem philshem

💭
🐙
Block or report user

Report or block philshem

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View play_.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / scrape_bee.py
Last active Nov 4, 2019
NYTimes Spelling Bee scraper 🐝☠️
View scrape_bee.py
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import json
def main():
# the answers are stored as a json inside the page source
url = 'https://www.nytimes.com/puzzles/spelling-bee'
@philshem
philshem / swiss_housing_dataviz.ipynb
Created Nov 3, 2019
swiss_housing_dataviz.ipynb
View swiss_housing_dataviz.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / get_nobel_prize_years_until_death.py
Last active Oct 8, 2019
Nobel prize winners and their years until death.
View get_nobel_prize_years_until_death.py
#/usr/bin/python3
# gets demographics for nobel prize winners
# calculates yearly average of how many years between prize and death
import pandas as pd
import numpy as np
# api endpoint for all nobel winnters: https://nobelprize.readme.io/
url = 'http://api.nobelprize.org/v1/laureate.csv'
@philshem
philshem / cadima_clean_metadata.py
Last active May 21, 2019
Python3 script to clean non-ascii characters from the PDF "Title" metadata field.
View cadima_clean_metadata.py
# requires python3.x and one non-standard module `pip install pdfrw`
# pdfs should be in folder relative to this code, named `pdfs`
import os
from pdfrw import PdfReader, PdfWriter
from glob import glob
import unicodedata
def edit_title_metadata(inpdf):
View Anadon-2011-Scientific Opinion on the safety e.pdf
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / get_top500_favicons.py
Created Apr 16, 2019
Download top500 favicons from csv
View get_top500_favicons.py
import requests
import pandas as pd
import os
from io import StringIO
def request_function(domain):
domain = domain.replace('/','')
url = 'https://www.google.com/s2/favicons?domain=' + domain
fav = requests.get(url).content
with open('images'+os.sep+domain+'.png', 'wb') as handler:
@philshem
philshem / clean_AcronymFile.csv
Last active Apr 13, 2019
cleanup script and csv file (needs some cleaning) based on https://github.com/krishnakt031990/Crawl-Wiki-For-Acronyms
View clean_AcronymFile.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 1 column, instead of 2. in line 6.
acronym definition
0D Zero-dimensional
1AM Air mechanic 1st class
1D One-dimensional
2AM Air mechanic 2nd class
2D Two-dimensional
2G Second-generation mobile (cellular, wireless) telephone system
2LA Two letter acronym
2Lt 2nd lieutenant
3AM Air mechanic 3rd class
View clean_AcronymsCSV.py
#!/usr/bin/env python
# coding=utf-8
ignore_list = ('Search-Navigation','Tools-What links','Top-','Contents','Magyar')
with open('AcronymsFile.csv','r') as inp:
data = inp.read().split('\n')
with open('clean_AcronymsFile.csv','w') as out:
out.write('acronym'+'\t'+'definition'+'\n')
@philshem
philshem / follower_count_201904.csv
Last active Apr 7, 2019
twitter follower counts for swiss media (04.2019)
View follower_count_201904.csv
handle count
@20min 394346
@nzz 393044
@blickch 249295
@blickamabend 183293
@tagesanzeiger 178033
@watson_news 118861
@Lematinch 102817
@tdgch 88920
@24heuresch 66644
You can’t perform that action at this time.