Skip to content

Instantly share code, notes, and snippets.

💭
🐙

@philshem philshem

💭
🐙
Block or report user

Report or block philshem

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@philshem
philshem / cadima_clean_metadata.py
Last active May 21, 2019
Python3 script to clean non-ascii characters from the PDF "Title" metadata field.
View cadima_clean_metadata.py
# requires python3.x and one non-standard module `pip install pdfrw`
# pdfs should be in folder relative to this code, named `pdfs`
import os
from pdfrw import PdfReader, PdfWriter
from glob import glob
import unicodedata
def edit_title_metadata(inpdf):
View Anadon-2011-Scientific Opinion on the safety e.pdf
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / get_top500_favicons.py
Created Apr 16, 2019
Download top500 favicons from csv
View get_top500_favicons.py
import requests
import pandas as pd
import os
from io import StringIO
def request_function(domain):
domain = domain.replace('/','')
url = 'https://www.google.com/s2/favicons?domain=' + domain
fav = requests.get(url).content
with open('images'+os.sep+domain+'.png', 'wb') as handler:
@philshem
philshem / clean_AcronymFile.csv
Last active Apr 13, 2019
cleanup script and csv file (needs some cleaning) based on https://github.com/krishnakt031990/Crawl-Wiki-For-Acronyms
View clean_AcronymFile.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 1 column, instead of 2. in line 6.
acronym definition
0D Zero-dimensional
1AM Air mechanic 1st class
1D One-dimensional
2AM Air mechanic 2nd class
2D Two-dimensional
2G Second-generation mobile (cellular, wireless) telephone system
2LA Two letter acronym
2Lt 2nd lieutenant
3AM Air mechanic 3rd class
View clean_AcronymsCSV.py
#!/usr/bin/env python
# coding=utf-8
ignore_list = ('Search-Navigation','Tools-What links','Top-','Contents','Magyar')
with open('AcronymsFile.csv','r') as inp:
data = inp.read().split('\n')
with open('clean_AcronymsFile.csv','w') as out:
out.write('acronym'+'\t'+'definition'+'\n')
@philshem
philshem / follower_count_201904.csv
Last active Apr 7, 2019
twitter follower counts for swiss media (04.2019)
View follower_count_201904.csv
handle count
@20min 394346
@nzz 393044
@blickch 249295
@blickamabend 183293
@tagesanzeiger 178033
@watson_news 118861
@Lematinch 102817
@tdgch 88920
@24heuresch 66644
@philshem
philshem / stackexchange_tag_usage.csv
Created Mar 28, 2019
tag counts for all stackexchange network sites
View stackexchange_tag_usage.csv
We can't make this file beautiful and searchable because it's too large.
url,tagname,tagcount
"http://3dprinting.StackExchange.com/tags/101-hero|3dprinting","101-hero","1"
"http://3dprinting.StackExchange.com/tags/123d-catch|3dprinting","123d-catch","2"
"http://3dprinting.StackExchange.com/tags/2d|3dprinting","2d","4"
"http://3dprinting.StackExchange.com/tags/3d-design|3dprinting","3d-design","131"
"http://3dprinting.StackExchange.com/tags/3d-models|3dprinting","3d-models","152"
"http://3dprinting.StackExchange.com/tags/3d-pen|3dprinting","3d-pen","1"
"http://3dprinting.StackExchange.com/tags/3d-printerworks|3dprinting","3d-printerworks","1"
"http://3dprinting.StackExchange.com/tags/3dtouch|3dprinting","3dtouch","2"
@philshem
philshem / upwork_skill_tests.csv
Last active Oct 4, 2018
Summary of Upwork Skill Tests (collected 2018-10-04 from https://www.upwork.com/ab/tests/)
View upwork_skill_tests.csv
Category Title Qualified Freelancers Tests Taken Success Ratio
English Language English Spelling Test (U.S. Version) 901778 1394513 0.647
Office Skills Office Skills Test 242375 416129 0.582
Computer Skills Windows XP Test 158157 294835 0.536
Upwork Upwork Readiness Test 131245 258871 0.507
Web Development HTML5 Test 105085 222286 0.473
English Language English Spelling Test (UK Version) 134264 207299 0.648
Office Skills Email Etiquette Certification 135632 194920 0.696
Web Development CSS Test 79855 156032 0.512
Web Development PHP Test 73244 149854 0.489
@philshem
philshem / play_neiss.py
Last active Dec 26, 2017
python script to parse NEISS tsv files
View play_neiss.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import glob
import csv
# export options - default is just CSV
csv_tf = True #.csv
@philshem
philshem / swiss_bio_religion.csv
Last active Jan 9, 2016
Supporting text for Swiss Bio Religion dataviz
View swiss_bio_religion.csv
Kanton short Kanton Percent bio 2015 Römisch-katholisch Evangelisch-reformiert Konfessionslos
ZH Zurich 20.2 27.4494002999 32.146057653 24.2100824588
BE Berne 18.3 15.5754304486 55.4993347313 16.209321905
LU Lucerne 17.5 64.7856993061 11.0363370592 13.8023857444
UR Uri 16.6 81.6174974568 4.6998982706 8.0535774839
SZ Schwyz 17.6 63.7955490732 11.2731702235 14.3969931802
OW Obwalden 16.3 73.9953826078 7.2506440927 12.0989058788
NW Nidwalden 15.6 68.8285171426 10.7109434818 13.9352862774
GL Glarus 18.7 34.8267730818 35.4797750771 15.0311385211
ZG Zug 17.5 54.3763370849 14.2177988611 19.6095453653
You can’t perform that action at this time.