Skip to content

Instantly share code, notes, and snippets.

Avatar

Ed Summers edsu

View GitHub Profile
View pinboard-ask.txt
Hi there,
My name is Maciej, I run the bookmarking site Pinboard, and I’m writing to ask
for your help.
You joined the site back when there was a one-time signup fee. Back then,
charging for bookmarking online was unheard of, and the fee was more of an
anti-spam measure than a revenue model.
In 2015, I changed Pinboard over to a subscription site, where even “basic”
@edsu
edsu / CC-MAIN-2021-04-hosts-sizes-top100.csv
Last active Feb 7, 2021
The top 100 hosts by WARC record sizes (bytes) in commoncrawl CC-MAIN-2021-04.
View CC-MAIN-2021-04-hosts-sizes-top100.csv
url_host_name length
d2y1pz2y630308.cloudfront.net 35553494127
photos.google.com 22829413806
www.download.p4c.philips.com 19523224128
quod.lib.umich.edu 18400799789
s3.amazonaws.com 17043193709
support.google.com 16945185389
www.wmagazine.com 15723224197
api.whatsapp.com 15241728762
www.thecut.com 15017634948
@edsu
edsu / CC-MAIN-2021-04-host-counts-top100.csv
Last active Feb 7, 2021
The top 100 host counts in commoncrawl CC-MAIN-2021-04
View CC-MAIN-2021-04-host-counts-top100.csv
url_host_name total
getpocket.com 1640422
auth.webnode.com 1056353
telegram.me 543797
plus.google.com 472899
www.ncbi.nlm.nih.gov 433041
api.whatsapp.com 394818
web.skype.com 338835
www.amazon.com 296540
dx.doi.org 290895
View gist:50a9ac542b2ade5a60dd833c9c31f878
name,twitter
Susan A. Davis,https://twitter.com/repsusandavis
Martha Roby,https://twitter.com/repmartharoby
"F. James Sensenbrenner, Jr.",https://twitter.com/jimpressoffice
Joseph P. Kennedy III,https://twitter.com/repjoekennedy
Denny Heck,https://twitter.com/repdennyheck
Donna E. Shalala,https://twitter.com/repshalala
Steve Watkins,https://twitter.com/rep_watkins
Kendra S. Horn,https://twitter.com/repkendrahorn
Ben McAdams,https://twitter.com/repbenmcadams
View most.yaml
- id:
bioguide: B001288
lis: S370
thomas: '02194'
govtrack: 412598
opensecrets: N00035267
votesmart: 76151
wikipedia: Cory Booker
ballotpedia: Cory Booker
fec:
View outgoing.csv
name twitter
Susan A. Davis https://twitter.com/repsusandavis
Martha Roby https://twitter.com/repmartharoby
F. James Sensenbrenner, Jr. https://twitter.com/jimpressoffice
Joseph P. Kennedy III https://twitter.com/repjoekennedy
Denny Heck https://twitter.com/repdennyheck
Donna E. Shalala https://twitter.com/repshalala
Steve Watkins https://twitter.com/rep_watkins
Kendra S. Horn https://twitter.com/repkendrahorn
Ben McAdams https://twitter.com/repbenmcadams
View outgoing.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 6. in line 7.
name,twitter,twitter_ok,youtube,youtube_ok,facebook,facebook_ok,instagram,instagram_ok
Lamar Alexander,https://twitter.com/senalexander,True,https://www.youtube.com/user/lamaralexander,True,,,https://www.instagram.com/senlamaralexander,True
Michael B. Enzi,https://twitter.com/senatorenzi,True,https://www.youtube.com/user/senatorenzi,True,https://www.facebook.com/mikeenzi,True,https://www.instagram.com/senatorenzi,True
Pat Roberts,https://twitter.com/senpatroberts,True,https://www.youtube.com/user/senpatroberts,True,https://www.facebook.com/senpatroberts,True,,
Tom Udall,https://twitter.com/senatortomudall,True,https://www.youtube.com/user/senatortomudall,True,https://www.facebook.com/senatortomudall,True,https://www.instagram.com/senatortomudall,True
Justin Amash,,,,,https://www.facebook.com/repjustinamash,True,,
Rob Bishop,https://twitter.com/reprobbishop,True,https://www.youtube.com/user/congressmanbishop,True,https://www.facebook.com/reprobbishop,True,,
Wm. Lacy Clay,,,,,https://www.facebook.com/1091354058
View most.yaml
id:
bioguide: P000598
thomas: '01910'
govtrack: 412308
opensecrets: N00029127
votesmart: 106220
fec:
- H8CO02137
cspan: 1031300
wikipedia: Jared Polis
@edsu
edsu / search_tweets_all.sh
Last active Jan 31, 2021
use twitter's search client to retrieve all possible data for a tweet from the v2 api
View search_tweets_all.sh
search_tweets.py \
--query obama \
--results-per-call 100 \
--tweet-fields id,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,possibly_sensitive,public_metrics,referenced_tweets,reply_settings,source,withheld \
--user-fields id,name,username,created_at,description,entities,location,pinned_tweet_id,profile_image_url,protected,public_metrics,url,verified,withheld \
--media-fields media_key,type,duration_ms,height,preview_image_url,public_metrics,width \
--poll-fields id,options,duration_minutes,end_datetime,voting_status \
--place-fields full_name,id,contained_within,country,country_code,geo,name,place_type \
--expansions author_id,referenced_tweets.id,in_reply_to_user_id,attachments.media_keys,attachments.poll_ids,geo.place_id,entities.mentions.username,referenced_tweets.id.author_id \
--filename-prefix obama \