Skip to content

Instantly share code, notes, and snippets.

View dannguyen's full-sized avatar
💭
havin a normal one

Dan Nguyen dannguyen

💭
havin a normal one
View GitHub Profile
@dannguyen
dannguyen / _README-twitter-purge.md
Last active July 13, 2018 23:18
List of my follower characteristics, sorted by # of followers -- as of 2018-07-11 that were not my followers on 2018-07-13

My Twitter followers who were "purged" between July 11 and July 13. 2018

tl;dr: The file below, purged-users.csv, contains a data table of stats from my Twitter followers who appear to have been "purged" on 2018-07-12 from my followers count. As Twitter said in its announcement, only follower counts have been adjusted. The actual user accounts who constitute the purged followers counts were not deleted, nor does their list of "followings" reflect the change. This makes sense because the "purged" accounts aren't necessarily fake, but they've been "locked" for suspicious behavior and have been inactive since.

Background

Sometime on July 12, 2018, Twitter conducted a mass purge of user accounts suspected to be fake. Via the Twitter blog, Confidence in follower counts (emphasis added):

@dannguyen
dannguyen / README.md
Last active September 26, 2021 04:20
Using just pure SQLite, create a tidy and normalized table from a recordset in which some columns contain multiple delimited values. Kudos to Samuel Bosch for this solution http://www.samuelbosch.com/2018/02/split-into-rows-sqlite.html

Pure SQLite solution to creating a tidy/normalized data table from a column of delimited values

The problem: we have a data table in which one of the columns contains a text string that is meant to be multiple values separated by a delimiter (e.g. a comma). For example, the LAPD crime incidents data has a column named MO Codes (short for modus operandi). Every incident may have several MO's -- for example, a particular RESISTING ARREST incident may have a MO Codes value of 1212 0416, which corresponds, respectively, to: LA Police Officer and Hit-Hit w/ weapon:

![image](https://user-ima

@dannguyen
dannguyen / fetch_house_disbursements.py
Last active June 16, 2018 09:29
Python 3.6 script for downloading House disbursement data (~10 years worth) from ProPublica: https://projects.propublica.org/represent/expenditures
"""
Fetches House disbursement CSV files from
https://projects.propublica.org/represent/expenditures
Saves them to:
data/raw/{year}Q{q}.csv
"""
import requests
from pathlib import Path
DATADIR = Path('data', 'raw')
@dannguyen
dannguyen / schemacrawler-sqlite-macos-howto.md
Last active January 21, 2024 15:32
How to use schemacrawler to generate schema diagrams for SQLite from the commandline (Mac OS)
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 12 columns, instead of 5. in line 9.
uid,reported_date,victim_last,victim_first,victim_race,victim_age,victim_sex,city,state,lat,lon,disposition
Alb-000001,20100504,GARCIA,JUAN,Hispanic,78,Male,Albuquerque,NM,35.0957885,-106.5385549,Closed without arrest
Alb-000002,20100216,MONTOYA,CAMERON,Hispanic,17,Male,Albuquerque,NM,35.0568104,-106.715321,Closed by arrest
Alb-000003,20100601,SATTERFIELD,VIVIANA,White,15,Female,Albuquerque,NM,35.086092,-106.695568,Closed without arrest
Alb-000004,20100101,MENDIOLA,CARLOS,Hispanic,32,Male,Albuquerque,NM,35.0784929,-106.5560938,Closed by arrest
Alb-000005,20100102,MULA,VIVIAN,White,72,Female,Albuquerque,NM,35.1303568,-106.5809862,Closed without arrest
Alb-000006,20100126,BOOK,GERALDINE,White,91,Female,Albuquerque,NM,35.15111,-106.537797,Open/No arrest
Alb-000007,20100127,MALDONADO,DAVID,Hispanic,52,Male,Albuquerque,NM,35.1117847,-106.7126144,Closed by arrest
Alb-000008,20100127,MALDONADO,CONNIE,Hispanic,52,Female,Albuquerque,NM,35.1117847,-106.7126144,Closed by arrest
Alb-000009,20100130,MARTIN-LEYVA,GUSTAVO,W
{
"kind": "youtube#videoListResponse",
"etag": "\"DuHzAJ-eQIiCIp7p4ldoVcVAOeY/oWEizCC8EIhZ_7DOM2iudlxoTvI\"",
"pageInfo": {
"totalResults": 3,
"resultsPerPage": 3
},
"items": [
{
"kind": "youtube#video",
@dannguyen
dannguyen / fetch-and-extract-facebook-adpdfs.md
Created May 11, 2018 21:48
Some quickie Shell snippets to do the fetching and unzipping of the Facebook-Congress-Russia data files

Context

Facebook has been under scrutiny because of how its ad platform may have been used by foreign actors during the 2016 election. In May 2018, Facebook released ad data to a House committee, which subsequently published the data online.

As part of that continuing effort to educate the public and seek additional analysis, the Committee Minority is making available all IRA advertisements identified by Facebook. This is an effort to be fully transparent with the public, allow outside experts to analyze the data, and provide the American people a fuller accounting of Russian efforts to sow discord and interfere in our democracy.

You can read more about the events here:

https://democrats-intelligence.house.gov/facebook-ads/

@dannguyen
dannguyen / scrape-ca-dmv.py
Created May 10, 2018 23:32
Quickie scraper for california disengagement reports, gets data to paste into spreadsheet
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from time import sleep
URLS = {
'2017': 'https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/disengagement_report_2017',
'2016': 'https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/disengagement_report_2016',
'2015': 'https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/disengagement_report_2015',
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dannguyen
dannguyen / recent-votes.json
Last active April 26, 2018 20:26
Sample data from ProPublica represent API, /votes endpoint: https://projects.propublica.org/api-docs/congress-api/votes/ v0YW5hqaX73oy3PHoXNSy1jvzvxRDo4gaw5TFtPY
{
"status": "OK",
"copyright": "Copyright (c) 2017 Pro Publica Inc. All Rights Reserved.",
"results": {
"chamber": "House",
"offset": 0,
"num_results": 20,
"votes": [{
"congress": 115,
"chamber": "House",