Skip to content

Instantly share code, notes, and snippets.

Dan Nguyen dannguyen

View GitHub Profile
@dannguyen
dannguyen / cms-medicare-bulk-downloading.md
Created Apr 14, 2019
Compiling all the Medicare payment data
View cms-medicare-bulk-downloading.md
@dannguyen
dannguyen / census-bulk-downloading-scripts.md
Created Apr 14, 2019
Bulk Census data downloading script
View census-bulk-downloading-scripts.md
@dannguyen
dannguyen / _house-public-disc-scraper-README.md
Last active Apr 8, 2019
Simple scraper of the ASPX search form for U.S. Congress House financial disclosure results
View _house-public-disc-scraper-README.md

Simple scraper of the ASPX search form for U.S. Congress House financial disclosure results

The following script, given someone's last name, prints a CSV of financial disclosure PDFs (the first 20, for simplicity's sake) as found on the House Financial Disclosure Reports. It's meant to be a proof-of-concept of how to scrape ASPX (and other "stateful" websites) with using plain old requests -- without too much inconvenience -- rather than resorting to something heavy like the selenium websdriver

The search page can be found here: http://clerk.house.gov/public_disc/financial-search.aspx

Here's a screenshot of what it does when you search via web browser:

screenshot of disclosure search for "king"

@dannguyen
dannguyen / google-speech-2-text-README.md
Last active Feb 19, 2019
How Google's text-to-speech API performs when reading the New York Times
View google-speech-2-text-README.md

Demo of Google text-to-speech Wavenet API on a NYT article

Was curious if Google's text-to-speech API might be good enough for generating audio versions of stories on-the-fly. Google has offered traditional computer voices for awhile, but last year made available their premium WaveNet voices, which are trained using audio recorded from human speakers, and are purportedly capable of mimicking natural-sounding inflection and rhythm.

tl;dr results

Pretty good...but I honestly can't tell the difference between the standard voice and the WaveNet version, at least when it comes to intonation and inflection. The first 2 grafs of this NYT story, roughly 85 words/560 characters, took less than 2 seconds to process. The result in both cases is a 37-second second audio file.

  • The M
@dannguyen
dannguyen / cardib-politics-talk-transcribe.md
Last active Feb 15, 2019
An example of how to use command-line tools to transcribe a viral video of Cardi B
View cardib-politics-talk-transcribe.md

Transcribing Cardi B's political speech with AWS Transcribe and command-line tools

Inspired by the following exchange on Twitter, in which someone captures and posts a valuable video onto Twitter, but doesn't have the resources to easily transcribe it for the hearing-impaired, I thought it'd be fun to try out Amazon's AWS Transcribe service to help with this problem, and to see if I could do it all from the bash command-line like a Unix dork.

Screencap of @jordanuhl's video tweet, followed by a request for a transcript

The instructions and code below show how to use command-line tools/scripting and Amazon's Transcribe service to transcribe the audio from online video. tl;dr: AWS Transcribe is a pretty amaz

@dannguyen
dannguyen / README.md
Last active Nov 7, 2018
Sample SQLite problem in which, given a table of records, I want to link each record with its prior version. Seems like recursion is the solution -- putting down this correlated query for ow
View README.md

Using recursive CTE in SQLite instead of a correlated query

Given a table of exams, in which each exam belongs to a "student", we want to derive a new table in which for any given student exam, we see how much the score has changed compared to the student's previous exam.

Here's the table of exams, which is generated by the SQL in sql-make-exams-table.sql](#file-sql-make-exams-table-sql):

starting exams table

| exam_id | year | student | score |

@dannguyen
dannguyen / _README-twitter-purge.md
Last active Jul 13, 2018
List of my follower characteristics, sorted by # of followers -- as of 2018-07-11 that were not my followers on 2018-07-13
View _README-twitter-purge.md

My Twitter followers who were "purged" between July 11 and July 13. 2018

tl;dr: The file below, purged-users.csv, contains a data table of stats from my Twitter followers who appear to have been "purged" on 2018-07-12 from my followers count. As Twitter said in its announcement, only follower counts have been adjusted. The actual user accounts who constitute the purged followers counts were not deleted, nor does their list of "followings" reflect the change. This makes sense because the "purged" accounts aren't necessarily fake, but they've been "locked" for suspicious behavior and have been inactive since.

Background

Sometime on July 12, 2018, Twitter conducted a mass purge of user accounts suspected to be fake. Via the Twitter blog, Confidence in follower counts (emphasis added):

@dannguyen
dannguyen / README.md
Last active Feb 9, 2019
Using just pure SQLite, create a tidy and normalized table from a recordset in which some columns contain multiple delimited values. Kudos to Samuel Bosch for this solution http://www.samuelbosch.com/2018/02/split-into-rows-sqlite.html
View README.md

Pure SQLite solution to creating a tidy/normalized data table from a column of delimited values

The problem: we have a data table in which one of the columns contains a text string that is meant to be multiple values separated by a delimiter (e.g. a comma). For example, the LAPD crime incidents data has a column named MO Codes (short for modus operandi). Every incident may have several MO's -- for example, a particular RESISTING ARREST incident may have a MO Codes value of 1212 0416, which corresponds, respectively, to: LA Police Officer and Hit-Hit w/ weapon:

![image](https://user-ima

@dannguyen
dannguyen / fetch_house_disbursements.py
Last active Jun 16, 2018
Python 3.6 script for downloading House disbursement data (~10 years worth) from ProPublica: https://projects.propublica.org/represent/expenditures
View fetch_house_disbursements.py
"""
Fetches House disbursement CSV files from
https://projects.propublica.org/represent/expenditures
Saves them to:
data/raw/{year}Q{q}.csv
"""
import requests
from pathlib import Path
DATADIR = Path('data', 'raw')
@dannguyen
dannguyen / schemacrawler-sqlite-macos-howto.md
Last active Apr 14, 2019
How to use schemacrawler to generate schema diagrams for SQLite from the commandline (Mac OS)
View schemacrawler-sqlite-macos-howto.md
You can’t perform that action at this time.