Dan Nguyen dannguyen

## bashfoo.yaml
"""
bashfoo.yaml

https://gist.github.com/dannguyen/ad80b9d03f755822d3cc03174bcbef74

Dan Nguyen's personally curated list of bash/command-line commands and snippets
  that are useful but yet he keeps forgetting

"""
# gist: https://gist.github.com/dannguyen/ad80b9d03f755822d3cc03174bcbef74

## aws-textract-sample-readme.md

      
              5 files
            
          
              3 forks
            
          
              2 comments
            
          
              12 stars
            
          
                dannguyen
                / aws-textract-sample-readme.md
            
            
              Last active
              October 30, 2023 05:49
            
              
                A gist of AWS Textract sample/demo data for easy reference and preview, in case you're curious how well Amazon does when it comes to pdf-to-csv
              
          
    AWS Textract -- sample document image and data from the offical demo

AWS Textract is now out of closed beta. You can read the features page here, and you can also read about its limits here (e.g. no handwriting). Basically, if you've ever had to deal with the hell of getting structured data out of a PDF (scanned image or not), Textract is aiming for your business:

This short gist contains some of my brief observations about Textract and its demo, as well as direct links to the most relevant and important files, such as the Textract demo sample image and the resulting data files from Textract's API. If you have an AWS account, I h

  
## aws-textract-demo-readme.md

      
              5 files
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                dannguyen
                / aws-textract-demo-readme.md
            
            
              Last active
              May 30, 2019 04:38
            
              
                Amazon Textract, i.e. AWS's OCR-as-a-cloud-service, was just released to the public. Here's how well it did with recognizing data tables in a particularly difficult PDF
              
          
    [Ignore this gist, checkout the github] Testing AWS Textract's ability to correctly extract data tables from a difficult FBI stats report PDF

Update: I've since realized that this writeup would be far easier to do as its own Github repo, given the number of files involved. Please ignore this gist which I'm keeping here as a backup, and instead, visit: https://github.com/dannguyen/aws-textract-pdf-to-csv-demo


tl;dr: pretty good table structure overall, given the issues with the original PDF. However, there were inexplicable and critical data errors, as if Textract converted the PDF to an image, OCRed it, and then attempted to extract the data tables.

Amazon Textract was announced about 6 months ago but was made public today (May 29). If have an AWS account, you can check out Textract's point-and-click demo, which allows you to upload an image or PDF for T

  
## sample-public-tweets.csv
ID,Posted at,Screen name,Text
1123212586919419905,2019-04-30 13:08:43 +0000,shinya1720777,"妙にテンションの高いまーちゃんにあおられた
キビナゴりせ

#今日のりせ活 https://t.co/qabC3PQi7m"
1123212591109672961,2019-04-30 13:08:44 +0000,inesteiixeira,RT @lunaaaaa20: acho que a coisa mais linda é acompanhar o crescimento da pessoa que gostamos e contribuir para isso
1123212591109689348,2019-04-30 13:08:44 +0000,BTS20520283,#BBMAsTopSocial BTS @BTS_twt kalp
1123212591139045381,2019-04-30 13:08:44 +0000,Dipendr80247123,"RT @YL511: #البتكوين
📌اش الحكاية :
-

## cms-medicare-bulk-downloading.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dannguyen
                / cms-medicare-bulk-downloading.md
            
            
              Created
              April 14, 2019 17:14
            
              
                Compiling all the Medicare payment data
              
          
    Bulk Census data work

2016 geo summary level code list:
https://factfinder.census.gov/help/en/summary_level_code_list.htm
Landing page for ACS5-2016 Summary File

  
## census-bulk-downloading-scripts.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dannguyen
                / census-bulk-downloading-scripts.md
            
            
              Created
              April 14, 2019 17:12
            
              
                Bulk Census data downloading script
              
          
    Bulk Census data work

2016 geo summary level code list:
https://factfinder.census.gov/help/en/summary_level_code_list.htm
Landing page for ACS5-2016 Summary File

  
## _house-public-disc-scraper-README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              4 stars
            
          
                dannguyen
                / _house-public-disc-scraper-README.md
            
            
              Last active
              August 23, 2022 05:21
            
              
                Simple scraper of the ASPX search form for U.S. Congress House financial disclosure results
              
          
    Simple scraper of the ASPX search form for U.S. Congress House financial disclosure results

The following script, given someone's last name, prints a CSV of financial disclosure PDFs (the first 20, for simplicity's sake) as found on the House Financial Disclosure Reports. It's meant to be a proof-of-concept of how to scrape ASPX (and other "stateful" websites) with using plain old requests -- without too much inconvenience -- rather than resorting to something heavy like the selenium websdriver
The search page can be found here: http://clerk.house.gov/public_disc/financial-search.aspx
Here's a screenshot of what it does when you search via web browser:


## google-speech-2-text-README.md

      
              2 files
            
          
              3 forks
            
          
              0 comments
            
          
              5 stars
            
          
                dannguyen
                / google-speech-2-text-README.md
            
            
              Last active
              March 2, 2022 09:31
            
              
                How Google's text-to-speech API performs when reading the New York Times 
              
          
    Demo of Google text-to-speech Wavenet API on a NYT article

Was curious if Google's text-to-speech API might be good enough for generating audio versions of stories on-the-fly. Google has offered traditional computer voices for awhile, but last year made available their premium WaveNet voices, which are trained using audio recorded from human speakers, and are purportedly capable of mimicking natural-sounding inflection and rhythm.
tl;dr results

Pretty good...but I honestly can't tell the difference between the standard voice and the WaveNet version, at least when it comes to intonation and inflection. The first 2 grafs of this NYT story, roughly 85 words/560 characters, took less than 2 seconds to process. The result in both cases is a 37-second second audio file.

The M


## cardib-politics-talk-transcribe.md

      
              6 files
            
          
              4 forks
            
          
              11 comments
            
          
              31 stars
            
          
                dannguyen
                / cardib-politics-talk-transcribe.md
            
            
              Last active
              October 26, 2022 15:40
            
              
                An example of how to use command-line tools to transcribe a viral video of Cardi B
              
          
    Transcribing Cardi B's political speech with AWS Transcribe and command-line tools

Inspired by the following exchange on Twitter, in which someone captures and posts a valuable video onto Twitter, but doesn't have the resources to easily transcribe it for the hearing-impaired, I thought it'd be fun to try out Amazon's AWS Transcribe service to help with this problem, and to see if I could do it all from the bash command-line like a Unix dork.

The instructions and code below show how to use command-line tools/scripting and Amazon's Transcribe service to transcribe the audio from online video. tl;dr: AWS Transcribe is a pretty amaz

  
## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dannguyen
                / README.md
            
            
              Last active
              November 7, 2018 06:28
            
              
                Sample SQLite problem in which, given a table of records, I want to link each record with its prior version. Seems like recursion is the solution -- putting down this correlated query for ow
              
          
    Using recursive CTE in SQLite instead of a correlated query

Given a table of exams, in which each exam belongs to a "student", we want to derive a new table in which for any given student exam, we see how much the score has changed compared to the student's previous exam.
Here's the table of exams, which is generated by the SQL in sql-make-exams-table.sql](#file-sql-make-exams-table-sql):
starting exams table

| exam_id | year | student | score |
	"""
	bashfoo.yaml

	https://gist.github.com/dannguyen/ad80b9d03f755822d3cc03174bcbef74

	Dan Nguyen's personally curated list of bash/command-line commands and snippets
	that are useful but yet he keeps forgetting

	"""
	# gist: https://gist.github.com/dannguyen/ad80b9d03f755822d3cc03174bcbef74
	ID,Posted at,Screen name,Text
	1123212586919419905,2019-04-30 13:08:43 +0000,shinya1720777,"妙にテンションの高いまーちゃんにあおられた
	キビナゴりせ

	#今日のりせ活 https://t.co/qabC3PQi7m"
	1123212591109672961,2019-04-30 13:08:44 +0000,inesteiixeira,RT @lunaaaaa20: acho que a coisa mais linda é acompanhar o crescimento da pessoa que gostamos e contribuir para isso
	1123212591109689348,2019-04-30 13:08:44 +0000,BTS20520283,#BBMAsTopSocial BTS @BTS_twt kalp
	1123212591139045381,2019-04-30 13:08:44 +0000,Dipendr80247123,"RT @YL511: #البتكوين
	📌اش الحكاية :
	-