Dan Nguyen dannguyen

## tx-dp-regex-religion.py
"""
Filter Texas executed inmates by whether any of their last words fit in a
list of words commonly associated with religion.


A quick demonstration of the overall patterns in web-scraping, including
  using a HTML parser to navigate the DOM and the use of Regex for
  hand-entered values. Does none of the file-caching/management that you should
  be doing for such a task
"""

## fetch_ghstars.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              12 stars
            
          
                dannguyen
                / fetch_ghstars.md
            
            
              Last active
              April 10, 2024 19:25
            
              
                fetch_ghstars.py: quick CLI script to fetch from Github API all of a user's starred repos and save it as raw JSON and wrangled CSV
              
          
    fetch_ghstars.py: quick CLI script to fetch and collate  from Github API all of a user's starred repos


Requires Python 3.6+
Creates a subdir 'ghstars-USERNAME' at the current working directory
the raw JSON of each page request is saved as: 01.json, 02.json 0n.json
A flattened, filtered CSV is also created: wrangled.csv

Example usage:

  
## pypy-print.py
def print_(*args, **kwargs):
    """The new-style print function from py3k."""
    fp = kwargs.pop("file", sys.stdout)
    if fp is None:
        return
    def write(data):
        if not isinstance(data, basestring):
            data = str(data)
        fp.write(data)
    want_unicode = False

## schemacrawler-sqlite-macos-howto.md

      
              3 files
            
          
              10 forks
            
          
              8 comments
            
          
              31 stars
            
          
                dannguyen
                / schemacrawler-sqlite-macos-howto.md
            
            
              Last active
              January 21, 2024 15:32
            
              
                How to use schemacrawler to generate schema diagrams for SQLite from the commandline (Mac OS)
              
          
    Installing and using schemacrawler for MacOS

A recipe for generating cool SQLite database diagrams with schemacrawler on MacOS

This was tested on MacOS 10.14.5 on 2019-07-16
schemacrawler is a free and open-source database schema discovery and comprehension tool. It can be invoked from the command-line to produce, using GraphViz, images/pdfs from a SQLite (or other database type) file.
It can be used from the command-line to generate schema diagrams like these:


## README.md

      
              2 files
            
          
              69 forks
            
          
              9 comments
            
          
              406 stars
            
          
                dannguyen
                / README.md
            
            
              Last active
              December 28, 2023 15:21
            
              
                Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data
              
          
    Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.
The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.
On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:
####### 1. A low-resolution photo of road signs

  
## wget-snapshotpage.md

      
              1 file
            
          
              20 forks
            
          
              3 comments
            
          
              98 stars
            
          
                dannguyen
                / wget-snapshotpage.md
            
            
              Last active
              December 25, 2023 20:57
            
              
                Use wget to snapshot a page and its necessary visual dependencies
              
          
    Use wget to mirror a single page and its visible dependencies (images, styles)


Graphic via State of Florida CFO Vendor Payment Search (flair.myfloridacfo.com)
This is a quick command I use to snapshot webpages that have a fun image I want to keep for my own collection of WTFViz. Why not just right-click and save the image? Oftentimes, the webpage in which the image is embedded contains necessary context, such as captions and links to important documentation just incase you forget what exactly that fun graphic was trying to explain.

  
## ec2-centos-ruby-rvm-nginx-passenger.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              7 stars
            
          
                dannguyen
                / ec2-centos-ruby-rvm-nginx-passenger.md
            
            
              Last active
              November 27, 2023 15:43
            
              
                Setting up Ruby 1.9.3 stable, RVM, nginx, passenger on Amazon Linux AMI (CentOS)
              
          
    Ruby 1.9.3 stable, RVM, nginx, passenger on Amazon Linux AMI (CentOS, 03-2013)

This combines the instructions on a few different tutorials:

Tabula EC2 AMI quickstart - A nice illustrated How-To for a basic EC2 setup, including permissions setup. The instructions apply to a specific demo app but apply to any EC2 usage
Code by Zack » Getting RVM set up on Amazon EC2 - How to install the latest stable version of Ruby (1.9.3p392) with RVM
Code by Zack » Setting up Passenger with rvm on Ubuntu - Installing Passenger and using it to set up nginx. The instructions at the end of Zack's guide on creating an init script apply to Ubuntu only.
[Slicehost Articles: Cen


## guardian-articles-day-api.md

      
              1 file
            
          
              7 forks
            
          
              5 comments
            
          
              36 stars
            
          
                dannguyen
                / guardian-articles-day-api.md
            
            
              Last active
              November 23, 2023 12:28
            
              
                How to use The Guardian's API to download article data for content analysis (in Python 3.x)
              
          
    How to use The Guardian's API to download article data for content analysis (in Python 3.x)

The Guardian offers an API as deep and robust as the New York Times Article API when it comes to content analysis.
The Guardian's API offers more than "1.7 million pieces of content", with published items as far back as 1999. You can register as a developer here, which gets you 5,000 API hits a day and an API key that looks something like this:
zzzyyyyy-9a9z-999z-z999-9e8a83922516

The Guardian has a handy interactive explorer to interactively tweak the query parameters.

  
## aws-textract-sample-readme.md

      
              5 files
            
          
              3 forks
            
          
              2 comments
            
          
              12 stars
            
          
                dannguyen
                / aws-textract-sample-readme.md
            
            
              Last active
              October 30, 2023 05:49
            
              
                A gist of AWS Textract sample/demo data for easy reference and preview, in case you're curious how well Amazon does when it comes to pdf-to-csv
              
          
    AWS Textract -- sample document image and data from the offical demo

AWS Textract is now out of closed beta. You can read the features page here, and you can also read about its limits here (e.g. no handwriting). Basically, if you've ever had to deal with the hell of getting structured data out of a PDF (scanned image or not), Textract is aiming for your business:

This short gist contains some of my brief observations about Textract and its demo, as well as direct links to the most relevant and important files, such as the Textract demo sample image and the resulting data files from Textract's API. If you have an AWS account, I h

  
## catdrawer-youtube-to-gif-README.md

      
              1 file
            
          
              4 forks
            
          
              5 comments
            
          
              15 stars
            
          
                dannguyen
                / catdrawer-youtube-to-gif-README.md
            
            
              Last active
              September 20, 2023 21:02
            
              
                Using youtube-dl and gifify from the command-line to make a cat gif
              
          
    Using youtube-dl and gifify from the command-line

Turn this cute YouTube cat video into a briefer-but-still-cute GIF:

Software to download


youtube-dl is a command-line tool for quickly downloading video files from a given YouTube URL
	"""
	Filter Texas executed inmates by whether any of their last words fit in a
	list of words commonly associated with religion.


	A quick demonstration of the overall patterns in web-scraping, including
	using a HTML parser to navigate the DOM and the use of Regex for
	hand-entered values. Does none of the file-caching/management that you should
	be doing for such a task
	"""
	def print_(args, *kwargs):
	"""The new-style print function from py3k."""
	fp = kwargs.pop("file", sys.stdout)
	if fp is None:
	return
	def write(data):
	if not isinstance(data, basestring):
	data = str(data)
	fp.write(data)
	want_unicode = False