Skip to content

Instantly share code, notes, and snippets.


John Resig jeresig

View GitHub Profile
dannguyen /
Last active Aug 3, 2020
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

View Makefile
# Hello, and welcome to makefile basics.
# You will learn why `make` is so great, and why, despite its "weird" syntax,
# it is actually a highly expressive, efficient, and powerful way to build
# programs.
# Once you're done here, go to
# to learn SOOOO much more.
Asparagirl / gist:c2f710724232f76187b3
Last active Nov 25, 2018
Grab a website with wpull and PhantomJS
View gist:c2f710724232f76187b3

Grab a website with wpull and PhantomJS

export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
# this one can be regex, or you can leave it out, whatever
export THINGS_TO_IGNORE="ignore-this,other-thing-to-ignore"
export WARC_NAME="Example.com_-_2014-10-15"
# these two are needed in case wpull quits or chokes and we need to restart where we left off
sevastos /
Last active Feb 6, 2016
Boot2Docker (VirtualBox) MongoDB volume filesystem issue


I was using Boot2Docker 1.2 (OSX) and wanted to use volume for MongoDB. First nothing was happening because 1.2 has no Guest Additions and volumes don't work. There is a workaround by making a boot2docker.iso from master which has Guest Additions.

But then Mongo didn't like putting data on VirtualBox's shared folders:

[initandlisten] 	WARNING: This file system is not supported. For further information see:
View jquery.js
* jQuery JavaScript Library v2.1.1pre
* Includes Sizzle.js
* Copyright 2005, 2014 jQuery Foundation, Inc. and other contributors
* Released under the MIT license
potiuk / libmemcached.rb
Last active Dec 21, 2015
Install libmemcached with brew in preview (build 13A558) version of OSX Mavericks with XCode 5 Developer Preview 6
View libmemcached.rb
require 'formula'
class Libmemcached < Formula
homepage ''
url ''
sha1 '1023bc8c738b1f5b8ea2cd16d709ec6b47c3efa8'
depends_on 'memcached'
def install
Asparagirl / gist:6206247
Last active May 30, 2020
Have a WARC that you would like to upload to the Internet Archive so that it can eventually be included in their Wayback Machine? Here's how to upload it from the command line.
View gist:6206247

Do you have a WARC file of a website all downloaded and ready to be added to the Internet Archive? Great! You can do that with the Internet Archive's web-based uploader, but it's not ideal and it can't handle really big uploads. Here's how you can upload your WARC files to the IA from the command line, and without worrying about a size restriction.

First, you need to get your Access Key and Secret Key from the Internet Archive for the S3-like API. Here's where you can get that for your IA account: Don't share those with other people!

Here's their documentation file about how to use it, if you need some extra help:

Next, you should copy the following files to a text file and edit them as needed:

You can’t perform that action at this time.