Skip to content

Instantly share code, notes, and snippets.

@dannguyen
dannguyen / README.md
Last active July 29, 2025 14:26
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

require 'base64'
require 'open-uri'
require 'net/http'
require 'net/https'
require 'json'
class OCR
attr_reader :api_key, :image_url
def self.scan(api_key:, image_url:)
@wvengen
wvengen / README.md
Last active January 5, 2025 05:20
Ruby memory analysis over time

Finding a Ruby memory leak using a time analysis

When developing a program in Ruby, you may sometimes encounter a memory leak. For a while now, Ruby has a facility to gather information about what objects are laying around: ObjectSpace.

There are several approaches one can take to debug a leak. This discusses a time-based approach, where a full memory dump is generated every, say, 5 minutes, during a time that the memory leak is showing up. Afterwards, one can look at all the objects, and find out which ones are staying around, causing the

@bsweger
bsweger / useful_pandas_snippets.md
Last active October 6, 2025 13:44
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@staltz
staltz / introrx.md
Last active October 26, 2025 03:06
The introduction to Reactive Programming you've been missing
@traviskaufman
traviskaufman / jasmine-this-vars.md
Last active January 4, 2025 16:49
Better Jasmine Tests With `this`

Better Jasmine Tests With this

On the Refinery29 Mobile Web Team, codenamed "Bicycle", all of our unit tests are written using Jasmine, an awesome BDD library written by Pivotal Labs. We recently switched how we set up data for tests from declaring and assigning to closures, to assigning properties to each test case's this object, and we've seen some awesome benefits from doing such.

The old way

Up until recently, a typical unit test for us looked something like this:

describe('views.Card', function() {
@lsauer
lsauer / gist:3741940
Created September 18, 2012 08:07
C# Fullscreen Console on Windows via unmanaged kernel32 calls to SetConsoleDisplayMode
//author: lsauer.com 2012, CC-BY-SA
//description: Fullscreen consoles were common until the last few years.
// Currently Windows XP, and Windows Vista, Windows 7’s Safe Mode allow fullscreen console mode.
// The reason is that the current display driver model does not support VGA text mode programs.
using System;
using System.IO;
using System.Collections.Generic; //for dictionary
using System.Runtime.InteropServices; //for P/Invoke DLLImport
class App
@fcingolani
fcingolani / index.html
Created August 9, 2012 02:16
How to render a full PDF using Mozilla's pdf.js
<html>
<body>
<!-- really dirty! this is just a test drive ;) -->
<script type="text/javascript" src="https://raw.github.com/mozilla/pdf.js/gh-pages/build/pdf.js"></script>
<script type="text/javascript">
function renderPDF(url, canvasContainer, options) {
var options = options || { scale: 1 };
@aozturk
aozturk / HashMap.h
Last active April 2, 2022 00:47
Basic Hash Map (Hash Table) implementation in C++
// Hash map class template
template <typename K, typename V, typename F = KeyHash<K>>
class HashMap {
public:
HashMap() {
// construct zero initialized hash table of size
table = new HashNode<K, V> *[TABLE_SIZE]();
}
~HashMap() {
@tkf
tkf / mplonflask.py
Created October 19, 2011 21:04
matplotlib on Flask
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot
import numpy
from flask import Flask, send_file
from cStringIO import StringIO
app = Flask(__name__)