Skip to content

Instantly share code, notes, and snippets.

View jbest's full-sized avatar

Jason Best jbest

View GitHub Profile
@jakeonrails
jakeonrails / Ruby Notepad Bookmarklet
Created January 29, 2013 18:08
This bookmarklet gives you a code editor in your browser with a single click.
data:text/html, <style type="text/css">#e{position:absolute;top:0;right:0;bottom:0;left:0;}</style><div id="e"></div><script src="http://d1n0x3qji82z53.cloudfront.net/src-min-noconflict/ace.js" type="text/javascript" charset="utf-8"></script><script>var e=ace.edit("e");e.setTheme("ace/theme/monokai");e.getSession().setMode("ace/mode/ruby");</script>
@christianroman
christianroman / training.sh
Last active October 26, 2021 06:04
Tesseract OCR training new font
#! /bin/bash
# build the environment
mkdir tessenv; cd tessenv
TROOT=`pwd`
mkdir $TROOT/stockfonts; mkdir $TROOT/build; mkdir $TROOT/build/eng
echo "Environment built"
# Get the stock english fonts from Google (old, but they work)
cd $TROOT/stockfonts
GET http://tesseract-ocr.googlecode.com/files/boxtiff-2.01.eng.tar.gz > boxtiff-2.01.eng.tar.gz
@robmathers
robmathers / readinglisturls.py
Last active November 16, 2023 21:18
Prints out URLs of items in Safari’s Reading List
#!/usr/bin/env python
import plistlib
from shutil import copy
import subprocess
import os
from tempfile import gettempdir
import sys
import atexit
BOOKMARKS_PLIST = '~/Library/Safari/Bookmarks.plist'

Moved

Now located at https://github.com/JeffPaine/beautiful_idiomatic_python.

Why it was moved

Github gists don't support Pull Requests or any notifications, which made it impossible for me to maintain this (surprisingly popular) gist with fixes, respond to comments and so on. In the interest of maintaining the quality of this resource for others, I've moved it to a proper repo. Cheers!

import urllib
import urllib2
try:
# Base url for the TNRS match_names API
url = 'http://api.opentreeoflife.org/v2/tnrs/match_names'
# Encode data value to be looked up as an array of names:
data = '{"names": ["'+value+'"]}'
print "Looking up: " + data
# Set HTTP headers:
@beezly
beezly / gist:9b2de3749d687fdbff3f
Last active April 19, 2023 13:14
Log Nest temperatures into a Google Spreadsheet. Update the username and password values and create a resource trigger to call "getData" at regular intervals.
function performLogin(email, password) {
var payload = {
"username" : email,
"password" : password
};
var options = {
"method" : "post",
"payload" : payload
};
@dannguyen
dannguyen / README.md
Last active July 6, 2024 16:36
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

import collections
import math
import os
import cv2
import numpy as np
import time
MAX_LINES = 4000
N_PINS = 36*8
MIN_LOOP = 20 # To avoid getting stuck in a loop
@joelbarmettlerUZH
joelbarmettlerUZH / image_classifier.py
Created October 5, 2019 09:12
Tkinter Image classification GUI
from os import listdir, rename
from os.path import isfile, join
import tkinter as tk
from PIL import ImageTk, Image
IMAGE_FOLDER = "./images/unclassified"
images = [f for f in listdir(IMAGE_FOLDER) if isfile(join(IMAGE_FOLDER, f))]
unclassified_images = filter(lambda image: not (image.startswith("0_") or image.startswith("1_")), images)
current = None
@lwrubel
lwrubel / captions-shift.py
Last active June 26, 2024 13:24
Shift timestamps for captions in VTT files. Usage: python3 captions-shift.py my_captions.vtt my_edited_captions.vtt -2.25
#!/usr/bin/env python3
import webvtt
from datetime import datetime, timedelta
import argparse
parser = argparse.ArgumentParser(description="Shift caption start \
and end times in a .vtt file")
parser.add_argument("inputfile", help="input filename, must be VTT format")
parser.add_argument("outputfile", help="output filename")