Skip to content

Instantly share code, notes, and snippets.

View thirdwing's full-sized avatar
🏠
Working from home

Qiang Kou (KK) thirdwing

🏠
Working from home
View GitHub Profile
@thirdwing
thirdwing / README.md
Created March 24, 2016 20:19 — forked from dannguyen/README.md
Using Google Cloud Vision API to OCR scanned documents to extract structured data

Using Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

#!/bin/python
from collections import OrderedDict
def memo(f, k):
cache = OrderedDict()
def memoized(n):
if n not in cache:
cache[n] = f(n)
if len(cache) > k:
del cache[cache.keys()[0]]
require(RCurl)
require(XML)
webpage <- getURL("https://en.wikipedia.org/wiki/N-gram")
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
x <- xpathSApply(pagetree, "//*/p", xmlValue)
x <- unlist(strsplit(x, "\n"))
x <- gsub("\t","",x)
x <- sub("^[[:space:]]*(.*?)[[:space:]]*$", "\\1", x, perl=TRUE)
x <- x[!(x %in% c("", "|"))]
// boost graph serialization example
// g++ boost_graph_serialize.cpp -lboost_serialization -o test
#include <iostream>
#include <string>
#include <iostream>
#include <fstream>
#include <set>
@thirdwing
thirdwing / ls_env.R
Created October 12, 2015 17:49
ls functions
find.funs <-function(pos = 1, ..., exclude.mcache = TRUE, mode = 'function') {
findo <- function(pos2) {
o <- named(lsall(pos = pos2, ...))
if (!length(o))
return(character(0))
# keep if exists
keep <- sapply(o, exists, where = pos2, mode = mode, inherits = FALSE)
if (!any(keep))
CFLAGS += -O3 -Wall -pipe -pedantic
CXXFLAGS += -O3 -Wall -pipe -Wno-unused -pedantic
VER=-4.8
CCACHE=ccache
CC=$(CCACHE) gcc$(VER)
CXX=$(CCACHE) g++$(VER)
SHLIB_CXXLD=g++$(VER)
FC=ccache gfortran$(VER)
F77=$(CCACHE) gfortran$(VER)
MAKE=make -j8
# Make mouse useful
setw -g mouse on
# Allow xterm titles in terminal window, terminal scrolling with scrollbar, and setting overrides of C-Up, C-Down, C-Left, C-Right
# (commented out because it disables cursor navigation in vim)
#set -g terminal-overrides "xterm*:XT:smcup@:rmcup@:kUP5=\eOA:kDN5=\eOB:kLFT5=\eOD:kRIT5=\eOC"
# Scroll History
set -g history-limit 30000
@thirdwing
thirdwing / .vimrc
Last active December 29, 2018 23:20
set backspace=indent,eol,start
colorscheme default
set nocompatible " be iMproved, required
filetype off " required
" set the runtime path to include Vundle and initialize
set rtp+=~/.vim/bundle/Vundle.vim
call vundle#begin()
" alternatively, pass a path where Vundle should install plugins
"call vundle#begin('~/some/path/here')

#mlpack on Windows

mlpack, while not designed with Windows as a specific target, can still be built and run on Windows with some configuration. This document details the steps necessary to get MLPACK compiled using MinGW64.

#Prerequisites

  • MinGW64: The compiler we use. Please add it into PATH
  • boost: This is the boost 1.56 static library compiled by myself using MinGW64. Static library is a better choice on Windows.
  • Armadillo
  • BLAS and lapack: Static BLAS and lapack library which are needed by Armadillo. Compiled using MinGW64
// callback4.cpp - C++11 Lambda Callback
// To build:
// g++ -std=c++11 callback4.cpp
// Situation: A "Caller" class allows another class "Callee"
// to connect to it via callback. How to implement this?
// A C++11 lambda function can be used.