Skip to content

Instantly share code, notes, and snippets.

View ironsmile's full-sized avatar

Doychin Atanasov ironsmile

View GitHub Profile
@dannguyen
dannguyen / README.md
Last active May 17, 2024 02:07
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

@wang-steven
wang-steven / netflix_vector_ubuntu_trusty.md
Last active August 29, 2015 14:19
Netflix’s Vector on ubuntu 14.04

Taking Netflix’s Vector (Performance Monitoring Tool) For A Spin

install pcp pcp-webapi

sudo curl 'https://bintray.com/user/downloadSubjectPublicKey?username=netflixoss' | sudo apt-key add -
sudo echo "deb https://dl.bintray.com/netflixoss/ubuntu trusty main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install pcp pcp-webapi

start pcp service

@debasishg
debasishg / gist:8172796
Last active May 10, 2024 13:37
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@trongthanh
trongthanh / gist:1196596
Created September 6, 2011 04:37
Emulate slow Internet connection speed on localhost with netem (Ubuntu)
#Refer: http://www.linuxfoundation.org/collaborate/workgroups/networking/netem#Delaying_only_some_traffic
#Refer: http://www.bomisofmab.com/blog/?p=100
#Refer: http://drija.com/linux/41983/simulating-a-low-bandwidth-high-latency-network-connection-on-linux/
#Setup the rate control and delay
sudo tc qdisc add dev lo root handle 1: htb default 12
sudo tc class add dev lo parent 1:1 classid 1:12 htb rate 56kbps ceil 128kbps
sudo tc qdisc add dev lo parent 1:12 netem delay 200ms
#Remove the rate control/delay
sudo tc qdisc del dev lo root