Skip to content

Instantly share code, notes, and snippets.

@dannguyen
dannguyen / README.md
Last active May 17, 2024 02:07
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

@FrancesCoronel
FrancesCoronel / sampleREADME.md
Last active March 26, 2024 01:21
A sample README for all your GitHub projects.

Repository Title Goes Here

Frances Coronel

INSERT GRAPHIC HERE (include hyperlink in image)

Subtitle or Short Description Goes Here

ideally one sentence >

@georgiana-gligor
georgiana-gligor / osx-pdf-from-markdown.markdown
Last active March 5, 2024 21:09
Markdown source for the "Create PDF files from Markdown sources in OSX" article

Create PDF files from Markdown sources in OSX

When [Markdown][markdown] appeared more than 10 years ago, it aimed to make it easier to express ideas in an easy-to-write plain text format. It offers a simple syntax that takes the writer focus away from the formatting, thus giving her time to focus on the actual content.

The market abunds of editors to be used for help with markdown. After a few attempts, I settled to Sublime and its browser preview plugin, which work great for me and have a small memory footprint to accomplish that. To pass the results around to other people, less technical, a markdown file and a bunch of images is not the best approach, so converting it to a more robust format like PDF seems like a much better choice.

[Pandoc][pandoc] is the swiss-army knife of converting documents between various formats. While being able to deal with heavy-weight formats like docx and epub, we will need it for the more lightweight markdown. To be able to generate PDF files, we need LaTeX. On OSX, the s

@kachayev
kachayev / concurrency-in-go.md
Last active March 11, 2024 11:27
Channels Are Not Enough or Why Pipelining Is Not That Easy
@madan712
madan712 / WindowsProcessKiller.java
Created October 31, 2013 08:44
Java program to kill a runnning windows process
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class WindowsProcessKiller {
// command used to get list of running task
private static final String TASKLIST = "tasklist";
// command used to kill a task
private static final String KILL = "taskkill /IM ";
@pamelafox
pamelafox / personalize.js
Created October 12, 2011 00:00
Javascript grammatical personalization library
/**
* A function that takes a template and grammatical options ('gender', 'person', 'name') and returns the computed string.
* See below for examples.
*
* See wikipedia for more on grammar:
* http://en.wikipedia.org/wiki/English_personal_pronouns
* http://en.wikipedia.org/wiki/Grammatical_conjugation
*/
function personalize(template, options) {
var GENDERS = ['neutral', 'female', 'male'];
@insin
insin / index.html
Last active February 8, 2024 15:14
Export a <table> to Excel - http://bl.ocks.org/insin/1031969
<!DOCTYPE html>
<html>
<head>
<title>tableToExcel Demo</title>
<script src="tableToExcel.js"></script>
</head>
<body>
<h1>tableToExcel Demo</h1>
<p>Exporting the W3C Example Table</p>