Skip to content

Instantly share code, notes, and snippets.

View debovis's full-sized avatar

John DeBovis debovis

View GitHub Profile
{
"processorName": "contract-processor1",
"versionName": "pretrained-contract-processor-2022-10-07-104243",
"isFuzzyMatch": true,
"createTime": "2022-10-07T07:18:16.349388Z",
"metrics": {
"All labels": {
"f1Score": 0.7778220981252493,
"precision": 0.7751937984496124,
"recall": 0.7804682809685811,
· · ·MEETING OF THE
· · · · · · MARYLAND BOAT ACT ADVISORY COMMITTEE
· · · · · · · · · · * * * * * * * *
· · · · · · ·The above-entitled matter came on for
· meeting on Tuesday, August 21st, 2013 commencing at
· 10:00 a.m., at the Kent Island Yacht Club, Chester,
· Maryland, Coles Marsh, committee chairman,
@debovis
debovis / gist:e9acb3d5d652c1928f09a161f136345b
Created March 2, 2018 15:56 — forked from madis/gist:4650014
Testing CORS OPTIONS request with curl
curl \
--verbose \
--request OPTIONS \
http://localhost:3001/api/configuration/visitor \
--header 'Origin: http://localhost:9292' \
--header 'Access-Control-Request-Headers: Origin, Accept, Content-Type' \
--header 'Access-Control-Request-Method: GET'
# http://nils-blum-oeste.net/cors-api-with-oauth2-authentication-using-rails-and-angularjs/#.UQJeLkp4ZyE
@debovis
debovis / package.json
Last active January 16, 2024 14:13
How to debug gatsby and reactjs with webstorm
{
"name": "project-name",
"version": "1.0.0",
"description": "",
"main": "n/a",
"scripts": {
"serve": "gatsby develop -p 5000",
"dev": "node $NODE_DEBUG_OPTION ./node_modules/.bin/gatsby develop -p 5000",
}
}
@debovis
debovis / docker-compose.yml
Created August 9, 2017 20:43
ES docker compose 2.4.1
version: '2'
services:
elasticsearch:
image: library/elasticsearch:2.4.1
environment:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "http.host=0.0.0.0"
- "transport.host=127.0.0.1"
ports:
- "9200:9200"
@debovis
debovis / recover_source_code.md
Created March 12, 2017 12:03 — forked from simonw/recover_source_code.md
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb
@debovis
debovis / README.md
Created September 12, 2016 19:02 — forked from dannguyen/README.md
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

#!/usr/bin/python
def poorMansConvert(di, inPath, outType, outPath):
from apiclient.http import MediaFileUpload
valid_output = [
'text/html','text/plain','application/rtf','application/vnd.oasis.opendocument.text',\
'application/pdf','application/vnd.openxmlformats-officedocument.wordprocessingml.document',\
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet','application/x-vnd.oasis.opendocument.spreadsheet',\
'image/jpeg','image/png','image/svg+xml','application/vnd.openxmlformats-officedocument.presentationml.presentation'
from sklearn.metrics import classification_report
import matplotlib.pylab as plt
import numpy as np
class ts_classifier(object):
def __init__(self,plotter=False):
'''
preds is a list of predictions that will be made.
with app.app_context():
output = []
for rule in app.url_map.iter_rules():
methods = ','.join(rule.methods)
line = urllib.unquote("{:50s} {:20s} {}".format(rule.endpoint, methods, rule))
output.append(line)
for line in sorted(output):
print line