Chris Zubak-Skees chriszs

## README.md

      
              2 files
            
          
              69 forks
            
          
              9 comments
            
          
              407 stars
            
          
                dannguyen
                / README.md
            
            
              Last active
              May 17, 2024 02:07
            
              
                Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data
              
          
    Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.
The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.
On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:
####### 1. A low-resolution photo of road signs

  
## README.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              8 stars
            
          
                veltman
                / README.md
            
            
              Created
              October 10, 2016 16:08
            
              
                Geosupport w/ JS and node-ffi
              
          
    Geocoding 10,000 addresses a second with NYC's Geosupport library and Node FFI

Following on Chris Whong's excellent writeup of how to make calls directly to NYC's Geosupport client and this first attempt at generalizing it, here's a way that let me geocode about 10,000 addresses a second on Ubuntu using Node FFI.
Note: this assumes Ubuntu - other Linux is probably fine but may need adjustments.
First, install the basics:
# Update, install Node and unzip (if needed)

  
## .block
license: bsd-3-clause

## README.md

      
              2 files
            
          
              0 forks
            
          
              9 comments
            
          
              8 stars
            
          
                duner
                / README.md
            
            
              Last active
              April 28, 2022 19:48
            
              
                Twitter Archive to JSON
              
          
    If you download your personal Twitter archive, you don't quite get the data as JSON, but as a series of .js files, one for each month (there are meant to replicate the Twitter API respones for the front-end part of the downloadable archive.)
But if you want to be able to use the data in those files, which is far richer than the CSV data, for some analysis or app just run this script.
Run sh ./twitter-archive-to-json.sh in the same directory as the /tweets folder that comes with the archive download, and you'll get two files:

tweets.json — a JSON list of the objects
tweets_dict.json — a JSON dictionary where each Tweet's key is its id_str

You'll also get a /json-tweets directory which has the individual JSON files for each month of tweets.

  
## index.js
// Built-in modules
var csv = require("csv");
var fs = require("fs");
var url = require("url");

// Loaded from NPM
var $ = require("cheerio"); // jQuery-like DOM library
var async = require("async"); // Easier concurrency utils
var request = require("request"); // Make HTTP requests simply

## gi-lf
#!/bin/bash

# set max file size to include (in MB)
max_size_mb=100

max_size_b="$(($max_size_mb * 1000000))c"
git_dir="$(git rev-parse --show-toplevel)"
git_exclude=$git_dir/.git/info/exclude

files="$(find $git_dir -path $git_dir/.git -prune -o -type f -size +$max_size_b -print  | sed "s%$git_dir/%%g" | sed "s/\ /\\\ /g")"

## form_baseline.py
from collections import Counter

import pandas as pd

df = pd.read_hdf('training.h5')
g = df.groupby('slug')


def get_sample(slug):
    return df.ix[g.groups[slug]]

## optimization.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              12 stars
            
          
                tmcw
                / optimization.md
            
            
              Last active
              February 14, 2021 14:38
            
              
                Optimization
              
          
    Optimization

Correctly prioritizing and targeting performance problems and optimization opportunities is one of the hardest things to master in programming. There are a lot of ways to do it wrong: by prematurely optimizing non-bottlenecks, or preferring fast solutions to clear solutions, or measuring problems incorrectly.
I'll try to summarize what I've learned about doing this right.
First, don't optimize until there's an issue. And issues should be defined as application issues: performance problems that are either detectable by the users (lag) or endanger the platform – i.e. problems that cause downtime, like out-of-memory issues. Until there's an issue, don't think about peformance at all: just solve the problem at hand, which is "creating value for the end-user," or some less-corporate translation of the same.
Second, only optimize with instruments. By instruments, I mean technology that lets you decipher which sub-part of the stack is the bottleneck. Let's say you see slowness around fet

  
## bowers_streaming_setup.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                jeremyjbowers
                / bowers_streaming_setup.md
            
            
              Last active
              March 8, 2024 12:22
            
              
                How I do video and audio in These Times
              
          
    Jeremy’s Streaming Setup

Camera and lighting


Sony Alpha A6000 with 16-50mm lens
Elgato Cam Link 4k
Micro HDMI to HDMI cable
AC power supply for Sony camera
2x Elgato Key Light Air

Audio


Audio-Technica AT875R shotgun condenser mic


## esm-package.md

      
              1 file
            
          
              206 forks
            
          
              7 comments
            
          
              2822 stars
            
          
                sindresorhus
                / esm-package.md
            
            
              Last active
              June 9, 2024 17:19
            
              
                Pure ESM package
              
          
    Pure ESM package

The package that linked you here is now pure ESM. It cannot be require()'d from CommonJS.
This means you have the following choices:

Use ESM yourself. (preferred)

  Use import foo from 'foo' instead of const foo = require('foo') to import the package. You also need to put "type": "module" in your package.json and more. Follow the below guide.
If the package is used in an async context, you could use await import(…) from CommonJS instead of require(…).
Stay on the existing version of the package until you can move to ESM.
	// Built-in modules
	var csv = require("csv");
	var fs = require("fs");
	var url = require("url");

	// Loaded from NPM
	var $ = require("cheerio"); // jQuery-like DOM library
	var async = require("async"); // Easier concurrency utils
	var request = require("request"); // Make HTTP requests simply
	#!/bin/bash

	# set max file size to include (in MB)
	max_size_mb=100

	max_size_b="$(($max_size_mb * 1000000))c"
	git_dir="$(git rev-parse --show-toplevel)"
	git_exclude=$git_dir/.git/info/exclude

	files="$(find $git_dir -path $git_dir/.git -prune -o -type f -size +$max_size_b -print \| sed "s%$git_dir/%%g" \| sed "s/\ /\\\ /g")"
	from collections import Counter

	import pandas as pd

	df = pd.read_hdf('training.h5')
	g = df.groupby('slug')


	def get_sample(slug):
	return df.ix[g.groups[slug]]