Oliver oliver006

## things-i-believe.md

      
              1 file
            
          
              38 forks
            
          
              44 comments
            
          
              758 stars
            
          
                stettix
                / things-i-believe.md
            
            
              Last active
              July 10, 2024 23:00
            
              
                Things I believe
              
          
    Things I believe

This is a collection of the things I believe about software development. I have worked for years building backend and data processing systems, so read the below within that context.
Agree? Disagree? Feel free to let me know at @JanStette. See also my blog at www.janvsmachine.net.
Fundamentals

Keep it simple, stupid. You ain't gonna need it.

  
## attributes.rb
default['sshd']['sshd_config']['AuthenticationMethods'] = 'publickey,keyboard-interactive:pam'
default['sshd']['sshd_config']['ChallengeResponseAuthentication'] = 'yes'
default['sshd']['sshd_config']['PasswordAuthentication'] = 'no'

## gzip.go
package main

import (
        "net/http"
        "compress/gzip"
        "io/ioutil"
        "strings"
        "sync"
        "io"
)

## README.md

      
              2 files
            
          
              69 forks
            
          
              9 comments
            
          
              406 stars
            
          
                dannguyen
                / README.md
            
            
              Last active
              July 6, 2024 16:36
            
              
                Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data
              
          
    Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.
The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.
On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:
####### 1. A low-resolution photo of road signs

  
## frag32.py
#!/usr/bin/env python

import random
import struct
import sys

# Most of the Fat32 class was cribbed from https://gist.github.com/jonte/4577833

def ppNum(num):
  return "%s (%s)" % (hex(num), num)

## 1-react-websockets-reflux.md

      
              5 files
            
          
              8 forks
            
          
              2 comments
            
          
              52 stars
            
          
                danawoodman
                / 1-react-websockets-reflux.md
            
            
              Last active
              September 15, 2021 14:48
            
              
                Using WebSockets with Reflux and React
              
          
    WebSockets + Reflux + React

Using WebSockets, React and Reflux together can be a beautiful thing, but the intial setup can be a bit of a pain. The below examples attempt to offer one (arguably enjoyable) way to use these tools together.
Overview

This trifect works well if you think of things like so:

Reflux Store: The store fetches, updates and persists data. A store can be a list of items or a single item. Most of the times you reach for this.state in react should instead live within stores. Stores can listen to other stores as well as to events being fired.
Reflux Actions: Actions are triggered by components when the component wants to change the state of the store. A store listens to actions and can listen to more than one set of actions.


## cloud-config.yml
#cloud-config

coreos:
  etcd:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new
    discovery: https://discovery.etcd.io/<token>
    # multi-region deployments, multi-cloud deployments, and droplets without
    # private networking need to use $public_ipv4
    addr: $private_ipv4:4001
    peer-addr: $private_ipv4:7001

## gist:5998034874b119fab0e4

      
              1 file
            
          
              9 forks
            
          
              1 comment
            
          
              42 stars
            
          
                filipbec
                / gist:5998034874b119fab0e4
            
            
              Created
              September 5, 2014 12:31
            
              
                Scannr - Keys for obtaining US Driver's license data
              
          
    Keys for obtaining US Driver's license data

Standard for US Driver's Licenses defines 9 different barcode standards (AAMVA versions) with over 80 different fields encoded inside a barcode. Some fields exist on all barcode standards, other exist only on some. To standardize the API, we have structured the fields in the following sections:

Determining AAMVA version
Keys existing on all barcode versions

Mandatory values

Personal data
License data


Optional values


## gist:8172796

      
              1 file
            
          
              404 forks
            
          
              23 comments
            
          
              1645 stars
            
          
                debasishg
                / gist:8172796
            
            
              Last active
              July 5, 2024 11:53
            
              
                A collection of links for streaming algorithms and data structures
              
          
    General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep=rep1&amp;t
	default['sshd']['sshd_config']['AuthenticationMethods'] = 'publickey,keyboard-interactive:pam'
	default['sshd']['sshd_config']['ChallengeResponseAuthentication'] = 'yes'
	default['sshd']['sshd_config']['PasswordAuthentication'] = 'no'
	package main

	import (
	"net/http"
	"compress/gzip"
	"io/ioutil"
	"strings"
	"sync"
	"io"
	)
	#!/usr/bin/env python

	import random
	import struct
	import sys

	# Most of the Fat32 class was cribbed from https://gist.github.com/jonte/4577833

	def ppNum(num):
	return "%s (%s)" % (hex(num), num)
	#cloud-config

	coreos:
	etcd:
	# generate a new token for each unique cluster from https://discovery.etcd.io/new
	discovery: https://discovery.etcd.io/<token>
	# multi-region deployments, multi-cloud deployments, and droplets without
	# private networking need to use $public_ipv4
	addr: $private_ipv4:4001
	peer-addr: $private_ipv4:7001