Skip to content

Instantly share code, notes, and snippets.

View wanghaisheng's full-sized avatar
🏠
Working from home

HeisenBerg? wanghaisheng

🏠
Working from home
View GitHub Profile
@seanh
seanh / formatFilename.py
Created April 11, 2009 18:30
Turn any string into a valid filename in Python.
def format_filename(s):
"""Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
removed. Also spaces are replaced with underscores.
Note: this method may produce invalid filenames such as ``, `.` or `..`
When I use this method I prepend a date string like '2009_01_15_19_46_32_'
and append a file extension like '.txt', so I avoid the potential of using
an invalid filename.
@mnot
mnot / uri_validate.py
Last active March 3, 2022 14:32
uri_validate.py: Validation regex for URIs, URI references, and relative URIs
#!/usr/bin/env python
"""
Regex for URIs
These regex are directly derived from the collected ABNF in RFC3986
(except for DIGIT, ALPHA and HEXDIG, defined by RFC2234).
Additional regex are defined to validate the following schemes according to
their respective specifications:
i
me
my
myself
we
our
ours
ourselves
you
your
@entaroadun
entaroadun / gist:1653794
Created January 21, 2012 20:10
Recommendation and Ratings Public Data Sets For Machine Learning

Movies Recommendation:

Music Recommendation:

@jbenet
jbenet / simple-git-branching-model.md
Last active June 17, 2024 14:53
a simple git branching model

a simple git branching model (written in 2013)

This is a very simple git workflow. It (and variants) is in use by many people. I settled on it after using it very effectively at Athena. GitHub does something similar; Zach Holman mentioned it in this talk.

Update: Woah, thanks for all the attention. Didn't expect this simple rant to get popular.

@acdha
acdha / custom-log-filtering-and-formatting.py
Created February 26, 2014 21:19
Example of how to filter or apply custom formatting using Python's logging library
#!/usr/bin/env python
# encoding: utf-8
from pprint import pformat, pprint
import logging
class PasswordMaskingFilter(logging.Filter):
"""Demonstrate how to filter sensitive data:"""
@jmandel
jmandel / design.md
Last active January 8, 2020 18:19
Desiging a framework for triggered notifications in FHIR

Proposal: Triggered Notifications

Use case: subscribing to specific lab observations

FHIR offers a REST API that lets clients search for resources on demand. Separately, there is a Messaging API that allows notifications to be "pushed" from one place to another. But neither API provides a clean solution to a common set of real-world "triggering" or notification-type requirements.

For example, let's say Mt. Auburn Hospital's Mother and Infant Unit wants to

@moshekaplan
moshekaplan / TextHandler.py
Last active February 25, 2024 01:33
A Logging Handler that allows logging to a Tkinter Text Widget
#!/usr/bin/env python
# Built-in modules
import logging
import Tkinter
import threading
class TextHandler(logging.Handler):
"""This class allows you to log to a Tkinter Text or ScrolledText widget"""
def __init__(self, text):
@loderunner
loderunner / 01-mac-profiling.md
Last active July 12, 2024 03:52
Profiling an application in Mac OS X

Profiling an application in Mac OS X

Finding which process to profile

If your system is running slowly, perhaps a process is using too much CPU time and won't let other processes run smoothly. To find out which processes are taking up a lot of CPU time, you can use Apple's Activity Monitor.

The CPU pane shows how processes are affecting CPU (processor) activity:

@dannguyen
dannguyen / README.md
Last active July 6, 2024 16:36
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs