Skip to content

Instantly share code, notes, and snippets.

View hvtuananh's full-sized avatar
🐶

Tuan Anh Hoang-Vu hvtuananh

🐶
View GitHub Profile
@hvtuananh
hvtuananh / cities.json
Created October 17, 2016 21:21 — forked from Miserlou/cities.json
1000 Largest US Cities By Population With Geographic Coordinates, in JSON
[
{
"city": "New York",
"growth_from_2000_to_2013": "4.8%",
"latitude": 40.7127837,
"longitude": -74.0059413,
"population": "8405837",
"rank": "1",
"state": "New York"
},

Keybase proof

I hereby claim:

  • I am hvtuananh on github.
  • I am hvtuananh (https://keybase.io/hvtuananh) on keybase.
  • I have a public key whose fingerprint is 4D67 56F5 E786 100E 244F B967 6073 1E59 AE10 2257

To claim this, I am signing this object:

from multiprocessing import Pool, cpu_count
import sys
import csv
from utils import Entity
import pickle
file1 = sys.argv[1]
fields1 = map(int, sys.argv[2].split(','))
file2 = sys.argv[3]
fields2 = map(int, sys.argv[4].split(','))
import re
from difflib import SequenceMatcher
pattern=re.compile("[^\w']")
def gen_signature(string):
string = string.lower()
string = pattern.sub(' ', string)
@hvtuananh
hvtuananh / us_address_abbreviations.txt
Created June 18, 2014 22:45
This is the common US address abbreviation. CSV format. The first element is the abbr form. The second element is the full form. Taken from http://pe.usps.gov/text/pub28/28apc_002.htm
ALLEE,ALLEY
ALLEY,ALLEY
ALLY,ALLEY
ALY,ALLEY
ANEX,ANNEX
ANNEX,ANNEX
ANNX,ANNEX
ANX,ANNEX
ARC,ARCADE
ARCADE,ARCADE
@hvtuananh
hvtuananh / unicode_csv.py
Created February 5, 2014 22:36
Python Unicode CSV Reader/Writer (fix writerow problem in Python docs)
#http://docs.python.org/2.7/library/csv.html
import csv, codecs, cStringIO
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)