Skip to content

Instantly share code, notes, and snippets.

View samzhang111's full-sized avatar

Sam Zhang samzhang111

View GitHub Profile
rom collections import defaultdict
class SuffixTree(object):
def __init__(self, key=None):
self.key = key
self.dict = {
'count': 0,
'children': {}
}
Please tag commit messages with one of the following:
API: an (incompatible) API change
BLD: change related to building numpy
BUG: bug fix
DEP: deprecate something, or remove a deprecated object
DEV: development tool or utility
DOC: documentation
ENH: enhancement
MAINT: maintenance commit (refactoring, typos, etc.)
@samzhang111
samzhang111 / nutch-site.xml
Created January 10, 2015 02:00
Deduplication with Nutch
<property>
<name>db.signature.class</name>
<value>org.apache.nutch.crawl.TextProfileSignature</value>
<description>The default implementation of a page signature. Signatures
created with this implementation will be used for duplicate detection
and removal.</description>
</property>
<property>
<name>db.signature.text_profile.min_token_len</name>
@samzhang111
samzhang111 / dbus.sh
Last active August 29, 2015 13:56
Dbus Buildpack for Heroku
#!/bin/sh
#forked from https://gist.github.com/ddollar/07d579a6621b3ddd7b6b/
# capture root dir
root=$(pwd)
# change into subdir of archive
cd $root/dbus-*
@samzhang111
samzhang111 / _.md
Created January 17, 2014 02:30
timeline.js