Skip to content

Instantly share code, notes, and snippets.

Avatar

Sam Zhang samzhang111

View GitHub Profile
View suffix_tree.py, and suffix_tree_test.py
rom collections import defaultdict
class SuffixTree(object):
def __init__(self, key=None):
self.key = key
self.dict = {
'count': 0,
'children': {}
}
View gist:c79767172759b5e3b918
Please tag commit messages with one of the following:
API: an (incompatible) API change
BLD: change related to building numpy
BUG: bug fix
DEP: deprecate something, or remove a deprecated object
DEV: development tool or utility
DOC: documentation
ENH: enhancement
MAINT: maintenance commit (refactoring, typos, etc.)
@samzhang111
samzhang111 / nutch-site.xml
Created Jan 10, 2015
Deduplication with Nutch
View nutch-site.xml
<property>
<name>db.signature.class</name>
<value>org.apache.nutch.crawl.TextProfileSignature</value>
<description>The default implementation of a page signature. Signatures
created with this implementation will be used for duplicate detection
and removal.</description>
</property>
<property>
<name>db.signature.text_profile.min_token_len</name>
@samzhang111
samzhang111 / dbus.sh
Last active Aug 29, 2015
Dbus Buildpack for Heroku
View dbus.sh
#!/bin/sh
#forked from https://gist.github.com/ddollar/07d579a6621b3ddd7b6b/
# capture root dir
root=$(pwd)
# change into subdir of archive
cd $root/dbus-*
View _.md
You can’t perform that action at this time.