Skip to content

Instantly share code, notes, and snippets.

@greeness
greeness / whirr
Created November 10, 2011 17:12
Install whirr cdh3 release
# launch an ec2 instance with lucid (ubuntu 10.04) e.g. ami-ad36fbc4
# ssh to the machine
################################################################
# install java
# https://ccp.cloudera.com/display/CDHDOC/Java+Development+Kit+Installation
# RELEASE=lucid, which you can find by running lsb_release -c.
################################################################
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
@greeness
greeness / dumbo example
Created November 16, 2011 04:36
dumbo running command line using cache file in hdfs
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users/part-m-00000
### piece of code in demo_dumbo.py
for line in file('part-m-00000'):
print line
# ----------------
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users
@greeness
greeness / gist:1384478
Created November 22, 2011 00:28
create egg file from boto.tar.gz
http://mrtopf.de/blog/en/a-small-introduction-to-python-eggs/
sudo apt-get install python-setuptools
python setup.py bdist_egg
@greeness
greeness / gist:1386945
Created November 22, 2011 21:01
create egg file for scipy
sudo apt-get install gfortran libblas-dev liblapack-dev
cd scipy-0.10.0
python setupegg.py bdist_egg
# wait about 5 minutes on my machine
@greeness
greeness / gist:1392524
Created November 24, 2011 23:42
avoid ssh known host prompt
ssh -o "StrictHostKeyChecking no" user@host
@greeness
greeness / gist:1410425
Created November 30, 2011 19:28
sentiment analysis notes

Challenges

  • query classification: to tell if a text is a review or opinion

  • which documents or portions of documents contain review-like or opinionated material.

  • identifying the overall sentiment expressed

  • the system needs to present the sentiment information it has garnered in some reasonable summary fashion.

  • aggregation of "votes" that may re registered on different scales

npm install
nodeunit tests/recommender_tests.coffee
@greeness
greeness / gist:1483520
Created December 15, 2011 23:42
max number of arguments
http://www.in-ulm.de/~mascheck/various/argmax/
@greeness
greeness / gist:1498382
Created December 19, 2011 18:53
random projection lsh with random number from a pre-generated pool
import numpy
import math
# LSH signature generation using random projection
def get_signature(user_vector, rand_proj):
res = 0
for p in (rand_proj):
res = res << 1
val = numpy.dot(p, user_vector)
if val >= 0: