Skip to content

Instantly share code, notes, and snippets.

@jwf-zz
jwf-zz / imdb-sentiment-vw.sh
Last active March 5, 2019 00:20
Sentiment analysis on an IMDB dataset using Vowpal Wabbit
#!/bin/bash
# Requires vw (https://github.com/JohnLangford/vowpal_wabbit/wiki/),
# the IMDB dataset (http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz),
# and the perf utility from http://osmot.cs.cornell.edu/kddcup/software.html.
cat aclImdb/train/labeledBow.feat | \
sed -n 's/^\([7-9]\|10\)\s/&/p' | \
sed -e "s/^\([7-9]\|10\)\s//" | \
awk '{ print "1 '"'"'pos_" (NR-1) " |features " $0}' > train.vw
@jwf-zz
jwf-zz / print_words.py
Created July 6, 2012 19:36
Print words with largest weights.
#!/usr/bin/env python
import sys
Dict = []
with open('aclImdb/imdb.vocab','r') as f:
for line in f:
Dict.append(line.strip())
with open('audit.log','r') as f:
f.readline()