Filtering
* Sampling
* Top-N
* Simple Filter
* Bloom Filter ()
* Sampling
* Random Sampling
* Top 10
Summarization
* Couting
* Min/Max
* Statistics
* Index
Structural
* Combining data sets
for line in reader:
lines = re.split('\.| \! |\? \.\\n|\!\\n|\?\\n',line[4])
if len(lines) == 1 :
line[4] = lines[0]
writer.writerow(line)
lines = []
for line in reader:
line[4] = int(line[4])
lines.append(line)
from operator import itemgetter
lines = sorted(lines, key=itemgetter(4))
for line in lines[-10:]:
writer.writerow(line)
https://github.com/duf59/ud617/blob/master/mapper5.py
https://github.com/duf59/ud617/blob/master/reducer5.py
cat forum_node.tsv | ./mapper.py | sort -k 1 | ./reducer.py | grep -i fantastic