Skip to content

Instantly share code, notes, and snippets.

@gregglind
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gregglind/9491f6b1ab15545e156f to your computer and use it in GitHub Desktop.
Save gregglind/9491f6b1ab15545e156f to your computer and use it in GitHub Desktop.
"""
from fileinput or stdin, decorate lines with userid, ts for later sort.
Full usage:
hadoop dfs -text /bagheera/testpilot_contextfeaturerecommender/*/* | python dsu.py | sort -k 1,2 | cut -f 3- | less -S
(decorate-sort-undecorate pattern sorting)
"""
import fileinput
import json
for l in fileinput.input():
try:
payload = l.split("\t",1)[1]
payload = json.loads(json.loads(payload)["dp"])
print "\t".join(map(str,[payload["userid"], payload["ts"], l.rstrip()]))
except:
continue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment