Skip to content

Instantly share code, notes, and snippets.

@archie
Created April 19, 2012 16:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save archie/2422153 to your computer and use it in GitHub Desktop.
Save archie/2422153 to your computer and use it in GitHub Desktop.
Playcount.py
import sys
import random
import math
if len(sys.argv) != 3:
print "Usage: ./playcount.py <file_to_parse> <output_counts_to_file>"
exit(1)
countmap = {}
# count occurrences per user-video pair
handle = open(sys.argv[1], 'r')
for line in handle.readlines():
parsedline = line.strip().split()
key = (int(parsedline[1]), int(parsedline[2]))
if key in countmap:
countmap[key] += 1
else:
countmap[key] = 1
handle.close()
# create a testing set and let the remaining be the training set
samplemap = dict([(s, countmap.pop(s)) for s in random.sample(countmap, int(math.ceil(len(countmap)*0.25)))])
# store everything
def writefile(filename, data):
outhandle = open(filename, 'w')
for (user,video),value in data.iteritems():
outhandle.write("%d %d %d\n" % (useridmap[user], videoidmap[video], value))
outhandle.close()
writefile("%s-test" % sys.argv[2], samplemap)
writefile("%s-train" % sys.argv[2], countmap)
@uppfinnarjohnny
Copy link

Jag okynnesrefaktorerade lite. :-D

@archie
Copy link
Author

archie commented Apr 30, 2012

Oj, missade din kommentar helt. Vad refaktorerade du sa du?

Edit: ah, såg din fork nu. Snyggt. Tack för tipset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment