Skip to content

Instantly share code, notes, and snippets.

@ukyo
Created August 8, 2011 12:27
Show Gist options
  • Save ukyo/1131667 to your computer and use it in GitHub Desktop.
Save ukyo/1131667 to your computer and use it in GitHub Desktop.
make lda training data
#!/usr/bin/python
# coding: utf8
import sys
import re
sub = re.compile('\n').sub
def hoge(text):
fuga = sub('', text).split(',')
foo = {}
if len(fuga) < 5:
return
while(len(fuga) > 0):
n = fuga.count(fuga[0])
s = fuga[0]
for i in range(n): fuga.remove(s)
foo[s] = n
print ' '.join([str(k) + ' ' + str(n) for k, n in foo.items()])
for line in sys.stdin:
hoge(line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment