Skip to content

Instantly share code, notes, and snippets.

@suma
Created May 11, 2012 11:16
Show Gist options
  • Save suma/2659036 to your computer and use it in GitHub Desktop.
Save suma/2659036 to your computer and use it in GitHub Desktop.
Convert MIST for Jubatus Ruby Client see next slide http://www.slideshare.net/suma_/jubatus-12892694
class MIST
def initialize(file)
@bags = []
File.open(file) {|file|
# collect LEVEL 1 each threads
bag = []
line = file.gets
pos = line.index('thread')
if pos
pos = line.index('thread')
thread_num = '0001'
end
while line = file.gets
pos = line.index('thread')
if pos
push(thread_num, bag)
bag = []
thread_num = line.slice!(pos + 7, 4)
else
bag << line.slice!(0, 5).delete!(' ')
end
end
push(thread_num, bag)
}
end
def push(thread_num, bag)
@bags << [thread_num, bag.join(' ')]
end
# convert fv
def to_datum
@bags
end
# datum = message datum {
# 0: list<tuple<string, string> > sv
# 1: list<tuple<string, double> > nv
# }
end
require './mist'
sha1 = '0e947e116aabcf6668143fcdfc9268f3608ff084'
label = 'LDPINCH'
path = './malheur/reference/refset/0e947e116aabcf6668143fcdfc9268f3608ff084.LDPINCH'
mist = MIST.new(path)
p mist.to_datum
@suma
Copy link
Author

suma commented May 11, 2012

store Reference Datasets to ./malheur/reference/refset/
$ ruby mist_test.rb
[["0001", "0201 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0a01 1003 1001 1002 1002 1004 1002 1002 1004 0e03"], ["0001", "0201 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0202 0202 0e01 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0d01 030a 0902 0905 0202 0202 0303 0301 0a05 0d01 0303 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0d01 0303 1002 1002 1002 1002 1002 0902 0909 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0702 0a05 0a07 1001 0c01 1001 0c01 1001 1001 1003 1001 0c01 1001 0c01 1001 0c01 1001 0c01 1001 1001 0a04"]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment