Created
May 11, 2012 11:16
-
-
Save suma/2659036 to your computer and use it in GitHub Desktop.
Convert MIST for Jubatus Ruby Client see next slide http://www.slideshare.net/suma_/jubatus-12892694
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class MIST | |
def initialize(file) | |
@bags = [] | |
File.open(file) {|file| | |
# collect LEVEL 1 each threads | |
bag = [] | |
line = file.gets | |
pos = line.index('thread') | |
if pos | |
pos = line.index('thread') | |
thread_num = '0001' | |
end | |
while line = file.gets | |
pos = line.index('thread') | |
if pos | |
push(thread_num, bag) | |
bag = [] | |
thread_num = line.slice!(pos + 7, 4) | |
else | |
bag << line.slice!(0, 5).delete!(' ') | |
end | |
end | |
push(thread_num, bag) | |
} | |
end | |
def push(thread_num, bag) | |
@bags << [thread_num, bag.join(' ')] | |
end | |
# convert fv | |
def to_datum | |
@bags | |
end | |
# datum = message datum { | |
# 0: list<tuple<string, string> > sv | |
# 1: list<tuple<string, double> > nv | |
# } | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require './mist' | |
sha1 = '0e947e116aabcf6668143fcdfc9268f3608ff084' | |
label = 'LDPINCH' | |
path = './malheur/reference/refset/0e947e116aabcf6668143fcdfc9268f3608ff084.LDPINCH' | |
mist = MIST.new(path) | |
p mist.to_datum |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
store Reference Datasets to ./malheur/reference/refset/
$ ruby mist_test.rb
[["0001", "0201 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0a01 1003 1001 1002 1002 1004 1002 1002 1004 0e03"], ["0001", "0201 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0202 0202 0e01 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0202 0d01 030a 0902 0905 0202 0202 0303 0301 0a05 0d01 0303 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 1002 0d01 0303 1002 1002 1002 1002 1002 0902 0909 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0902 0905 0702 0a05 0a07 1001 0c01 1001 0c01 1001 1001 1003 1001 0c01 1001 0c01 1001 0c01 1001 0c01 1001 1001 0a04"]]