Skip to content

Instantly share code, notes, and snippets.

@miwarin
Created April 30, 2014 11:15
Show Gist options
  • Save miwarin/9b7ee57c19592d1a5265 to your computer and use it in GitHub Desktop.
Save miwarin/9b7ee57c19592d1a5265 to your computer and use it in GitHub Desktop.
N-gram
#!/usr/bin/ruby -Ku
# ref. Ruby1.9でUTF-8の漢字だけ正規表現でヒットさせる - 屑プログラマの憂鬱
# http://d.hatena.ne.jp/Artisan/20120826/1345990754]]
def ngram(gram = 2, text)
ngrams ||= []
len = text.length - 1
0.upto(len) {|i|
if i + gram > text.length
return ngrams
end
t = text[i, gram]
if t[0] =~ /\p{Han}/
ngrams << t
end
}
return ngrams
end
def main(argv)
gram = 3
text = <<-EOS
いいか、忘れんな。
おまえを信じろ。
おれが信じるおまえでもない。
おまえが信じる俺でもない。
おまえが信じる、おまえを信じろ!"
EOS
puts ngram(gram, text)
end
main(ARGV)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment