Skip to content

Instantly share code, notes, and snippets.

@nebuta
nebuta / webget.rb
Created November 8, 2011 07:17
Get google search result
require 'net/http'
require 'cgi'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require "resolv-replace"
require 'timeout'
BASE_URL = "http://www.google.com/search?"
LANG = "ja"
@nebuta
nebuta / webget.rb
Created November 8, 2011 07:18
Get google search result
require 'net/http'
require 'cgi'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require "resolv-replace"
require 'timeout'
BASE_URL = "http://www.google.com/search?"
LANG = "ja"
@nebuta
nebuta / aozoraget.rb
Created November 8, 2011 07:42
Get aozora bunko
#aozoraget.rb
require 'rubygems'
require 'hpricot'
require 'open-uri'
for i in 1..13
toc = "http://www.aozora.gr.jp/index_pages/sakuhin_a#{i}.html"
puts "Opening: " + toc
html = IO.read(toc)
@nebuta
nebuta / aozora.rb
Created November 8, 2011 07:47
Make a dictionary for ja encodings
require 'rubygems'
require 'hpricot'
require 'iconv'
$vector = Hash.new
$vector[:utf8] = Array.new(65536).fill(0)
$vector[:shiftjis] = Array.new(65536).fill(0)
$vector[:iso] = Array.new(65536).fill(0)
$vector[:eucjp] = Array.new(65536).fill(0)
@nebuta
nebuta / rfcascii.rb
Created November 8, 2011 07:48
Make a dictionary for ASCII
require 'rubygems'
require 'hpricot'
require 'iconv'
$vectorascii = Array.new(65536).fill(0)
$pwd = ""
def normalize
norm = 65536
@nebuta
nebuta / subtract.rb
Created November 8, 2011 07:50
Make dictionary data after subtraction of ASCII
#subtract.rb
def normalize
norm = 65536
$vector.each_key{|key|
sqsum = $vector[key].inject(0){|sum,e| sum += e*e}
p sqsum
factor = Math.sqrt(sqsum)
$vector[key].map!{|e| e.to_f * norm / factor}
}
@nebuta
nebuta / makecoursevectors.rb
Created November 8, 2011 07:56
Make dictionary data with lower resolution (=smaller size)
def normalize(v, norm)
ret = Hash.new
v.each_key{|key|
sqsum = v[key].inject(0){|sum,e| sum += e*e}
factor = Math.sqrt(sqsum)
ret[key] = v[key].map{|e| e.to_f * norm / factor}
}
ret
end
@nebuta
nebuta / encode_test.rb
Created November 8, 2011 08:40
Test algorithm
require 'rubygems'
require 'hpricot'
$asciilist = (0x20..0x7e).to_a | [0x09,0x0a,0x0c,0x0d]
def parse(lines)
arr = Array.new(65536)
start = 0
lines.each{|line|
arr[start,256]=line.chomp.split("\t").map{|e| e.to_f}
@nebuta
nebuta / encode_test_1d.rb
Created November 8, 2011 08:55
Test algorithm with 1d dictionary
require 'rubygems'
require 'hpricot'
$asciilist = (0x20..0x7e).to_a | [0x09,0x0a,0x0c,0x0d]
def isAscii?(b)
$asciilist.include? b
end
def parse1d(lines)
@nebuta
nebuta / 2dto1d.rb
Created November 8, 2011 08:57
Make 1D dictionary
#2dto1d.rb
def normalize(norm)
$vector.each_key{|key|
sqsum = $vector[key].inject(0){|sum,e| sum += e*e}
p sqsum
factor = Math.sqrt(sqsum)
$vector[key].map!{|e| e.to_f * norm / factor}
}
end