Skip to content

Instantly share code, notes, and snippets.

@KenshoFujisaki
Forked from zakuroishikuro/printer.rb
Last active August 29, 2015 14:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KenshoFujisaki/f682e0cf1f38e6289a9f to your computer and use it in GitHub Desktop.
Save KenshoFujisaki/f682e0cf1f38e6289a9f to your computer and use it in GitHub Desktop.
Googleからダウンロードできる検索履歴のjsonを月ごとの検索単語でヒストグラム化
# https://history.google.com/history/
# 上のリンクのギアアイコンからダウンロードできる検索履歴のzipを全て表示するやつ
require 'kconv'
require 'json'
require 'cgi'
zip_path = ARGV[0]
raise "Googleからダウンロードした.zipファイルを指定してください。" unless /検索-20.*\.zip/ === zip_path
def get_jsons_in_zip(zip_path)
`unzip -c #{zip_path} *.json`.toutf8.each_line.select{|line| /^{/ === line}
end
def parse(json)
JSON.parse(json)["event"].map do |item|
query = item["query"]
time = Time.at(*query["id"][0]["timestamp_usec"].to_i.divmod(10**6))
text = CGI.unescapeHTML query["query_text"]
{time:time, text:text}
end
end
def sprintf_m(string, size)
padding_size = size - string.each_char.map{|c| c.bytesize == 1 ? 1 : 2}.reduce(0, &:+)
padding_size = 0 if size < 0 || padding_size < 0
' ' * padding_size + string
end
get_jsons_in_zip(zip_path).map(&method(:parse)).flatten.
group_by{|q| q[:time].strftime("%Y年%m月")}.sort.each do |month, queries|
bow = {}
queries.map{|q|
q[:text].split(/\ +| +/).map{|word| word.downcase}.map{|word|
bow[word] = 0 unless bow.has_key?(word)
bow[word] += 1
}
}
puts "# #{month} #{queries.size}件",
bow.sort{|(k1,v1),(k2,v2)| v2<=>v1}.map{|e| sprintf_m(e[0]+":"+e[1].to_s, 30) + "*" * e[1]}.take(10)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment