Skip to content

Instantly share code, notes, and snippets.

@negipo
Created September 1, 2013 00:09
Show Gist options
  • Save negipo/6401463 to your computer and use it in GitHub Desktop.
Save negipo/6401463 to your computer and use it in GitHub Desktop.
ツイートから駄洒落を生成するやつ
require 'mecab'
require 'oj'
require 'pry'
RE_A = /[アカサタナハマヤラワ]/
RE_I = /[イキシチニヒミリ]/
RE_U = /[ウクスツヌフムユル]/
RE_E = /[エケセテネヘメレ]/
RE_O = /[オコソトノホモヨロオヲ]/
REGEXP = /\A#{RE_I}ム\Z/
TARGET = 'ビム'
def main
@mecab = MeCab::Tagger.new()
files = Dir.glob('/Users/po/Dropbox/Public/t/data/js/tweets/*.js')
files.each do |file|
tweets = load_tweet(file)
reduce_tweets(tweets)
end
end
def reduce_tweets(tweets)
tweets.each do |tweet|
begin
tags = @mecab.parse(tweet).force_encoding('UTF-8').split(/\n/).map{ |tag| tag.split(/,/) }
if tags.any? { |tag| REGEXP === tag.last }
puts tags.map { |tag|
if REGEXP === tag.last
TARGET
else
tag.first.split(/\t/).first
end
}.reject{ |s|
s === 'EOS'
}.join
end
rescue
end
end
end
def load_tweet(file)
str = open(file).read
str.sub!(/\A.*\n/, '')
Oj.load(str).map{|json| json['text'] }
end
main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment