Skip to content

Instantly share code, notes, and snippets.

@sasamijp
Created September 28, 2014 11:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sasamijp/4df8193b80c4007dd185 to your computer and use it in GitHub Desktop.
Save sasamijp/4df8193b80c4007dd185 to your computer and use it in GitHub Desktop.
SSから会話コーパスへの変換をするクラス
# -*- encoding: utf-8 -*-
class SSparser
def parse(body)
(body.count('『') > body.count('「')) ?
body.gsub('「', '「').gsub('『', '「').gsub('』', '」').gsub('」', '」') :
body.gsub('「', '「').gsub('」', '」')
body = body.split("\n").delete_if{|v|v.nil?}
ss = []
in_reply_to = nil
in_reply_to_char = nil
body.delete_if{|v|(!v.include?('「') and !v.include?('」'))}.each do |str|
n_s = split_name_serif(str)
next if n_s.nil?
hash = {
name: n_s[0],
serif: n_s[1],
in_reply_to: in_reply_to,
in_reply_to_char: in_reply_to_char }
(hash[:in_reply_to_char] == hash[:name]) ? hash[:in_reply_to] = nil : in_reply_to = hash[:serif]
in_reply_to_char = hash[:name]
ss.push hash
end
ss.map!{|v|{name: v[:name], serif: v[:serif], in_reply_to: v[:in_reply_to]}}
ss.delete_if{|hash| hash[:name].nil? or hash[:serif].nil? or hash[:in_reply_to].nil?}
end
def split_name_serif(str)
begin
name = str[0..str.rindex('「')-1].gsub(' ', '').gsub(" ", '')
[name, str.sub('「', '').reverse.sub('」', '').reverse.sub(name, '')]
rescue
nil
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment