Skip to content

Instantly share code, notes, and snippets.

@sasamijp
Created October 25, 2014 08:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sasamijp/a997fcbef4e9c4c708a1 to your computer and use it in GitHub Desktop.
Save sasamijp/a997fcbef4e9c4c708a1 to your computer and use it in GitHub Desktop.
SSparserでパースされたSSがコーパスとして適しているか判定する
# -*- encoding: utf-8 -*-
class SSAnalyzer
def corpus?(ss)
sla = sentence_length_average(ss)
ctc = consecutive_talking_count(ss)
return false if ctc.nil?
ctc = ctc/ss.length.to_f
(ctc >= 0.8) or (ctc >= 0.4 and sla <= 20)
end
def sentence_length_average(ss)
len = ss.map{|v|v[:serif].length}
len.inject(0.0){|r,i| r+=i }/len.size
end
def consecutive_talking_count(ss)
conv = []
names = ss.map{|v|v[:name]}
start = -1
names.each_with_index do |v, l|
next if l < start
c = []
for i in 0..10000 do
unless [names[l+i], names[l+i+1]].reverse == [names[l+i+1], names[l+i+2]]
start = l+i
break
end
c << [names[l+i], names[l+i+1]]
end
conv << c
end
conv[0..-3].map{|v|v.length/2}.max
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment