Skip to content

Instantly share code, notes, and snippets.

@lekevicius
Created March 14, 2013 14:02
Show Gist options
  • Save lekevicius/5161561 to your computer and use it in GitHub Desktop.
Save lekevicius/5161561 to your computer and use it in GitHub Desktop.
Clean subtitles and merge by year.
drop_words = ['subtitle', 'cd1', 'cd2', 'kbps', 'transc', 'subed', 'distrib', 'synched', '.com']
(1962..2012).each do |year|
year_string = ""
Dir.glob("Scripts/#{ year }-*") do |file|
contents = File.read(file)
year_string += contents
year_string += "\n\n"
end
clean_year_string = ''
year_string.each_line do |line|
clean_line = true
line = line.strip
clean_line = false if line == ''
lowerline = line.downcase
drop_words.each do |word|
clean_line = false if lowerline.include? word
end
clean_year_string += line if clean_line
end
f = File.new("years/#{ year }.txt", 'w')
f.write(year_string)
f.close
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment