Forked from lsloan/ Convert video subtitles from YouTube XML format to SubRip (.srt)
Last active
June 7, 2018 22:31
-
-
Save briankung/ea885b968a642549e638191f43d6c7c7 to your computer and use it in GitHub Desktop.
Convert XML Youtube subtitles to SubRip (srt) from YouTube ID or XML file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Convert XML Youtube subtitles to SubRip (srt) format | |
# To download the subtitle in XML, put the code of the Youtube video | |
# at the end of the next url: | |
# http://video.google.com/timedtext?hl=en&lang=en&v= | |
# Usage: | |
# | |
# $ ruby youtube2srt.rb [input_filename] [output_filename] | |
# | |
# Where input_filename can be either the name of your xml file | |
# (probably timedtext.xml) or the hashid of your YouTube video. | |
# The output filename is optional. | |
require 'rubygems' | |
require 'nokogiri' | |
require 'uri' | |
require 'net/http' | |
BASE_URL = 'http://video.google.com/timedtext?hl=en&lang=en&v=' | |
source_filename = ARGV[0] | |
output_filename = ARGV[1] | |
TIME_FORMAT = '%02H:%02M:%02S,%3N' | |
def create_srt output_filename, source_filename, source_file | |
File.open(output_filename || source_filename.gsub('.xml', '').concat('.srt'), 'w+') do |srt_file| | |
source_file.css('text').to_enum.with_index(1) do |sub, i| | |
start_time = Time.at(sub['start'].to_f).utc | |
end_time = start_time + sub['dur'].to_f | |
srt_file.write(<<~CAPTION | |
#{i} | |
#{start_time.strftime(TIME_FORMAT)} --> #{end_time.strftime(TIME_FORMAT)} | |
#{Nokogiri::HTML.parse(sub.text).text} | |
CAPTION | |
) | |
end | |
end | |
end | |
if source_filename =~ /\.xml$/i | |
source_file = Nokogiri::XML(open(source_filename), &:noblanks) | |
create_srt(output_filename, source_filename, source_file) | |
puts "xml file #{source_filename} converted to srt" | |
else | |
response = Net::HTTP.get_response URI.parse(BASE_URL+source_filename) | |
if response.code_type.ancestors.include?(Net::HTTPSuccess) | |
source_file = Nokogiri::XML(response.body, &:noblanks) | |
create_srt(output_filename, source_filename, source_file) | |
puts 'Google timedtext.xml converted to srt' | |
else | |
puts "Couldn't find a srt file for #{source_filename} at #{BASE_URL + source_filename}" | |
end | |
end | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks, this helped me today!