Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save briankung/ea885b968a642549e638191f43d6c7c7 to your computer and use it in GitHub Desktop.
Save briankung/ea885b968a642549e638191f43d6c7c7 to your computer and use it in GitHub Desktop.
Convert XML Youtube subtitles to SubRip (srt) from YouTube ID or XML file
# Convert XML Youtube subtitles to SubRip (srt) format
# To download the subtitle in XML, put the code of the Youtube video
# at the end of the next url:
# http://video.google.com/timedtext?hl=en&lang=en&v=
# Usage:
#
# $ ruby youtube2srt.rb [input_filename] [output_filename]
#
# Where input_filename can be either the name of your xml file
# (probably timedtext.xml) or the hashid of your YouTube video.
# The output filename is optional.
require 'rubygems'
require 'nokogiri'
require 'uri'
require 'net/http'
BASE_URL = 'http://video.google.com/timedtext?hl=en&lang=en&v='
source_filename = ARGV[0]
output_filename = ARGV[1]
TIME_FORMAT = '%02H:%02M:%02S,%3N'
def create_srt output_filename, source_filename, source_file
File.open(output_filename || source_filename.gsub('.xml', '').concat('.srt'), 'w+') do |srt_file|
source_file.css('text').to_enum.with_index(1) do |sub, i|
start_time = Time.at(sub['start'].to_f).utc
end_time = start_time + sub['dur'].to_f
srt_file.write(<<~CAPTION
#{i}
#{start_time.strftime(TIME_FORMAT)} --> #{end_time.strftime(TIME_FORMAT)}
#{Nokogiri::HTML.parse(sub.text).text}
CAPTION
)
end
end
end
if source_filename =~ /\.xml$/i
source_file = Nokogiri::XML(open(source_filename), &:noblanks)
create_srt(output_filename, source_filename, source_file)
puts "xml file #{source_filename} converted to srt"
else
response = Net::HTTP.get_response URI.parse(BASE_URL+source_filename)
if response.code_type.ancestors.include?(Net::HTTPSuccess)
source_file = Nokogiri::XML(response.body, &:noblanks)
create_srt(output_filename, source_filename, source_file)
puts 'Google timedtext.xml converted to srt'
else
puts "Couldn't find a srt file for #{source_filename} at #{BASE_URL + source_filename}"
end
end
@benlieb
Copy link

benlieb commented Apr 30, 2018

Thanks, this helped me today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment