Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save lsloan/1327534 to your computer and use it in GitHub Desktop.
Save lsloan/1327534 to your computer and use it in GitHub Desktop.
A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
Gist title: "Convert video subtitles from YouTube XML format to SubRip (.srt)"
Summary: A Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
# Convert XML YouTube subtitles to SubRip (srt) format
# To download the subtitle in XML, put the ID of the YouTube video
# at the end of the url:
#
# http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__
# Usage:
#
# $ ruby youtube2srt.rb [input_filename] [output_filename]
#
# Where input_filename can be either the name of your xml file
# (probably timedtext.xml) or the hashid of your YouTube video.
# The output filename is optional.
require 'rubygems'
require 'nokogiri'
require 'uri'
require 'net/http'
BASE_URL = 'http://video.google.com/timedtext?hl=en&lang=en&v='
source_filename = ARGV[0]
output_filename = ARGV[1]
TIME_FORMAT = '%02H:%02M:%02S,%3N'
def create_srt output_filename, source_filename, source_file
File.open(output_filename || source_filename.gsub('.xml', '').concat('.srt'), 'w+') do |srt_file|
source_file.css('text').to_enum.with_index(1) do |sub, i|
start_time = Time.at(sub['start'].to_f).utc
end_time = start_time + sub['dur'].to_f
srt_file.write(<<~CAPTION
#{i}
#{start_time.strftime(TIME_FORMAT)} --> #{end_time.strftime(TIME_FORMAT)}
#{Nokogiri::HTML.parse(sub.text).text}
CAPTION
)
end
end
end
if source_filename =~ /\.xml$/i
source_file = Nokogiri::XML(open(source_filename), &:noblanks)
create_srt(output_filename, source_filename, source_file)
puts "xml file #{source_filename} converted to srt"
else
response = Net::HTTP.get_response URI.parse(BASE_URL+source_filename)
if response.code_type.ancestors.include?(Net::HTTPSuccess)
source_file = Nokogiri::XML(response.body, &:noblanks)
create_srt(output_filename, source_filename, source_file)
puts 'Google timedtext.xml converted to srt'
else
puts "Couldn't find a srt file for #{source_filename} at #{BASE_URL + source_filename}"
end
end
@lsloan
Copy link
Author

lsloan commented Oct 31, 2011

This is not bad, although I'd rather see it done with XSL or Python.

@StanBoyet
Copy link

Well thank you :)

@viliam-durina
Copy link

viliam-durina commented Jul 16, 2015

@forgijs
Copy link

forgijs commented Mar 3, 2016

fmt=srt doesn't work anymore, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment