Skip to content

Instantly share code, notes, and snippets.

@jareware
Last active March 21, 2024 15:14
Show Gist options
  • Save jareware/7af17f2034931608e842 to your computer and use it in GitHub Desktop.
Save jareware/7af17f2034931608e842 to your computer and use it in GitHub Desktop.
Conversion script between the TTML & SRT subtitle formats

premiere-subtitle-convert

Conversion script between the TTML & SRT subtitle formats. This is particularly useful with Adobe Premiere, as it doesn't understand the SRT format (which is joyously simple and interoperable). TTML-XML is probably the most straightforward subtitle format it does understand, hence this tool.

Note that due to the simplicity of the SRT format, this conversion is extremely lossy for all the bells and whistles supported by TTML. Not like you'd want fixed-pixel font sizes etc in your subtitles anyway, but you've been warned.

Usage

  • Install Node.js
  • Install libxmljs somewhere where node can find it (if you don't know what this means, consider installing globally with npm install -g libxmljs)
  • Run node premiere-subtitle-convert.js path/to/inputfile.xml > path/to/outputfile.srt (or the other way around with the file types)
  • Profit

Batch usage

This basically just shell wizardry, but for completeness, here's how one could accomplish the rather common scenario where a bunch of files need to be converted in one go:

ls "*.xml" | while read f
do
  node /path/to/premiere-subtitle-convert.js "$f" > "$(echo $f | sed s/.xml/.srt/)"
done

This also properly handles white-spaces in file names.

Author

Jarno Rantanen (@Jareware)

License: MIT

var XML_PREFIX = '<?xml version="1.0" encoding="UTF-8" standalone="no" ?><tt:tt xmlns:tt="http://www.w3.org/ns/ttml"><tt:head></tt:head><tt:body><tt:div>\n';
var XML_SUFFIX = '\n</tt:div></tt:body></tt:tt>';
var fs = require('fs');
var libxmljs = require('libxmljs');
var args = process.argv.slice(2);
var inputFile = args[0];
var inputData = fs.readFileSync(inputFile, { encoding: 'utf8' });
var inputType = inputFile.replace(/^.*\.(.*?)$/, '$1').toUpperCase();
if (inputType === 'SRT') {
console.log(subsToXML(srtToSubs(inputData)));
} else if (inputType === 'XML') {
console.log(subsToSRT(xmlToSubs(inputData)));
} else {
throw new Error('Unknown input type: ' + inputType);
}
function xmlToSubs(xmlString) {
var inputXML = libxmljs.parseXml(xmlString);
var subsEls = inputXML.find('//tt:p', { tt: 'http://www.w3.org/ns/ttml' });
var subs = {};
subsEls.forEach(function(subEl) {
var begin = subEl.attr('begin').value().replace(/:(\d+)$/, ',$1'); // ":frame" -> ",frame"
var end = subEl.attr('end').value().replace(/:(\d+)$/, ',$1');
var content = subEl.text().trim();
if (subs[begin]) {
subs[begin].content.push(content); // combine content with the same timecode
} else {
subs[begin] = {
begin: begin,
end: end,
content: [ content ]
}
}
});
return Object.keys(subs).map(function(key) {
return subs[key];
});
}
function srtToSubs(srtString) {
return srtString.trim().split(/\n\n/).map(function(subPiece) {
var parts = subPiece.split(/\n/);
var timecodes = parts[1].split(' --> ');
return {
begin: timecodes[0],
end: timecodes[1],
content: parts.slice(2)
};
});
}
function subsToXML(subsArray) {
return XML_PREFIX + subsArray.map(function(sub) {
return '' +
'<tt:p tt:begin="' + sub.begin.replace(',', ':') + '" tt:end="' + sub.end.replace(',', ':') + '">' +
'<tt:span>' +
sub.content.join('<tt:br />') +
'</tt:span>' +
'</tt:p>';
}).join('\n') + XML_SUFFIX;
}
function subsToSRT(subsArray) {
return subsArray.map(function(sub, index) {
return (index + 1) + '\n' + sub.begin + ' --> ' + sub.end + '\n' + sub.content.join('\n');
}).join('\n\n') + '\n';
}
@pyy
Copy link

pyy commented Oct 17, 2016

@jareware Thank you so so much for this useful script. You save my life! Thanks!

It works perfectly for the captions from MVA.

@jvinhit
Copy link

jvinhit commented Jan 14, 2017

@jareware i have an error : $ node premiere-subtitle-convert.js video_cc.xml > outputfile.srt stdout is not a tty
I have java

@jiangweiatgithub
Copy link

I ran the script on my dfxp.xml and it silently produced an srt file, which was empty. Any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment