Skip to content

Instantly share code, notes, and snippets.

@mebubo
Forked from jareware/README.md
Created October 20, 2016 18:43
Show Gist options
  • Save mebubo/84c0f8d9e71ed7d6f3f197e9c88c9b95 to your computer and use it in GitHub Desktop.
Save mebubo/84c0f8d9e71ed7d6f3f197e9c88c9b95 to your computer and use it in GitHub Desktop.
Conversion script between the TTML & SRT subtitle formats

premiere-subtitle-convert

Conversion script between the TTML & SRT subtitle formats. This is particularly useful with Adobe Premiere, as it doesn't understand the SRT format (which is joyously simple and interoperable). TTML-XML is probably the most straightforward subtitle format it does understand, hence this tool.

Note that due to the simplicity of the SRT format, this conversion is extremely lossy for all the bells and whistles supported by TTML. Not like you'd want fixed-pixel font sizes etc in your subtitles anyway, but you've been warned.

Usage

  • Install Node.js
  • Install libxmljs somewhere where node can find it (if you don't know what this means, consider installing globally with npm install -g libxmljs)
  • Run node premiere-subtitle-convert.js path/to/inputfile.xml > path/to/outputfile.srt (or the other way around with the file types)
  • Profit

Batch usage

This basically just shell wizardry, but for completeness, here's how one could accomplish the rather common scenario where a bunch of files need to be converted in one go:

ls "*.xml" | while read f
do
  node /path/to/premiere-subtitle-convert.js "$f" > "$(echo $f | sed s/.xml/.srt/)"
done

This also properly handles white-spaces in file names.

Author

Jarno Rantanen (@Jareware)

License: MIT

var XML_PREFIX = '<?xml version="1.0" encoding="UTF-8" standalone="no" ?><tt:tt xmlns:tt="http://www.w3.org/ns/ttml"><tt:head></tt:head><tt:body><tt:div>\n';
var XML_SUFFIX = '\n</tt:div></tt:body></tt:tt>';
var fs = require('fs');
var libxmljs = require('libxmljs');
var args = process.argv.slice(2);
var inputFile = args[0];
var inputData = fs.readFileSync(inputFile, { encoding: 'utf8' });
var inputType = inputFile.replace(/^.*\.(.*?)$/, '$1').toUpperCase();
if (inputType === 'SRT') {
console.log(subsToXML(srtToSubs(inputData)));
} else if (inputType === 'XML') {
console.log(subsToSRT(xmlToSubs(inputData)));
} else {
throw new Error('Unknown input type: ' + inputType);
}
function xmlToSubs(xmlString) {
var inputXML = libxmljs.parseXml(xmlString);
var subsEls = inputXML.find('//tt:p', { tt: 'http://www.w3.org/ns/ttml' });
var subs = {};
subsEls.forEach(function(subEl) {
var begin = subEl.attr('begin').value().replace(/:(\d+)$/, ',$1'); // ":frame" -> ",frame"
var end = subEl.attr('end').value().replace(/:(\d+)$/, ',$1');
var content = subEl.text().trim();
if (subs[begin]) {
subs[begin].content.push(content); // combine content with the same timecode
} else {
subs[begin] = {
begin: begin,
end: end,
content: [ content ]
}
}
});
return Object.keys(subs).map(function(key) {
return subs[key];
});
}
function srtToSubs(srtString) {
return srtString.trim().split(/\n\n/).map(function(subPiece) {
var parts = subPiece.split(/\n/);
var timecodes = parts[1].split(' --> ');
return {
begin: timecodes[0],
end: timecodes[1],
content: parts.slice(2)
};
});
}
function subsToXML(subsArray) {
return XML_PREFIX + subsArray.map(function(sub) {
return '' +
'<tt:p tt:begin="' + sub.begin.replace(',', ':') + '" tt:end="' + sub.end.replace(',', ':') + '">' +
'<tt:span>' +
sub.content.join('<tt:br />') +
'</tt:span>' +
'</tt:p>';
}).join('\n') + XML_SUFFIX;
}
function subsToSRT(subsArray) {
return subsArray.map(function(sub, index) {
return (index + 1) + '\n' + sub.begin + ' --> ' + sub.end + '\n' + sub.content.join('\n');
}).join('\n\n') + '\n';
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment