Skip to content

Instantly share code, notes, and snippets.

@aral
Last active September 4, 2020 15:44
Show Gist options
  • Save aral/4ec009a28527e6c0d07e to your computer and use it in GitHub Desktop.
Save aral/4ec009a28527e6c0d07e to your computer and use it in GitHub Desktop.
Quick and dirty regexp that we use to format video transcripts (CoffeeScript)
#!/usr/bin/env coffee
fs = require 'fs'
# Get the file name that’s passed as the first argument
nameOfFile = process.argv.slice(2)[0]
# Read the file and convert it from bytes to a string
file = fs.readFileSync(nameOfFile).toString()
# Strip double empty lines (causes problems for later replacements otherwise)
file = file.replace(/\n\n/gm, '\n')
# Replace lines starting with Speaker: with the h3 header
file = file.replace(/^([^\ \(]*:)/gm, "<h3 class='transcript--name'>$1</h3>")
# This weirldy leaves a newline — remove it
file = file.replace(/<h3 class='transcript--name'>\n/gm, "<h3 class='transcript--name'>")
# Wrap all lines in <p> tags
file = file.replace(/^(.*)$/gm, "<p>$1</p>")
# Remove the <p> tags from lines that begin with the <h3> and make it prettier by adding an empty line before the <h3>s.
file = file.replace(/^<p><h3 class='transcript--name'>(.*)<\/h3>(.*)<\/p>/gm, "\n<h3 class='transcript--name'>$1</h3>\n\n<p>$2</p>")
# Replace double spaces after punctuation. Not necessary for HTML but easy enough to do and it looks better in the source.
file = file.replace(/[\.\?\!]\ \ /gm, '. ')
# Replace ellipsis+dot pairs (found one in Cole’s)
file = file.replace(/…\./gm, '… ')
# Replace ellipses with HTML entity code
file = file.replace(/…/gm, '&hellip;')
# Wrap (applause) in <em>s
file = file.replace(/\(applause\)/g, '<em>(applause)</em>')
# Remove empty paragraph tags if any (<p></p>)
file = file.replace(/<p><\/p>/gm, '')
# Create the new file name by replacing the .txt extension with .html
newNameOfFile = nameOfFile.replace('.txt', '.html')
# Create and write the updated file string to the new file
fs.writeFileSync(newNameOfFile, file)
# Alert the user that we’re done
console.log ('Formatted transcript: ' + newNameOfFile)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment