Skip to content

Instantly share code, notes, and snippets.

@macloo
Created May 26, 2018 21:41
Show Gist options
  • Save macloo/03de82277ac606b01696ba307affb8aa to your computer and use it in GitHub Desktop.
Save macloo/03de82277ac606b01696ba307affb8aa to your computer and use it in GitHub Desktop.
Clean any .sbv transcript file from YouTube - remove blank lines and timecode
# clean any .sbv transcript file from YouTube
# preserve linebreaks
filename = input('What is the filename? (include .sbv) ')
myfile = open(filename)
mylist = myfile.readlines()
myfile.close()
length = str(len(mylist))
new_length = str( int(len(mylist) / 3) )
print('The original file is ' + length + ' lines.')
print('The new file should be ' + new_length + ' lines.')
# new file with same filename but with '.txt' extension
new_filename = filename[:-3] + 'txt'
newfile = open(new_filename, 'w')
for item in mylist:
if item != '\n' and item[:2] != '0:':
newfile.write(item)
newfile.close()
@macloo
Copy link
Author

macloo commented May 26, 2018

How to download a transcript from YouTube (your own video)

  1. Go to YouTube
  2. Go to "My Channel"
  3. Click button for "Creator Studio"
  4. Open menu beside "Edit" on an ind. video. Select "Subtitles/CC"
  5. If prompted to set language, do so (e.g., English)
  6. Under "Published," select "English (Automatic)" (these are the subtitles YouTube makes automatically)
  7. Open "Actions," choose file type (such as .sbv)
  8. Downloads automatically

.sbv files are best for editing, e.g. to rewrite script entirely. Files are plain text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment