Skip to content

Instantly share code, notes, and snippets.

@zjuyk
Created August 17, 2021 21:24
Show Gist options
  • Save zjuyk/3842b9ca7565dbc5a344c35feb1aba27 to your computer and use it in GitHub Desktop.
Save zjuyk/3842b9ca7565dbc5a344c35feb1aba27 to your computer and use it in GitHub Desktop.
将 srt 字幕的台词提取到 txt 中
import re
# 注意默认编码是 UTF-8
file = open("input.srt")
lines = file.readlines()
file.close()
text = ""
for line in lines:
if re.search('^[0-9]+$', line) is None and \
re.search('^[0-9]{2}:[0-9]{2}:[0-9]{2}', line) is None and \
re.search('^$', line) is None:
text += '\n' + line.rstrip('\n')
text = text.lstrip()
text_file = open("output.txt", 'w')
text_file.write(text)
text_file.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment