Created
August 17, 2021 21:24
-
-
Save zjuyk/3842b9ca7565dbc5a344c35feb1aba27 to your computer and use it in GitHub Desktop.
将 srt 字幕的台词提取到 txt 中
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
# 注意默认编码是 UTF-8 | |
file = open("input.srt") | |
lines = file.readlines() | |
file.close() | |
text = "" | |
for line in lines: | |
if re.search('^[0-9]+$', line) is None and \ | |
re.search('^[0-9]{2}:[0-9]{2}:[0-9]{2}', line) is None and \ | |
re.search('^$', line) is None: | |
text += '\n' + line.rstrip('\n') | |
text = text.lstrip() | |
text_file = open("output.txt", 'w') | |
text_file.write(text) | |
text_file.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment