Skip to content

Instantly share code, notes, and snippets.

@dvdbng
Created November 15, 2019 18:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dvdbng/9b08b14cbf76ab7feb984ab9cab41224 to your computer and use it in GitHub Desktop.
Save dvdbng/9b08b14cbf76ab7feb984ab9cab41224 to your computer and use it in GitHub Desktop.
Convert chinese subtitle to chinese + pinyin (Output is like this: 最zùi高gāo法fǎ院yuàn在zài今jīn天tiān早zǎo上shàng)
#!/usr/bin/python3
import sys
import re
import pysrt
import pinyin
def to_pinyin(text):
pinyins = pinyin.get(text, delimiter=" ").split(' ')
assert len(text) == len(pinyins)
return "".join([f"{char}{pinyin}" for char, pinyin in zip(text, pinyins)])
def process_text(text):
return re.sub(r'[\u4e00-\u9fff]+', lambda m: to_pinyin(m.group(0)), text)
def pinyin_subtitles(filename):
srt = pysrt.open(filename)
for item in srt :
item.text = process_text(item.text)
srt.save(filename[:-4] + '.pinyin.srt', encoding="utf-8")
def usage():
print('./pinyin-subtitles file.srt')
def main():
if len(sys.argv) != 2 or not sys.argv[1].endswith('.srt'):
usage()
else:
pinyin_subtitles(sys.argv[1])
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment