Skip to content

Instantly share code, notes, and snippets.

@Elfsong
Last active January 7, 2019 16:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Elfsong/5faed3edc1eaa432fc408f7e43ce12ac to your computer and use it in GitHub Desktop.
Save Elfsong/5faed3edc1eaa432fc408f7e43ce12ac to your computer and use it in GitHub Desktop.
seperate_sentence #python #NLP
def cut_sent(para):
para = para.replace(u'\u3000', '')
para = re.sub('([。!?\?])([^”’])', r"\1\n\2", para) # 单字符断句符
para = re.sub('(\.{6})([^”’])', r"\1\n\2", para) # 英文省略号
para = re.sub('(\…{2})([^”’])', r"\1\n\2", para) # 中文省略号
para = re.sub('([。!?\?][”’])([^,。!?\?])', r'\1\n\2', para)
para = para.rstrip()
return [sentence for sentence in para.split("\n") if sentence]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment