Skip to content

Instantly share code, notes, and snippets.

@manliu1225
Created July 19, 2017 07:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save manliu1225/7e17b9c2b755388e65c0198d04b22915 to your computer and use it in GitHub Desktop.
Save manliu1225/7e17b9c2b755388e65c0198d04b22915 to your computer and use it in GitHub Desktop.
def cut_sentence_new(words):
words = (words).decode('utf8')
start = 0
i = 0
sents = []
# print words
token = list(words[start:i+2]).pop()
punt_list = '.!?;~。!?;~'.decode('utf8')
for word in words:
if word in punt_list and token not in punt_list:
sents.append(words[start:i+1])
start = i+1
i += 1
else:
i += 1
token = list(words[start:i+2]).pop()
if start < len(words):
sents.append(words[start:])
return sents
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment