Skip to content

Instantly share code, notes, and snippets.

@kanekomasahiro
Last active March 28, 2021 06:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kanekomasahiro/4712ed7b78e0b1fa3b8a4d16493adc39 to your computer and use it in GitHub Desktop.
Save kanekomasahiro/4712ed7b78e0b1fa3b8a4d16493adc39 to your computer and use it in GitHub Desktop.
正規表現を使った英語の単語分割
import regex as re
def split_sentence_to_words(sent):
pat = re.compile(r"'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+")
return re.findall(pat, sent)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment