Skip to content

Instantly share code, notes, and snippets.

@SerhatTeker
Last active February 27, 2020 18:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SerhatTeker/a252e6bb466a067799e42d7c87bbead4 to your computer and use it in GitHub Desktop.
Save SerhatTeker/a252e6bb466a067799e42d7c87bbead4 to your computer and use it in GitHub Desktop.
Word Count Regex - Python
import re
import collections
def count_words(sentence):
word_list = re.findall(r"[\da-zA-Z]+(?:\'[\da-zA-Z]+)?", sentence.lower())
return collections.Counter(word_list)
# Alternative
#------------------------------------------------------------------------------
def unquoted(word):
if word.startswith("'") and word.endswith("'"):
return [word for word in word.split("'") if word != ""][0]
return word
def count_words_alternative(sentence):
# regex is kept simpler to improve readability
words = re.findall("[A-Za-z']+|[0-9]+", sentence.lower())
return collections.Counter(map(unquoted, words))
@SerhatTeker
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment