Skip to content

Instantly share code, notes, and snippets.

@kharandziuk
Created August 19, 2020 20:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kharandziuk/1c50584879d5802f57727bf7b4609e90 to your computer and use it in GitHub Desktop.
Save kharandziuk/1c50584879d5802f57727bf7b4609e90 to your computer and use it in GitHub Desktop.
words count

Note: I didn't have enough time to finish the "output" part so I am just printing a counter.

  • will determine what are our deliverables here(a script or a library or a service). Provide an interface according to requirements(should the script read a file or should the library work with the string?).
  • will try to think about the edge cases. For example: I treat '2d' as word but it's not a word in your original example. The other possible edge case is a huge file. I am saying "possible" because performance and "size of input" are not issues sometimes. Test it with a different sets of data(ideally provide automated tests)
  • check that code satisfies all the "code" specific to the company/team(naming conventions, style, documentation etc)
import re
from collections import Counter
def is_proper_word(word):
return word != '-' and not word.isdigit()
def get_words(text):
words = re.findall(r"[\w\-']+", text)
words = [word.lower() for word in words if is_proper_word(word)]
return words
counter = Counter(get_words(text))
print(counter)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment