Skip to content

Instantly share code, notes, and snippets.

Created May 22, 2020 16:08
Show Gist options
  • Save aravindpai/67692a74fb14c079ed60e9f78da0969d to your computer and use it in GitHub Desktop.
Save aravindpai/67692a74fb14c079ed60e9f78da0969d to your computer and use it in GitHub Desktop.
#computer frequency of a pair of characters or character sequences
#accepts corpus and return frequency of each pair
def get_stats(corpus):
pairs = collections.defaultdict(int)
for word, freq in corpus.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
return pairs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment