Skip to content

Instantly share code, notes, and snippets.

@aravindpai
Last active May 22, 2020 19:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aravindpai/981365e1e480f97e8cf3a3cde15083e6 to your computer and use it in GitHub Desktop.
Save aravindpai/981365e1e480f97e8cf3a3cde15083e6 to your computer and use it in GitHub Desktop.
num_merges = 10
for i in range(num_merges):
#compute frequency of bigrams in a corpus
pairs = get_stats(corpus)
#compute the best pair
best = max(pairs, key=pairs.get)
#merge the frequent pair in corpus
corpus = merge_vocab(best, corpus)
#append to merge list and vocabulary
merges.append(best)
vocab.append(best)
#convert a tuple to a string
merges_in_string = ["".join(list(i)) for i in merges]
print("BPE Merge Operations:",merges_in_string)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment