Skip to content

Instantly share code, notes, and snippets.

Last active May 22, 2020 19:25
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
num_merges = 10
for i in range(num_merges):
#compute frequency of bigrams in a corpus
pairs = get_stats(corpus)
#compute the best pair
best = max(pairs, key=pairs.get)
#merge the frequent pair in corpus
corpus = merge_vocab(best, corpus)
#append to merge list and vocabulary
#convert a tuple to a string
merges_in_string = ["".join(list(i)) for i in merges]
print("BPE Merge Operations:",merges_in_string)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment