Skip to content

Instantly share code, notes, and snippets.

@aravindpai
Last active May 22, 2020 14:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aravindpai/70619db5f051e572d9f2454939bf0898 to your computer and use it in GitHub Desktop.
Save aravindpai/70619db5f051e572d9f2454939bf0898 to your computer and use it in GitHub Desktop.
#initlialize the vocabulary
vocab = list(set(" ".join(corpus)))
vocab.remove(' ')
#split the word into characters
corpus = [" ".join(token) for token in corpus]
#appending </w>
corpus=[token+' </w>' for token in corpus]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment