Skip to content

Instantly share code, notes, and snippets.

@Lexie88rus
Last active August 30, 2019 16:04
Show Gist options
  • Save Lexie88rus/41ce576b5841061381d53af4b305b3e1 to your computer and use it in GitHub Desktop.
Save Lexie88rus/41ce576b5841061381d53af4b305b3e1 to your computer and use it in GitHub Desktop.
Create a vocabulary for kernel titles
def create_vocabulary(titles):
'''
Function to create a vocabulary out of a list of titles
'''
vocab = set()
for title in titles:
if (clean_title(title) != ''):
words = extract_words(title)
vocab.update(words)
word_list = list(vocab)
word_list.append(end_of_sentence)
vocabulary = {word_list[n]:n for n in range(0,len(word_list))}
return vocabulary
# create vocabulary out of popular kernel titles
vocab = create_vocabulary(popular_kernel_titles)
vocab_size = len(vocab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment