Skip to content

Instantly share code, notes, and snippets.

@rishisidhu
Created September 1, 2020 03:30
Show Gist options
  • Save rishisidhu/d7175e1edade634ad2b88ba148f58e7a to your computer and use it in GitHub Desktop.
Save rishisidhu/d7175e1edade634ad2b88ba148f58e7a to your computer and use it in GitHub Desktop.
Working with OOV Tokens
from tensorflow.keras.preprocessing.text import Tokenizer
#Let's add custom sentences
sentences = [
"Apples are red",
"Apples are round",
"Oranges are round",
"Grapes are green"
]
#Tokenize the sentences using OOV
myTokenizer = Tokenizer(num_words=100, oov_token="<some-word>")
myTokenizer.fit_on_texts(sentences)
print(myTokenizer.word_index)
# Unseen Words
test_data = [
'Grapes are sour but oranges are sweet',
]
test_seq = myTokenizer.texts_to_sequences(test_data)
print("\nTest Sequence = ", test_seq, " => ", [x for x in myTokenizer.sequences_to_texts_generator(test_seq)])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment