Skip to content

Instantly share code, notes, and snippets.

@AmrutaKoshe
Created June 24, 2021 13:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AmrutaKoshe/3799552aea94a6c494339842648ccc1e to your computer and use it in GitHub Desktop.
Save AmrutaKoshe/3799552aea94a6c494339842648ccc1e to your computer and use it in GitHub Desktop.
Tokenizing features
#tokenize features and labels
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
# Tokenize feature data
vocab_size = 6000
oov_tok = '<>'
feature_tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
feature_tokenizer.fit_on_texts(features)
feature_index = feature_tokenizer.word_index
print(dict(list(feature_index.items())))
# Print example sequences from train and test datasets
train_feature_sequences = feature_tokenizer.texts_to_sequences(train_features)
test_feature_sequences = feature_tokenizer.texts_to_sequences(test_features)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment