Skip to content

Instantly share code, notes, and snippets.

@sid321axn
Created August 7, 2020 03:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sid321axn/8432bbd29750f3fdbfd7429b61f0edef to your computer and use it in GitHub Desktop.
Save sid321axn/8432bbd29750f3fdbfd7429b61f0edef to your computer and use it in GitHub Desktop.
# convert the sentences (strings) into integers
tokenizer = Tokenizer(num_words=MAX_VOCAB_SIZE)
tokenizer.fit_on_texts(list(df['title']))
X = tokenizer.texts_to_sequences(list(df['title']))
# pad sequences so that we get a N x T matrix
X = pad_sequences(X, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of data tensor:', X.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment