Skip to content

Instantly share code, notes, and snippets.

@anujdutt9
Created April 13, 2017 17:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anujdutt9/440739592e0251ad06b6c37be25f5351 to your computer and use it in GitHub Desktop.
Save anujdutt9/440739592e0251ad06b6c37be25f5351 to your computer and use it in GitHub Desktop.
# Step1: Training Data and Preprocessing
# a) Tokenize the Input text (sentence to words)
# b) Form the Vocabulary and remove Infrequent words
# c) Add "Start" and "End" Tokens to the sentences
# Vocabulary Size: 8000 words
vocab_size = 8000
# Token to replace the infrequent words
unknown_token = 'Unknown_Token'
# Sentence start and end tokens
sentence_start_token = 'Sentence_Start'
sentence_end_token = 'Sentence_End'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment