Last active
February 28, 2018 18:38
-
-
Save montanalow/a67c9580739443f4f2f36b004d4a2eb9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# my_app/pipelines/product_popularity.py part 2 | |
def get_encoders(self): | |
return ( | |
# An encoder to tokenize product names into max 15 tokens that | |
# occur in the corpus at least 10 times. We also want the | |
# estimator to spend 5x as many resources on name vs department | |
# since there are so many more words in english than there are | |
# grocery store departments. | |
Token('product_name', sequence_length=15, minimum_occurrences=10, embed_scale=5), | |
# An encoder to translate department names into unique | |
# identifiers that occur at least 50 times | |
Unique('department', minimum_occurrences=50) | |
) | |
def get_output_encoder(self): | |
# Sales is floating point which we could Pass encode directly to the | |
# estimator, but Norm will bring it to small values around 0, | |
# which are more amenable to deep learning. | |
return Norm('sales') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment