- Library import (nlp, numpy, pandas, matplotlib, seaborn etc.)
- NLTK downloads (checking packages' status)
- Importing data
- Data analysis (EDA)
- Converting target/classes to int class type (Like orange : 0, red : 1)
- Converting dtype text data to string type with astype
- Regex
- Data visualization
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class MeanEmbeddingVectorizer(object): | |
def __init__(self, word_model): | |
self.word_model = word_model | |
self.vector_size = word_model.wv.vector_size | |
def fit(self): # comply with scikit-learn transformer requirement | |
return self | |
def transform(self, docs): # comply with scikit-learn transformer requirement |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Afghanistan | |
Albania | |
Algeria | |
Andorra | |
Angola | |
Antigua & Deps | |
Argentina | |
Armenia | |
Australia | |
Austria |