Created
October 4, 2019 02:47
-
-
Save amitrani6/9347c06b808b47b5a4a733a13dac385c to your computer and use it in GitHub Desktop.
A Naive Bayes Classifier for NLP.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#The code for creating a Naive Bayes Classifier from text data stored in a pandas data frame | |
#Train Test Split The Data Frame | |
X_train, X_test, y_train, y_test = train_test_split(df.lemmatize_text, df.show_name, test_size=0.2, random_state=42) | |
#create a Scikit-Learn pipeline for Naive Bayes Classification | |
text_clf = Pipeline([('count_vectorizer', CountVectorizer()), | |
('tfidf_vectorizer', TfidfTransformer()), | |
('clf', MultinomialNB()) | |
]) | |
#Fit the training datat | |
text_clf.fit(X_train, y_train) | |
#Predict the categories of the test data | |
test_predictions = text_clf.predict(X_test) | |
#Evaluate the predictions based on the scripts' actual classes | |
print(metrics.classification_report(y_test, test_predictions, | |
target_names = le.classes_)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment