Skip to content

Instantly share code, notes, and snippets.

@amattu2
Last active December 24, 2020 02:31
Show Gist options
  • Save amattu2/36b40d04bf4e6e9fe74a87819ddc47b9 to your computer and use it in GitHub Desktop.
Save amattu2/36b40d04bf4e6e9fe74a87819ddc47b9 to your computer and use it in GitHub Desktop.
Generate a Sklearn multiple-label classification model off of Automotive service appointments. The dataset was built from a proprietary dataset that was anonymized.
"""
Labels automotive appointment description/comments
based on a trained multi-label classification model
Expected CSV structure:
ID|Tech|Service|Comments|mechanical|bodywork|diagnostic|suspension|engine|exhaust|electrical|brakes|tires
Structre notes:
ID, Tech, Service - Irrelevent, used during transcription
Comments - Used to generate the model
mechanical, bodywork, diagnostic, suspension, engine, exhaust, electrical, brakes, tires - Used to categorize (label)
"""
"""
Produced 2020
By https://amattu.com/links/github
Copy Alec M.
"""
"""
Original Author
https://www.geeksforgeeks.org/an-introduction-to-multilabel-classification/
"""
# Imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from skmultilearn.adapt import MLkNN
# Read model CSV
df = pd.read_csv('model.csv')
vetorizar = TfidfVectorizer(max_features = 3000, max_df = 0.85)
comments = ""
# Model text field
X = df["Comments"]
vetorizar.fit(X)
# Model data field(s)
y = np.asarray(df[df.columns[4:]])
# Split model datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4)
X_train_tfidf = vetorizar.transform(X_train)
X_test_tfidf = vetorizar.transform(X_test)
# using Multi-label kNN classifier
mlknn_classifier = MLkNN()
mlknn_classifier.fit(X_train_tfidf, y_train)
# Continue reading until KeyboardInterupt
while comments == "":
# Read appointment comments, test model
comments = input("Enter a appointment description: ")
new_sentence_tfidf = vetorizar.transform([comments])
prediction = mlknn_classifier.predict(new_sentence_tfidf).toarray()[0]
# Print results
print("Mechanical: {}\nBodywork: {}\nDiagnostic: {}\nSuspension: {}\nEngine: {}\nExhaust: {}\nElectrical: {}\nBrakes: {}\nTires: {}".format(
prediction[0],
prediction[1],
prediction[2],
prediction[3],
prediction[4],
prediction[5],
-1, # untrained in dataset
-1, # untrained in dataset
-1 # untrained in dataset
))
# Reset comments
comments = ""
@amattu2
Copy link
Author

amattu2 commented Dec 23, 2020

Dataset not public ATM, currently scraping confidential data out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment