Skip to content

Instantly share code, notes, and snippets.

@BroaderImpact
Last active February 18, 2023 16:49
Show Gist options
  • Save BroaderImpact/1f040b015589fd163f317f6d0be94d39 to your computer and use it in GitHub Desktop.
Save BroaderImpact/1f040b015589fd163f317f6d0be94d39 to your computer and use it in GitHub Desktop.
ML Model Selection

GitHub contributors GitHub issues GitHub Packagist Stars

ml-model-selection

Easy python script for determining which machine learning model to use.

Installation

Use the package manager pip to install select_model.

pip install select_model

Usage

import select_model

# returns 'Linear Regression is recommended.'
select_model.linear('yes')

# returns 'Random Forest Classifier is recommended.'
select_model.classification('yes')

# returns 'Logistic Regression is recommended.'
select_model.binary('yes')

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
def select_model():
"""
Takes input and returns best machine learning model.
"""
print("Answer the following questions to select a machine learning model:")
linear = input("Is the target variable continuous? (yes or no) ")
if linear.lower() == "yes":
print("Linear Regression is recommended.")
return LinearRegression()
classification = input("Are you trying to predict a categorical outcome? (yes or no) ")
if classification.lower() == "yes":
random_forest = input("Do you have a large number of features? (yes or no) ")
if random_forest.lower() == "yes":
print("Random Forest Classifier is recommended.")
return RandomForestClassifier()
else:
logistic = input("Is the outcome binary? (yes or no) ")
if logistic.lower() == "yes":
print("Logistic Regression is recommended.")
return LogisticRegression()
else:
svm = input("Do you have a small or medium-sized dataset? (yes or no) ")
if svm.lower() == "yes":
print("Support Vector Machines is recommended.")
return SVC()
else:
print("K-Nearest Neighbors is recommended.")
return KNeighborsClassifier()
else:
clustering = input("Are you trying to group similar data points? (yes or no) ")
if clustering.lower() == "yes":
print("K-Means Clustering is recommended.")
return KMeans()
else:
print("Principal Component Analysis is recommended.")
return PCA()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment