Created
November 4, 2020 09:29
-
-
Save zaleslaw/a760699516d1e30251a3fd391e39c3d0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Topic | Feature | Ignite ML 2.9 | Spark ML 3.0 | Note Specific to Spark | |
---|---|---|---|---|---|
Classification | Binomial logistic regression | yes | yes | ||
Classification | Decision tree classifier | yes | yes | ||
Classification | Linear SVM | yes | yes | ||
Classification | Random Forest classifier | yes | yes | ||
Classification | Gradient-boosted tree classifier | yes | yes | ||
Classification | Multilayer perceptron classifier | yes | yes | ||
Classification | KNN | yes | no | ||
Classification | Weighted KNN | yes | no | ||
Classification | Indexed KNN | yes | no | ||
Classification | ANN (approximate nearest neighbour) with ACD strategy | yes | no | Spark can use ANN methods via pre-built buckets with LSH, but it doesn't support another method to build an annoy index. | |
Classification | Naive Bayes | yes | yes | ||
Regression | Generalized linear regression | no | yes | supports only 4096 features | |
Regression | Linear regression with LSQR | yes | no | ||
Regression | Linear regression with SGD | yes | yes | ||
Regression | Decision tree regression | yes | yes | ||
Regression | Random Forest regression | yes | yes | ||
Regression | Gradient-boosted tree regression | yes | yes | ||
Regularization | L1, L2 .. Lp as a trainer hyperparameter | yes | yes | ||
Model composition / model ensembles | OneVsRest training | yes | yes | ||
Model composition / model ensembles | Ensemble as a mean value of predictions | yes | no | supported only for trees | |
Model composition / model ensembles | Majority-based ensembles | yes | no | supported only for trees | |
Model composition / Model ensembles | Ensemble as a weighted sum of predictions | yes | no | supported only for trees | |
Model composition / model ensembles | Common bagging | yes | no | ||
Model composition / model ensembles | Common stacking | yes | no | ||
model composition / model ensembles | Common boosting | yes | no | ||
Clustering | K-means | yes | yes | ||
Clustering | Latent Dirichlet allocation (LDA) | no | yes | ||
Clustering | Bisecting k-means | no | yes | ||
Clustering | Gaussian mixture model (GMM) | yes | yes | ||
Collaborative filtering | SGD | yes | no | ||
Collaborative filtering | ALS | yes | yes | ||
Model selection | Train-test splitting | yes | yes | ||
Model selection | Cross validation | yes | yes | ||
Model selection | Binary evaluator | yes | yes | ||
Model selection | Multi-class evaluator | no | yes | ||
Hyperparameter tuning | Brute-force | yes | yes | ||
Hyperparameter tuning | Random search | yes | no | ||
Hyperparameter tuning | Genetic algorithm | yes | no | ||
Hyperparameter tuning | Parallel search | yes | no | ||
Metrics | Accuracy | yes | yes | ||
Metrics | Fmeasure | yes | yes | ||
Metrics | Precision | yes | yes | ||
Metrics | Recall | yes | yes | ||
Metrics | ROC AUC | no | yes | ||
Metrics | Regression metrics | yes | yes | ||
Metrics | Clustering metrics | yes | no | calculates something in KMeans | |
Inference | Model export/Import | yes | yes | PMML is supported for 7 old models | |
Inference | Import from XGBoost | yes | no | ||
Inference | Import from Spark ML | yes | yes | ||
Advanced topics | Model updating (online learning) | yes | no | ||
Advanced topics | Genetic algorithms to solve optimization problems | yes | no | ||
Advanced topics | Training logging | yes | no | ||
Advanced topics | Data generators | yes | no | ||
Advanced topics | Sandbox datasets | yes | no | ||
Feature extraction | TF-IDF | no | yes | related to NLP | |
Feature extraction | Word2vec | no | yes | related to NLP | |
Feature extraction | CountVectorizer | no | yes | ||
Feature extraction | FeatureHasher | no | yes | ||
Feature transformation | Tokenizer | no | yes | related to NLP | |
Feature transformation | StopWordsRemover | no | yes | related to NLP | |
Feature transformation | NN-grams | no | yes | related to NLP | |
Feature transformation | Binarizer | yes | yes | ||
Feature transformation | PCA | no | yes | ||
Feature transformation | PolynomialExpansion | no | yes | ||
Feature transformation | Discrete cosine transform (DCT) | no | yes | ||
Feature transformation | StringIndexer | yes | yes | ||
Feature transformation | OneHotEncoder | yes | yes | ||
Feature transformation | Normalizer | yes | yes | ||
Feature transformation | StandardScaler | yes | yes | ||
Feature transformation | MinMaxScaler | yes | yes | ||
Feature transformation | MaxAbsScaler | yes | yes | ||
Feature transformation | QuantileDiscretizer | no | yes | ||
Feature transformation | Imputer | yes | yes | ||
Feature transformation | Locality-Sensitive Hashing (LSH) | no | yes | ||
Feature selection | Feature extraction/vectorizer | yes | yes | VectorAssembler/VectorSlice and other vector transformers | |
Feature selection | Chi-square test of independence | no | yes | ChiSqSelector |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment