Skip to content

Instantly share code, notes, and snippets.

@zaleslaw
Created November 4, 2020 09:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zaleslaw/a760699516d1e30251a3fd391e39c3d0 to your computer and use it in GitHub Desktop.
Save zaleslaw/a760699516d1e30251a3fd391e39c3d0 to your computer and use it in GitHub Desktop.
Topic Feature Ignite ML 2.9 Spark ML 3.0 Note Specific to Spark
Classification Binomial logistic regression yes yes
Classification Decision tree classifier yes yes
Classification Linear SVM yes yes
Classification Random Forest classifier yes yes
Classification Gradient-boosted tree classifier yes yes
Classification Multilayer perceptron classifier yes yes
Classification KNN yes no
Classification Weighted KNN yes no
Classification Indexed KNN yes no
Classification ANN (approximate nearest neighbour) with ACD strategy yes no Spark can use ANN methods via pre-built buckets with LSH, but it doesn't support another method to build an annoy index.
Classification Naive Bayes yes yes
Regression Generalized linear regression no yes supports only 4096 features
Regression Linear regression with LSQR yes no
Regression Linear regression with SGD yes yes
Regression Decision tree regression yes yes
Regression Random Forest regression yes yes
Regression Gradient-boosted tree regression yes yes
Regularization L1, L2 .. Lp as a trainer hyperparameter yes yes
Model composition / model ensembles OneVsRest training yes yes
Model composition / model ensembles Ensemble as a mean value of predictions yes no supported only for trees
Model composition / model ensembles Majority-based ensembles yes no supported only for trees
Model composition / Model ensembles Ensemble as a weighted sum of predictions yes no supported only for trees
Model composition / model ensembles Common bagging yes no
Model composition / model ensembles Common stacking yes no
model composition / model ensembles Common boosting yes no
Clustering K-means yes yes
Clustering Latent Dirichlet allocation (LDA) no yes
Clustering Bisecting k-means no yes
Clustering Gaussian mixture model (GMM) yes yes
Collaborative filtering SGD yes no
Collaborative filtering ALS yes yes
Model selection Train-test splitting yes yes
Model selection Cross validation yes yes
Model selection Binary evaluator yes yes
Model selection Multi-class evaluator no yes
Hyperparameter tuning Brute-force yes yes
Hyperparameter tuning Random search yes no
Hyperparameter tuning Genetic algorithm yes no
Hyperparameter tuning Parallel search yes no
Metrics Accuracy yes yes
Metrics Fmeasure yes yes
Metrics Precision yes yes
Metrics Recall yes yes
Metrics ROC AUC no yes
Metrics Regression metrics yes yes
Metrics Clustering metrics yes no calculates something in KMeans
Inference Model export/Import yes yes PMML is supported for 7 old models
Inference Import from XGBoost yes no
Inference Import from Spark ML yes yes
Advanced topics Model updating (online learning) yes no
Advanced topics Genetic algorithms to solve optimization problems yes no
Advanced topics Training logging yes no
Advanced topics Data generators yes no
Advanced topics Sandbox datasets yes no
Feature extraction TF-IDF no yes related to NLP
Feature extraction Word2vec no yes related to NLP
Feature extraction CountVectorizer no yes
Feature extraction FeatureHasher no yes
Feature transformation Tokenizer no yes related to NLP
Feature transformation StopWordsRemover no yes related to NLP
Feature transformation NN-grams no yes related to NLP
Feature transformation Binarizer yes yes
Feature transformation PCA no yes
Feature transformation PolynomialExpansion no yes
Feature transformation Discrete cosine transform (DCT) no yes
Feature transformation StringIndexer yes yes
Feature transformation OneHotEncoder yes yes
Feature transformation Normalizer yes yes
Feature transformation StandardScaler yes yes
Feature transformation MinMaxScaler yes yes
Feature transformation MaxAbsScaler yes yes
Feature transformation QuantileDiscretizer no yes
Feature transformation Imputer yes yes
Feature transformation Locality-Sensitive Hashing (LSH) no yes
Feature selection Feature extraction/vectorizer yes yes VectorAssembler/VectorSlice and other vector transformers
Feature selection Chi-square test of independence no yes ChiSqSelector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment