- supports numpy array, scipy sparse matrix, pandas dataframe.
Estimator
- learns from data: can be a classification, regression , clustering that extracts/filters useful features from raw data - implements set_params
, fit(X,y)
, predict(T)
, score
(judge the quality of fit / predict), predict_proba
(confidence level)
Transformer
- transform
(reduce dimensionality)/ inverse_transform
, - clean (sklearn.preprocessing
), reduce dimensions (sklearn.unsupervised _reduction
), expand (sklearn.kernel_approximation
) or generate feature representations (sklearn.feature_extraction
).
properties: labels_
, cluster_centers_
. distance metrics - maximize distance between samples in different classes, and minimizes it within each class: Euclidean distance (l2), Manhattan distance (l1) - good for sparse features, cosine distance - invariant to global scalings, or any precomputed affinity matrix.
dbscan
- deterministicly separate areas of high density from