Model | Boundary | Needs Scaling | Probabilities | Strengths | Weak Spots |
---|---|---|---|---|---|
Logistic Reg | Linear | ✅ | ✅ (softmax) | Fast, interpretable, calibrated | Nonlinear patterns |
SVM (Linear) | Linear max-margin | ✅ | Robust separator, few params | Overlap regions | |
SVM (RBF) | Nonlinear | ✅ | Powerful on small data | Tune C ,gamma |
|
KNN | Nonlinear (data-driven) | ✅ | ❌ | Simple, local structure | Slow predict, noisy, scales matter |
Decision Tree | Nonlinear | ❌ | ✅ (leaf freq) | Explainable |
ID | Outlook | Temperature | Play Tennis |
---|---|---|---|
1 | Sunny | Hot | No |
2 | Sunny | Hot | No |
3 | Overcast | Hot | Yes |
4 | Rain | Mild | Yes |
5 | Rain | Cool | Yes |
6 | Rain | Hot | No |
7 | Overcast | Cool | Yes |
8 | Sunny | Mild | No |
Kernel | Best For | Parameters to Tune | Risk |
---|---|---|---|
Linear | Linearly separable data | None | Poor for complex data |
Polynomial | Structured patterns | c, d | Overfitting with high d |
RBF | Complex, non-linear data | γ | Sensitive to γ |
Sigmoid | Neural network-like problems | α | Can be unstable |
Model Family | Scale? | Notes |
---|---|---|
KNN / K-Means | ✅ | Distances must be comparable across features |
SVM (linear/RBF) | ✅ | Margin/RBF kernel sensitive to scale |
Logistic/Linear (with penalties) | ✅ | Coefficients & penalties become comparable |
Neural Nets | ✅ | Helps optimization & stability |
Trees / Random Forest / Gradient boost decision tree | ❌ | Split thresholds are rank-based |
Naive Bayes | ❌ | Usually fine unscaled (Gaussian NB ok either way) |
Scaler Type | Outlier Robust? | Distribution Goal | Range/Bias | Sparse-safe |
---|---|---|---|---|
StandardScaler | ❌ | ~Gaussian (z-scores) | Mean 0, Var 1 | |
MinMaxScaler | ❌ | Preserve shape | [0,1] (or custom) | ✅ |
RobustScaler | ✅ | Median/IQR-based | No fixed range | |
MaxAbsScaler | ❌ | Preserve signs | [-1,1] | ✅ |
PowerTransformer | More Gaussian (de-skew) | No fixed range | ❌ | |
QuantileTransformer | ✅ | Uniform/Normal via ranks | [0,1] or Normal | ❌ |
Normalizer (row-wise) |
count | mean | std | min | 25% | 50% | 75% | max | skew | kurtosis | |
---|---|---|---|---|---|---|---|---|---|---|
symboling | 201.0 | 0.840796 | 1.254802 | -2.000000 | 0.000000 | 1.000000 | 2.000000 | 3.000000 | 0.197370 | -0.707178 |
normalized-losses | 201.0 | 122.000000 | 31.996250 | 65.000000 | 101.000000 | 122.000000 | 137.000000 | 256.000000 | 0.846546 | 1.319068 |
wheel-base | 201.0 | 98.797015 | 6.066366 | 86.600000 | 94.500000 | 97.000000 | 102.400000 | 120.900000 | 1.031261 | 0.948445 |
length | 201.0 | 0.837102 | 0.059213 | 0.678039 | 0.801538 | 0.832292 | 0.881788 | 1.000000 | 0.154446 | -0.065192 |
width | 201.0 | 0.915126 | 0.029187 | 0.837500 | 0.890278 | 0.909722 | 0.925000 | 1.000000 | 0.875029 | 0.678655 |
symboling | normalized-losses | make | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | length | ... | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | city-L/100km | horsepower-binned | diesel | gas |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 122 | alfa-romero | std | two | convertible | rwd | front | 88.6 | 0.81115 | ... | 9.0 | 111.0 | 5000.0 | 21 | 27 | 13495.0 | 11.190476 | Medium | 0 | 1 |
3 | 122 | alfa-romero | std | two | convertible | rwd | front | 88.6 | 0.81115 | ... | 9.0 | 111.0 |