Created
March 14, 2017 23:22
-
-
Save glemaitre/d42ce13eb32d5c0576f6f6c67042ad18 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from sklearn.preprocessing import QuantileTransformer | |
X = np.array([0] * 1 + [0.5] * 7 + [1] * 2).reshape(-1, 1) | |
qt = QuantileTransformer(n_quantiles=10) | |
qt.fit(X) | |
# a behaviour which is not desired, but that frankly should | |
# not happen will be the following | |
print('0.5 is mapped to {}'.format(qt.transform(0.5))) | |
print('0.4999999 is mapped to {}'.format(qt.transform(0.499999))) | |
# the two values are mapped far from each other since 0.5 | |
# will be mapped to the greater quantiles. | |
# a solution is to add a small noise while computing the | |
# quantiles, making the operation more stable. | |
qt = QuantileTransformer(n_quantiles=10, smoothing_noise=1e-7) | |
qt.fit(X) | |
# a behaviour which is not desired, but that frankly should | |
# not happen will be the following | |
print('0.5 is mapped to {}'.format(qt.transform(0.5))) | |
print('0.4999999 is mapped to {}'.format(qt.transform(0.499999))) | |
# however, this case is unlikely to happen in real-world dataset | |
# and that's why we chose to put the smoothing_noise parameter | |
# to None as default value. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment