Skip to content

Instantly share code, notes, and snippets.

@kgilpin
Created June 25, 2024 15:00
Show Gist options
  • Save kgilpin/c15fda05ee41e1f6ba16df33c8e9d869 to your computer and use it in GitHub Desktop.
Save kgilpin/c15fda05ee41e1f6ba16df33c8e9d869 to your computer and use it in GitHub Desktop.
Navie-generated code change
<changeset>
<change>
<file change-number-for-this-file="1">sklearn/preprocessing/_encoders.py</file>
<original line-count="14" no-ellipsis="true"><![CDATA[
for i in range(n_features):
Xi = X[:, i]
diff, valid_mask = _encode_check_unknown(Xi, self.categories_[i],
return_mask=True)
if not np.all(valid_mask):
if handle_unknown == 'error':
msg = ("Found unknown categories {0} in column {1}"
" during transform".format(diff, i))
raise ValueError(msg)
else:
# Set the problematic rows to an acceptable value and
# continue `The rows are marked `X_mask` and will be
# removed later.
X_mask[:, i] = valid_mask
Xi = Xi.copy()
Xi[~valid_mask] = self.categories_[i][0]
]]></original>
<modified no-ellipsis="true"><![CDATA[
for i in range(n_features):
Xi = X[:, i]
diff, valid_mask = _encode_check_unknown(Xi, self.categories_[i],
return_mask=True)
if not np.all(valid_mask):
if handle_unknown == 'error':
msg = ("Found unknown categories {0} in column {1}"
" during transform".format(diff, i))
raise ValueError(msg)
else:
# Set the problematic rows to an acceptable value and
# continue `The rows are marked `X_mask` and will be
# removed later.
X_mask[:, i] = valid_mask
Xi = Xi.copy()
if np.issubdtype(Xi.dtype, np.str_) and len(self.categories_[i][0]) > len(Xi[~valid_mask][0]):
Xi = Xi.astype(object)
Xi[~valid_mask] = self.categories_[i][0]
]]></modified>
</change>
</changeset>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment