Skip to content

Instantly share code, notes, and snippets.

@jmansilla
Created August 26, 2016 13:40
Show Gist options
  • Save jmansilla/cdc9cb7407b04e7f4dbbb25f8005afdf to your computer and use it in GitHub Desktop.
Save jmansilla/cdc9cb7407b04e7f4dbbb25f8005afdf to your computer and use it in GitHub Desktop.
import numpy as np
class OneHotTransformer:
def __init__(self, func):
self.f = func
def fit(self, X, y=None):
unseen = object()
seen = set()
for x in X:
seen.add(self.f(x))
self.seen = list(sorted(seen)) + [unseen]
return self
def transform(self, X):
return np.array([self.transform_one(x) for x in X])
def transform_one(self, x):
result = [0] * len(self.seen)
value = self.f(x)
if value in self.seen:
result[self.seen.index(value)] = 1
else:
result[-1] = 1
return result
@rafacarrascosa
Copy link

Using a list.index is slow, perhaps seen could be defined as: self.seen = {key: i for i, key in enumerate(seen)}

@rafacarrascosa
Copy link

@jmansilla to use an extra column for unseen values is new to me: how has it worked out for you? have you found it useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment