Skip to content

Instantly share code, notes, and snippets.

@jnothman
Last active August 21, 2017 03:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jnothman/a75bac778c1eb9661017555249e50379 to your computer and use it in GitHub Desktop.
Save jnothman/a75bac778c1eb9661017555249e50379 to your computer and use it in GitHub Desktop.
vectorize a pandas dataframe with scikit-learn <= 0.19
from sklearn.feature_extraction import DictVectorizer
class PandasVectorizer(DictVectorizer):
def fit(self, x, y=None):
return super(PandasVectorizer, self).fit(x.to_dict('records'))
def fit_transform(self, x, y=None):
return super(PandasVectorizer, self).fit_transform(x.to_dict('records'))
def transform(self, x):
return super(PandasVectorizer, self).transform(x.to_dict('records'))
"""
>>> import pandas as pd
>>> from pandasvectorizer import PandasVectorizer
>>> df = pd.DataFrame({'a': [1,2,3], 'b': ['a', 'b', 'a']})
>>> PandasVectorizer().fit_transform(df).toarray()
array([[ 1., 1., 0.],
[ 2., 0., 1.],
[ 3., 1., 0.]])
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment