Skip to content

Instantly share code, notes, and snippets.

@josepablog
Last active May 30, 2018 11:04
Show Gist options
  • Save josepablog/cb7fd076ad4ccb63b199cf2c352bb5fe to your computer and use it in GitHub Desktop.
Save josepablog/cb7fd076ad4ccb63b199cf2c352bb5fe to your computer and use it in GitHub Desktop.
Extracting features from a Pandas Dataframe does not work out of the box with DictVectorizer. This is an efficient way to extract your categorical features
from sklearn.feature_extraction import DictVectorizer
import pandas as pd
df = pd.DataFrame({"user_name": ["a", "b", "c"]})
fe_lm = DictVectorizer()
design_lm = fe_lm.fit_transform(df.to_dict(orient="records"))
# Note that this solution is *MUCH* faster (60 times) than transposing and converting into a dictionary:
# http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/ is much slower
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment