Skip to content

Instantly share code, notes, and snippets.

View josepablog's full-sized avatar

José P. González-Brenes josepablog

View GitHub Profile
@josepablog
josepablog / DictVectorizer_Pandas.py
Last active May 30, 2018 11:04
Extracting features from a Pandas Dataframe does not work out of the box with DictVectorizer. This is an efficient way to extract your categorical features
from sklearn.feature_extraction import DictVectorizer
import pandas as pd
df = pd.DataFrame({"user_name": ["a", "b", "c"]})
fe_lm = DictVectorizer()
design_lm = fe_lm.fit_transform(df.to_dict(orient="records"))
# Note that this solution is *MUCH* faster (60 times) than transposing and converting into a dictionary:
# http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/ is much slower
@josepablog
josepablog / to_redshift.py
Last active August 10, 2021 02:22 — forked from TomAugspurger/to_redshift.py
to_redshift.py
import gzip
from functools import wraps
import boto3
from sqlalchemy import MetaData
from pandas import DataFrame
from pandas.io.sql import SQLTable, pandasSQL_builder
import psycopg2
import codecs
import cStringIO
from io import BytesIO