Skip to content

Instantly share code, notes, and snippets.

DnanaDev /
Last active Sep 18, 2020
[ML- Tree based Models and Categorical data]

One-hot encoded categorical data and sklearns RF, XGBoost don't work properly.

There seems to be different opinions about using one-hot encoded categorical features with implementations that don't natively support them. Try CatBoost or H20 Random Forrest that support categorical data by design. Also, investigate one-hot encoding not being recommended for features with high cardinality, something to do with creating very sparse features.

For reference : \

DnanaDev /
Last active Jul 26, 2020
SQLite Data Ingestion script
""" Data Ingestion for Covid19 Data Pipeline
Using SQLAlchemy engine to interface to PostgresQL Database.
Functions to create DB according to schema and for ingesting data.
The use case is to run the script and automatically update CSVs in Data/Raw and to
store the cleaned data in the database. Backup of the database in stored in Data/cleaned.
# Data Ingestion Functions
1. add_data_table(engine, tablename, df)
Uses Pandas dataframe from Covid19_india_org_api to append data to table using SQLAlchemy and DF.to_sql()
You can’t perform that action at this time.