Skip to content

Instantly share code, notes, and snippets.

@mkleinbort
Last active June 26, 2018 21:35
Show Gist options
  • Save mkleinbort/b0d89490365369688caced5a7a163f72 to your computer and use it in GitHub Desktop.
Save mkleinbort/b0d89490365369688caced5a7a163f72 to your computer and use it in GitHub Desktop.
GoodToKnows

missingno (msgn)

Good for visualizing missing data and exploring patterns in the missing values of missing data.

pandas_profiling

A 1-line tool for data exploration, statistical analysis and EDA. I was working on a similar library, but this is clearely of excellent quality.

import pandas as pd
from pandas_profiling import ProfileReport

data = pd.read_csv('https://raw.githubusercontent.com/mkleinbort/resource-datasets/master/titanic/titanic.csv')

ProfileReport(data)

Dask

Big data with Python, scales across clusters and uses parallelism and memory management to allow for amazing things. Extends numpy arrays and pandas dataframes (and other things). Used it to clean ~300,000 dataframes (190GB) in ~1.5h running on my laptop.

GeoPy

fbProphet

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment