Skip to content

Instantly share code, notes, and snippets.

@karenyyng
Last active April 7, 2016 03:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karenyyng/78399eb7147730739d4bb151a0c63174 to your computer and use it in GitHub Desktop.
Save karenyyng/78399eb7147730739d4bb151a0c63174 to your computer and use it in GitHub Desktop.
Incomplete list of cool Python data-related projects

Personal list of cool Python related projects to play with. Projects are counted as long as they have a Python API regardless if the underlying code is written in another language.

Data analytics (pipelined) frameworks

  • PySpark
  • GraphLab Create
  • Scikit Learn
  • Tensorflow
  • Cloud Dataflow

Data wrangling

  • Pandas

Specific machine learning / stat. libraries

Neural networks / Deep learning

Topic modeling

  • gensim (can use word2vec without the need to do stemming, remove stop words etc.)

Image processing

Regression

  • XGBoost - for Gradient Boosting trees
  • Statsmodel
  • Pymc - MCMC with hierarchical models and uses graphical models
  • emcee -
  • GPy - Gaussian Process

Bayesian hyperparameter tuning

  • Spearmint - also check out the NIPS paper reference at the repo

Data visualization / reports

  • IPython notebook
  • Matplotlib
  • Seaborn
  • Plotly - can be interactive
  • Bokeh - interactive with D3.js backend

Data file formats

  • H5py
  • Feather - for exchanging dataframes between R and Python
  • Parquet (via PySpark / Py4j currently)

Parallel processing

  • Dask
  • Joblib
  • PySpark
  • MPI4Py

Compilation technology

Other reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment