Skip to content

Instantly share code, notes, and snippets.

@KWMalik
Created October 12, 2012 21:56
Show Gist options
  • Save KWMalik/3881780 to your computer and use it in GitHub Desktop.
Save KWMalik/3881780 to your computer and use it in GitHub Desktop.
Python packages for Data Analysis

Python packages for Data Analysis, Natural Language Processing, Machine Learning and Statistics

Forever a work in progress.

I recommend you install these from the python package manager - pip.

Keep in mind that you have to resolve the dependencies yourself(unless those dependencies are python packages), if you feel that you cannot do this then please install them from your distro's package manager.

  • rpy2 - Interface Python with R
  • scikit-learn - Machine Learning
  • numpy - Multi-dimensional arrays and matrices and functions
  • scipy - Scientific Tools
  • pymc - Bayesian statistical models, Markov Chain Monte Carlo sampling
  • matplotlib - 2D Plotting
  • pandas - Data Analysis Toolkit
  • pyqt/pygtk - GUIs
  • nltk - Natural Language Toolkit
  • psycopg2 - PostgreSQL database adapter
  • ipython - An enhanced Python interpreter

These should be installed from your distribution's package manager because python-tables/h5py won't pick up your hdf5 PATH if you use pip to install it.

  • python-tables - Huge Dataset Processing
  • h5py - HDF5 Interface

Need better performance?

  • Rewrite parts of your code in C
  • Use cython to generate efficient C code by using static typing in Python - cython also has support for parallelism through OpenMP
  • Use PyPy instead of CPython - PyPy is a faster implementation of the Python language, however NumPy is not yet fully supported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment