Skip to content

Instantly share code, notes, and snippets.

@hdemers
Last active June 1, 2020 07:17
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save hdemers/5337229 to your computer and use it in GitHub Desktop.
Save hdemers/5337229 to your computer and use it in GitHub Desktop.
Installation instructions for doing data science in a Python environment on Ubuntu. We'll install base packages like numpy, scipy, scikit-learn and pandas. We also install the IPython Notebook interactive environment. This is a best practice recommendation for doing research-type work. We make use of virtualenvwrapper, but don't show how to inst…
mkvirtualenv datascience
sudo apt-get install python-scipy libblas-dev liblapack-dev gfortran
sudo apt-get install libffi-dev # for cryptography from scrapy
sudo apt-get install libxslt-dev # for libxml from scrapy
export BLAS=/usr/lib/libblas.so
export LAPACK=/usr/lib/liblapack.so
pip install numpy
pip install scipy
pip install scikit-learn
pip install pandas
pip install patsy
pip install statsmodels
pip install ipython tornado pyzmq
pip install networkx
pip install gensim
pip install scrapy
pip install numexpr bottleneck
pip install sqlalchemy
pip install nltk
pip install seaborn
@startakovsky
Copy link

Do the export statements stick after reboot? Do they need to be added to a .profile, what are they for?

@startakovsky
Copy link

Also, it seems like numpy and scipy install automatically with the sudo apt-get install python-scipy

@alvations
Copy link

@startakovsky, both command install scipy but they're somewhat different. Using sudo apt-get python-scipy gives you the most stable version that the OS supports but pip install scipy installs the latest version that scipy pushed in to the pypi index.

@MachineTruth
Copy link

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment