Skip to content

Instantly share code, notes, and snippets.

@minesh1291
Created April 18, 2018 14:54
Show Gist options
  • Save minesh1291/af5af7b25a1157611f3120e1d5aa7643 to your computer and use it in GitHub Desktop.
Save minesh1291/af5af7b25a1157611f3120e1d5aa7643 to your computer and use it in GitHub Desktop.
Setting up Ubuntu 14.04 for Deep Learning, PySpark, and Climate Science

Setting up an Ubuntu 14.04 clean install for Development with PySpark and Deep Learning

  • Assumes NVIDIA GPU
  • Prefers Ubuntu native packages over Docker for simplicity
sudo apt-get update && sudo apt-get -y upgrade

0. Install Java

sudo apt-get -y install python-software-properties
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
sudo apt-get -y install oracle-java8-installer
sudo apt-get install -y oracle-java8-set-default
sudo apt-get install -y build-essential emacs24-nox unzip apt-transport-https ca-certificates git zsh lzop

1. Install docker

sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
sudo echo 'deb https://apt.dockerproject.org/repo ubuntu-trusty main' | sudo tee -a /etc/apt/sources.list.d/docker.list
sudo apt-get update && sudo apt-get install -y docker-engine

2. Setup ssh keys

mkdir ~/.ssh
# This line will generate a new key if you need one. You can also scp or rsync
# a key to this new system remotely.
ssh-keygen -t rsa -b 2048

3. Install CUDA

mkdir ~/download
cd download && wget 'http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb'
sudo dpkg -i cuda*1404*deb
sudo apt-get update
# Yes this installs a ton of junk, especially on a server … but it’s not a big deal
sudo apt-get -y install cuda
cd

4. Install Extra Hard Drives if Available

This is specific to my system, but if you have extra hard drives that need to be mounted, it might be helpful. You can get the unique ID of a drive by using sudo blkid

# Specific to my hard drives, but add these lines to /etc/fstab
## Other people probably don’t want to run these lines
# echo 'UUID=59bd0993-48f9-4aea-8f69-b54e9baa4be3   /scratch4   ext4   errors=remount-ro  0  1' | sudo tee -a /etc/fstab
# echo 'UUID=5f292600-db1c-413b-85cb-9468adffa005   /scratch1   ext4   errors=remount-ro  0  1' | sudo tee -a  /etc/fstab
# sudo mkdir -p /scratch4 /scratch1

5. Download and activate ZSH (optional)

Some environment variables set in this zshrc are necessary for later.

wget 'https://gist.githubusercontent.com/r-shekhar/ed3e1ba3837bb7d5c7589b25b571af1d/raw/.zshrc'
chsh -s /bin/zsh ${USER}

If you did the above, you can skip this block. These are the environment variables I set, but this snippet might be out of date. Get the latest one from the wget command above.

export PATH="${HOME}/anaconda2/bin:$PATH"

export PATH=/usr/local/cuda-7.5/bin:${HOME}/anaconda2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:${LD_LIBRARY_PATH}

JAVA_HOME=`which java`
JAVA_HOME=`readlink -f ${JAVA_HOME}`
export JAVA_HOME=`echo ${JAVA_HOME} | sed -e 's/\/bin\/java//'`
export JRE_HOME=${JAVA_HOME}

export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=9755"

6. Install Anaconda and Reboot

cd ${HOME}/download && wget http://repo.continuum.io/archive/Anaconda2-4.0.0-Linux-x86_64.sh && bash Anaconda*sh -b

Don't skip this. Without it CUDA will not be available.

sudo reboot

7. Test Cuda

cuda-install-samples-7.5.sh ${HOME}
cd ${HOME}/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery
make && ./deviceQuery | grep PASS
cd

8. Install Anaconda, python modules, and Deep Learning software

cd
pip install theano keras lasagne #Deep Learning Libraries
# This installs climate science libraries
conda install -c ioos iris cartopy windspharm pyspharm eofs h5netcdf nco palettable xarray
# Install spark and scala
conda install -c anaconda-cluster scala spark apache-maven
# The conda modules are preferable to outdated ubuntu modules
conda install psycopg2 pymongo

# Install tensorflow (0.8 here, you can use a more recent one if available)
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

9. Setup Your .theanorc

Create and edit your ~/.theanorc to contain the following

[global]
floatX = float32
device = gpu0

[lib]
cnmem = 0

10. Install CudNN libraries

These libraries must be downloaded manually. Get them from NVIDIA CUDNN

  1. Download the Cudnn v4 and v5 libraries manually

  2. scp it from your download location to the Ubuntu system

  3. Install it.

    sudo dpkg -i libcudnn5*deb
    sudo tar -xvf cudnn-7.0-linux-x64-v4.0-prod.tar -C /usr/local/

11. Test Deep learning software

You should be able to start python and import theano, keras, and tensorflow without errors.

$ python
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec  6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import keras
Using Theano backend.
Using gpu device 0: GeForce GTX 950 (CNMeM is disabled, cuDNN 4007)
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

12. Install postgresql

sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'
wget -q https://www.postgresql.org/media/keys/ACCC4CF8.asc -O - | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install postgresql postgresql-contrib postgis postgresql-server-dev-9.5

# Optional: made a postgresql directory on /scratch1
sudo mkdir -p /scratch1/postgresql
# Initialize Clean Database
sudo /usr/lib/postgresql/9.5/bin/initdb /scratch1/postgresql
# changed owner to postgresql:postgresql
sudo chown -R postgresql:postgresql /scratch1/postgresql

# Manually adjust data directory to /scratch1/postgresql
sudo emacs -nw /etc/postgresql/9.5/main/postgresql.conf

# restarted postgresql with
sudo service postgresql restart

13. Install Mongodb

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install -y mongodb-org

This is something I'm currently working on and is not fully functional

# Optional: Setup Hadoop
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo su - hduser
$ ssh-keygen -t rsa -P “"
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

# Add the following three lines to /etc/sysctl.conf and reboot to disable IPv6
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

wget 'http://apache.osuosl.org/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment