Skip to content

Instantly share code, notes, and snippets.

@persiyanov
Last active October 21, 2021 15:35
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save persiyanov/1d5499fd10802fa309d803359039a93b to your computer and use it in GitHub Desktop.
Save persiyanov/1d5499fd10802fa309d803359039a93b to your computer and use it in GitHub Desktop.
How-to get Amazon EC2 instance and do machine learning on it. Jupyter 4.0.6 server and Python 2.7.

Goal

Want to move computation on machine with much power. We will set up Anaconda 4.0.0 and XGBoost 0.4 (it is tricky installable).

Preliminaries

Let's start

AWS Console and launching EC2 Instance.

  • Register on https://aws.amazon.com/ and Sign In to the AWS Console.
  • Get your $$$ from Amazon AWS Educate and GitHub Students Pack and activate them in your account settings (top right corner in AWS Console), "Credits" tab.
  • Go to EC2 tab.
  • Click Launch Instance button.

Step 1. Choose AMI

Choose Ubuntu Server 14.04 LTS (HVM), SSD Volume Type. Click Next.

Step 2. Choose Instance Type

Now you need to choose Instance Type. Types c3,c4,g2,r3* seem good for our tasks. You can compare them on http://www.ec2instances.info/ (Don't forget to choose your region. You can determine and change it in top right corner of your AWS Console.). We choose c4.4xlarge. It costs ~20-25 cents per hour.

Click Next.

Step 3. Configure Instance

There are two types of Amazon Instances: On-Demand and Spot Instances. Read about them here: https://aws.amazon.com/ec2/spot/. In few words, Spot Instances have auction on Amazon EC2 power.

If you want to pay less (~20-25 cents as was said) you need to check Request Spot instances. You will see current prices on this type of instance in your region. Set price on 2-5 cents higher than maximum of three prices if you want to get an instance as fast as possible.

Click Next.

Step 4. Add Storage

  • Amazon provides 30GB SSD storage device for free. So replace default 8 with 30GiB.
  • Uncheck Delete on Termination. It prevents from deleting your storage after instance terminating.

Click Next.

Step 5. Tag Instance

No changes here. Press Next.

Step 6. Configure Security Group

  • Check Create a new security group
  • Remain SSH rule, and add two more rules: (HTTPS/TCP/443/Anywhere) and (Custom TCP Rule/TCP/8888/Anywhere).

Click Review and Launch and then Launch.

Step 7. Creating key pair

Create new key pair following Amazon's instructions and download it. You need this file for connecting to your instance through SSH.

Click Request Spot Instance.

You will see picture like this:

Connecting instance and configuring SSH config

Wait until your Instance State will be Running and click Connect button.

According to instructions, add read permissions:

chmod 400 aws_c4_xlarge.pem

You can follow these instructions and connect to your instance over SSH with command like this:

ssh -i "aws_c4_xlarge.pem" ubuntu@ec2-52-38-217-74.us-west-2.compute.amazonaws.com

but more handy is to connect via

ssh aws

And that's how to do that:

  • Create SSH config or edit existing adding these lines:
Host aws
	HostName ec2-52-11-148-133.us-west-2.compute.amazonaws.com 
	User ubuntu
	IdentityFile ~/.ssh/aws_c4_xlarge.pem
  • Indentaion with TAB
  • Instead of 'aws' you can make your own alias.
  • You will have different HostName, check instructions clicking Connect button in your AWS Console.

Reopen your terminal and connect to the instance via ssh aws.

Configuring environment

In this section, we will install Anaconda 4.0.0, XGBoost 0.4 and set up Jupyter server.

Anaconda

To install Anaconda, execute the following lines:

wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-4.0.0-Linux-x86_64.sh
chmod +x Anaconda2-4.0.0-Linux-x86_64.sh
./Anaconda2-4.0.0-Linux-x86_64.sh

Install Anaconda just pressing Enter and typing 'yes' everywhere. Reconnect to the instance:

logout
ssh aws

Create virtual environment with Python 2.7:

conda create --name venv anaconda

To activate virtual environment, type source activate venv, to deactivate source deactivate.

XGBoost

Activate your virtual environment.

Install XGBoost:

sudo apt-get install git make g++ python-setuptools
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
make -j4
cd python-package
sudo python setup.py install

Last command can raise exception ImportError: No module named numpy.distutils.core. Whatever, XGBoost is correctly installed. Only thing we need is to add the package to PYTHONPATH:

echo "export PYTHONPATH=~/xgboost/python-package" > ~/.bash_profile
source ~/.bash_profile

That's all! Try to import it.

Jupyter

You need to generate Jupyter config to start remote server. The simplest way is the following:

cd ~
wget https://raw.githubusercontent.com/persiyanov/ml-mipt/master/amazon-howto/jupyter_notebook_ec2.sh
chmod +x jupyter_notebook_ec2.sh
./jupyter_notebook_ec2.sh

Enter the password which you want to use while connecting to Jupyter through browser. Repeat it. Then press Enter several times. This is the my log:

(venv)ubuntu@ip-172-31-12-235:~$ chmod +x jupyter_notebook_ec2.sh 
(venv)ubuntu@ip-172-31-12-235:~$ ./jupyter_notebook_ec2.sh 
Writing default config to: /home/ubuntu/.jupyter/jupyter_notebook_config.py
Enter password: 
Verify password: 
Generating a 1024 bit RSA private key
...........................................................................................++++++
...........++++++
writing new private key to 'mycert.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:        
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:
Email Address []:
(venv)ubuntu@ip-172-31-12-235:~$ 
(venv)ubuntu@ip-172-31-12-235:~$ ls
anaconda2  Anaconda2-4.0.0-Linux-x86_64.sh  certs  jupyter_notebook_ec2.sh  xgboost

To start Jupyter server, you need to execute jupyter notebook --certfile=~/certs/mycert.pem --keyfile ~/certs/mycert.key command or download my bash script which executes this line:

wget https://raw.githubusercontent.com/persiyanov/ml-mipt/master/amazon-howto/start-jupyter.sh
chmod +x start-jupyter.sh
./start-jupyter.sh

Now server has started. Try to connect it via HTTPS. Type in your browser https://<hostname>:8888 or https://<public_ip>:8888. Hostname is the similar to that we used writing SSH config. You can always determine your HostName and Public Ip in AWS Console clicking at your instance. In my case: https://ec2-52-38-217-74.us-west-2.compute.amazonaws.com:8888.

Now you can connect to Jupyter and run your notebooks at EC2 instance! But that's not the end. We want to make our interaction with instance more comfortable. Next two sections are about that.

tmux

We want to off our computer or disconnect from the Internet but preserve computing our models on EC2 instance. As for now, we will lost our SSH session if something from this will happen. And for solving this problem we use tmux:

tmux new -s .
./start-jupyter.sh

We have just started Jupyter server in tmux session. As soon as we did it, we can close this SSH connection and all processes will retain.

Amazon AMI

We don't want to set up Anaconda, XGBoost, SSH Config and other each time we start new instance. We want to preserve this state. For this purpose, we use Amazon AMI.

In your Amazon AWS Console, in the tab Instances, select your instance, click Actions -> Image -> Create Image. Name your image and click Create Image.

Next time you want to start instance, at the Step 1 select tab My AMIs and choose your AMI.

@rosscleung
Copy link

THANK YOU FOR THIS TUTORIAL!

I've been scratching my head for days trying to get XGBoost onto my Ec2 instance. I've followed your syntax to the dot and I got XGboost installed. But when I tried to import xgboost in my Jupyter notebook, I get the following error. I'm a Linux newbie so I have absolutely no idea what to do...Any help will be appreciated!

OSError Traceback (most recent call last)
in ()
1 import pandas as pd
----> 2 import xgboost

/home/ubuntu/xgboost/python-package/xgboost/init.py in ()
9 import os
10
---> 11 from .core import DMatrix, Booster
12 from .training import train, cv
13 from . import rabit # noqa

/home/ubuntu/xgboost/python-package/xgboost/core.py in ()
110
111 # load the XGBoost library globally
--> 112 _LIB = _load_lib()
113
114

/home/ubuntu/xgboost/python-package/xgboost/core.py in _load_lib()
104 if len(lib_path) == 0:
105 return None
--> 106 lib = ctypes.cdll.LoadLibrary(lib_path[0])
107 lib.XGBGetLastError.restype = ctypes.c_char_p
108 return lib

/home/ubuntu/anaconda2/lib/python2.7/ctypes/init.pyc in LoadLibrary(self, name)
438
439 def LoadLibrary(self, name):
--> 440 return self._dlltype(name)
441
442 cdll = LibraryLoader(CDLL)

/home/ubuntu/anaconda2/lib/python2.7/ctypes/init.pyc in init(self, name, mode, handle, use_errno, use_last_error)
360
361 if handle is None:
--> 362 self._handle = _dlopen(self._name, mode)
363 else:
364 self._handle = handle

OSError: /home/ubuntu/anaconda2/lib/python2.7/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so)

@fx86
Copy link

fx86 commented Jul 23, 2017

@rosscleung try these - https://askubuntu.com/questions/575505/glibcxx-3-4-20-not-found-how-to-fix-this-error

What worked for me is the second answer : conda install libgcc

@akosturos
Copy link

I am leaving my terminal window and I am noticing that my jupyter notebook stops running.... I am running python 3.5, could that be why?

@sumit-t
Copy link

sumit-t commented Nov 1, 2017

Nice tutorial. You can also use AWS Deep Learning AMI that are available for free. They have anaconda, jupyter and popular deep learning frameworks pre installed on them - https://aws.amazon.com/blogs/ai/announcing-new-aws-deep-learning-ami-for-amazon-ec2-p3-instances/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment