Skip to content

Instantly share code, notes, and snippets.

@ellisonbg
Created August 20, 2011 18:11
Show Gist options
  • Save ellisonbg/1159431 to your computer and use it in GitHub Desktop.
Save ellisonbg/1159431 to your computer and use it in GitHub Desktop.
Running IPython.parallel on Microsoft Azure
============================
Configuring IPython.parallel
============================
This guide describes the steps needed to run IPython engines on Microsoft Azure to perform a parallel
computation in the Microsoft cloud. We assume (see above guide) that Python, IPython and PyZMQ have
been installed on a set of Azure compute nodes.
1. Install Python, IPython and PyZMQ on a system having open access to the internet. We used an
Ubuntu 11.04 system for this purpose. This system will run the IPython controller and the IPython
engines running on Azure will connect to the controller.
2. Create an IPython profile with the parallel configuration files::
ipython profile create default --parallel
3. Edit the controller configuration file `ipcontroller_config.py`. The following lines were added::
c.IPControllerApp.reuse_files = True
c.IPControllerApp.log_to_file = False
c.IPControllerApp.location = [your public IP address]
c.HubFactory.control = (10000,10001)
c.HubFactory.task = (10002,10003)
c.HubFactory.ip = '*'
c.HubFactory.mux = (10004,10005)
c.HubFactory.notifier_port = 10006
c.HubFactory.regport = 10007
c.HubFactory.engine_ip = '*'
c.HubFactory.iopub = (10008,10009)
c.HubFactory.hb = (10010,10011)
c.HubFactory.monitor_ip = '*'
4. Open up the firewall on ports 10000-10011.
5. Start the controller::
ipcontroller
6. Copy the security file `ipcontroller-engine.json` to the Azure compute nodes. This file need to be
put into the directory "D:\Users\username\.ipython\profile_default\security." for this purpose, we
use `hpcpack`, but you can also just do this by hand.
7. Start the engines using the fabric script shown above:
fab start_ipengines:16
At this point, your Azure based IPython compute cluster is ready for use.
"""A fabric script for starting IPython engines on Windows HPC Server.
This script assumes the following:
* Python, IPython and PyZMQ are installed on a set of compute nodes. Those
compute node can be either local cluster nodes or Azure nodes.
* Those compute nodes have been joined to a node group called "ipython".
* The ipcontroller-engine.json file has been copied into the default profile
directory.
* The cluster-wide IPYTHONDIR envinronment variable has been set to point to
the location of the .ipython directory on the compute nodes. The syntax for
this is: "cluscfg setenvs IPYTHONDIR=C:\Users\username\.ipython".
* Python and fabric are installed on the head node where this script is
located.
To start IPython engines do:
$ fab start_ipengines:16
To stop IPython engines do:
$ fab stop_ipengines
"""
from fabric.api import local, abort
import re
import os
node_group = 'ipython'
ipengine_path = 'D:\Python27\Scripts\ipengine.exe'
def save_job_id(job_id):
"""Save a job id to disk."""
with open('job.id','w') as f:
f.write(job_id)
def load_job_id():
"""Load a job id from disk."""
if not os.path.isfile('job.id'):
abort("No job running")
with open('job.id','r') as f:
job_id = f.read()
return job_id
def new_job():
"""Create a new job and return the job_id."""
r = local('job new /nodegroup:%s' % node_group, capture=True)
m = re.search(r'\d+', r)
if m is not None:
job_id = m.group()
else:
abort('Job ID could not be parsed')
save_job_id(job_id)
print "Job started with ID:", job_id
return job_id
def cancel_job():
"""Cancel the running job."""
job_id = load_job_id()
local('job cancel %s' % job_id)
print "Job cancelled:", job_id
def start_ipengines(n='4'):
"""Start n IPython engines on the nodegroup."""
job_id = new_job()
n = int(n)
for i in range(n):
local('job add %s %s --debug' % (job_id, ipengine_path))
local('job submit /id:%s' % job_id)
def stop_ipengines():
"""Stop the running IPython engines."""
cancel_job()
================================================
Installing Python and IPython on Microsoft Azure
================================================
Introduction
============
Azure is Microsoft's cloud computing platform. Azure integrates with the Windows HPC Server 2008 R2 SP2 job
scheduler. This allows you to use the Windows HPC job scheduler to schedule jobs on computing nodes running
in Azure. This guide describes the steps needed to install Python and IPython on Azure compute nodes.
The guide assumes the following:
* A Windows HPC Server 2008 R2 SP2 head node has been configured.
* Azure compute nodes have been started and joined to the head node.
* The Azure compute nodes to be used have been joined to a node group called "ipython".
* You can use Remote Desktop Connection to log onto the Azure compute nodes.
Installation
============
We now describe the steps needed to install the following packages on the Azure compute nodes:
* Python
* Distribute
* IPython
* PyZMQ
This same procedure could be used to install other packages as well.
Upload the packages to the compute nodes using Azure storage
------------------------------------------------------------
The simplest way of installing the above packages is to simply use Remote Desktop Connection to log
onto the compute nodes, download the packages and install them. However, to save time, we will use
`hpcpack` command to upload the packages to the compute nodes in one shot.
1. Download the Python 2.7 Windows installer and `distribute_setup.py` and put them in a local directory
named "python27".
2. Create an Azure package by doing::
hpcpack create python27.zip .\python27
3. Upload the package to Azure storage by doing::
hpcpack upload python27.zip /nodetemplate:"Default AzureNode Template" \relativepath:python27
4. Sync the package to the Azure compute nodes::
clusrun /nodegroup:ipython hpcsync
At this point, the python27 directory should be present on all the compute nodes. This can be tested
by doing::
clusrun /nodegroup:ipython dir %CCP_PACKAGE_ROOT%python27
Installation using `clusrun`
----------------------------
With the packages uploaded to the compute nodes, we can now use `clusrun` to install them.
1. Install Python::
clusrun /nodegroup:ipython msiexec /i %CCP_PACKAGE_ROOT%python27\python-2.7.2.msi /qn
2. Install Distribute
clusrun /nodegroup:ipython D:\Python27\python.exe %CCP_PACKAGE_ROOT%python27\distribute_setup.py
3. Install PyZMQ
clusrun /nodegroup:ipython D:\Python27\Scripts\easy_install.exe pyzmq
4. Install IPython
clusrun /nodegroup:ipython D:\Python27\Scripts\easy_install.exe ipython
At this point, Python, PyZMQ and IPython have been installed on the Azure compute nodes. The
best way of confirming that the installation was successful is to log on to the nodes using
Remote Desktop Connection and try running IPython::
D:\Python27\Scripts\ipython.exe
Manual installation
-------------------
There is one problem that came up in installing things using `clusrun`. We were unable to get
NumPy installed in the same manner. It appears that when Python is installed using `clusrun`,
the registry is not properly updated, so NumPy does not see that Python is installed and will
refuse to run.
If you need NumPy to be installed, we found that simply logging onto the Azure compute nodes
using Remote Desktop Connection and performing the installation by hand worked fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment