Simple parallelization using mpi4py

For basic applications, MPI is as easy to use as any other message-passing system. The sample scripts below contain the complete communications skeleton for a data (or embarrassingly) parallel problem using the mpi4py package.

Within the code is a description of the few functions needed to write typical parallel applications. - Parallel application with simple partitioning: unbalanced load. - Parallel application with master/slave scheme: dynamically balanced load.


Install Apache Spark (OSX 10.6)

You need the package manager Homebrew (if not installed), see

ruby -e "$(curl -fsSL"

Get Spark 'Source Code':

Software install on Mac OS X

Install Fonts for Matplotlib (Helvetica)

Download a font converter (no need to install, unzip and launch the GUI):

Working on remote machines

Some (trivial?) commands to work on a remote machine using the SSH, FTP and HTTP protocols.

SSH certificate

Generate SSH keys for using a SSH connection without having to enter the password every time you log in.

  1. Generate the key (local machine):

Grid computing with Apple's Xgrid

How to set up a grid of computers using the Mac OS X desktop version.

The Xgrid agent (worker)

Any machine can be an agent. Configure the agent using the Sharing Pane of System Preferences. See Managing Xgrid Agents.


Command line args in Fortran 90/95

Although I'm not a big fan of Fortran, here is a simple example of parsing Unix-like command line arguments to a program following the modern standard of F2003. Note that the following example also works for any F90/F95 code compiled with gfortran or g95.


$ ./thisprog -h 

usage: ./thisprog [-h] [-a ARG_A] [-b ARG_B] [-c]

When to use a specific tool

Brief list describing scenarios calling for specific tools.

If you have (larger-than-memory) petabytes of JSON/XML/CSV files, a simple workflow, and a thousand node cluster

If you have (larger-than-memory) 10s-1000s of gigabytes of binary or numeric data (e.g., HDF5, netCDF4, CSV.gz), complex algorithms, and a (single) large multi-core workstation