View parallelization.md

Simple parallelization using mpi4py

For basic applications, MPI is as easy to use as any other message-passing system. The sample scripts below contain the complete communications skeleton for a data (or embarrassingly) parallel problem using the mpi4py package.

Within the code is a description of the few functions needed to write typical parallel applications.

mpi-submit.py - Parallel application with simple partitioning: unbalanced load.

mpi-submit2.py - Parallel application with master/slave scheme: dynamically balanced load.

View install_spark.md

Install Apache Spark (OSX 10.6)

You need the package manager Homebrew (if not installed), see http://brew.sh/

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Get Spark 'Source Code':

http://spark.apache.org/downloads.html
View macosx.md

Software install on Mac OS X

Install Fonts for Matplotlib (Helvetica)

Download a font converter (no need to install, unzip and launch the GUI):

http://peter.upfold.org.uk/projects/dfontsplitter
View remote.md

Working on remote machines

Some (trivial?) commands to work on a remote machine using the SSH, FTP and HTTP protocols.

SSH certificate

Generate SSH keys for using a SSH connection without having to enter the password every time you log in.

  1. Generate the key (local machine):
View triton.md
View xgrid.md

Grid computing with Apple's Xgrid

How to set up a grid of computers using the Mac OS X desktop version.

The Xgrid agent (worker)

Any machine can be an agent. Configure the agent using the Sharing Pane of System Preferences. See Managing Xgrid Agents.

View argsf90.md

Command line args in Fortran 90/95

Although I'm not a big fan of Fortran, here is a simple example of parsing Unix-like command line arguments to a program following the modern standard of F2003. Note that the following example also works for any F90/F95 code compiled with gfortran or g95.

Example:

$ ./thisprog -h 

usage: ./thisprog [-h] [-a ARG_A] [-b ARG_B] [-c]
View when_to_use.md

When to use a specific tool

Brief list describing scenarios calling for specific tools.

Spark
If you have (larger-than-memory) petabytes of JSON/XML/CSV files, a simple workflow, and a thousand node cluster

Dask
If you have (larger-than-memory) 10s-1000s of gigabytes of binary or numeric data (e.g., HDF5, netCDF4, CSV.gz), complex algorithms, and a (single) large multi-core workstation