Skip to content

Instantly share code, notes, and snippets.

@asadharis
Forked from dan-blanchard/create_single_machine_sge.md
Last active November 16, 2023 08:40
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save asadharis/9d14da97d9ad1f8eccc36dc14390e4e0 to your computer and use it in GitHub Desktop.
Save asadharis/9d14da97d9ad1f8eccc36dc14390e4e0 to your computer and use it in GitHub Desktop.
How to setup a single-machine (Sun) Grid Engine installation for unit tests on Travis-CI

Setting up a SGE cluster on a single Amazon EC2 machine

The gist here provides a script to automate the process of installing Sun Grid Engine (SGE) on a single EC2 machine.

Motivation

SGE is a job scheduler for a computing cluster. This usually involves a cluster of multiple machines. However for many applications we don't need a massive computing cluster and a cluster of 8-30 nodes would be sufficient. In this tutorial we set-up SGE on a single amazon EC2 machine. The reasons for doing so are as follows:

  1. Automation: Setting-up a cluster with SGE is fairly involved as it requires multiple machines communicating with each other and having some shared memory. A single machine with multiple cores is already a simple cluster where the memory is shared across cores.
  2. Moderate Size: Amazon EC2 instances provide a variety of computing options with the number of cores ranging from 1 to 128.
  3. Cost: The On-Demand price structure of AWS makes this a relatively cheap option. Further cost reduction can be achieved by using spot instances.

Tutorial: Setting up SGE on an EC2 instance.

Prerequisites

This tutorial assumes the following

  • User has an AWS account
  • User can start an amazon EC2 instance
  • User can SSH into a started EC2 instance
  • (Optional) User has downloaded all needed software/packages. For users working with R, there are numerous publicly available script which automate installing R and some required packages. Alternatively, I recommend using an AWS machine image (AMI) which comes with R pre-installed, my personal favorite is one by Louis Aslett.

Cluster setup

Once you have SSH'ed into your instance run the following commands:

git clone https://gist.github.com/9d14da97d9ad1f8eccc36dc14390e4e0.git files/
cd files
sudo chmod +x install_sge.sh loop.sh sleep.sh
./install_sge.sh
./loop.sh
  1. The first command installs all the files we will need into a folder files. You can use a different folder if you like.
  2. Changes the directory to where we downloaded the files
  3. Makes the scripts executable
  4. Runs the script which will ask permission for installing files, say Y to all.
  5. Sends a couple of jobs to the cluster to test it

The last command is optional but is a good way to check if the cluster is working, once the jobs have been submitted we can check the status by running

qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@ip-172-31-9-18.us-west-2 BIP   0/4/4          0.04     lx26-amd64
     14 0.50000 sleep.sh   ubuntu       r     12/02/2016 20:55:09     1
     15 0.50000 sleep.sh   ubuntu       r     12/02/2016 20:55:09     1
     16 0.50000 sleep.sh   ubuntu       r     12/02/2016 20:55:09     1
     17 0.50000 sleep.sh   ubuntu       r     12/02/2016 20:55:09     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     18 0.50000 sleep.sh   ubuntu       qw    12/02/2016 20:54:47     1
hostname HOST
load_scaling NONE
complex_values NONE
user_lists arusers
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
#!/bin/bash
# This script installs and configures a Sun Grid Engine installation for use
# on a Travis instance.
#
# Written by Dan Blanchard (dblanchard@ets.org), September 2013
#
# Edited by Asad Haris, September 2016 for runnig for amazon ec2.
# Since we are no longer on a localmachine but an ec2 AMI we will use the HOSTNAME
# sudo sed -i -r "s/^(127.0.0.1\s)(localhost\.localdomain\slocalhost)/\1localhost localhost.localdomain $(hostname) /" /etc/hosts
# Update first
sudo apt-get update -qq
# Set parameters which will allow us to install SGE without opening pop-up.
# The first option was missing before, not having this still opens the pop-up and asks for email
# configuration.
echo "postfix postfix/main_mailer_type select No configuration" | sudo debconf-set-selections
# Recall that we are now going to set master as the instance hostname.
echo "gridengine-master shared/gridenginemaster string $(hostname)" | sudo debconf-set-selections
# THe rest remains unchanged.
echo "gridengine-master shared/gridenginecell string default" | sudo debconf-set-selections
echo "gridengine-master shared/gridengineconfig boolean true" | sudo debconf-set-selections
# The first main step. Install grid engine.
sudo apt-get install gridengine-common gridengine-master
# Do this in a separate step to give master time to start
# The next line changes slightly. I install gridengine-drmaa1.0 since I am use Ubuntu 14.04 for amazon EC2.
sudo apt-get install gridengine-drmaa1.0 gridengine-client gridengine-exec
# Obtain the number of cores and some parts remain unchanged.
export CORES=$(grep -c '^processor' /proc/cpuinfo)
sed -i -r "s/template/$USER/" user_template
sudo qconf -Auser user_template
sudo qconf -au $USER arusers
# Instead of adding localhost as submitter add the hostname.
sudo qconf -as $HOSTNAME
# Add the host name.
sed -i -r "s/HOST/$HOSTNAME/" host_template
sudo qconf -Ae host_template
# Specify number of cores.
sed -i -r "s/UNDEFINED/$CORES/" queue_template
# Add the host name.
sed -i -r "s/HOST/$HOSTNAME/" queue_template
sudo qconf -Ap smp_template
sudo qconf -Aq queue_template
echo "Printing queue info to verify that things are working correctly."
qstat -f -q all.q -explain a
echo "You should see sge_execd and sge_qmaster running below:"
ps aux | grep "sge"
#!/bin/csh
set s = 1
while( $s < 10 )
qsub -cwd -o /dev/null -e /dev/null sleep.sh
@ s++
end
qname all.q
hostlist HOST
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make smp
rerun FALSE
slots UNDEFINED
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists arusers
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
#!/bin/bash
#
date
sleep 10
date
pe_name smp
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
name template
oticket 0
fshare 0
delete_time 0
default_project NONE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment