Skip to content

Instantly share code, notes, and snippets.

@Atlas7
Last active August 10, 2017 11:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Atlas7/76a6060612697807df03e9c664e86dd9 to your computer and use it in GitHub Desktop.
Save Atlas7/76a6060612697807df03e9c664e86dd9 to your computer and use it in GitHub Desktop.
Intel Colfax Cluster - Parallel a For Loop Application On Xeon Phi (Knights Landing) Cluster Node

Intel Colfax Cluster - Notes - Index Page


In this post we are going to show you a very simple application that will parallel a for loop (of 18 counts) distributed across 4 threads on a Knights Landing (KNL) node. Imagine a team of 4 people trying to work together to eat 18 apples - each person can only eat 1 apple at a time. Each person is represented by a thread. We will use the OpenMP framework to write our parallel (Multi-thread) application.

Here comes the demo...

SSH to Colfax Cluster

johnny@Chuns-MBP  $ ssh colfax

######################################################################
# Welcome to Colfax Cluster!
######################################################################
#
# Pre-compiled Machine Learning Frameworks, as well as other tools
# are available in the /opt/ directory
#
# If you have any questions or feature requests, post them on our
# forum at:
# https://colfaxresearch.com/discussion/forum/
#
# Colfax Research Team
######################################################################
Last login: Wed Aug  9 05:49:11 2017 from 10.5.0.7

[u4443@c001 ~]$

We are now at the Login Node of Cluster c001.

Navigate to working directory

Navigate to a working directory of your choice. For example:

[u4443@c001 ~]$ cd deepdive/lec-04/

[u4443@c001 lec-04]$ pwd
/home/u4443/deepdive/lec-04

Create Multi-thread C++ code

Create a C++ code called hello-parallel-eat.cc:

[u4443@c001 lec-04]$ emacs hello-parallel-eat.cc

Code:

#include <omp.h>
#include <cstdio>

int main() {

  // This code is executed by 1 thread
  const int num_people = 4;
  const int num_apples = 18;
  printf("OpenMP with %d threads\n", num_people);

omp_set_num_threads(num_people);
#pragma omp parallel for
  for (int i = 0; i <  num_apples; i++) {
    // This code is executed in parallel
    // by multiple threads
  printf("person (thread) %d eating apple %d \n", omp_get_thread_num(), i);
  };
};

Note:

  • the omp.h directory provides us the required utilities to write multi-thread codes
  • we manually control number of threads to use in the code omp_set_num_threads(num_people), in our case, 4 threads. The system needs to have at least 4 threads for this code to work. We will have no problem in our case as our KNL node has 256 threads. This line takes higher priority than any thread number setting upstream.
  • the omp_get_thread_num() returns the thread ID. We print the thread ID so we can visualize the parallelism in work later on.
  • the #pragma omp parallel for - this is the secret sauce that executes the for loop scope below in a manner that distributes across threads.

Save and exit editor.

Compile Multi-thread C++ Code

We can compile the code with the Intel C++ Compiler (icpc) or GCC Compiler.

If we use icpc (use the -qopenmp option to use OpenMP):

[u4443@c001 lec-04]$ icpc -qopenmp -o hello-parallel-eat  hello-parallel-eat.cc

If we use gcc (use the -fopenmp option to use OpenMP):

[u4443@c001 lec-04]$ gcc -fopenmp -o hello-parallel-eat  hello-parallel-eat.cc

This will create a binary executable hello-parallel-eat.

Create Shell Script

A shell script will enable us to run the application easily.

Create a shell script hello-parallel-eat.sh in the same directory like this (tweak the working directory to the one of your choice):

echo "hello-parallel-eat starts"
export OMP_NUM_THREADS=10
cd /home/u4443/deepdive/lec-04
./hello-parallel-eat
echo "hello-parallel-eat ends"

Notes:

  • For the sake of experiment, we have included this line export OMP_NUM_THREADS=10, meaning, we manually control to use 10 threads if available. BUT, as we have previously explcitly specified to use only 4 threads within the C++ code, the code would end up using 4 threads and ignore this setting. We don't really "need" this line. We are including here purely to convince ourself the omp_set_num_threads() within the code takes higher priority.
  • the rest is quite self-explanatory.

Run Multi-thread Job on a KNL Node

Within the working directory where all our files are, submit the shell script to execute on a KNL node (if you would prefer a non KNL node, just replace knl with something else as long as it's available):

[u4443@c001 lec-04]$ qsub hello-parallel-eat.sh -l nodes=1:knl
21145.c001

Make a note of the job number returned by the command. Now view the output file.

[u4443@c001 lec-04]$ cat hello-parallel-eat.sh.o21145

########################################################################
# Colfax Cluster - https://colfaxresearch.com/
#      Date:           Thu Aug 10 04:12:04 PDT 2017
#    Job ID:           21145.c001
#      User:           u4443
# Resources:           neednodes=1:knl,nodes=1:knl,walltime=24:00:00
########################################################################

hello-parallel-eat starts
OpenMP with 4 threads
person (thread) 0 eating apple 0
person (thread) 0 eating apple 1
person (thread) 0 eating apple 2
person (thread) 0 eating apple 3
person (thread) 0 eating apple 4
person (thread) 1 eating apple 5
person (thread) 1 eating apple 6
person (thread) 1 eating apple 7
person (thread) 1 eating apple 8
person (thread) 1 eating apple 9
person (thread) 3 eating apple 14
person (thread) 3 eating apple 15
person (thread) 3 eating apple 16
person (thread) 3 eating apple 17
person (thread) 2 eating apple 10
person (thread) 2 eating apple 11
person (thread) 2 eating apple 12
person (thread) 2 eating apple 13
hello-parallel-eat ends

########################################################################
# Colfax Cluster
# End of output for job 21145.c001
# Date: Thu Aug 10 04:12:05 PDT 2017
########################################################################

[u4443@c001 lec-04]$

A bit of explaination:

  • hello-parallel-eat starts, this is our 1st (single-thread) process in the serial stream.
  • OpenMP with 4 threads, this is our 2nd (single-thread) process in the serial stream.
  • person (thread) X eating apple Y, this is our 3rd (multi-thread) process in the serial stream, handled by 4 threads concurrently. To understand how this part works, just imagine a team of 4 people trying to eat through 18 apples. They try and do it as parallelly as possible. Each person can only eat 1 apple at a time.
  • hello-parallel-eat ends, this is our 4th (single-thread) process in the serial stream.

So here we are, a simple Hello World Multi-Thread Application running on a KNL node, with the for loop parallel-ized! Now imagine we have millions of apples to be eaten by 256 people (threads) - thats where the 256 threads on Xeon Phi may come in handy. Now also imagine each person eats at different rate - it's all programmable.

Conclusion

In this post we have demonstrate a simple C++ Hello World application using a combination of serial and parallel processing (parallel a for loop and distribute work across threds). We used the OpenMP framework to handle the "hybrid serial and parallel" Processing part - having 4 threads to process a loop of 18 elements.

References


Intel Colfax Cluster - Notes - Index Page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment