Skip to content

Instantly share code, notes, and snippets.

View brycelelbach's full-sized avatar

Bryce Adelstein Lelbach aka wash brycelelbach

  • NVIDIA
  • Sunnyvale, CA
View GitHub Profile
[18:10:16]:wash@ariel00.hermione:/srv/scratch/wash/hpx/gcc-4.6.2-release:0:$ bin/hpx_homogeneous_timed_task_spawn -t16 $(cat counters.cfg)
# BENCHMARK: Homogeneous Timed Task Spawn - HPX (weak scaling, static-balanced distribution)
# VERSION: 67b96d38463dffdf3301aa45cadadc4f5c81721e 02-05-2014
#
## 0:DELAY:Delay [micro-seconds] - Independent Variable
## 1:TASKS:# of Tasks - Independent Variable
## 2:STASKS:# of Tasks to Suspend - Independent Variable
## 3:OSTHRDS:OS-threads - Independent Variable
## 4:WTIME:Total Walltime [seconds]
## 5:A:/counters/arithmetics/add@/papi{locality#0/worker-thread#*}/INST_RETIRED
movq 64(%rsi), %rcx
pushq %rbp
pushq %rbx
pushq %rax
pushq %rdx
pushq %r12
pushq %r13
pushq %r14
pushq %r15
movq %rsp, (%rdi)
// Copyright (c) 2014 Bryce Adelstein-Lelbach - blelbach - at - cct.lsu.edu
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#include <hpx/hpx_main.hpp>
#include <hpx/include/runtime.hpp>
#include <hpx/include/iostreams.hpp>
#include <hpx/include/thread_executors.hpp>
{stack-trace}: 8 frames:
0x2aaaab889cc1 : hpx::termination_handler(int) + 0x81 in /srv/scratch/wash/hpx/gcc-4.6.2-release/lib/hpx/libhpx.so.0
0x2aaaadcb6210 : ??? + 0x2aaaadcb6210 in /lib/x86_64-linux-gnu/libpthread.so.0
0x2aaaad050a7d : ??? + 0x2aaaad050a7d in /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
0x2aaaad069331 : ??? + 0x2aaaad069331 in /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
0x2aaaad047350 : free + 0x350 in /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
0x2aaaad4ff2b5 : std::locale::_Impl::~_Impl() + 0x115 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x2aaaad4ff3ed : std::locale::~locale() + 0x2d in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x2aaaadefce2f : __cxa_finalize + 0x8f in /lib/x86_64-linux-gnu/libc.so.6
{what}: Segmentation fault
what}: Segmentation fault
{config}:
HPX_HAVE_NATIVE_TLS=ON
HPX_HAVE_STACKTRACES=ON
HPX_HAVE_COMPRESSION_BZIP2=OFF
HPX_HAVE_COMPRESSION_SNAPPY=OFF
HPX_HAVE_COMPRESSION_ZLIB=OFF
HPX_HAVE_PARCEL_COALESCING=ON
HPX_HAVE_PARCELPORT_IPC=OFF
HPX_HAVE_PARCELPORT_IBVERBS=OFF
--counter HPXTHRDS,/threads{locality#*/total}/count/cumulative
--counter PHASES,/threads{locality#*/total}/count/cumulative-phases
--counter RES_MEM,/runtime{locality#*/total}/memory/resident
--counter VIR_MEM,/runtime{locality#*/total}/memory/virtual
--counter CTX_RECY,/threads{locality#*/total}/count/stack-recycles
--counter CTX_ALLOC,/threads{locality#*/total}/count/objects
--counter TLB_IM,/arithmetics/add@/papi{locality#0/worker-thread#*}/PAPI_TLB_IM
--counter TLB_DM,/arithmetics/add@/papi{locality#0/worker-thread#*}/PAPI_TLB_DM
--counter TOT_INS,/arithmetics/add@/papi{locality#0/worker-thread#*}/PAPI_TOT_INS
--counter L2_TCM,/arithmetics/add@/papi{locality#0/worker-thread#*}/PAPI_L2_TCM

Concepts:


Kernels that are used on supercomputers have very unique requirements:

  • When you run on a supercomputer, you are running one application at a time.
  • This application has (almost always) has exclusive access to the the hardware.
  • The performance of the application is the only thing that matters.
  • A good kernel for parallel applications must reduce OS noise - we want to reduce the number of things that happen behind the scene in the kernel.
  • A good kernel for parallel applications gives the application as much control as possible while still protecting the hardware.
  • A good kernel for parallel applications is highly asynchronous - kernel calls from userspace should by asynchronous wherever possible.
#!/usr/bin/env python
#
# Copyright (c) 2013-2014 Bryce Adelstein-Lelbach
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
from numpy import zeros
from optparse import OptionParser
/tmp/icpc1oc8n1as_.s:82198: Error: invalid char '{' beginning operand 1 `{rz-sae}'
/tmp/icpc1oc8n1as_.s:82211: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82214: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82218: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82224: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82226: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82347: Error: invalid char '{' beginning operand 1 `{rz-sae}'
/tmp/icpc1oc8n1as_.s:82360: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82363: Error: invalid char '{' beginning operand 1 `{rn-sae}'
/tmp/icpc1oc8n1as_.s:82367: Error: invalid char '{' beginning operand 1 `{rn-sae}'
n: # of HPX-threads
p: payload duration of each HPX-thread
w_M: measured walltime
Performance Model
-----------------
Theoretical Walltime = w_T = np