Skip to content

Instantly share code, notes, and snippets.

View jsquyres's full-sized avatar

Jeff Squyres jsquyres

View GitHub Profile
@jsquyres
jsquyres / distgraph1.out
Last active December 12, 2017 17:18
Treematch issues on the Cisco cluster for https://github.com/open-mpi/ompi/issues/4303
This file has been truncated, but you can view the full file.
using graph layout 'deterministic complete graph'
testing MPI_Dist_graph_create_adjacent
testing MPI_Dist_graph_create w/ outgoing only
[mpi025:15395] Unable to extract peer [[30746,1],0] nodeid from the modex.
[mpi025:15395] Unable to extract peer [[30746,1],1] nodeid from the modex.
[mpi025:15395] Unable to extract peer [[30746,1],2] nodeid from the modex.
[mpi024:04822] Unable to extract peer [[30746,1],16] nodeid from the modex.
[mpi024:04822] Unable to extract peer [[30746,1],17] nodeid from the modex.
[mpi025:15395] Unable to extract peer [[30746,1],3] nodeid from the modex.
[mpi024:04822] Unable to extract peer [[30746,1],18] nodeid from the modex.
@jsquyres
jsquyres / summary.md
Created December 1, 2017 04:47
Logs for github.com:open-mpi/hwloc PR 191

Last updated: 2017-11-30 20:47:46.808895

This log is not yet complete. Keep refreshing this gist to get the latest status.

Logs for github.com:open-mpi/hwloc PR 191

@jsquyres
jsquyres / bisect-helper
Created September 27, 2017 15:07
Jeff's git bisect helper
#!/bin/zsh
#
# Git bisect helper.
#
# 1. Edit the script below to build OMPI how you need it and run the test
# that you need.
#
# 2. Once you have found a starting point for the git bisect, run it
# like this:
#
@jsquyres
jsquyres / np=4.ppn=2.out.txt
Last active September 25, 2017 21:08
TCP BTL run across 2 16-core nodes, each with 8 non-loopback IP interfaces, with mca_btl_base_verbose=100
[mpi020:28517] mca: base: components_register: registering framework btl components
[mpi020:28517] mca: base: components_register: found loaded component self
[mpi020:28517] mca: base: components_register: component self register function successful
[mpi020:28517] mca: base: components_register: found loaded component tcp
[mpi020:28518] mca: base: components_register: registering framework btl components
[mpi020:28518] mca: base: components_register: found loaded component self
[mpi019:22888] mca: base: components_register: registering framework btl components
[mpi019:22888] mca: base: components_register: found loaded component self
[mpi019:22889] mca: base: components_register: registering framework btl components
[mpi019:22889] mca: base: components_register: found loaded component self
[2:08] JSQUYRES-M-H05C:~/dev
$ xcode-select -p
/Applications/Xcode.app/Contents/Developer
[2:08] JSQUYRES-M-H05C:~/dev
$ export SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk
[2:08] JSQUYRES-M-H05C:~/dev
$ ls -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/
total 8
drwxr-xr-x 5 root wheel 170 Jan 27 2019 MacOSX.sdk/
lrwxr-xr-x 1 root wheel 10 Oct 29 10:23 MacOSX10.12.sdk@ -> MacOSX.sdk
diff --git a/opal/mca/btl/usnic/btl_usnic_module.c b/opal/mca/btl/usnic/btl_usnic_mod
ule.c
index cdc301b..ffcfa21 100644
--- a/opal/mca/btl/usnic/btl_usnic_module.c
+++ b/opal/mca/btl/usnic/btl_usnic_module.c
@@ -83,6 +83,10 @@ static int add_procs_block_create_endpoints(opal_btl_usnic_module_
t *module,
int rc;
opal_proc_t* my_proc;
size_t num_created = 0;
@jsquyres
jsquyres / ofi-msg-fd-sockets-test.c
Created June 20, 2016 21:34
Sample libfabric EP_MSG program showing fd-EQ-waiting problem
#define WANT_FDS 1
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <errno.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
[pacini061.arcetri.cisco.com:13809] [[10049,0],47] LOOPING BRKS
malloc debug: Request for 139764985974018 bytes failed (rml_oob_send.c, 175)
[pacini061:13809] *** Process received signal ***
[pacini061:13809] Signal: Segmentation fault (11)
[pacini061:13809] Signal code: (128)
[pacini061:13809] Failing at address: (nil)
[pacini061:13809] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x7f1d95ced100]
[pacini061:13809] [ 1] /usr/lib64/libc.so.6(+0x1496d6)[0x7f1d95a666d6]
[pacini061:13809] [ 2] /home/jsquyres/bogus/lib/openmpi/mca_rml_oob.so(+0x36ce)[0x7f1d925736ce]
[pacini061:13809] [ 3] /home/jsquyres/bogus/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x8fc)[0x7f1d96cbe1ec]
@jsquyres
jsquyres / mpirun-of-failed-np40-run.txt
Last active December 13, 2015 15:47
mpirun of a failed -np 40 run, with lots of verbosity: see https://github.com/open-mpi/ompi/issues/1215
This file has been truncated, but you can view the full file.
$ mpirun --mca ess_base_verbose 100 --mca grpcomm_base_verbose 100 --mca pmix_base_verbose 100 --mca pml ob1 --mca btl tcp,vader,self --hostfile hosts -np 40 ring_c
[franco.cisco.com:16980] mca: base: components_register: registering framework ess components
[franco.cisco.com:16980] mca: base: components_register: found loaded component env
[franco.cisco.com:16980] mca: base: components_register: component env has no register or open function
[franco.cisco.com:16980] mca: base: components_register: found loaded component hnp
[franco.cisco.com:16980] mca: base: components_register: component hnp has no register or open function
[franco.cisco.com:16980] mca: base: components_register: found loaded component pmi
[franco.cisco.com:16980] mca: base: components_register: component pmi has no register or open function
[franco.cisco.com:16980] mca: base: components_register: found loaded component singleton
[franco.cisco.com:16980] mca: base: components_register: component singleton register function successful
@jsquyres
jsquyres / backwards-compat-1.c
Last active November 15, 2015 12:55
2015-12 December MPI Forum slides: sessions
void legacy_library_init(void) {
MPI_Initialized(&init);
MPI_Finalized(&finalized);
if (init && !finalized) {
MPI_Comm_dup(MPI_COMM_WORLD, &comm);
// Do MPI things
}
}