Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@bosilca
bosilca / gist:f82ffdc76e8f049e9822061e16e2cced
Created February 8, 2022 01:19
Quick fix for the use of paragraph* in the MPI Datatype chapter
diff --git a/chap-datatypes/datatypes.tex b/chap-datatypes/datatypes.tex
index e37de6a1..ef781d01 100644
--- a/chap-datatypes/datatypes.tex
+++ b/chap-datatypes/datatypes.tex
@@ -184,7 +184,7 @@ in \sectionref{sec:f90-problems:derived-types}.
\subsection{Type Constructors with Explicit Addresses}
\label{sec:misc-extent}
In Fortran, the functions
-\mpifunc{MPI\_TYPE\_CREATE\_HVECTOR},\flushline % force break to keep in margin
+\mpifunc{MPI\_TYPE\_CREATE\_HVECTOR},
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <string.h>
#include <mpi.h>
void print_buf(const char* msg, const char *buf, int nbytes,
int start_from, int stop_at, int vals_per_line)
{
@bosilca
bosilca / check_avx.c
Created September 26, 2020 18:06
Playground for the AVX512 support on KNL / KNC.
#include <stdlib.h>
#include <stdio.h>
#include <immintrin.h>
#define OMPI_OP_AVX_HAS_AVX512BW_FLAG 0x00000200
#define OMPI_OP_AVX_HAS_AVX512F_FLAG 0x00000100
#define OMPI_OP_AVX_HAS_AVX2_FLAG 0x00000020
#define OMPI_OP_AVX_HAS_AVX_FLAG 0x00000010
#define OMPI_OP_AVX_HAS_SSE4_1_FLAG 0x00000008
#define OMPI_OP_AVX_HAS_SSE3_FLAG 0x00000004
@bosilca
bosilca / check_coll_names.c
Created July 31, 2020 05:34
A quick benchmark to evaluate the cost of converting a MPI collective communication name into the collective identifier. The benchmark does not check the cost for a particular permutation of the collectives, as they all have the same chance to be in the configuration file I look at the cost to search for all of them once.
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <inttypes.h>
extern int mca_coll_base_name_to_colltype(const char* name);
typedef enum COLLTYPE {
ALLGATHER = 0, /* 0 */
ALLGATHERV, /* 1 */
@bosilca
bosilca / cuda_allreduce.cc
Created November 13, 2016 07:45
Quick example to check the performance of MPI_Allreduce from GPU buffers.
#include <mpi.h>
#include <cuda_runtime.h>
#include <stdlib.h>
#include <time.h>
#include <iostream>
/**
* mpic++ -g -Wall -I/opt/cuda/8.0/include cuda_check.cc -o cuda_check -L/opt/cuda/8.0/lib64 -lcudart
*/
@bosilca
bosilca / gist:64843961946319497da7
Last active February 7, 2016 07:37
Dump the offset of the jobid and vpid in an opal_process_name_t. Show how content of each of the jobid, job family and vpid.
/* Compile with gcc -Wall orte_offset.c -o orte_offset */
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
typedef uint32_t opal_jobid_t;
typedef uint32_t opal_vpid_t;
typedef struct {
opal_jobid_t jobid;
opal_vpid_t vpid;
Index: datatypes.tex
===================================================================
--- datatypes.tex (revision 1835)
+++ datatypes.tex (working copy)
@@ -128,6 +128,7 @@
lb(Typemap) & = & \min_j disp_j , \nonumber \\
ub(Typemap) & = & \max_j (disp_j + \mpicode{sizeof}(type_j)) + \epsilon , \mbox{ and}
\nonumber \\ extent(Typemap) & = & ub(Typemap) - lb(Typemap).
+\label{soft-lb-ub-definition}
\end{eqnarray}