AMD NUMA configuration.
Calculate the accuracy of BLR LU using multiplication of L and U factors.
#include "hicma/hicma.h"
#include <cassert>
#include <cstdint>
#include <tuple>
#include <vector>
#include <iostream>
#include <fstream>
using namespace hicma;
Low Rank matrix truncation algorithm as specified by Grasedyck.
import numpy as np
np.set_printoptions(precision=2, linewidth=300)
def lr(full, rank):
u, s, v = np.linalg.svd(full)
u = u[:, 0:rank]
s = np.diag(s)[0:rank, 0:rank]
v = v[0:rank, :]
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" lang="en" xml:lang="en">
<!-- 2018-12-12 Wed 18:25 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="generator" content="Org-mode" />
<meta name="author" content="Sameer Deshmukh" />
Stacktrace for SLATE
#0 0x00007ffff7b284b7 in PMPI_Comm_rank ()
from /usr/local/openmpi-3.1.1/lib/
#1 0x0000555555559ca5 in slate::MatrixStorage<double>::MatrixStorage (
this=0x5555557c6fb0, m=16, n=16, nb=4, p=2, q=2, mpi_comm=1)
at /home/sameer.deshmukh/gitrepos/slate/slate_Storage.hh:247
#2 0x00005555555597c8 in __gnu_cxx::new_allocator<slate::MatrixStorage<double> >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (this=0x7fffffffc8f7, __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
__args#1=@0x7fffffffcc38: 16, __args#2=@0x7fffffffcc30: 4,
__args#3=@0x7fffffffcc2c: 2, __args#4=@0x7fffffffcc28: 2,
__args#5=@0x7fffffffcc80: 1) at /usr/include/c++/7/ext/new_allocator.h:136
#3 0x000055555555962e in std::allocator_traits<std::allocator<slate::MatrixStorage<double> > >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (__a=..., __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
SLATE makefile
CXX = /usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/
SOURCES = ../bin/libslate.a
.PHONY: clean
$(CXX) -c $? -o $@
slate_lu: slate_lu.o $(SOURCES)
$(CXX) $(CXXFLAGS) $? -lblas -lgfortran
➜ slate git:(slate-stunts) ✗ make
/usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/ slate_lu.o ../bin/libslate.a -lblas -lgfortran
/usr/bin/mpirun -np 4 ./a.out
[asus401ub:23781] *** Process received signal ***
[asus401ub:23781] Signal: Segmentation fault (11)
[asus401ub:23781] Signal code: Address not mapped (1)
[asus401ub:23781] Failing at address: 0x99
[asus401ub:23783] *** Process received signal ***
[asus401ub:23783] Signal: Segmentation fault (11)
[asus401ub:23783] Signal code: Address not mapped (1)
Failing code for SLATE
#include "slate_Matrix.hh"
int main(int argc, char **argv)
MPI_Init(&argc, &argv);
int rank, size;
int N = 16;
int NB = 4;
int P = 2;
Interfacing internal objects with the Ruby GC

Interfacing with Ruby's GC


Ruby uses a mark-and-sweep GC that scans the entire Ruby interpreter stack for objects that have gone out of scope and can be freed from memory. It does not offer any of the reference counting mechanism that the Python GC offers.

While both approaches have their pros and cons, in the context of the ndtypes wrapper, it becomes risky to have 'internal' Ruby objects that are only visible

# In a calling Ruby script caller.rb
require ‘’
def compute_without_gil
t = []
4.times { t << { _some_computation }
4.times { t.join }