Skip to content

Instantly share code, notes, and snippets.


Sameer Deshmukh v0dro

View GitHub Profile
v0dro /
Created Jul 12, 2021
AMD NUMA configuration.
v0dro / blr-lu-mm.cpp
Created Nov 16, 2020
Calculate the accuracy of BLR LU using multiplication of L and U factors.
View blr-lu-mm.cpp
#include "hicma/hicma.h"
#include <cassert>
#include <cstdint>
#include <tuple>
#include <vector>
#include <iostream>
#include <fstream>
using namespace hicma;
v0dro /
Last active May 17, 2019
Low Rank matrix truncation algorithm as specified by Grasedyck.
import numpy as np
np.set_printoptions(precision=2, linewidth=300)
def lr(full, rank):
u, s, v = np.linalg.svd(full)
u = u[:, 0:rank]
s = np.diag(s)[0:rank, 0:rank]
v = v[0:rank, :]
View a.html
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="" lang="en" xml:lang="en">
<!-- 2018-12-12 Wed 18:25 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="generator" content="Org-mode" />
<meta name="author" content="Sameer Deshmukh" />
v0dro / mpi_stacktrace.txt
Created Sep 21, 2018
Stacktrace for SLATE
View mpi_stacktrace.txt
#0 0x00007ffff7b284b7 in PMPI_Comm_rank ()
from /usr/local/openmpi-3.1.1/lib/
#1 0x0000555555559ca5 in slate::MatrixStorage<double>::MatrixStorage (
this=0x5555557c6fb0, m=16, n=16, nb=4, p=2, q=2, mpi_comm=1)
at /home/sameer.deshmukh/gitrepos/slate/slate_Storage.hh:247
#2 0x00005555555597c8 in __gnu_cxx::new_allocator<slate::MatrixStorage<double> >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (this=0x7fffffffc8f7, __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
__args#1=@0x7fffffffcc38: 16, __args#2=@0x7fffffffcc30: 4,
__args#3=@0x7fffffffcc2c: 2, __args#4=@0x7fffffffcc28: 2,
__args#5=@0x7fffffffcc80: 1) at /usr/include/c++/7/ext/new_allocator.h:136
#3 0x000055555555962e in std::allocator_traits<std::allocator<slate::MatrixStorage<double> > >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (__a=..., __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
v0dro / Makefile
Created Sep 18, 2018
SLATE makefile
View Makefile
CXX = /usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/
SOURCES = ../bin/libslate.a
.PHONY: clean
$(CXX) -c $? -o $@
slate_lu: slate_lu.o $(SOURCES)
$(CXX) $(CXXFLAGS) $? -lblas -lgfortran
View error.txt
➜ slate git:(slate-stunts) ✗ make
/usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/ slate_lu.o ../bin/libslate.a -lblas -lgfortran
/usr/bin/mpirun -np 4 ./a.out
[asus401ub:23781] *** Process received signal ***
[asus401ub:23781] Signal: Segmentation fault (11)
[asus401ub:23781] Signal code: Address not mapped (1)
[asus401ub:23781] Failing at address: 0x99
[asus401ub:23783] *** Process received signal ***
[asus401ub:23783] Signal: Segmentation fault (11)
[asus401ub:23783] Signal code: Address not mapped (1)
v0dro / slate.cpp
Created Sep 18, 2018
Failing code for SLATE
View slate.cpp
#include "slate_Matrix.hh"
int main(int argc, char **argv)
MPI_Init(&argc, &argv);
int rank, size;
int N = 16;
int NB = 4;
int P = 2;
v0dro /
Created Aug 24, 2018
Interfacing internal objects with the Ruby GC

Interfacing with Ruby's GC


Ruby uses a mark-and-sweep GC that scans the entire Ruby interpreter stack for objects that have gone out of scope and can be freed from memory. It does not offer any of the reference counting mechanism that the Python GC offers.

While both approaches have their pros and cons, in the context of the ndtypes wrapper, it becomes risky to have 'internal' Ruby objects that are only visible

View a.rb
# In a calling Ruby script caller.rb
require ‘’
def compute_without_gil
t = []
4.times { t << { _some_computation }
4.times { t.join }