Skip to content

Instantly share code, notes, and snippets.

View v0dro's full-sized avatar

Sameer Deshmukh v0dro

View GitHub Profile
@v0dro
v0dro / amd-config.md
Created July 12, 2021 00:31
AMD NUMA configuration.
@v0dro
v0dro / blr-lu-mm.cpp
Created November 16, 2020 08:48
Calculate the accuracy of BLR LU using multiplication of L and U factors.
#include "hicma/hicma.h"
#include <cassert>
#include <cstdint>
#include <tuple>
#include <vector>
#include <iostream>
#include <fstream>
using namespace hicma;
@v0dro
v0dro / lr_truncate.py
Last active May 17, 2019 13:02
Low Rank matrix truncation algorithm as specified by Grasedyck.
import numpy as np
np.set_printoptions(precision=2, linewidth=300)
def lr(full, rank):
u, s, v = np.linalg.svd(full)
u = u[:, 0:rank]
s = np.diag(s)[0:rank, 0:rank]
v = v[0:rank, :]
@v0dro
v0dro / a.html
Created December 12, 2018 09:26
Part plan
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>16-12-18</title>
<!-- 2018-12-12 Wed 18:25 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="generator" content="Org-mode" />
<meta name="author" content="Sameer Deshmukh" />
@v0dro
v0dro / mpi_stacktrace.txt
Created September 21, 2018 05:42
Stacktrace for SLATE
#0 0x00007ffff7b284b7 in PMPI_Comm_rank ()
from /usr/local/openmpi-3.1.1/lib/libmpi.so.40
#1 0x0000555555559ca5 in slate::MatrixStorage<double>::MatrixStorage (
this=0x5555557c6fb0, m=16, n=16, nb=4, p=2, q=2, mpi_comm=1)
at /home/sameer.deshmukh/gitrepos/slate/slate_Storage.hh:247
#2 0x00005555555597c8 in __gnu_cxx::new_allocator<slate::MatrixStorage<double> >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (this=0x7fffffffc8f7, __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
__args#1=@0x7fffffffcc38: 16, __args#2=@0x7fffffffcc30: 4,
__args#3=@0x7fffffffcc2c: 2, __args#4=@0x7fffffffcc28: 2,
__args#5=@0x7fffffffcc80: 1) at /usr/include/c++/7/ext/new_allocator.h:136
#3 0x000055555555962e in std::allocator_traits<std::allocator<slate::MatrixStorage<double> > >::construct<slate::MatrixStorage<double>, long&, long&, long&, int&, int&, int&> (__a=..., __p=0x5555557c6fb0, __args#0=@0x7fffffffcc40: 16,
@v0dro
v0dro / Makefile
Created September 18, 2018 13:14
SLATE makefile
CXX = /usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/
SOURCES = ../bin/libslate.a
.PHONY: clean
.cpp.o:
$(CXX) -c $? -o $@
slate_lu: slate_lu.o $(SOURCES)
$(CXX) $(CXXFLAGS) $? -lblas -lgfortran
@v0dro
v0dro / error.txt
Created September 18, 2018 13:13
SLATE error
➜ slate git:(slate-stunts) ✗ make
/usr/bin/mpicxx -g -Wall -fPIC -std=c++11 -O0 -fopenmp -lm -I/home/sameer/gitrepos/slate/blaspp/include -I/home/sameer/gitrepos/slate/lapackpp/include -I/home/sameer/gitrepos/slate/ -I /usr/include/mpi/ slate_lu.o ../bin/libslate.a -lblas -lgfortran
/usr/bin/mpirun -np 4 ./a.out
[asus401ub:23781] *** Process received signal ***
[asus401ub:23781] Signal: Segmentation fault (11)
[asus401ub:23781] Signal code: Address not mapped (1)
[asus401ub:23781] Failing at address: 0x99
[asus401ub:23783] *** Process received signal ***
[asus401ub:23783] Signal: Segmentation fault (11)
[asus401ub:23783] Signal code: Address not mapped (1)
@v0dro
v0dro / slate.cpp
Created September 18, 2018 13:12
Failing code for SLATE
#include "slate_Matrix.hh"
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
int rank, size;
int N = 16;
int NB = 4;
int P = 2;
@v0dro
v0dro / gc.md
Created August 24, 2018 11:51
Interfacing internal objects with the Ruby GC

Interfacing with Ruby's GC

Background

Ruby uses a mark-and-sweep GC that scans the entire Ruby interpreter stack for objects that have gone out of scope and can be freed from memory. It does not offer any of the reference counting mechanism that the Python GC offers.

While both approaches have their pros and cons, in the context of the ndtypes wrapper, it becomes risky to have 'internal' Ruby objects that are only visible

# In a calling Ruby script caller.rb
require ‘compiled_binary.so’
def compute_without_gil
t = []
4.times { t << Thread.new { _some_computation }
4.times { t.join }
end