Skip to content

Instantly share code, notes, and snippets.

@maleadt
maleadt / .gitignore
Last active Aug 12, 2021
CxxWrap pointer MWE
View .gitignore
build
Manifest.toml
View dump.s
(gdb) f 0
#0 0x00007fffe06dee9d in zgemm_kernel_n_SANDYBRIDGE () from /home/tbesard/.cache/jl/installs/bin/linux/x64/1.7/julia-latest-linux64/bin/../lib/julia/libopenblas64_.so
(gdb) disassemble
Dump of assembler code for function zgemm_kernel_n_SANDYBRIDGE:
0x00007fffe06dee00 <+0>: sub $0x80,%rsp
0x00007fffe06dee07 <+7>: mov %rbx,(%rsp)
0x00007fffe06dee0b <+11>: mov %rbp,0x8(%rsp)
0x00007fffe06dee10 <+16>: mov %r12,0x10(%rsp)
0x00007fffe06dee15 <+21>: mov %r13,0x18(%rsp)
0x00007fffe06dee1a <+26>: mov %r14,0x20(%rsp)
@maleadt
maleadt / demo.c
Created Jan 23, 2021
Stream-ordered memory allocator + device reset = launch failure
View demo.c
#include <stdio.h>
#include <cuda.h>
#define check(ans) { _check((ans), __FILE__, __LINE__); }
inline void _check(CUresult code, const char *file, int line)
{
if (code != CUDA_SUCCESS)
{
const char *name;
cuGetErrorName(code, &name);
View bad.lowered.jl
CodeInfo(
1%1 = (#self#)(vals, lo, hi, parity, sync, sync_depth, prev_pivot, lt, by, @_11, -1)
└── return %1
)
View Manifest.toml
# This file is machine-generated - editing it directly is not advised
[[AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "051c95d6836228d120f5f4b984dd5aba1624f716"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "0.5.0"
[[AbstractTrees]]
deps = ["Markdown"]
@maleadt
maleadt / sort.jl
Created Apr 13, 2020
GPU sort using dynamic parallelism (WIP, slow)
View sort.jl
using Test
using CUDA
const MAX_DEPTH = 16
const SELECTION_SORT = 32
function selection_sort(data, left, right)
@inbounds for i in left:right
min_val = data[i]
View regular.ll
; ModuleID = 'permutedims!'
source_filename = "permutedims!"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-pc-linux-gnu"
%jl_value_t = type opaque
%jl_array_t = type { i8 addrspace(13)*, i64, i16, i16, i32 }
declare %jl_value_t addrspace(10)* @japi1_checkdims_perm(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32)
@maleadt
maleadt / tdma.jl
Created Jul 25, 2019
batched tdma
View tdma.jl
# experimentation with batched tridiagonal solvers on the GPU for Oceananigans.jl
#
# - reference serial CPU implementation
# - batched GPU implementation using cuSPARSE (fastest)
# - batched GPU implementation based on the serial CPU implementation (slow but flexible)
# - parallel GPU implementation (potentially fast and flexible)
#
# see `test_batched` and `bench_batched`
using LinearAlgebra
@maleadt
maleadt / tdma.jl
Created Jun 6, 2019
Tridiagonal matrix algorithm on the GPU with Julia
View tdma.jl
# experimentation with batched tridiagonal solvers on the GPU for Oceananigans.jl
#
# - reference serial CPU implementation
# - batched GPU implementation using cuSPARSE (fastest)
# - batched GPU implementation based on the serial CPU implementation (slow but flexible)
# - parallel GPU implementation (potentially fast and flexible)
#
# see `test_batched` and `bench_batched`
using CUDAdrv