Skip to content

Instantly share code, notes, and snippets.

Avatar

Harmen Stoppels haampie

View GitHub Profile
View PrimeMonstrosity.jl
module PrimeMonstrosity
const bit_1 = ~(0x01 << 7)
const bit_2 = ~(0x01 << 6)
const bit_3 = ~(0x01 << 5)
const bit_4 = ~(0x01 << 4)
const bit_5 = ~(0x01 << 3)
const bit_6 = ~(0x01 << 2)
const bit_7 = ~(0x01 << 1)
const bit_8 = ~(0x01 << 0)
@haampie
haampie / trace.txt
Last active Aug 29, 2019
btmon trace
View trace.txt
Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0-25-generic (x86_64) 0.401834
= Note: Bluetooth subsystem version 2.22 0.401835
= New Index: 80:C5:F2:F8:D6:54 (Primary,USB,hci0) [hci0] 0.401836
@ MGMT Open: btmon (privileged) version 1.14 {0x0001} 0.401851
= bluetoothd: Bluetooth daemon 5.50 3.341004
@ MGMT Open: bluetoothd (privileged) version 1.14 {0x0002} 3.342105
= bluetoothd: Starting SDP server 3.342212
= bluetoothd: Excluding (cli) wiimote 3.342323
@ MGMT Command: Read Management Version In.. (0x0001) plen 0 {0x0002} 3.343725
@haampie
haampie / testing.cpp
Last active Mar 15, 2019
everything_compile_time.cpp
View testing.cpp
#include <iostream>
#include <vector>
#include <string>
#include <tuple>
using namespace std;
// Some instances of events; we're using "public" const data members.
struct UsernameChanged {
string const username;
View cerfacs.cu
// Computes y <- alpha * A * x + beta * y for tall and skinny A.
// Compile with `nvcc -O3 -o cerfacs cerfacs.cu`
// Assumes we have a *large* basis of COLS = 100 columns, you can play with this param
// Timing is measured without copies from / to device (copies should not happen in a good impl of arnoldi anyways)
// Assumes a fixed number of 256 threads per block.
#include <stdio.h>
#include <sys/time.h>
#define COLS 100
View spmvbench.jl
"""
This is the implementation currently in SparseArrays
"""
function simple_mul!(y, A, x)
@inbounds for i = Base.OneTo(A.n)
xi = x[i]
for j = A.colptr[i] : A.colptr[i + 1] - 1
y[A.rowval[j]] += A.nzval[j] * xi
end
View micro.cc
#include <cstdint>
#include <cmath>
// g++ -Wall -O3 -std=c++14 -march=native -fPIC -shared -o givenlib.so micro.cc
extern "C" {
void fused_horizontal(double * __restrict__ A, int64_t cols, double c1, double s1, double c2, double s2, double c3, double s3, double c4, double s4)
{
for (int64_t col = 0; col < cols; ++col, A += 4)
{
@haampie
haampie / fusing_perf.jl
Last active Sep 27, 2018
fusing_perf.jl
View fusing_perf.jl
using BenchmarkTools
using LinearAlgebra
using LinearAlgebra: givensAlgorithm
"""
I want to apply 4 'fused' Givens rotations to 4 columns of matrix Q. Here Q
is a n x 4 matrix. In the benchmarks I compare the number of GFLOP/s when the
rotations are applied to Q directly (vertical) versus when Q is first
transposed (horizontal).
View fusing.jl
using LinearAlgebra
using LinearAlgebra: givensAlgorithm
using Test
using BenchmarkTools
import LinearAlgebra: rmul!
abstract type SmallRotation end
struct Rotation2{Tc,Ts} <: SmallRotation
View example.jl
using LinearAlgebra
using LinearAlgebra: givensAlgorithm
using Test
using BenchmarkTools
import LinearAlgebra: rmul!
abstract type SmallRotation end
struct Rotation2{Tc,Ts} <: SmallRotation
@haampie
haampie / 01_example.txt
Last active Sep 24, 2018
multishift qr and blas3
View 01_example.txt
Chasing two double-shift-bulges one step forward using two
reflections G1 and G2 of size 3 (they are each composed of
two Givens rotations).
x x x x x x x x x x x x x x x x
x x x x x x x x ┐ x x x x x x x x
x x x x x x x x │ double shift G2 . x x x x x x x
x x x x x x x x ┘ . x x x x x x x
. . . x x x x x ┐ . x x x x x x x
. . . x x x x x │ double shift G1 . . . . x x x x