Skip to content

Instantly share code, notes, and snippets.

@tcantenot
tcantenot / shared_mem_indexing.cu
Last active November 2, 2020 12:03
Shared memory tile with borders fetches
// Store tile values (w/ border) in shared memory:
// - first compute and store the "border" values (by a subset of the threads of the block)
// - then compute the "center" values (the one corresponding to each thread that will be used afterwards).
//
// "Center" values: o
// "Border" values: c, h, v
//
// The "borders" values are classified in 3 groups:
// - The first tile row + the left border of the 2nd row (range end: r0)
// - The left and right borders values + the border and the last row (range end: r1)
@tcantenot
tcantenot / lop3.cu
Created March 25, 2020 23:54 — forked from allanmac/lop3.cu
Test to see if the bit hack "Conditionally set or clear bits without branching" maps to a single Maxwell LOP3.LUT opcode
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin lop3.cu"; -*-
#define KERNEL_QUALIFIERS extern "C" __global__
//
// Bit hack: "Conditionally set or clear bits without branching"
// http://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
//
// This bit hack *should* map to a single LOP3.LUT opcode:
//