Skip to content

Instantly share code, notes, and snippets.

View allanmac's full-sized avatar

Allan MacKinnon allanmac

  • Dispatch3 Inc.
  • South Florida, USA
  • 04:52 (UTC -04:00)
  • X @pixelio
View GitHub Profile
@allanmac
allanmac / ldg.cu
Created March 5, 2015 00:20
ld.global.nc (LDG.CI) operations not being generated when const+restrict pointers are within a const struct passed as a kernel argument
//
//
//
#define WARP_SIZE 32
#define KERNEL_QUALIFIERS extern "C" __global__
#define DEVICE_INTRINSIC_QUALIFIERS __device__ __forceinline__
#define RESTRICT __restrict__
//
@allanmac
allanmac / f16.cu
Last active August 29, 2015 14:13
Blind attempt at getting f16v2 ops to work... Totally untested.
// -*- compile-command: "nvcc -m 32 -arch sm_52 -Xptxas=-v,-abi=no -cubin f16.cu" ; -*-
#include <stdint.h>
//
//
//
#define KERNEL_QUALIFIERS __global__
#define KERNEL_QUALIFIERS_EXTERN extern KERNEL_QUALIFIERS
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin int_mul.cu" ; -*-
#include <stdint.h>
#include <cuda_fp16.h>
//
//
//
#define KERNEL_QUALIFIERS __global__
@allanmac
allanmac / scan_64.cu
Last active August 29, 2015 14:13
Signed 64-bit reduce-add using shuffles (untested)
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin scan_64.cu" ; -*-
#include <stdint.h>
//
//
//
#define KERNEL_QUALIFIERS __global__
#define KERNEL_QUALIFIERS_EXTERN extern KERNEL_QUALIFIERS
@allanmac
allanmac / ipc.cu
Created January 5, 2015 16:14
Try to push a kernel's IPC metric as high as possible!
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no ipc.cu -o ipc" ; -*-
//
//
//
#define KERNEL_QUALIFIERS __global__
#define KERNEL_QUALIFIERS_EXTERN extern KERNEL_QUALIFIERS
#define KERNEL_QUALIFIERS_EXTERN_C extern "C" KERNEL_QUALIFIERS
@allanmac
allanmac / kth.cu
Last active August 29, 2015 14:12
Acquire the indices of all set bits in a 32-bit word. For Fermi and Kepler the indices will be captured from MSB to LSB. For Maxwell the indices are ordered from LSB to MSB.
// -*- compile-command: "nvcc -m 32 -arch compute_20 -Xptxas=-v,-abi=no -cubin kth.cu"; -*-
#include <stdio.h>
#include <stdint.h>
//
//
//
typedef uint8_t u8;
@allanmac
allanmac / xmad.cu
Created October 7, 2014 22:07
Try to generate XMAD instructions
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin xmad.cu"; -*-
//
//
//
#define KERNEL_QUALIFIERS extern "C" __global__
#define RESTRICT __restrict__
//
@allanmac
allanmac / int_ipc.cu
Last active August 29, 2015 14:06
Try to push an integer math kernel's IPC metric as high as possible!
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no int_ipc.cu -o int_ipc" ; -*-
//
//
//
#define KERNEL_QUALIFIERS __global__
#define KERNEL_QUALIFIERS_EXTERN extern KERNEL_QUALIFIERS
#define KERNEL_QUALIFIERS_EXTERN_C extern "C" KERNEL_QUALIFIERS
@allanmac
allanmac / tk1_gpu_max_clock.sh
Last active August 14, 2017 13:25
Some login tweaks for L4T 19.3 to disable USB autosuspend and lock the GPU to max MHz. I have lines 3-11 of tk1_tweaks.sh appended to my .profile.
#!/bin/bash
#
# Lock GPU and MEM clocks to max MHz
#
# - benchmarking shows there is no need to override the MEM clock
#
echo
echo Locking GPU clock to max MHz for debugging porpoises...
# set to max
@allanmac
allanmac / lop3.cu
Last active March 25, 2020 23:54
Test to see if the bit hack "Conditionally set or clear bits without branching" maps to a single Maxwell LOP3.LUT opcode
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin lop3.cu"; -*-
#define KERNEL_QUALIFIERS extern "C" __global__
//
// Bit hack: "Conditionally set or clear bits without branching"
// http://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
//
// This bit hack *should* map to a single LOP3.LUT opcode:
//