Skip to content

Instantly share code, notes, and snippets.

View allanmac's full-sized avatar

Allan MacKinnon allanmac

  • Dispatch3 Inc.
  • South Florida, USA
  • 14:21 (UTC -04:00)
  • X @pixelio
View GitHub Profile
@allanmac
allanmac / assert_cuda.c
Last active April 11, 2026 23:42
A tiny example of CUDA + OpenGL interop with write-only surfaces and CUDA kernels. Uses GLFW+GLAD.
//
//
//
#include <stdlib.h>
#include <stdio.h>
//
//
//
@allanmac
allanmac / vulkaninfo_S905X5M_Mali_G310.txt
Created October 19, 2025 16:36
vulkaninfo ODROID C5 S905X5M Mali G310
==========
VULKANINFO
==========
Vulkan Instance Version: 1.4.328
Instance Extensions: count = 14
===============================
VK_EXT_debug_report : extension revision 10
@allanmac
allanmac / vkpeak_S905X5M_Mali_G310.txt
Created October 19, 2025 16:34
vkpeak ODROID C5 S905X5M Mali G310
device = Mali-G310
fp32-scalar = 52.58 GFLOPS
fp32-vec4 = 44.02 GFLOPS
fp16-scalar = 52.00 GFLOPS
fp16-vec4 = 99.07 GFLOPS
fp16-matrix = 0.00 GFLOPS
fp64-scalar = 0.00 GFLOPS
@allanmac
allanmac / vulkaninfo_Mali_G610.txt
Last active September 9, 2025 01:43
Radxa Rock 5B (Rockchip 3588) Vulkan vulkaninfo
Captured on a Weston Wayland compositor.
---------------------------
vulkanCapsViewer data here: https://vulkan.gpuinfo.org/displayreport.php?id=41734
---------------------------
$ vulkaninfo
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
@allanmac
allanmac / vkpeak_RK3588_Mali_G610.txt
Last active September 8, 2025 23:46
Radxa Rock 5B (Rockchip 3588) Vulkan vkpeak benchmark
$ ./vkpeak 0
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
device = Mali-G610
arm_release_ver: g24p0-00eac0, rk_so_ver: 10
fp32-scalar = 467.75 GFLOPS
fp32-vec4 = 496.48 GFLOPS
fp16-scalar = 470.50 GFLOPS
fp16-vec4 = 977.37 GFLOPS
@allanmac
allanmac / ck_2.cu
Last active July 13, 2025 20:24
Concurrent kernel test that demonstrates _different_ kernels running concurrently. Hacked from NVIDIA's example. ck_2.cu has two kernels each requiring half of an sm_50 multiprocessor's shared memory. Kernel "a" is run on 5 out of 6 launches, otherwise kernel "b" is launched. ck_6.cu has six kernels.
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
@allanmac
allanmac / sha256.cu
Last active May 2, 2025 07:10
A CUDA SHA-256 subroutine using macro expansion
// -*- compile-command: "nvcc -m 32 -arch sm_35 -Xptxas=-v,-abi=no -cubin sha256.cu"; -*-
//
// Copyright 2013 Allan MacKinnon <allanmac@alum.mit.edu>
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
@allanmac
allanmac / sort.cu
Last active January 19, 2025 02:53
Thrust Radix Sort benchmark
// -*- compile-command: "nvcc -D THRUST_SORT_TYPE=uint32_t -arch sm_50 -o sort sort_32.cu"; -*-
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>
@allanmac
allanmac / shflrot.cu
Last active September 21, 2023 22:59
Experiments with shfl.idx/up/down to see how negative indices or offsets are handled. The shuffled value and its predicate are returned.
#include <stdio.h>
//
//
//
#define WARP_SIZE 32
#define KERNEL_QUALIFIERS extern "C" __global__
#define DEVICE_INTRINSIC_QUALIFIERS __device__ __forceinline__
@allanmac
allanmac / README.md
Last active June 10, 2023 11:11
Macros for neatly error checking OpenCL API functions.

Simply adding two parentheses cl(...) gives you error checking for OpenCL API functions that return a cl_int error code.

The second cl_ok(err) macro is for error checking API functions that initialize their error code as an argument.

The header also includes a useful function for converting OpenCL errors to strings:

char const * clGetErrorString(cl_int const err);