Skip to content

Instantly share code, notes, and snippets.

View allanmac's full-sized avatar

Allan MacKinnon allanmac

  • Dispatch3 Inc.
  • South Florida, USA
  • 13:06 (UTC -05:00)
  • X @pixelio
View GitHub Profile
@allanmac
allanmac / cub_sort.cu
Last active June 17, 2022 17:29
Benchmark CUB Radix Sort with uniformly random data
View cub_sort.cu
//
// Build:
//
// nvcc -lcurand --generate-code arch=compute_50,code=compute_50 --generate-code arch=compute_75,code=compute_75 -D CUB_SORT_TYPE=uint32_t -o sort_cub_32 cub_sort.cu
// nvcc -lcurand --generate-code arch=compute_50,code=compute_50 --generate-code arch=compute_75,code=compute_75 -D CUB_SORT_TYPE=uint64_t -o sort_cub_64 cub_sort.cu
//
#define THRUST_IGNORE_CUB_VERSION_CHECK
#include <curand.h>
@allanmac
allanmac / sort.cu
Last active August 15, 2018 16:54
CUB Radix Sort benchmark
View sort.cu
// -*- compile-command: "nvcc -I ../cub-1.8.0 -lcurand -arch sm_50 -o sort sort.cu"; -*-
#include <curand.h>
#include <cub/cub.cuh>
//
//
//
#include <stdbool.h>
@allanmac
allanmac / warp_scan.cu
Created August 6, 2016 18:48
Inclusive vs. exclusive warp scan
View warp_scan.cu
#include <stdio.h>
#include <stdint.h>
#define WARP_SIZE 32
//
//
//
@allanmac
allanmac / README.md
Last active June 10, 2023 11:11
Macros for neatly error checking OpenCL API functions.
View README.md

Simply adding two parentheses cl(...) gives you error checking for OpenCL API functions that return a cl_int error code.

The second cl_ok(err) macro is for error checking API functions that initialize their error code as an argument.

The header also includes a useful function for converting OpenCL errors to strings:

char const * clGetErrorString(cl_int const err);
View unit16v2.cu
// -*- compile-command: "nvcc -arch sm_50 -Xptxas=-v -use_fast_math unit16v2.cu -o unit16v2"; -*-
#include <stdio.h>
#include <stdint.h>
//
//
//
#define WARP_SIZE 32
@allanmac
allanmac / ck_2.cu
Last active May 19, 2022 03:08
Concurrent kernel test that demonstrates _different_ kernels running concurrently. Hacked from NVIDIA's example. ck_2.cu has two kernels each requiring half of an sm_50 multiprocessor's shared memory. Kernel "a" is run on 5 out of 6 launches, otherwise kernel "b" is launched. ck_6.cu has six kernels.
View ck_2.cu
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
View alpha_gamma_test.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@allanmac
allanmac / sort.cu
Last active May 30, 2021 14:06
Thrust Radix Sort benchmark
View sort.cu
// -*- compile-command: "nvcc -D THRUST_SORT_TYPE=uint32_t -arch sm_50 -o sort sort_32.cu"; -*-
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>
@allanmac
allanmac / probe_bw.cu
Last active October 10, 2019 15:27
Measure achieved bandwidth when performing 128, 256 or 512 byte transactions on a multi-megabyte extent. This appears to reproduce @Genoil's original findings: https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7
View probe_bw.cu
// -*- compile-command: "nvcc -m 64 -arch compute_30 -Xptxas=-v -o probe_bw probe_bw.cu"; -*-
//
// Copyright 2015 Allan MacKinnon <allanmac@alum.mit.edu>
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
@allanmac
allanmac / assert_cuda.c
Last active April 16, 2023 18:42
A tiny example of CUDA + OpenGL interop with write-only surfaces and CUDA kernels. Uses GLFW+GLAD.
View assert_cuda.c
//
//
//
#include <stdlib.h>
#include <stdio.h>
//
//
//