Skip to content

Instantly share code, notes, and snippets.

Allan MacKinnon allanmac

Block or report user

Report or block allanmac

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@allanmac
allanmac / sort.cu
Last active Aug 15, 2018
CUB Radix Sort benchmark
View sort.cu
// -*- compile-command: "nvcc -I ../cub-1.8.0 -lcurand -arch sm_50 -o sort sort.cu"; -*-
#include <curand.h>
#include <cub/cub.cuh>
//
//
//
#include <stdbool.h>
@allanmac
allanmac / warp_scan.cu
Created Aug 6, 2016
Inclusive vs. exclusive warp scan
View warp_scan.cu
#include <stdio.h>
#include <stdint.h>
#define WARP_SIZE 32
//
//
//
@allanmac
allanmac / README.md
Last active Dec 16, 2018
Macros for neatly error checking OpenCL API functions.
View README.md

Simply adding two parentheses cl(...) gives you error checking for OpenCL API functions that return a cl_int error code.

The second cl_ok(err) macro is for error checking API functions that initialize their error code as an argument.

The header also includes a useful function for converting OpenCL errors to strings:

char const * clGetErrorString(cl_int const err);
View unit16v2.cu
// -*- compile-command: "nvcc -arch sm_50 -Xptxas=-v -use_fast_math unit16v2.cu -o unit16v2"; -*-
#include <stdio.h>
#include <stdint.h>
//
//
//
#define WARP_SIZE 32
@allanmac
allanmac / ck_2.cu
Last active Oct 19, 2019
Concurrent kernel test that demonstrates _different_ kernels running concurrently. Hacked from NVIDIA's example. ck_2.cu has two kernels each requiring half of an sm_50 multiprocessor's shared memory. Kernel "a" is run on 5 out of 6 launches, otherwise kernel "b" is launched. ck_6.cu has six kernels.
View ck_2.cu
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
View alpha_gamma_test.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@allanmac
allanmac / sort.cu
Last active Aug 28, 2019
Thrust Radix Sort benchmark
View sort.cu
// -*- compile-command: "nvcc -D THRUST_SORT_TYPE=uint32_t -arch sm_50 -o sort sort_32.cu"; -*-
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>
@allanmac
allanmac / probe_bw.cu
Last active Oct 10, 2019
Measure achieved bandwidth when performing 128, 256 or 512 byte transactions on a multi-megabyte extent. This appears to reproduce @Genoil's original findings: https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7
View probe_bw.cu
// -*- compile-command: "nvcc -m 64 -arch compute_30 -Xptxas=-v -o probe_bw probe_bw.cu"; -*-
//
// Copyright 2015 Allan MacKinnon <allanmac@alum.mit.edu>
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
@allanmac
allanmac / assert_cuda.c
Last active Jul 22, 2019
A tiny example of CUDA + OpenGL interop with write-only surfaces and CUDA kernels. Uses GLFW+GLAD.
View assert_cuda.c
//
//
//
#include <stdlib.h>
#include <stdio.h>
//
//
//
@allanmac
allanmac / ldg.cu
Created Mar 5, 2015
ld.global.nc (LDG.CI) operations not being generated when const+restrict pointers are within a const struct passed as a kernel argument
View ldg.cu
//
//
//
#define WARP_SIZE 32
#define KERNEL_QUALIFIERS extern "C" __global__
#define DEVICE_INTRINSIC_QUALIFIERS __device__ __forceinline__
#define RESTRICT __restrict__
//
You can’t perform that action at this time.