Skip to content

Instantly share code, notes, and snippets.

View allanmac's full-sized avatar

Allan MacKinnon allanmac

  • Dispatch3 Inc.
  • South Florida, USA
  • 08:13 (UTC -04:00)
  • X @pixelio
View GitHub Profile
__global__
void fmaTest(float* const values)
{
const unsigned int tidx = threadIdx.x;
const float b = values[ tidx];
float a = values[2*tidx];
a = __fmaf_rn(a, b, 0.73f);
a = __fmaf_rn(a, b, 0.37f);
@allanmac
allanmac / fmuladd.cu
Last active December 10, 2015 08:38
__global__
void fmuladdTest(float* const values)
{
const unsigned int tidx = threadIdx.x;
const float b = values[ tidx];
float a = values[2*tidx];
a = __fmul_rn(a, b);
a = __fadd_rn(a, 0.73f);
@allanmac
allanmac / threadedCode.cu
Created December 9, 2012 05:42
A primitive example of threaded code in CUDA.
#include <stdio.h>
//
//
//
#define LAUNCH_BOUNDS // __launch_bounds__(512)
#define DEVICE_FUNCTION_QUALIFIERS __device__