Skip to content

Instantly share code, notes, and snippets.

View oscarbg's full-sized avatar

Oscar Barenys oscarbg

View GitHub Profile
@oscarbg
oscarbg / gist:fac8a36539e82ab5f97e
Created January 17, 2015 15:12
Trying to access fp16x2 on Maxwell on CUDA 7.0
//goes well until ptxas which says something about invalid arguments so at least f16x2 modified atom instruction is recognized?
u32
atomf16x2(u32 a, u32 b)
{
u32 d;
asm("atom.global.add.f16x2 %0, [%1], %2;" : "=r"(d) : "r"(a), "r"(b));
//atom.global.add.u32 %r5, [%rd2], 10;
//asm("mul.wide.s16 %0, %1, %2;" : "=r"(d) : "h"(a), "h"(b));
@oscarbg
oscarbg / int_mul.cu
Last active August 29, 2015 14:13 — forked from allanmac/int_mul.cu
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin int_mul.cu" ; -*-
#include <stdint.h>
//
//
//
#define KERNEL_QUALIFIERS __global__
#define KERNEL_QUALIFIERS_EXTERN extern KERNEL_QUALIFIERS