Skip to content

Instantly share code, notes, and snippets.

jbarczak

Block or report user

Report or block jbarczak

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View gist:04e4ab898edf2e335917
Microsoft compiler appears to ignore prefetches inside a loop.
Tested this on MSVC 2013 express edition. Microsoft's connect site says I am not authorized to submit feedback for who knows what reason, or else I'd send it there directly......
Code I used:
void Foo( char* p, int* q )
{
for( size_t i=0; i<8; i++ )
@jbarczak
jbarczak / PrefixSum
Last active Aug 29, 2015
Prefix sum improvements suggested by ryg
View PrefixSum
static void __fastcall ReorderRays( StackFrame& frame, size_t nGroups )
{
RayPacket** pPackets = frame.pActivePackets;
uint32 pIDs[MAX_TRACER_SIZE];
size_t nHitLoc = 0;
size_t nMissLoc = 8*nGroups;
const char* pRays = (const char*) frame.pRays;
@jbarczak
jbarczak / Reorder_with_shuffle_LUT
Created Jun 14, 2015
ray reordering with shuffle lut
View Reorder_with_shuffle_LUT
// Tried this, and it was marginally slower
//
// Some notes about this:
// 1. Seperate hit/miss arrays force me to use a lot more stack than I did before, and
// probably doesn't use the cache quite as well.
// 2. The prefetching of the rays doesn't fit in quite as neatly, and doesn't help anymore if I stick it in there
// it might make more sense to move that elsewhere anyway
// 3. LUT is 256 bytes. Not too bad, but it's probably knocking a few rays out of the cache
// 4. Reordering can produce at least one packet that is partially miss and partially hit.
View Reorder_with_lut_again
static const __m128i SHUFFLE_TABLE[16] = {
_mm_setr_epi8(12,13,14,15, 8, 9,10,11, 4, 5, 6, 7, 0, 1, 2, 3),
_mm_setr_epi8( 0, 1, 2, 3,12,13,14,15, 8, 9,10,11, 4, 5, 6, 7),
_mm_setr_epi8( 4, 5, 6, 7,12,13,14,15, 8, 9,10,11, 0, 1, 2, 3),
_mm_setr_epi8( 0, 1, 2, 3, 4, 5, 6, 7,12,13,14,15, 8, 9,10,11),
_mm_setr_epi8( 8, 9,10,11,12,13,14,15, 4, 5, 6, 7, 0, 1, 2, 3),
_mm_setr_epi8( 0, 1, 2, 3, 8, 9,10,11,12,13,14,15, 4, 5, 6, 7),
_mm_setr_epi8( 4, 5, 6, 7, 8, 9,10,11,12,13,14,15, 0, 1, 2, 3),
View Tokenizer_Comparison.cpp
#include <string>
#include <fstream>
#include <istream>
#include <sstream>
#include <boost/tokenizer.hpp>
#include <boost/timer/timer.hpp>
using namespace std;
View gist:26db1e78c893afb84c3cc9b50f763ac5
Results for Sebastian Aaltonen's buffer tester https://github.com/sebbbi/perftest
From Intel Haswell GT2 (i3 4010-U). It was necessary to change the threadgroup count to 64x64 down from 1024, or the test would TDR.
Load R8 invariant: 2.106ms
Load R8 linear: 13.438ms
Load R8 random: 6.053ms
Load RG8 invariant: 2.105ms
Load RG8 linear: 12.763ms
Load RG8 random: 6.229ms
Load RGBA8 invariant: 2.105ms
You can’t perform that action at this time.