Skip to content

Instantly share code, notes, and snippets.

View syoyo's full-sized avatar
💗
ray tracing

Syoyo Fujita syoyo

💗
ray tracing
View GitHub Profile
@syoyo
syoyo / gist:ef68a9c5b46b040e88db
Created June 12, 2015 04:24
exp() approximate function on Epiphany.
The clock cycle count for "expf()" (reference) is 141645.
The clock cycle count for "expapprox()" is 74.
The clock cycle count for "expapprox4()" is 127 (/4 = 31).
// GCC
#define RESTRICT __restrict__
// Disable range check makes faster evaluation of exp().
@syoyo
syoyo / gist:9484d27be95e3303789b
Created June 8, 2015 14:15
expapprox() compiled for Parallella Epiphany
// code From http://gallium.inria.fr/blog/fast-vectorizable-math-approx/
$ e-gcc compile options: -O3 -mno-soft-cmpsf -mcmove -mfp-mode=truncate
// 73 clocks
00000f40 <_expapprox>:
f40: 200b 0002 mov r1,0x0
f44: 476b 0aa2 mov r2,0xaa3b
f48: 470b 14b2 movt r2,0x4b38
f4c: 2fcb 14e2 movt r1,0x4e7e
@syoyo
syoyo / gist:24d0bf30dd2a9b5b2b69
Last active March 26, 2016 08:26
Intel C compiler bug or ill-defined C++?
// Writing (&x)[i] in operator[] is safe/correct C++ code or not?
// The following is the reduced code fragment(Thus not work by just copy&paste) which Intel C++ compier(ver 13 and 15) miscompiles(Release build only) the code for the access to real3 object through operator[] inside OpenMP loop.
// clang and gcc are OK to compile&run
typedef float real;
struct real3 {
real3() {}
real3(real xx, real yy, real zz) {
x = xx;
@syoyo
syoyo / gist:8399386
Created January 13, 2014 12:21
pocl CLinfo.
CL_DEVICE_NAME: pthread-Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
CL_DEVICE_VENDOR:
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 1.2 pocl
CL_DRIVER_VERSION: 0.9
CL_DEVICE_EXTENSIONS: cl_khr_fp64 cl_khr_fp16 cl_khr_byte_addressable_store
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
@syoyo
syoyo / gist:7245357
Created October 31, 2013 07:01
Bullet3 on Radeon R9 280X(1050 MHz OC model)
Demo settings:
SelectedDemo=1, demoname = BoxBox
x_dim=30, y_dim=30, z_dim=30
x_gap=16.299999, y_gap=6.300000, z_gap=16.299999
OpenCL settings:
Preferred cl_device index 1
Preferred cl_platform index-1
Platform info:
@syoyo
syoyo / gist:778d2294fd5534e0f923
Created December 15, 2015 10:55
ninja generator example
#!/usr/bin/env python
import ninja_syntax
import glob
import os
cxx_files = [
"src/OptionParser.cpp"
, "src/easywsclient.cpp"
, "src/json_to_eson.cc"
@syoyo
syoyo / gist:4945410
Created February 13, 2013 15:30
clang's vector extension -> ARM NEON codegen test.
typedef float float4 __attribute__((ext_vector_type(4)));
void
clang_test(float* out, float* in)
{
float4 a, b;
b = *((float4*)in);
a = b * b;
@syoyo
syoyo / gist:4945456
Created February 13, 2013 15:35
clang's ARM NEON codegen test.
void
clang_test(float* out, float* in)
{
for (int i = 0; i < 4; i++) {
out[i] = in[i] * in[i];
}
}
// asm output
@syoyo
syoyo / gist:4665840
Created January 29, 2013 17:05
valgrind --tool=memcheck --leak-check=full ../../ispc --emit-c++ --target=generic-16 -h ao_ispc.h -o bora.cc ao.ispc
This file has been truncated, but you can view the full file.
==20402== Memcheck, a memory error detector
==20402== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==20402== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==20402== Command: ../../ispc --emit-c++ --target=generic-16 -h ao_ispc.h -o bora.cc ao.ispc
==20402==
==20402== Conditional jump or move depends on uninitialised value(s)
==20402== at 0xBC0085: clang::SourceManager::getColumnNumber(clang::FileID, unsigned int, bool*) const (in /home/syoyo/work/ispc/ispc)
==20402== by 0xBC3838: clang::SourceManager::getPresumedLoc(clang::SourceLocation) const (in /home/syoyo/work/ispc/ispc)
==20402== by 0x50734D: (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID) (in /home/syoyo/work/ispc/ispc)
==20402== by 0xC17F3B: clang::Preprocessor::HandleDigitDirective(clang::Token&) (in /home/syoyo/work/ispc/ispc)
@syoyo
syoyo / gist:d502bd890f9e32f159da
Created October 25, 2015 06:12
Simple task queue using C++11 thread&atomic
std::vector<std::thread> workers;
std::atomic<unsigned int> i(0);
for (auto t = 0; t < std::thread::hardware_concurrency(); t++){
workers.push_back(std::thread([&,t](){
int index = 0;
while((index = i++) < numItems){
...
}
}));