This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The clock cycle count for "expf()" (reference) is 141645. | |
The clock cycle count for "expapprox()" is 74. | |
The clock cycle count for "expapprox4()" is 127 (/4 = 31). | |
// GCC | |
#define RESTRICT __restrict__ | |
// Disable range check makes faster evaluation of exp(). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// code From http://gallium.inria.fr/blog/fast-vectorizable-math-approx/ | |
$ e-gcc compile options: -O3 -mno-soft-cmpsf -mcmove -mfp-mode=truncate | |
// 73 clocks | |
00000f40 <_expapprox>: | |
f40: 200b 0002 mov r1,0x0 | |
f44: 476b 0aa2 mov r2,0xaa3b | |
f48: 470b 14b2 movt r2,0x4b38 | |
f4c: 2fcb 14e2 movt r1,0x4e7e |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Writing (&x)[i] in operator[] is safe/correct C++ code or not? | |
// The following is the reduced code fragment(Thus not work by just copy&paste) which Intel C++ compier(ver 13 and 15) miscompiles(Release build only) the code for the access to real3 object through operator[] inside OpenMP loop. | |
// clang and gcc are OK to compile&run | |
typedef float real; | |
struct real3 { | |
real3() {} | |
real3(real xx, real yy, real zz) { | |
x = xx; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CL_DEVICE_NAME: pthread-Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | |
CL_DEVICE_VENDOR: | |
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 | |
CL_DEVICE_PROFILE: FULL_PROFILE | |
CL_DEVICE_VERSION: OpenCL 1.2 pocl | |
CL_DRIVER_VERSION: 0.9 | |
CL_DEVICE_EXTENSIONS: cl_khr_fp64 cl_khr_fp16 cl_khr_byte_addressable_store | |
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 | |
CL_DEVICE_IMAGE_SUPPORT: 1 | |
CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Demo settings: | |
SelectedDemo=1, demoname = BoxBox | |
x_dim=30, y_dim=30, z_dim=30 | |
x_gap=16.299999, y_gap=6.300000, z_gap=16.299999 | |
OpenCL settings: | |
Preferred cl_device index 1 | |
Preferred cl_platform index-1 | |
Platform info: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import ninja_syntax | |
import glob | |
import os | |
cxx_files = [ | |
"src/OptionParser.cpp" | |
, "src/easywsclient.cpp" | |
, "src/json_to_eson.cc" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
typedef float float4 __attribute__((ext_vector_type(4))); | |
void | |
clang_test(float* out, float* in) | |
{ | |
float4 a, b; | |
b = *((float4*)in); | |
a = b * b; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
void | |
clang_test(float* out, float* in) | |
{ | |
for (int i = 0; i < 4; i++) { | |
out[i] = in[i] * in[i]; | |
} | |
} | |
// asm output |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==20402== Memcheck, a memory error detector | |
==20402== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. | |
==20402== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info | |
==20402== Command: ../../ispc --emit-c++ --target=generic-16 -h ao_ispc.h -o bora.cc ao.ispc | |
==20402== | |
==20402== Conditional jump or move depends on uninitialised value(s) | |
==20402== at 0xBC0085: clang::SourceManager::getColumnNumber(clang::FileID, unsigned int, bool*) const (in /home/syoyo/work/ispc/ispc) | |
==20402== by 0xBC3838: clang::SourceManager::getPresumedLoc(clang::SourceLocation) const (in /home/syoyo/work/ispc/ispc) | |
==20402== by 0x50734D: (anonymous namespace)::PrintPPOutputPPCallbacks::FileChanged(clang::SourceLocation, clang::PPCallbacks::FileChangeReason, clang::SrcMgr::CharacteristicKind, clang::FileID) (in /home/syoyo/work/ispc/ispc) | |
==20402== by 0xC17F3B: clang::Preprocessor::HandleDigitDirective(clang::Token&) (in /home/syoyo/work/ispc/ispc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
std::vector<std::thread> workers; | |
std::atomic<unsigned int> i(0); | |
for (auto t = 0; t < std::thread::hardware_concurrency(); t++){ | |
workers.push_back(std::thread([&,t](){ | |
int index = 0; | |
while((index = i++) < numItems){ | |
... | |
} | |
})); |