This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdio.h> | |
#include <arrayfire.h> | |
#include <thrust/device_vector.h> | |
#include <thrust/sort.h> | |
#include <thrust/binary_search.h> | |
#include <thrust/adjacent_difference.h> | |
using namespace af; | |
#define ITER 100 | |
int main(int argc, char **argv) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I have to be frank here, this is going to be | |
- criticism of thrust | |
- Showing off ArrayFire (of which I am a core developer) | |
*Criticism of thrust* | |
They do a good job at optimizing parallel algorithms for vector inputs. | |
They use data level parallelism (among other things) to parllelize algorithms that work really well for large, vector inputs. | |
But they fail to improve upon it and go all the way to perfom true data level parallelism. i.e. a large number of small problems. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdio.h> | |
#include <arrayfire.h> | |
using namespace af; | |
array in; | |
void bench_blas() | |
{ | |
array out = matmul(in, in); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ 109.455] | |
X.Org X Server 1.12.2 | |
Release Date: 2012-05-29 | |
[ 109.455] X Protocol Version 11, Revision 0 | |
[ 109.455] Build Operating System: Linux 3.0.32-1-lts x86_64 | |
[ 109.455] Current Operating System: Linux archer 3.3.7-1-ARCH #1 SMP PREEMPT Tue May 22 00:26:26 CEST 2012 x86_64 | |
[ 109.455] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=baa1b6d1-36e6-4243-b6c9-3b74361152ec ro init=/bin/systemd quiet add_efi_memmap | |
[ 109.455] Build Date: 30 May 2012 07:24:13PM | |
[ 109.455] | |
[ 109.455] Current version of pixman: 0.26.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <thrust/device_vector.h> | |
#include <thrust/host_vector.h> | |
#include <thrust/sort.h> | |
#define N 10 | |
__global__ | |
static void find_groups(int *locs, int *sorted, int num) | |
{ | |
int bid = blockIdx.y * gridDim.x + blockIdx.x; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ./test | |
size malloc + for malloc+memset calloc | |
2045 0.003761 0.002064 0.002071 | |
2046 0.003695 0.002043 0.002039 | |
2047 0.003695 0.002036 0.002044 | |
2048 0.003704 0.002038 0.002036 | |
2049 0.003734 0.002039 0.002069 | |
2050 0.004025 0.004864 0.000002 | |
2051 0.003986 0.004832 0.000002 | |
2052 0.004025 0.004823 0.000002 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdio.h> | |
#include <clFFT.h> | |
#define ERR(str, status) do { \ | |
printf("%s(%d):"str, \ | |
__FILE__, __LINE__, status); \ | |
return status; \ | |
} while(0) | |
#define CLFFT(fn) do { \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python3 | |
def format(f): | |
return int(f * 100) / 100 | |
def isprime_6k(p): | |
# Basic Test | |
if (p % 2 == 0): | |
return 0,1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Number of platforms: 1 | |
Platform Profile: FULL_PROFILE | |
Platform Version: OpenCL 1.1 CUDA 4.2.1 | |
Platform Name: NVIDIA CUDA | |
Platform Vendor: NVIDIA Corporation | |
Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll | |
Platform Name: NVIDIA CUDA | |
Number of devices: 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
======================================================== | |
AN INTERNAL KERNEL BUILD ERROR OCCURRED! | |
device name = Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz | |
error = -11 | |
memory pattern = Cached global memory based subgroup gemm, computing kernel generator | |
Subproblem dimensions: dims[0].itemY = 16, dims[0].itemX = 8, dims[0].y = 16, dims[0].x = 8, dims[0].bwidth = 64; ; dims[1].itemY = 4, dims[1].itemX = 4, dims[1].y = 4, dims[1].x = 4, dims[1].bwidth = 8; ; | |
Parallelism granularity: pgran->wgDim = 2, pgran->wgSize[0] = 8, pgran->wgSize[1] = 8, pgran->wfSize = 64 | |
Kernel extra flags: 939556625 |
OlderNewer