Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save genkuroki/c9ae4e03fb379235ac6b26f5d62e4da5 to your computer and use it in GitHub Desktop.
Save genkuroki/c9ae4e03fb379235ac6b26f5d62e4da5 to your computer and use it in GitHub Desktop.
Benchmark test of Monte Carlo calculation of pi (Julia v1.1.1)
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Benchmark test of Monte Carlo calculation of pi (Julia v1.1.1)\n\ngcc Allied Forces vs. Julia Imperial Military \n\nGen Kuroki\n\n2017-08-20~2019-05-18"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "using Libdl\nusing Distributed",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "versioninfo()",
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": "Julia Version 1.1.1\nCommit 55e36cc308 (2019-05-16 04:10 UTC)\nPlatform Info:\n OS: Windows (x86_64-w64-mingw32)\n CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz\n WORD_SIZE: 64\n LIBM: libopenlibm\n LLVM: libLLVM-6.0.1 (ORCJIT, haswell)\nEnvironment:\n JULIA_CMDSTAN_HOME = C:\\CmdStan\n JULIA_NUM_THREADS = 4\n JULIA_PKGDIR = C:\\JuliaPkg\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# julia -p auto\n\naddprocs(8)\nnworkers()",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": "8"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "run(`gcc --version`)",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "gcc (Rev3, Built by MSYS2 project) 6.3.0\r\nCopyright (C) 2016 Free Software Foundation, Inc.\r\nThis is free software; see the source for copying conditions. There is NO\r\nwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\r\n\r\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/plain": "Process(`\u001b[4mgcc\u001b[24m \u001b[4m--version\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "ENV[\"OMP_NUM_THREADS\"]",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": "\"8\""
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "using BenchmarkTools",
"execution_count": 6,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function findpi_julia(n)\n s = 0\n for j in 1:n\n s += ifelse(rand()^2+rand()^2 ≤ 1, 1, 0)\n end\n return 4.0 * s / n\nend\nfindpi_julia(10^5)",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": "3.14248"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_julia(10^8)",
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": " 0.421451 seconds (1.93 k allocations: 116.137 KiB)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "3.14163068"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# julia -p auto\n\nfunction findpi_julia_parallel(n)\n s = @distributed (+) for j in 1:n\n ifelse(rand()^2+rand()^2 ≤ 1, 1, 0)\n end\n return 4.0 * s / n\nend\nfindpi_julia_parallel(10^5)",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": "3.13112"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_julia_parallel(10^8)",
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": " 0.116158 seconds (786 allocations: 35.344 KiB)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/plain": "3.14176644"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "C_code= raw\"\"\"\n#include <time.h>\n#include <stdlib.h>\n\ndouble findpi(unsigned long n){\n double x,y;\n srand((unsigned)time(NULL));\n double R = (double) RAND_MAX;\n unsigned long count = 0;\n for(unsigned long j=0; j < n; j++){\n x = ((double) rand())/R;\n y = ((double) rand())/R;\n if(x*x + y*y <= 1){\n count++;\n }\n }\n return ((double) 4.0)*((double) count)/((double) n);\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 328961 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_C009.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_C009.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_rand_lib = filename\nfindpi_gcc_rand(n::Int64) = ccall((:findpi, findpi_gcc_rand_lib), Float64, (Int64,), n)\nfindpi_gcc_rand(10^5)",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/plain": "3.13984"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_rand(10^8)",
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": " 2.727899 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/plain": "3.14178392"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "C_code= raw\"\"\"\n#include <time.h>\n#include <stdlib.h>\n\ndouble findpi(unsigned long n){\n unsigned long x,y;\n srand((unsigned)time(NULL));\n double RR = (unsigned long) RAND_MAX*RAND_MAX;\n unsigned long count = 0;\n for(unsigned long j=0; j < n; j++){\n x = (unsigned long) rand();\n y = (unsigned long) rand();\n if(x*x + y*y <= RR){\n count++;\n }\n }\n return ((double) 4.0)*((double) count)/((double) n);\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 328961 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_AE2B.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_AE2B.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_rand_int_lib = filename\nfindpi_gcc_rand_int(n::Int64) = ccall((:findpi, findpi_gcc_rand_int_lib), Float64, (Int64,), n)\nfindpi_gcc_rand_int(10^5)",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/plain": "3.1384"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_rand_int(10^8)",
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": " 1.883796 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/plain": "3.14150884"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.sep.tgz\n#\n# Extract\n# mt19937ar.sep/mt19937ar.h\n# mt19937ar.sep/mt19937ar.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <stdlib.h>\n#include <time.h>\n#include \"mt19937ar.sep/mt19937ar.h\"\n#include \"mt19937ar.sep/mt19937ar.c\"\n\ndouble findpi(unsigned long n){\n srand((unsigned)time(NULL));\n init_genrand(rand());\n unsigned long count = 0;\n double x,y;\n for(unsigned long j = 0; j < n; j++){\n x = genrand_real1();\n y = genrand_real1();\n if(x*x + y*y <= 1){\n count++;\n }\n }\n return ((double)4.0)*((double)count)/((double)n);\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 330305 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_43DE.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 17,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_43DE.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_MT_lib = filename\nfindpi_gcc_MT(n::Int64) = ccall((:findpi, findpi_gcc_MT_lib), Float64, (Int64,), n)\nfindpi_gcc_MT(10^5)",
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 18,
"data": {
"text/plain": "3.15404"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_MT(10^8)",
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"text": " 1.055235 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/plain": "3.14167884"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/mt19937-64stdint.tgz\n#\n# Extract\n# mt19937-64/mt64.h\n# mt19937-64/mt19937-64.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <stdlib.h>\n#include <time.h>\n#define HAVE_SSE2\n#include \"mt19937-64/mt64.h\"\n#include \"mt19937-64/mt19937-64.c\"\n\ndouble findpi(uint64_t n){\n srand((unsigned)time(NULL));\n uint64_t init[4]={(uint64_t)rand(), (uint64_t)rand(), (uint64_t)rand(), (uint64_t)rand()};\n uint64_t length = 4;\n init_by_array64(init, length);\n uint64_t count = 0;\n double x,y;\n for(uint64_t j = 0; j < n; j++){\n x = genrand64_real2();\n y = genrand64_real2();\n if(x*x + y*y <= 1){\n count++;\n }\n }\n return ((double)4.0)*((double)count)/((double)n);\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 332335 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_B81F.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 20,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_B81F.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_MT64_lib = filename\nfindpi_gcc_MT64(n::Int64) = ccall((:findpi, findpi_gcc_MT64_lib), Float64, (Int64,), n)\nfindpi_gcc_MT64(10^5)",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 21,
"data": {
"text/plain": "3.1332"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_MT64(10^8)",
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"text": " 1.144136 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 22,
"data": {
"text/plain": "3.14182552"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/dSFMT-src-2.2.3.tar.gz\n#\n# Extract\n# dSFMT-src-2.2.3/dSFMT.h\n# dSFMT-src-2.2.3/dSFMT.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <stdlib.h>\n#include <time.h>\n#define HAVE_SSE2\n#define DSFMT_MEXP 521\n#include \"dSFMT-src-2.2.3/dSFMT.h\"\n#include \"dSFMT-src-2.2.3/dSFMT.c\"\n\ndouble findpi(unsigned long n){\n srand((unsigned)time(NULL));\n dsfmt_t dsfmt;\n dsfmt_init_gen_rand(&dsfmt, rand());\n unsigned long count = 0;\n double x,y;\n for(unsigned long j = 0; j < n; j++){\n x = dsfmt_genrand_close_open(&dsfmt);\n y = dsfmt_genrand_close_open(&dsfmt);\n if(x*x + y*y <= 1){\n count++;\n }\n }\n return ((double)4.0)*((double)count)/((double)n);\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 23,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 364419 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_895D.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 23,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_895D.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_dSFMT_lib = filename\nfindpi_gcc_dSFMT(n::Int64) = ccall((:findpi, findpi_gcc_dSFMT_lib), Float64, (Int64,), n)\nfindpi_gcc_dSFMT(10^5)",
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 24,
"data": {
"text/plain": "3.13508"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_dSFMT(10^8)",
"execution_count": 25,
"outputs": [
{
"output_type": "stream",
"text": " 0.286933 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 25,
"data": {
"text/plain": "3.14149744"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/dSFMT-src-2.2.3.tar.gz\n#\n# Extract\n# dSFMT-src-2.2.3/dSFMT.h\n# dSFMT-src-2.2.3/dSFMT.c\n\n# https://gist.github.com/Toru3/fba81b9106d8fa88e120ce85ab1d98f7\n# findpi_dSMFT_openmp.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <time.h>\n#include <stdlib.h>\n#include <omp.h>\n#define HAVE_SSE2\n#define DSFMT_MEXP 521\n#include \"dSFMT-src-2.2.3/dSFMT.h\"\n#include \"dSFMT-src-2.2.3/dSFMT.c\"\n\ndouble findpi(unsigned long n){\n srand((unsigned)time(NULL));\n unsigned long threads = omp_get_num_procs();\n n = (n+threads-1)/threads*threads;\n const unsigned long size = n/threads;\n\n dsfmt_t dsfmt[threads];\n for(unsigned long i=0; i<threads; i++){\n dsfmt_init_gen_rand(dsfmt+i, rand());\n }\n\n unsigned long count = 0;\n#pragma omp parallel for reduction(+:count)\n for(unsigned long i=0; i<threads; i++){\n for(unsigned long j=0; j<size; j++){\n double x = dsfmt_genrand_close_open(dsfmt+i);\n double y = dsfmt_genrand_close_open(dsfmt+i);\n if(x*x+y*y<=1){\n count++;\n }\n }\n }\n return 4.0*count/n;\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -fopenmp -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 365820 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_719F.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 26,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_719F.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_dSFMT_OMP_lib = filename\nfindpi_gcc_dSFMT_OMP(n::Int64) = ccall((:findpi, findpi_gcc_dSFMT_OMP_lib), Float64, (Int64,), n)\nfindpi_gcc_dSFMT_OMP(10^5)",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/plain": "3.15112"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_dSFMT_OMP(10^8)",
"execution_count": 28,
"outputs": [
{
"output_type": "stream",
"text": " 0.132410 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 28,
"data": {
"text/plain": "3.14185336"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/dSFMT-src-2.2.3.tar.gz\n#\n# Extract\n# dSFMT-src-2.2.3/dSFMT.h\n# dSFMT-src-2.2.3/dSFMT.c\n\n# https://gist.github.com/Toru3/fba81b9106d8fa88e120ce85ab1d98f7\n# findpi_dSMFT_openmp_sse.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <time.h>\n#include <stdlib.h>\n#include <omp.h>\n#include <emmintrin.h>\n#define HAVE_SSE2\n#define DSFMT_MEXP 521\n#include \"dSFMT-src-2.2.3/dSFMT.h\"\n#include \"dSFMT-src-2.2.3/dSFMT.c\"\n\ndouble findpi(unsigned long n){\n srand((unsigned)time(NULL));\n unsigned long threads = omp_get_num_procs();\n const unsigned long N = 8;\n unsigned long width = threads * N;\n n = (n+width-1)/width*width;\n const unsigned long size = n/threads;\n\n dsfmt_t dsfmt[threads];\n for(unsigned long i=0; i<threads; i++){\n dsfmt_init_gen_rand(dsfmt+i, rand());\n }\n\n unsigned long count = 0;\n#pragma omp parallel for reduction(+:count)\n for(unsigned long i=0; i<threads; i++){\n __m128d one;\n {\n double one_d = 1;\n one = _mm_loadl_pd(one, &one_d);\n one = _mm_loadh_pd(one, &one_d);\n }\n double s[2*N];\n double t[N/2][2];\n __m128d x0, x1, x2, x3;\n __m128d y0, y1, y2, y3;\n __m128d c0, c1, c2, c3;\n for(unsigned long j=0; j<size; j+=N){\n dsfmt_fill_array_close1_open2(dsfmt+i, s, 2*N);\n x0 = _mm_loadu_pd(s+0);\n y0 = _mm_loadu_pd(s+2);\n x1 = _mm_loadu_pd(s+4);\n y1 = _mm_loadu_pd(s+6);\n x2 = _mm_loadu_pd(s+8);\n y2 = _mm_loadu_pd(s+10);\n x3 = _mm_loadu_pd(s+12);\n y3 = _mm_loadu_pd(s+14);\n\n x0 = _mm_sub_pd(x0, one);\n y0 = _mm_sub_pd(y0, one);\n x1 = _mm_sub_pd(x1, one);\n y1 = _mm_sub_pd(y1, one);\n x2 = _mm_sub_pd(x2, one);\n y2 = _mm_sub_pd(y2, one);\n x3 = _mm_sub_pd(x3, one);\n y3 = _mm_sub_pd(y3, one);\n\n x0 = _mm_mul_pd(x0, x0);\n y0 = _mm_mul_pd(y0, y0);\n x1 = _mm_mul_pd(x1, x1);\n y1 = _mm_mul_pd(y1, y1);\n x2 = _mm_mul_pd(x2, x2);\n y2 = _mm_mul_pd(y2, y2);\n x3 = _mm_mul_pd(x3, x3);\n y3 = _mm_mul_pd(y3, y3);\n\n x0 = _mm_add_pd(x0, y0);\n x1 = _mm_add_pd(x1, y1);\n x2 = _mm_add_pd(x2, y2);\n x3 = _mm_add_pd(x3, y3);\n c0 = _mm_cmple_pd(x0, one);\n c1 = _mm_cmple_pd(x1, one);\n c2 = _mm_cmple_pd(x2, one);\n c3 = _mm_cmple_pd(x3, one);\n\n _mm_storeu_pd(t[0], c0);\n _mm_storeu_pd(t[1], c1);\n _mm_storeu_pd(t[2], c2);\n _mm_storeu_pd(t[3], c3);\n\n count +=\n (t[0][0]!=0)+(t[0][1]!=0)\n +(t[1][0]!=0)+(t[1][1]!=0)\n +(t[2][0]!=0)+(t[2][1]!=0)\n +(t[3][0]!=0)+(t[3][1]!=0);\n }\n }\n return 4.0*count/n;\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -O3 -march=native -fopenmp -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 29,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 366332 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_4826.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 29,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_4826.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_dSFMT_OMP_SSE_lib = filename\nfindpi_gcc_dSFMT_OMP_SSE(n::Int64) = ccall((:findpi, findpi_gcc_dSFMT_OMP_SSE_lib), Float64, (Int64,), n)\nfindpi_gcc_dSFMT_OMP_SSE(10^5)",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 30,
"data": {
"text/plain": "3.1421145233525274"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_dSFMT_OMP_SSE(10^8)",
"execution_count": 31,
"outputs": [
{
"output_type": "stream",
"text": " 0.126900 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 31,
"data": {
"text/plain": "3.14174264"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Download http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/dSFMT-src-2.2.3.tar.gz\n#\n# Extract\n# dSFMT-src-2.2.3/dSFMT.h\n# dSFMT-src-2.2.3/dSFMT.c\n\n# https://gist.github.com/Toru3/fba81b9106d8fa88e120ce85ab1d98f7\n# findpi_dSMFT_openmp_fma.c\n\nC_code= raw\"\"\"\n#include <stdio.h>\n#include <time.h>\n#include <stdlib.h>\n#include <omp.h>\n#include <immintrin.h>\n#define HAVE_SSE2\n#define DSFMT_MEXP 521\n#include \"dSFMT-src-2.2.3/dSFMT.h\"\n#include \"dSFMT-src-2.2.3/dSFMT.c\"\n\ndouble findpi(unsigned long n){\n srand((unsigned)time(NULL));\n unsigned long threads = omp_get_num_procs();\n const unsigned long N = 32;\n unsigned long width = threads * N;\n n = (n+width-1)/width*width;\n const unsigned long size = n/threads;\n\n unsigned long seed[threads];\n for(unsigned long i=0; i<threads; i++) seed[i] = rand();\n\n unsigned long count = 0;\n#pragma omp parallel for reduction(+:count)\n for(unsigned long i=0; i<threads; i++){\n dsfmt_t dsfmt;\n dsfmt_init_gen_rand(&dsfmt, seed[i]);\n __m256d one;\n {\n double one_d = 1;\n one = _mm256_broadcast_sd(&one_d);\n }\n double s[2*N];\n long long t[N/4][4];\n __m256d x0, x1, x2, x3, x4, x5, x6, x7;\n __m256d y0, y1, y2, y3, y4, y5, y6, y7;\n __m256d c0, c1, c2, c3, c4, c5, c6, c7;\n for(unsigned long j=0; j<size; j+=N){\n dsfmt_fill_array_close1_open2(&dsfmt, s, 2*N);\n\n x0 = _mm256_loadu_pd(s+ 0);\n x1 = _mm256_loadu_pd(s+ 4);\n x2 = _mm256_loadu_pd(s+ 8);\n x3 = _mm256_loadu_pd(s+12);\n x4 = _mm256_loadu_pd(s+16);\n x5 = _mm256_loadu_pd(s+20);\n x6 = _mm256_loadu_pd(s+24);\n x7 = _mm256_loadu_pd(s+28);\n\n y0 = _mm256_loadu_pd(s+32);\n y1 = _mm256_loadu_pd(s+36);\n y2 = _mm256_loadu_pd(s+40);\n y3 = _mm256_loadu_pd(s+44);\n y4 = _mm256_loadu_pd(s+48);\n y5 = _mm256_loadu_pd(s+52);\n y6 = _mm256_loadu_pd(s+56);\n y7 = _mm256_loadu_pd(s+60);\n\n x0 = _mm256_sub_pd(x0, one);\n x1 = _mm256_sub_pd(x1, one);\n x2 = _mm256_sub_pd(x2, one);\n x3 = _mm256_sub_pd(x3, one);\n x4 = _mm256_sub_pd(x4, one);\n x5 = _mm256_sub_pd(x5, one);\n x6 = _mm256_sub_pd(x6, one);\n x7 = _mm256_sub_pd(x7, one);\n\n y0 = _mm256_sub_pd(y0, one);\n y1 = _mm256_sub_pd(y1, one);\n y2 = _mm256_sub_pd(y2, one);\n y3 = _mm256_sub_pd(y3, one);\n y4 = _mm256_sub_pd(y4, one);\n y5 = _mm256_sub_pd(y5, one);\n y6 = _mm256_sub_pd(y6, one);\n y7 = _mm256_sub_pd(y7, one);\n\n x0 = _mm256_mul_pd(x0, x0);\n x1 = _mm256_mul_pd(x1, x1);\n x2 = _mm256_mul_pd(x2, x2);\n x3 = _mm256_mul_pd(x3, x3);\n x4 = _mm256_mul_pd(x4, x4);\n x5 = _mm256_mul_pd(x5, x5);\n x6 = _mm256_mul_pd(x6, x6);\n x7 = _mm256_mul_pd(x7, x7);\n\n x0 = _mm256_fmadd_pd(y0, y0, x0);\n x1 = _mm256_fmadd_pd(y1, y1, x1);\n x2 = _mm256_fmadd_pd(y2, y2, x2);\n x3 = _mm256_fmadd_pd(y3, y3, x3);\n x4 = _mm256_fmadd_pd(y4, y4, x4);\n x5 = _mm256_fmadd_pd(y5, y5, x5);\n x6 = _mm256_fmadd_pd(y6, y6, x6);\n x7 = _mm256_fmadd_pd(y7, y7, x7);\n\n c0 = _mm256_cmp_pd(x0, one, _CMP_LE_OQ);\n c1 = _mm256_cmp_pd(x1, one, _CMP_LE_OQ);\n c2 = _mm256_cmp_pd(x2, one, _CMP_LE_OQ);\n c3 = _mm256_cmp_pd(x3, one, _CMP_LE_OQ);\n c4 = _mm256_cmp_pd(x4, one, _CMP_LE_OQ);\n c5 = _mm256_cmp_pd(x5, one, _CMP_LE_OQ);\n c6 = _mm256_cmp_pd(x6, one, _CMP_LE_OQ);\n c7 = _mm256_cmp_pd(x7, one, _CMP_LE_OQ);\n\n _mm256_storeu_pd((double*)t[0], c0);\n _mm256_storeu_pd((double*)t[1], c1);\n _mm256_storeu_pd((double*)t[2], c2);\n _mm256_storeu_pd((double*)t[3], c3);\n _mm256_storeu_pd((double*)t[4], c4);\n _mm256_storeu_pd((double*)t[5], c5);\n _mm256_storeu_pd((double*)t[6], c6);\n _mm256_storeu_pd((double*)t[7], c7);\n\n unsigned int f =\n ((t[0][0]&1 | t[0][1]&2 | t[0][2]&4 | t[0][3]&8) << 0) |\n ((t[1][0]&1 | t[1][1]&2 | t[1][2]&4 | t[1][3]&8) << 4) |\n ((t[2][0]&1 | t[2][1]&2 | t[2][2]&4 | t[2][3]&8) << 8) |\n ((t[3][0]&1 | t[3][1]&2 | t[3][2]&4 | t[3][3]&8) << 12) |\n ((t[4][0]&1 | t[4][1]&2 | t[4][2]&4 | t[4][3]&8) << 16) |\n ((t[5][0]&1 | t[5][1]&2 | t[5][2]&4 | t[5][3]&8) << 20) |\n ((t[6][0]&1 | t[6][1]&2 | t[6][2]&4 | t[6][3]&8) << 24) |\n ((t[7][0]&1 | t[7][1]&2 | t[7][2]&4 | t[7][3]&8) << 28) ;\n\n f = (f&0x55555555) + ((f>> 1)&0x55555555);\n f = (f&0x33333333) + ((f>> 2)&0x33333333);\n f = (f&0x0f0f0f0f) + ((f>> 4)&0x0f0f0f0f);\n f = (f&0x00ff00ff) + ((f>> 8)&0x00ff00ff);\n f = (f&0x0000ffff) + ((f>>16)&0x0000ffff);\n count += f;\n }\n }\n return 4.0*count/n;\n}\n\"\"\"\n\nfilename = tempname()\nfilenamedl = filename * \".\" * Libdl.dlext\n\nopen(`gcc -O3 -march=native -fopenmp -xc -shared -o $filenamedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filenamedl`)",
"execution_count": 32,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 366844 May 18 00:31 C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_E70D.tmp.dll\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 32,
"data": {
"text/plain": "Process(`\u001b[4mls\u001b[24m \u001b[4m-l\u001b[24m \u001b[4m'C:\\Users\\GENKUR~1\\AppData\\Local\\Temp\\jl_E70D.tmp.dll'\u001b[24m`, ProcessExited(0))"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "const findpi_gcc_dSFMT_OMP_FMA_lib = filename\nfindpi_gcc_dSFMT_OMP_FMA(n::Int64) = ccall((:findpi, findpi_gcc_dSFMT_OMP_FMA_lib), Float64, (Int64,), n)\nfindpi_gcc_dSFMT_OMP_FMA(10^5)",
"execution_count": 33,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 33,
"data": {
"text/plain": "3.137827685421995"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_gcc_dSFMT_OMP_FMA(10^8)",
"execution_count": 34,
"outputs": [
{
"output_type": "stream",
"text": " 0.069022 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 34,
"data": {
"text/plain": "3.14149036"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@time findpi_julia(10^9)\n@time findpi_gcc_rand(10^9)\n@time findpi_gcc_rand_int(10^9)\n@time findpi_gcc_MT(10^9)\n@time findpi_gcc_dSFMT(10^9)",
"execution_count": 35,
"outputs": [
{
"output_type": "stream",
"text": " 3.555915 seconds (6 allocations: 192 bytes)\n 25.356221 seconds (6 allocations: 192 bytes)\n 19.378051 seconds (6 allocations: 192 bytes)\n 10.162761 seconds (6 allocations: 192 bytes)\n 3.153303 seconds (6 allocations: 192 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 35,
"data": {
"text/plain": "3.141498072"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@btime findpi_julia(10^9)\n@btime findpi_gcc_MT64(10^9)\n@btime findpi_gcc_MT(10^9)\n@btime findpi_gcc_dSFMT(10^9)",
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"text": " 3.516 s (0 allocations: 0 bytes)\n 12.357 s (0 allocations: 0 bytes)\n 9.104 s (0 allocations: 0 bytes)\n 3.039 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 36,
"data": {
"text/plain": "3.141627472"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@btime findpi_julia_parallel(10^9)\n@btime findpi_gcc_dSFMT_OMP(10^9)\n@btime findpi_gcc_dSFMT_OMP_SSE(10^9)\n@btime findpi_gcc_dSFMT_OMP_FMA(10^9)",
"execution_count": 37,
"outputs": [
{
"output_type": "stream",
"text": " 1.060 s (805 allocations: 35.38 KiB)\n 1.328 s (0 allocations: 0 bytes)\n 1.145 s (0 allocations: 0 bytes)\n 573.449 ms (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 37,
"data": {
"text/plain": "3.141635224"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"_draft": {
"nbviewer_url": "https://gist.github.com/c9ae4e03fb379235ac6b26f5d62e4da5"
},
"gist": {
"id": "c9ae4e03fb379235ac6b26f5d62e4da5",
"data": {
"description": "Benchmark test of Monte Carlo calculation of pi (Julia v1.1.1)",
"public": true
}
},
"kernelspec": {
"name": "julia-1.1",
"display_name": "Julia 1.1.1",
"language": "julia"
},
"language_info": {
"file_extension": ".jl",
"name": "julia",
"mimetype": "application/julia",
"version": "1.1.1"
},
"toc": {
"nav_menu": {
"height": "12px",
"width": "252px"
},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment