Skip to content

Instantly share code, notes, and snippets.

@genkuroki
Last active January 30, 2018 06:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save genkuroki/13d644a2cf9210ce68f25da94a3fabbd to your computer and use it in GitHub Desktop.
Save genkuroki/13d644a2cf9210ce68f25da94a3fabbd to your computer and use it in GitHub Desktop.
calc pi by gcd
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"toc": true
},
"cell_type": "markdown",
"source": "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#calc-pi-by-gcd\" data-toc-modified-id=\"calc-pi-by-gcd-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>calc pi by gcd</a></span><ul class=\"toc-item\"><li><span><a href=\"#Julia版\" data-toc-modified-id=\"Julia版-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Julia版</a></span><ul class=\"toc-item\"><li><span><a href=\"#通常Julia版\" data-toc-modified-id=\"通常Julia版-1.1.1\"><span class=\"toc-item-num\">1.1.1&nbsp;&nbsp;</span>通常Julia版</a></span></li><li><span><a href=\"#@inline版のgcdを使った場合\" data-toc-modified-id=\"@inline版のgcdを使った場合-1.1.2\"><span class=\"toc-item-num\">1.1.2&nbsp;&nbsp;</span>@inline版のgcdを使った場合</a></span></li><li><span><a href=\"#Euclidの互除法版\" data-toc-modified-id=\"Euclidの互除法版-1.1.3\"><span class=\"toc-item-num\">1.1.3&nbsp;&nbsp;</span>Euclidの互除法版</a></span></li><li><span><a href=\"#@parallel版\" data-toc-modified-id=\"@parallel版-1.1.4\"><span class=\"toc-item-num\">1.1.4&nbsp;&nbsp;</span>@parallel版</a></span></li></ul></li><li><span><a href=\"#gcc版との比較\" data-toc-modified-id=\"gcc版との比較-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>gcc版との比較</a></span><ul class=\"toc-item\"><li><span><a href=\"#gcc-non-omp-版\" data-toc-modified-id=\"gcc-non-omp-版-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>gcc non-omp 版</a></span></li><li><span><a href=\"#gcc版のgcdを使ったJulia版\" data-toc-modified-id=\"gcc版のgcdを使ったJulia版-1.2.2\"><span class=\"toc-item-num\">1.2.2&nbsp;&nbsp;</span>gcc版のgcdを使ったJulia版</a></span></li><li><span><a href=\"#gcc-omp-版\" data-toc-modified-id=\"gcc-omp-版-1.2.3\"><span class=\"toc-item-num\">1.2.3&nbsp;&nbsp;</span>gcc omp 版</a></span></li><li><span><a href=\"#gcc版のgcdと@parallelを使ったJulia版\" data-toc-modified-id=\"gcc版のgcdと@parallelを使ったJulia版-1.2.4\"><span class=\"toc-item-num\">1.2.4&nbsp;&nbsp;</span>gcc版のgcdと@parallelを使ったJulia版</a></span></li></ul></li><li><span><a href=\"#n-=-20000-での比較\" data-toc-modified-id=\"n-=-20000-での比較-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>n = 20000 での比較</a></span></li><li><span><a href=\"#gcdをUInt64ではなくInt64で計算すると速くなる.\" data-toc-modified-id=\"gcdをUInt64ではなくInt64で計算すると速くなる.-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>gcdをUInt64ではなくInt64で計算すると速くなる.</a></span><ul class=\"toc-item\"><li><span><a href=\"#gcdの定義中のunsignedを削ると速くなる.\" data-toc-modified-id=\"gcdの定義中のunsignedを削ると速くなる.-1.4.1\"><span class=\"toc-item-num\">1.4.1&nbsp;&nbsp;</span>gcdの定義中のunsignedを削ると速くなる.</a></span></li><li><span><a href=\"#UInt64を使うと遅くなる.\" data-toc-modified-id=\"UInt64を使うと遅くなる.-1.4.2\"><span class=\"toc-item-num\">1.4.2&nbsp;&nbsp;</span>UInt64を使うと遅くなる.</a></span></li><li><span><a href=\"#n-=-20000-で-Int64-版-gcd-を試してみる\" data-toc-modified-id=\"n-=-20000-で-Int64-版-gcd-を試してみる-1.4.3\"><span class=\"toc-item-num\">1.4.3&nbsp;&nbsp;</span>n = 20000 で Int64 版 gcd を試してみる</a></span></li></ul></li><li><span><a href=\"#Juliaのgcdの2^63での挙動\" data-toc-modified-id=\"Juliaのgcdの2^63での挙動-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Juliaのgcdの2^63での挙動</a></span></li><li><span><a href=\"#mygcd-函数のベンチマークテスト\" data-toc-modified-id=\"mygcd-函数のベンチマークテスト-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>mygcd 函数のベンチマークテスト</a></span></li></ul></li></ul></div>"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# calc pi by gcd\n\n黒木玄\n\n2018-01-28\n\n2つの整数の組 $(a,b)$ が互いに素になる「確率」(最大公約数が $1$ になる「確率」)は $\\zeta(2)=\\pi^2/6$ の逆数になることがよく知られている. このノートブックではJulia言語の `gcd` 函数のテストをしながら, そのよく知られている事実を数値的に確認してみよう.\n\n**注意:** このノートブックのコードは極めて効率が悪い. `gcd` 函数のテストをせずに効率のよい計算法を採用すればものすごく速く計算できる. その方法については次のリンク先を参照して欲しい.\n\nhttps://twitter.com/brv00/status/957631161529810944"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "versioninfo()",
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": "Julia Version 0.6.2\nCommit d386e40c17* (2017-12-13 18:08 UTC)\nPlatform Info:\n OS: Windows (x86_64-w64-mingw32)\n CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz\n WORD_SIZE: 64\n BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n LAPACK: libopenblas64_\n LIBM: libopenlibm\n LLVM: libLLVM-3.9.1 (ORCJIT, haswell)\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "using BenchmarkTools\n\nN = 2^12",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/plain": "4096"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Julia版"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### 通常Julia版"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function calc_pi_by_gcd(N)\n s = 0\n for a in 1:N\n for b in 1:N\n s += ifelse(gcd(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd(N)\n@benchmark calc_pi_by_gcd(N)",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 601.057 ms (0.00% GC)\n median time: 623.556 ms (0.00% GC)\n mean time: 628.945 ms (0.00% GC)\n maximum time: 677.749 ms (0.00% GC)\n --------------\n samples: 8\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### @inline版のgcdを使った場合\n\nほんの少しだけ速くなる."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# https://github.com/JuliaLang/julia/blob/d386e40c17d43b79fc89d3e579fc04547241787c/base/intfuncs.jl#L28-L49\n#\n# binary GCD (aka Stein's) algorithm\n# about 1.7x (2.1x) faster for random Int64s (Int128s)\n@inline function gcd_inline(a::T, b::T) where T<:Union{Int64,UInt64,Int128,UInt128}\n a == 0 && return abs(b)\n b == 0 && return abs(a)\n za = trailing_zeros(a)\n zb = trailing_zeros(b)\n k = min(za, zb)\n u = unsigned(abs(a >> za))\n v = unsigned(abs(b >> zb))\n while u != v\n if u > v\n u, v = v, u\n end\n v -= u\n v >>= trailing_zeros(v)\n end\n r = u << k\n # T(r) would throw InexactError; we want OverflowError instead\n r > typemax(T) && throw(OverflowError())\n r % T\nend\n\nfunction calc_pi_by_gcd_inline(N)\n s = 0\n for a in 1:N\n for b in 1:N\n s += ifelse(gcd_inline(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_inline(N)\n@benchmark calc_pi_by_gcd_inline(N)",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_inline(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 564.928 ms (0.00% GC)\n median time: 567.645 ms (0.00% GC)\n mean time: 571.927 ms (0.00% GC)\n maximum time: 593.906 ms (0.00% GC)\n --------------\n samples: 9\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Euclidの互除法版\n\n非常に遅い."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Euclidean algorithm\n@inline function gcd_euclid(a::T, b::T) where T<:Integer\n while b != 0\n t = b\n b = rem(a, b)\n a = t\n end\n abs(a)\nend\n\nfunction calc_pi_by_gcd_euclid(N)\n s = 0\n for a in 1:N\n for b in 1:N\n s += ifelse(gcd_euclid(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_euclid(N)\n@benchmark calc_pi_by_gcd_euclid(N)",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_euclid(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 1.528 s (0.00% GC)\n median time: 1.531 s (0.00% GC)\n mean time: 1.539 s (0.00% GC)\n maximum time: 1.564 s (0.00% GC)\n --------------\n samples: 4\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### @parallel版"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@show workers();",
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": "workers() = [2, 3, 4, 5, 6, 7, 8, 9]\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function calc_pi_by_gcd_parallel(N)\n s = @parallel (+) for a in 1:N\n s = 0\n for b in 1:N\n s += ifelse(gcd(a,b)==1, 1, 0)\n end\n s\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_parallel(N)\n@benchmark calc_pi_by_gcd_parallel(N)",
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_parallel(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 89.94 KiB\n allocs estimate: 995\n --------------\n minimum time: 261.367 ms (0.00% GC)\n median time: 263.501 ms (0.00% GC)\n mean time: 264.096 ms (0.00% GC)\n maximum time: 268.579 ms (0.00% GC)\n --------------\n samples: 19\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## gcc版との比較\n\n以下の C_code は実質的に\n\nhttps://gist.github.com/Toru3/7c2fbb64b5a46d5e87273696c5d2e594\n\nからのコピー&ペーストです。Toru3さんにはいつもお世話になっています。\n\nhttps://mathtod.online/@tetu/1070894\n\nhttps://mathtod.online/@genkuroki/1073489\n"
},
{
"metadata": {
"code_folding": [],
"trusted": true
},
"cell_type": "code",
"source": "C_code = raw\"\"\"\n//gcc pi_gcd.c -O3 -march=native -lm -fopenmp\n#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ninline int ctz(long x){\n // GCCの組み込み関数\n return __builtin_ctzl(x);\n}\n\ninline int min(int a, int b){\n return a<b ? a : b;\n}\n\nlong gcd(long a, long b){\n if(a==0) return b;\n if(b==0) return a;\n a = labs(a);\n b = labs(b);\n int ta = ctz(a);\n int tb = ctz(b);\n int shift = min(ta,tb);\n a >>= ta;\n b >>= tb;\n if(a<b){\n long t=a;\n a=b;\n b=t;\n }\n do{\n a-=b;\n a >>= ctz(a);\n if(a<b){\n long t=a;\n a=b;\n b=t;\n }\n }while(b!=0);\n return a << shift;\n}\n\ndouble pi_gcd_omp(long n){\n long count=0;\n#pragma omp parallel for reduction(+:count) collapse(2)\n for(long i=1; i<=n; i++){\n for(long j=1; j<=n; j++){\n if(gcd(i,j)==1){\n count++;\n }\n }\n }\n return sqrt(6.0*n*n/count);\n}\n\ndouble pi_gcd(long n){\n long count=0;\n for(long i=1; i<=n; i++){\n for(long j=1; j<=n; j++){\n if(gcd(i,j)==1){\n count++;\n }\n }\n }\n return sqrt(6.0*n*n/count);\n}\n\nint main(int argc, const char* argv[]){\n long n=20000;\n if(argc>=2){\n n=atol(argv[1]);\n }\n printf(\"%.15f\\n\", pi_gcd_omp(n));\n}\n\"\"\"\n\n@everywhere file = \"calc_pi_by_gcd_gcc_gcc\"\nfiledl = file * \".\" * Libdl.dlext\n\nopen(`gcc -Wall -O3 -march=native -lm -fopenmp -xc -shared -o $filedl -`, \"w\") do f\n print(f, C_code)\nend\n\nrun(`ls -l $filedl`)\n\n@everywhere const calc_pi_by_gcd_gcc_lib = file\n@everywhere calc_pi_by_gcd_gcc_gcc(n::Int64) = ccall((:pi_gcd, calc_pi_by_gcd_gcc_lib), Float64, (Int64,), n)\n@everywhere calc_pi_by_gcd_gcc_gcc_omp(n::Int64) = ccall((:pi_gcd_omp, calc_pi_by_gcd_gcc_lib), Float64, (Int64,), n)\n@everywhere gcd_gcc(m::Int64, n::Int64) = ccall((:gcd, calc_pi_by_gcd_gcc_lib), Int64, (Int64, Int64,), m, n)\n\n@show calc_pi_by_gcd_gcc_gcc(N)\n@show calc_pi_by_gcd_gcc_gcc_omp(N);",
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": "-rwxr-xr-x 1 genkuroki genkuroki 120962 Jan 30 15:19 calc_pi_by_gcd_gcc_gcc.dll\ncalc_pi_by_gcd_gcc_gcc(N) = 3.1414825885490334\ncalc_pi_by_gcd_gcc_gcc_omp(N) = 3.1414825885490334\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### gcc non-omp 版"
},
{
"metadata": {
"code_folding": [],
"trusted": true
},
"cell_type": "code",
"source": "@benchmark calc_pi_by_gcd_gcc_gcc(N)",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 451.784 ms (0.00% GC)\n median time: 481.064 ms (0.00% GC)\n mean time: 480.601 ms (0.00% GC)\n maximum time: 512.907 ms (0.00% GC)\n --------------\n samples: 11\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### gcc版のgcdを使ったJulia版"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function calc_pi_by_gcd_gcc_julia(N)\n s = 0\n for a in 1:N\n for b in 1:N\n s += ifelse(gcd_gcc(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_gcc_julia(N)\n@benchmark calc_pi_by_gcd_gcc_julia(N)",
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_gcc_julia(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 456.009 ms (0.00% GC)\n median time: 472.454 ms (0.00% GC)\n mean time: 473.144 ms (0.00% GC)\n maximum time: 496.123 ms (0.00% GC)\n --------------\n samples: 11\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "このようにJulia版で gcd を gcc 版の gcd で置き換えると gcc 版とほぼ同じ速さになる.\n\nJulia版が遅くなっているのは gcd が遅いことが原因である."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### gcc omp 版"
},
{
"metadata": {
"code_folding": [],
"trusted": true
},
"cell_type": "code",
"source": "@benchmark calc_pi_by_gcd_gcc_gcc_omp(N)",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 77.485 ms (0.00% GC)\n median time: 86.072 ms (0.00% GC)\n mean time: 92.076 ms (0.00% GC)\n maximum time: 125.016 ms (0.00% GC)\n --------------\n samples: 55\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### gcc版のgcdと@parallelを使ったJulia版"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function calc_pi_by_gcd_gcc_julia_parallel(N)\n s = @parallel (+) for a in 1:N\n s = 0\n for b in 1:N\n s += ifelse(gcd_gcc(a,b)==1, 1, 0)\n end\n s\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_gcc_julia_parallel(N)\n@benchmark calc_pi_by_gcd_gcc_julia_parallel(N)",
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_gcc_julia_parallel(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 91.39 KiB\n allocs estimate: 1030\n --------------\n minimum time: 236.239 ms (0.00% GC)\n median time: 238.812 ms (0.00% GC)\n mean time: 239.103 ms (0.00% GC)\n maximum time: 244.510 ms (0.00% GC)\n --------------\n samples: 21\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "gcc omp 版が非常に速いことがわかる!"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## n = 20000 での比較"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# gcc版 (omp parallel を使用**しない**場合)\n\n@btime calc_pi_by_gcd_gcc_gcc(20000)",
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": " 12.605 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# julia版\n\n@btime calc_pi_by_gcd(20000)",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"text": " 16.953 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Julia版 (gcd を gcc 版の gcd に置き換えた場合)\n\n@btime calc_pi_by_gcd_gcc_julia(20000)",
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"text": " 13.052 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "gcd を gcc 版 gcd_gcc に置き換えたら、純粋 gcc 版とほぼ同じ速さになった.\n\nJulia版が遅くなっているのは gcd が遅いことが原因である.\n\nJuliaの単純forループが遅いというようなことにはなっていない."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# gcc版 (omp parallel を使用**した**場合)\n\n@btime calc_pi_by_gcd_gcc_gcc_omp(20000)",
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": " 2.141 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# julia版 (@parallel を使った場合)\n\n@btime calc_pi_by_gcd_parallel(20000)",
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": " 6.835 s (1118 allocations: 94.50 KiB)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 17,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# julia版 (gcd を gcc 版に置き換えて、@parallel を使った場合)\n\n@btime calc_pi_by_gcd_gcc_julia_parallel(20000)",
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"text": " 6.109 s (1149 allocations: 96.28 KiB)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 18,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## gcdをUInt64ではなくInt64で計算すると速くなる."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### gcdの定義中のunsignedを削ると速くなる."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# https://github.com/JuliaLang/julia/blob/d386e40c17d43b79fc89d3e579fc04547241787c/base/intfuncs.jl#L28-L49\n#\n# binary GCD (aka Stein's) algorithm\n# about 1.7x (2.1x) faster for random Int64s (Int128s)\n@inline function gcd_inline_nounsigned(a::T, b::T) where T<:Union{Int64,UInt64,Int128,UInt128}\n a == 0 && return abs(b)\n b == 0 && return abs(a)\n za = trailing_zeros(a)\n zb = trailing_zeros(b)\n k = min(za, zb)\n # u = unsigned(abs(a >> za))\n # v = unsigned(abs(b >> zb))\n u = abs(a >> za)\n v = abs(b >> zb)\n while u != v\n if u > v\n u, v = v, u\n end\n v -= u\n v >>= trailing_zeros(v)\n end\n r = u << k\n # T(r) would throw InexactError; we want OverflowError instead\n r > typemax(T) && throw(OverflowError())\n r % T\nend\n\nfunction calc_pi_by_gcd_inline_nounsigned(N)\n s = 0\n for a in 1:N\n for b in 1:N\n s += ifelse(gcd_inline_nounsigned(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_inline_nounsigned(N)\n@benchmark calc_pi_by_gcd_inline_nounsigned(N)",
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_inline_nounsigned(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 453.976 ms (0.00% GC)\n median time: 462.135 ms (0.00% GC)\n mean time: 469.617 ms (0.00% GC)\n maximum time: 502.549 ms (0.00% GC)\n --------------\n samples: 11\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### UInt64を使うと遅くなる."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function calc_pi_by_gcd_inline_nounsigned_UInt64(N)\n s = 0\n for a in UnitRange{UInt64}(1:N)\n for b in UnitRange{UInt64}(1:N)\n s += ifelse(gcd_inline_nounsigned(a,b)==1, 1, 0)\n end\n end\n sqrt(6N^2/s)\nend\n\n@show calc_pi_by_gcd_inline_nounsigned_UInt64(N)\n@benchmark calc_pi_by_gcd_inline_nounsigned_UInt64(N)",
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"text": "calc_pi_by_gcd_inline_nounsigned_UInt64(N) = 3.1414825885490334\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 20,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 16 bytes\n allocs estimate: 1\n --------------\n minimum time: 559.848 ms (0.00% GC)\n median time: 562.741 ms (0.00% GC)\n mean time: 567.686 ms (0.00% GC)\n maximum time: 587.242 ms (0.00% GC)\n --------------\n samples: 9\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### n = 20000 で Int64 版 gcd を試してみる"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# 純粋 Julia 版 (gcd を Int64 で計算する)\n\n@btime calc_pi_by_gcd_inline_nounsigned(20000)",
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"text": " 12.773 s (0 allocations: 0 bytes)\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 21,
"data": {
"text/plain": "3.1415283804654663"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**結論: Julia言語のgcd函数をInt64で計算するように変更すると gcc や Rust と同等の速さになる.**"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Juliaのgcdの2^63での挙動"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# オーバーフローエラー (これは意図通りの挙動)\n\ngcd(2^63,2^63)",
"execution_count": 22,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mOverflowError()\u001b[39m",
"traceback": [
"\u001b[91mOverflowError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mgcd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int64, ::Int64\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\intfuncs.jl:47\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# これはエラーにならず、負の Int64 の値を返す!\n\n@show gcd(2^63,0);",
"execution_count": 23,
"outputs": [
{
"output_type": "stream",
"text": "gcd(2 ^ 63, 0) = -9223372036854775808\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# すべて Int64 で計算するとこうなる.\n\n@show gcd_inline_nounsigned(2^63, 2^63)\n@show gcd_inline_nounsigned(2^63, 0);",
"execution_count": 24,
"outputs": [
{
"output_type": "stream",
"text": "gcd_inline_nounsigned(2 ^ 63, 2 ^ 63) = -9223372036854775808\ngcd_inline_nounsigned(2 ^ 63, 0) = -9223372036854775808\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Uint64(2^63) は InexactError() になる.\n\nUInt64(2^63)",
"execution_count": 25,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mInexactError()\u001b[39m",
"traceback": [
"\u001b[91mInexactError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mUInt64\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int64\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\sysimg.jl:77\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# 2^63 % UInt64 ならばエラーにならない.\n\n@show a = 2^63 % UInt64\n@show gcd(a, a)\n@show gcd(a, 0);",
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": "a = 2 ^ 63 % UInt64 = 0x8000000000000000\ngcd(a, a) = 0x8000000000000000\ngcd(a, 0) = 0x8000000000000000\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Int128 の場合\n\n@show b = Int128(2)^127",
"execution_count": 27,
"outputs": [
{
"output_type": "stream",
"text": "b = Int128(2) ^ 127 = -170141183460469231731687303715884105728\n",
"name": "stdout"
},
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/plain": "-170141183460469231731687303715884105728"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "gcd(b, b)",
"execution_count": 28,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mOverflowError()\u001b[39m",
"traceback": [
"\u001b[91mOverflowError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mgcd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int128, ::Int128\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\intfuncs.jl:47\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "gcd(b, zero(b))",
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 29,
"data": {
"text/plain": "-170141183460469231731687303715884105728"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@show c = b % UInt128\n@show gcd(c, c)\n@show gcd(c, zero(c));",
"execution_count": 30,
"outputs": [
{
"output_type": "stream",
"text": "c = b % UInt128 = 0x80000000000000000000000000000000\ngcd(c, c) = 0x80000000000000000000000000000000\ngcd(c, zero(c)) = 0x80000000000000000000000000000000\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## mygcd 函数のベンチマークテスト"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "@inline function mygcd(a::T, b::T) where T<:Union{Int64,UInt64,Int128,UInt128}\n # When T is Int64 or Int128, then abs(typemin(T)) throws InexactError().\n # We want OverflowError instead.\n a == typemin(T) < 0 && throw(OverflowError())\n b == typemin(T) < 0 && throw(OverflowError())\n a, b = abs(a), abs(b)\n a == 0 && return b\n b == 0 && return a\n za = trailing_zeros(a)\n zb = trailing_zeros(b)\n k = min(za, zb)\n u = a >> za\n v = b >> zb\n while u != v\n if u > v\n u, v = v, u\n end\n v -= u\n v >>= trailing_zeros(v)\n end\n u << k\nend",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 31,
"data": {
"text/plain": "mygcd (generic function with 1 method)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(0,2^63)",
"execution_count": 32,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mOverflowError()\u001b[39m",
"traceback": [
"\u001b[91mOverflowError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mmygcd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int64, ::Int64\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\In[31]:5\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(2^63, 0)",
"execution_count": 33,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mOverflowError()\u001b[39m",
"traceback": [
"\u001b[91mOverflowError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mmygcd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int64, ::Int64\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\In[31]:4\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(2^63+1, 2^63+1)",
"execution_count": 34,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 34,
"data": {
"text/plain": "9223372036854775807"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "a = 2^63 % UInt64",
"execution_count": 35,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 35,
"data": {
"text/plain": "0x8000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(a, zero(a))",
"execution_count": 36,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 36,
"data": {
"text/plain": "0x8000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(zero(a), a)",
"execution_count": 37,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 37,
"data": {
"text/plain": "0x8000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "b = Int128(2)^127",
"execution_count": 38,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 38,
"data": {
"text/plain": "-170141183460469231731687303715884105728"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(b, b)",
"execution_count": 39,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mOverflowError()\u001b[39m",
"traceback": [
"\u001b[91mOverflowError()\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1mmygcd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Int128, ::Int128\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\In[31]:4\u001b[22m\u001b[22m",
" [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(b, zero(0))",
"execution_count": 52,
"outputs": [
{
"output_type": "error",
"ename": "LoadError",
"evalue": "\u001b[91mMethodError: no method matching mygcd(::Array{Int64,1}, ::Int64)\u001b[0m\nClosest candidates are:\n mygcd(\u001b[91m::T<:Union{Int128, Int64, UInt128, UInt64}\u001b[39m, ::T<:Union{Int128, Int64, UInt128, UInt64}) where T<:Union{Int128, Int64, UInt128, UInt64} at In[31]:4\u001b[39m",
"traceback": [
"\u001b[91mMethodError: no method matching mygcd(::Array{Int64,1}, ::Int64)\u001b[0m\nClosest candidates are:\n mygcd(\u001b[91m::T<:Union{Int128, Int64, UInt128, UInt64}\u001b[39m, ::T<:Union{Int128, Int64, UInt128, UInt64}) where T<:Union{Int128, Int64, UInt128, UInt64} at In[31]:4\u001b[39m",
"",
"Stacktrace:",
" [1] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m.\\loading.jl:522\u001b[22m\u001b[22m"
]
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "c = b % UInt128",
"execution_count": 41,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 41,
"data": {
"text/plain": "0x80000000000000000000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(c, c)",
"execution_count": 42,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 42,
"data": {
"text/plain": "0x80000000000000000000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "mygcd(c, zero(c))",
"execution_count": 43,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 43,
"data": {
"text/plain": "0x80000000000000000000000000000000"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "srand(2018)\na = rand(Int64, 10^6)\nb = rand(Int64, 10^6);",
"execution_count": 44,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_gcd = @benchmark gcd.(a,b)",
"execution_count": 45,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 45,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 7.63 MiB\n allocs estimate: 28\n --------------\n minimum time: 188.958 ms (0.00% GC)\n median time: 191.370 ms (0.00% GC)\n mean time: 196.693 ms (1.86% GC)\n maximum time: 271.267 ms (26.56% GC)\n --------------\n samples: 26\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_mygcd = @benchmark mygcd.(a,b)",
"execution_count": 46,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 46,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 7.63 MiB\n allocs estimate: 28\n --------------\n minimum time: 134.023 ms (0.00% GC)\n median time: 139.579 ms (0.00% GC)\n mean time: 141.028 ms (1.84% GC)\n maximum time: 196.875 ms (31.47% GC)\n --------------\n samples: 36\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_gcd.times[1]/bm_mygcd.times[1]",
"execution_count": 47,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 47,
"data": {
"text/plain": "1.4098880519825778"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "srand(2018)\nc = rand(Int128, 10^6)\nd = rand(Int128, 10^6);",
"execution_count": 48,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_gcd128 = @benchmark gcd.(c,d)",
"execution_count": 49,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 49,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 15.26 MiB\n allocs estimate: 28\n --------------\n minimum time: 828.751 ms (0.00% GC)\n median time: 835.910 ms (0.03% GC)\n mean time: 844.289 ms (1.39% GC)\n maximum time: 890.303 ms (6.87% GC)\n --------------\n samples: 6\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_mygcd128 = @benchmark mygcd.(c,d)",
"execution_count": 50,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 50,
"data": {
"text/plain": "BenchmarkTools.Trial: \n memory estimate: 15.26 MiB\n allocs estimate: 28\n --------------\n minimum time: 792.539 ms (0.00% GC)\n median time: 797.836 ms (0.00% GC)\n mean time: 805.527 ms (1.28% GC)\n maximum time: 856.856 ms (7.33% GC)\n --------------\n samples: 7\n evals/sample: 1"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bm_gcd128.times[1]/bm_mygcd128.times[1]",
"execution_count": 51,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 51,
"data": {
"text/plain": "1.04569075745307"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"_draft": {
"nbviewer_url": "https://gist.github.com/13d644a2cf9210ce68f25da94a3fabbd"
},
"gist": {
"id": "13d644a2cf9210ce68f25da94a3fabbd",
"data": {
"description": "calc pi by gcd",
"public": true
}
},
"kernelspec": {
"name": "julia-0.6-pauto",
"display_name": "Julia 0.6.2 procs auto",
"language": "julia"
},
"language_info": {
"file_extension": ".jl",
"name": "julia",
"mimetype": "application/julia",
"version": "0.6.2"
},
"toc": {
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment