Skip to content

Instantly share code, notes, and snippets.

@simonbyrne
Created April 16, 2015 09:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonbyrne/2a7a3bdefeb75d2d60ac to your computer and use it in GitHub Desktop.
Save simonbyrne/2a7a3bdefeb75d2d60ac to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Inside Julia\n",
"\n",
"### Simon Byrne\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# About Me\n",
"\n",
"* I'm a researcher in statistics at University College London.\n",
"* Focus on a class of algorithms known as Markov chain Monte Carlo (MCMC).\n",
" * Largely theoretical analysis of computational techniques.\n",
" * but occasionally I need to write some code.\n",
" * Sequential algorithms: can't be vectorised\n",
"\n",
"# Why Julia?\n",
"\n",
"* Quick to write: easy to try out ideas.\n",
"* Reasonably performant, but representative of the speed of the algorithm.\n",
"* Easy to understand: if I need to look at the code 6 months later.\n",
"* Able to peek under the hood.\n",
"\n",
"# How I met Julia\n",
"\n",
"> Just to never write another .mex file, I'll very seriously consider Julia for new projects\n",
"> \n",
"> &ndash; <cite>Justin Domke (September 2012)</cite>\n",
"\n",
"I had Matlab code full of:\n",
"```\n",
"R = chol(A) \n",
"y = R \\ ( R' \\ x)\n",
"```\n",
"\n",
"I really liked being able to write Julia code:\n",
"```\n",
"C = cholfact(A) \n",
"y = C \\ x\n",
"```\n",
"(and it was faster).\n",
"\n",
"I filed my first pull request 3 weeks later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# My Contributions\n",
"\n",
"* Base Julia: mostly numeric related\n",
" * Numeric comparisons\n",
" * Linear algebra\n",
" * Various mathematical functions\n",
" \n",
"* Stats\n",
" * StatsBase.jl: basic statistics functionality\n",
" * Distributions.jl: probability distributions\n",
" * KernelDensity.jl: kernel density estimators\n",
"* Other\n",
" * AppleAccelerate.jl: an interface to Apple's Accelerate library\n",
" * InplaceOps.jl: avoids unnecessary allocation of arrays\n",
" * DoubleDouble.jl: extended precision arithmetic without BigFloats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Numeric comparisons\n",
"or, why does:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"0.1 > 1//10"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.100000000000000005551115123125782702118158340454101562500000"
]
}
],
"source": [
"@printf \"%.60f\" 0.1"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.30000000000000004"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"0.1+0.2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Few other languages let you compare numbers of different types.\n",
"\n",
"The rough idea:\n",
"* every non-NaN `Real` value corresponds to exactly 1 real number, or an infinity.\n",
"* all non-NaN `Real` values can be compared correctly."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Float vs Integer comparisons"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"function <(x::Float64, y::Int64)\n",
" fy = Float64(y)\n",
" (x < fy) | ((x == fy) & (fy < Float64(typemax(Int64))) & (unsafe_trunc(Int64,fy) < y))\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why is this so complicated?\n",
"\n",
"Neither type is a subset of the other.\n",
"\n",
"* `Float64` can only represent all integers up to 2<sup>53</sup>.\n",
"* Any integers above this may be rounded when converted."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"9007199254740993"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2^53+1"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9007199254740992"
]
}
],
"source": [
"@printf \"%i\" Float64(2^53+1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So the rough idea is:\n",
"1. Convert to `Float64`, and compare the values\n",
"2. If they are not equal, then the comparison is correct.\n",
"3. If they are equal, then `y` must be an integer, so can be compared as integers\n",
" * `unsafe_trunc` is fast: it doesn't throw exceptions (same as a C cast).\n",
"4. Extra check is for integers that can't be round-tripped (ones that get rounded to 2<sup>63</sup>)."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"9223372036854775807"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = typemax(Int64) # 2^63-1"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9223372036854775808"
]
}
],
"source": [
"fy = Float64(y)\n",
"@printf \"%i\" fy # rounded to 2^64"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "LoadError",
"evalue": "LoadError: InexactError()\nwhile loading In[32], in expression starting on line 1",
"output_type": "error",
"traceback": [
"LoadError: InexactError()\nwhile loading In[32], in expression starting on line 1",
"",
" in call at base.jl:36"
]
}
],
"source": [
"Int64(fy)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"-9223372036854775808"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unsafe_trunc(Int64,fy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## So what happens to your functions?\n",
"\n",
"You can see the underlying Julia code using `@less`. But what happens after?\n",
"\n",
"### 0. Parsing\n",
"* Syntax is parsed (`parse` function): you can also see what this looks like by using `quote`, and playing around with the `Expr` object.\n",
"* Macros are expanded: try out `macroexpand` on a quoted macro.\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
":(1.0 < 2)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e = parse(\"1.0 < 2\")"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
":comparison"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e.head"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"3-element Array{Any,1}:\n",
" 1.0\n",
" :<\n",
" 2 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e.args"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"quote \n",
" #308#out = Base.Printf.STDOUT\n",
" #309###x#2686 = 0.1\n",
" local #314#neg, #313#pt, #312#len, #307#exp, #310#do_out, #311#args\n",
" if Base.Printf.isfinite(#309###x#2686)\n",
" (#310#do_out,#311#args) = Base.Printf.fix_dec(#308#out,#309###x#2686,\"\",0,60,'f')\n",
" if #310#do_out\n",
" (#312#len,#313#pt,#314#neg) = #311#args\n",
" #314#neg && Base.Printf.write(#308#out,'-')\n",
" Base.Printf.print_fixed(#308#out,60,#313#pt,#312#len)\n",
" end\n",
" else \n",
" Base.Printf.write(#308#out,begin # printf.jl, line 143:\n",
" if Base.Printf.isnan(#309###x#2686)\n",
" \"NaN\"\n",
" else \n",
" if (#309###x#2686 Base.Printf.< 0)\n",
" \"-Inf\"\n",
" else \n",
" \"Inf\"\n",
" end\n",
" end\n",
" end)\n",
" end\n",
" Base.Printf.nothing\n",
"end"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"macroexpand(:(@printf \"%.60f\" 0.1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. `@code_lowered` \n",
"This is a proto-Julia syntax: a couple of things have been done:\n",
"* Syntax desugaring, e.g. \n",
" * expand `a+=b` to `a=a+b`\n",
" * substitute `a'*b` with `Ac_mul_B(a,b)`\n",
"* Control flow are replaced by simple \"goto\"-style statements.\n",
"* Resolve scope of functions"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Any,1}:\n",
" :($(Expr(:lambda, Any[:x,:y], Any[Any[:fy],Any[Any[:x,:Any,0],Any[:y,:Any,0],Any[:fy,:Any,18]],Any[],0], :(begin # In[3], line 2:\n",
" fy = Float64(y)::ANY # line 3:\n",
" return (x < fy)::ANY | ((x == fy)::ANY & (fy < Float64(typemax(Int64)::ANY)::ANY)::ANY)::ANY & (unsafe_trunc(Int64,fy)::ANY < y)::ANY::ANY::ANY\n",
" end::ANY))))"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@code_lowered 1.0 < 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. `@code_typed`\n",
"\n",
"The Julia magic happens\n",
"* type inference\n",
"* method sorting\n",
"* inlining, and other optimisations"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Any,1}:\n",
" :($(Expr(:lambda, Any[:x,:y], Any[Any[:fy],Any[Any[:x,Float64,0],Any[:y,Int64,0],Any[:fy,Float64,18]],Any[],Any[]], :(begin # In[3], line 2:\n",
" fy = (top(box))(Float64,(top(sitofp))(Float64,y::Int64)) # line 3:\n",
" return (top(box))(Bool,(top(or_int))((top(lt_float))(x::Float64,fy::Float64)::Bool,(top(box))(Bool,(top(and_int))((top(box))(Bool,(top(and_int))((top(eq_float))(x::Float64,fy::Float64)::Bool,(top(lt_float))(fy::Float64,(top(box))(Float64,(top(sitofp))(Float64,9223372036854775807)))::Bool)),(top(slt_int))((top(box))(Int64,(top(fptosi))(Int64,fy::Float64)),y::Int64)::Bool))))\n",
" end::Bool))))"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@code_typed 1.0 < 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. `@code_llvm`\n",
"\n",
"`@code_llvm` gives the LLVM IR (intermediate representation). Think of it as a cross-platform, assembly-like language.\n",
"\n",
"The code is shown after various optimsation passes (constant propagation, dead code elimination, etc.).\n",
"\n",
"If you want to see what each pass does, you can start `julia-debug` with `JULIA_LLVM_ARGS = -print-after-all`. Beware, this will print a *LOT* of text."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define i1 @\"julia_<_43379\"(double, i64) {\n",
"top:\n",
" %2 = sitofp i64 %1 to double, !dbg !3108\n",
" %3 = fcmp ogt double %2, %0, !dbg !3110\n",
" %4 = fcmp oeq double %2, %0, !dbg !3110\n",
" %5 = fcmp olt double %2, 0x43E0000000000000, !dbg !3110\n",
" %6 = and i1 %4, %5, !dbg !3110\n",
" %7 = fptosi double %2 to i64, !dbg !3110\n",
" %8 = icmp slt i64 %7, %1, !dbg !3110\n",
" %9 = and i1 %6, %8, !dbg !3110\n",
" %10 = or i1 %3, %9, !dbg !3110\n",
" ret i1 %10, !dbg !3110\n",
"}\n"
]
}
],
"source": [
"@code_llvm 1.0 < 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. `@code_native`\n",
"Further platform-dependent optimisations, generate the assembly instructions"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\t.section\t__TEXT,__text,regular,pure_instructions\n",
"Filename: In[3]\n",
"Source line: 2\n",
"\tpushq\t%rbp\n",
"\tmovq\t%rsp, %rbp\n",
"Source line: 2\n",
"\tvcvtsi2sdq\t%rdi, %xmm0, %xmm1\n",
"\tvucomisd\t%xmm0, %xmm1\n",
"Source line: 3\n",
"\tseta\t%cl\n",
"\tsetnp\t%dl\n",
"\tsete\t%al\n",
"\tandb\t%dl, %al\n",
"\tmovabsq\t$13336766176, %rdx ## imm = 0x31AEEE6E0\n",
"\tvmovsd\t(%rdx), %xmm0\n",
"\tvucomisd\t%xmm1, %xmm0\n",
"\tseta\t%dl\n",
"\tandb\t%al, %dl\n",
"\tvcvttsd2si\t%xmm1, %rax\n",
"\tcmpq\t%rdi, %rax\n",
"\tsetl\t%al\n",
"\tandb\t%dl, %al\n",
"\torb\t%cl, %al\n",
"\tpopq\t%rbp\n",
"\tret\n"
]
}
],
"source": [
"@code_native 1.0 < 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Does this mean that comparing integers and floats is computationally expensive?\n",
"\n",
"Not always."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"isneg (generic function with 1 method)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isneg(x) = x < 0"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Any,1}:\n",
" :($(Expr(:lambda, Any[:x], Any[Any[],Any[Any[:x,:Any,0]],Any[],0], :(begin # In[42], line 1:\n",
" return x < 0::ANY\n",
" end::ANY))))"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@code_lowered isneg(1.0)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define i1 @julia_isneg_43442(float) {\n",
"top:\n",
" %1 = fcmp olt float %0, 0.000000e+00, !dbg !3138\n",
" ret i1 %1, !dbg !3138\n",
"}\n"
]
}
],
"source": [
"@code_llvm isneg(1f0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Float vs MathConst comparisons"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"π = 3.1415926535897..."
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pi"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"MathConst{:π}"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"typeof(pi)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"3.141592653589793238462643383279502884197169399375105820974944592307816406286198e+00 with 256 bits of precision"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"big(pi)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"true"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = Float64(pi)\n",
"x < pi < nextfloat(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"<(x::MathConst, y::Float64) = Float64(x,RoundUp) <= y\n",
"<(x::Float64, y::MathConst) = x <= Float64(y,RoundDown)\n",
"\n",
"# this is called by Float64(c,r)\n",
"stagedfunction call{T<:Union(Float32,Float64),s}(t::Type{T},c::MathConst{s},r::RoundingMode)\n",
" f = T(big(c()),r())\n",
" :($f)\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## So what is a `stagedfunction`?\n",
"\n",
"> `magicfunction`\n",
">\n",
"> Defines a magical function that shares similarities with warp drives\n",
"and matter replicators. Also, the arguments are the types, not values.\n",
">\n",
"> No it is not a macro.\n",
">\n",
"> Also, we don't use words like \"homoiconic\" in this household young man.\n",
">\n",
"> &ndash; <cite>Iain Dunning</cite>\n",
"\n",
"A `stagedfunction` consists of 2 stages:\n",
"* at compile time (i.e. when first called), it is evaluated with its arguments as types, and returning an expression.\n",
"* this is expression becomes the function for those types."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Any,1}:\n",
" :($(Expr(:lambda, Any[:(t::Any::ANY),:(c::Any::ANY),:(r::Any::ANY)], Any[Any[],Any[Any[:t,Type{Float64},0],Any[:c,MathConst{:π},0],Any[:r,Base.Rounding.RoundingMode{:Down},0]],Any[],Any[]], :(begin \n",
" return 3.141592653589793\n",
" end::Float64))))"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@code_typed Float64(pi,RoundDown)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1-element Array{Any,1}:\n",
" :($(Expr(:lambda, Any[:(t::Any::ANY),:(c::Any::ANY),:(r::Any::ANY)], Any[Any[],Any[Any[:t,Type{Float64},0],Any[:c,MathConst{:π},0],Any[:r,Base.Rounding.RoundingMode{:Up},0]],Any[],Any[]], :(begin \n",
" return 3.1415926535897936\n",
" end::Float64))))"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@code_typed Float64(pi,RoundUp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Particularly useful for:\n",
"* When functions behave slighlty differently for different parametric types: can avoid lots of \n",
"```\n",
"for T in (:Type1,:Type2)\n",
" @eval foo(x::$T) = ...\n",
"end\n",
"```\n",
"* Variable agurments, e.g. multidimensional indexing:\n",
"```\n",
"getindex(A::MyArray, inds...)\n",
"```\n",
"* Used in Keno's [Cxx.jl](https://github.com/Keno/Cxx.jl): allows calling C++ straight from Julia (no wrappers, no precompilation)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Special functions\n",
"\n",
"Do you know what your `sin` function does?\n",
"\n",
"* Recent libraries on modern CPUs (support SSE instructions, i.e. the last 5 years or so) don't use the x87 assembly routines (e.g. FSIN)\n",
"\n",
"In [openlibm](https://github.com/JuliaLang/openlibm) (the default math library used by Julia), there are two steps:\n",
"\n",
"1. Range reduction: find an integer $n$ and $y \\in [-\\frac{\\pi}{4},\\frac{\\pi}{4}]$ such that\n",
" $$ x = n \\frac{\\pi}{2} + y $$\n",
" * this is tricky (as $\\pi$ does not correspond to a floating point number):\n",
" - Cody&ndash;Waite for small values\n",
" - Payne&ndash;Hanek for large values\n",
" \n",
" * Calculate $y$ with extra precision: typically stored as a pair of floating point numbers `y_hi + y_lo`\n",
" \n",
"2. Based on $n$, compute the appropriately signed `sin`/`cos` of $y$.\n",
" * a minimax polynomial approximation (similar, but distinct from, a Taylor's series truncation).\n",
" * some other libraries incorporate a further range reduction (e.g. $[-\\pi/128,\\pi/128]$), combined with a table-lookup and a shorter polynomial."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However [a](http://stackoverflow.com/questions/14429902/abssinpi-for-mitigating-systematic-rounding-error) [LOT](http://stackoverflow.com/questions/2670999/how-can-i-work-around-the-fact-that-in-c-sinm-pi-is-not-0) [of](http://stackoverflow.com/questions/22439827/why-is-sinpi-4-different-from-cospi-4-in-python) [people](http://stackoverflow.com/questions/7561698/javascripts-geometrical-methods-zero-isnt-exactly-zero-is-it) [want](http://stackoverflow.com/questions/6566512/value-of-sine-180-is-coming-out-as-1-22465e-16) [to](http://stackoverflow.com/questions/27643267/storing-cos-and-sin-into-a-vector-and-getting-weird-values) [calculate](http://stackoverflow.com/questions/9780057/sin-function-not-giving-correct-answer)\n",
"\n",
"$$ \\sin \\frac{n \\pi}{N}$$\n",
"\n",
"for $n = 0,1,\\ldots,N$, e.g.\n",
"* Simple discretisations of the circle,\n",
"* Fast Fourier transforms (FFTs)\n",
"* Bessel functions, complex numbers, etc.\n",
"\n",
"Although technically correct (according to the IEEE 754 floating point standard), it does seem silly to give answers like:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1.2246467991473532e-16"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sin(10*pi/10)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.49999999999999994"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sin(pi/6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But we can do better&hellip;"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sinpi(10/10)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.5"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sinpi(1/6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This function is basically\n",
"1. A range reduction modulo 1/2\n",
" * *can* be done exactly using a standard `rem` operation (or even faster using some other tricks).\n",
"2. Multiply by $\\pi$\n",
" * Do this using \"double-double\" arithmetic to get `y_hi + y_lo`\n",
"3. Call the \"kernel\" `sin`/`cos`\n",
"\n",
"\n",
"* More accurate: to within 1 unit in the last place,\n",
"* Is actually *faster* then `sin(pi*x)` (we avoid the expensive range-reduction),\n",
"* Is implemented *all* in native Julia."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Base.Math.DoubleFloat64(0.5235987755982988,2.835264368075643e-17)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = Base.Math.mulpi_ext(1/6)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.5"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Base.Math.sin_kernel(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mathematical functions\n",
"\n",
"Slowly, a lot of the mathematical functions are being implented in Julia.\n",
"\n",
"* `reinterpret` makes it easy to implement \"bit-twiddling\" functions\n",
" * e.g. `significand`, `exponent`, `frexp`.\n",
" * much clearer than type-punning in C"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1.2"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# find y = x * 2^k ∈ [1,2)\n",
"reinterpret(Float64,\n",
" (reinterpret(UInt64, 0.3) & 0x800f_ffff_ffff_ffff)| 0x3ff0000000000000\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `@label`/`@goto` makes it very easy to translate Fortran code\n",
" * See Mike Nolta's [SpecialFunctions.jl](https://github.com/nolta/SpecialFunctions.jl): a translation of openspecfun (AMOS and Fadeeva)\n",
"\n",
"Will hopefully allow users to easily understand what their code is doing (and how to improve it)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lower-level functions\n",
"\n",
"`llvmcall` allows you to call LLVM IR directly. This allows you to directly call LLVM intrinsics without C code."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"getmxcsr (generic function with 1 method)"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# accesses the SSE floating point control status register (mxcsr)\n",
"\n",
"function getmxcsr()\n",
" Base.llvmcall(\"\"\"\n",
" %ptr = alloca i32\n",
" call void @llvm.x86.sse.stmxcsr(i32 * %ptr)\n",
" %curval = load i32 * %ptr\n",
" ret i32 %curval\"\"\", UInt32, ())\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0x00001fa1"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"getmxcsr()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Flags: 100001\n",
"Den = 0: 0\n",
"Masks: 111111\n",
"Rounding: 00\n",
"Flush den: 0\n"
]
}
],
"source": [
"function decmx(u::UInt32)\n",
" b = bits(u)\n",
" println(\"Flags: \",b[27:32])\n",
" println(\"Den = 0: \",b[26])\n",
" println(\"Masks: \",b[20:25])\n",
" println(\"Rounding: \",b[18:19])\n",
" println(\"Flush den: \",b[17])\n",
"end\n",
"decmx(getmxcsr())"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Inf"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 1.0 / 0.0"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Flags: 100101\n",
"Den = 0: 0\n",
"Masks: 111111\n",
"Rounding: 00\n",
"Flush den: 0\n"
]
}
],
"source": [
"decmx(getmxcsr())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"function setmxcsr(u::UInt32)\n",
" Base.llvmcall(\"\"\"%ptr = alloca i32\n",
" store i32 %0, i32 * %ptr\n",
" call void @llvm.x86.sse.ldmxcsr(i32 * %ptr)\n",
" ret void\"\"\", Void, (UInt32,),u)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"FE_MASK_DIVBYZERO = 0b0_0_00_000100_0_000000\n",
"u = getmxcsr()\n",
"setmxcsr(u & ~FE_MASK_DIVBYZERO)\n",
"decmx(getmxcsr())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"1.0/0.0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"setmxcsr(u)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In future, I'm hoping to be able to use this to allow for a sane approach to floating point exceptions.\n",
" * Ideally, integrate this into the future debugger."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Even lower\n",
"\n",
"We can also use this for inline assembly (requires LLVM 3.4+)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"asm_pi (generic function with 1 method)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function asm_pi()\n",
" Base.llvmcall(\"\"\"\n",
" %pi = call double asm \"fldpi\", \"={st},~{dirflag},~{fpsr},~{flags}\"() #0\n",
" ret double %pi\"\"\",\n",
" Float64, ())\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"3.141592653589793"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"asm_pi()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n",
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n",
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n",
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n",
"error: failed to compute relocation: X86_64_RELOC_UNSIGNED\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\t.section\t__TEXT,__text,regular,pure_instructions\n",
"Filename: In[21]\n",
"Source line: 0\n",
"\tpushq\t%rbp\n",
"\tmovq\t%rsp, %rbp\n",
"Source line: 2\n",
"\tfldpi\n",
"\tfstpl\t-8(%rbp)\n",
"\tvmovsd\t-8(%rbp), %xmm0\n",
"\tpopq\t%rbp\n",
"\tretq\n"
]
}
],
"source": [
"@code_native asm_pi()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Other cool stuff in 0.4: meta tags\n",
"\n",
"These are \"extra\" instructions to control various optimisations.\n",
"\n",
"## `@inbounds`\n",
"\n",
"This disables boundary checking of array access and assignment (i.e. standard C behaviour)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"second_element (generic function with 1 method)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_element(x) = x[2]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define i64 @julia_second_element_43258(%jl_value_t*) {\n",
"top:\n",
" %1 = getelementptr inbounds %jl_value_t* %0, i64 2, !dbg !2566\n",
" %2 = bitcast %jl_value_t* %1 to i64*, !dbg !2566\n",
" %3 = load i64* %2, align 8, !dbg !2566, !tbaa %jtbaa_arraylen\n",
" %4 = icmp ugt i64 %3, 1, !dbg !2566\n",
" br i1 %4, label %idxend, label %oob, !dbg !2566\n",
"\n",
"oob: ; preds = %top\n",
" %5 = alloca i64, align 8, !dbg !2566\n",
" store i64 2, i64* %5, align 8, !dbg !2566\n",
" call void @jl_bounds_error_ints(%jl_value_t* %0, i64* %5, i64 1), !dbg !2566\n",
" unreachable, !dbg !2566\n",
"\n",
"idxend: ; preds = %top\n",
" %6 = getelementptr inbounds %jl_value_t* %0, i64 1, !dbg !2566\n",
" %7 = bitcast %jl_value_t* %6 to i8**, !dbg !2566\n",
" %8 = load i8** %7, align 8, !dbg !2566, !tbaa %jtbaa_arrayptr\n",
" %9 = getelementptr i8* %8, i64 8, !dbg !2566\n",
" %10 = bitcast i8* %9 to i64*, !dbg !2566\n",
" %11 = load i64* %10, align 8, !dbg !2566, !tbaa %jtbaa_user\n",
" ret i64 %11, !dbg !2566\n",
"}\n"
]
}
],
"source": [
"@code_llvm second_element([1,2,3])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"second_element (generic function with 1 method)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_element(x) = @inbounds x[2]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define %jl_value_t* @julia_second_element_43259(%jl_value_t*, %jl_value_t**, i32) {\n",
"top:\n",
" ret %jl_value_t* inttoptr (i64 4371070976 to %jl_value_t*), !dbg !2570\n",
"}\n"
]
}
],
"source": [
"@code_llvm second_element([1,2,3])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"\n",
"## `@inline`\n",
"\n",
"Function inlining replaces a function call with the actual code from the function. Julia will automatically inline certain functions based on certain rules. \n",
"\n",
"Some additional control is provided by:\n",
"\n",
"* `@inline` macro marks functions at declaration which should be inlined (similar to C `inline` statement).\n",
"\n",
"* `@noinline` marks functions which should NOT be inlined.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `@fastmath`\n",
"\n",
"This enables certain floating point optimisations that aren't strictly valid according to the IEEE-754 floating point standard:\n",
"* allows reordering of instructions\n",
"* allows substitution of fused multiply-add (FMA) instructions\n",
"* assumes all values are finite (no Inf or NaNs)\n",
"* treats signed zeros are the the same.\n",
"* allows reciprocal approximations to division. \n",
"\n",
"Similar to the C `-ffast-math` compiler flag, but applied only locally."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define double @julia_third_43558(double) {\n",
"top:\n",
" %1 = fdiv double %0, 3.000000e+00, !dbg !3506\n",
" ret double %1, !dbg !3506\n",
"}\n"
]
}
],
"source": [
"third(x) = x/3\n",
"@code_llvm third(1.0)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"define double @julia_third_43559(double) {\n",
"top:\n",
" %1 = fmul double %0, 0x3FD5555555555555, !dbg !3510\n",
" ret double %1, !dbg !3510\n",
"}\n"
]
}
],
"source": [
"third(x) = @fastmath x/3\n",
"@code_llvm third(1.0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"In future could be used for faster, but less accurate, special functions (`sin`, `log`, etc.)."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## `@simd`\n",
"\n",
"Allows for limited reordering of loops so as to be able to take advantage of SIMD (single instruction, multiple data) operations on modern processors. \n",
"\n",
"* Particularly useful for `Int32`/`Float32` operands, though there is typically still some speedup on `Float64` as well."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Conclusions\n",
"\n",
"* `@less` is your friend.\n",
"* `@code_*` can tell you lots of useful stuff.\n",
"* Meta tags are going to allow more power, and avoid the need for compiler flags"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 0.4.0-dev",
"language": "julia",
"name": "julia 0.4"
},
"language_info": {
"name": "julia",
"version": "0.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment