Skip to content

Instantly share code, notes, and snippets.

@tobydriscoll
Created September 30, 2016 19:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tobydriscoll/7449b8d30704663835214af91d2b4d90 to your computer and use it in GitHub Desktop.
Save tobydriscoll/7449b8d30704663835214af91d2b4d90 to your computer and use it in GitHub Desktop.
TB Lecture 13
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lecture 13: Floating point arithmetic\n",
"\n",
"In order to complete computations in finite space and bounded time, we replace the real numbers with a surrogate finite set $\\mathbb{F}$, the floating point numbers. (The \"floating point\" term originally differentiated it from \"fixed point\", which was an early alternative system based on absolute errors rather than relative errors.) Most scientific computing now conforms to the IEEE 754 double precision standard. We won't need to think about the details of this standard going forward, but it is useful to briefly explore what is really going on in the computer. \n",
"\n",
"In double precision there are 64 binary bits used to represent the members of $\\mathbb{F}$. We can get them directly."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"0011111111110000000000000000000000000000000000000000000000000000\""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"one = bits(1.0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These bits define three integers $s$, $e$, and $m$ used in the representation\n",
"\n",
"$$ x = (-1)^s \\cdot \\left( 1 + 2^{-52}m \\right) \\cdot 2^e.$$\n",
"\n",
"Here $s\\in\\{0,1\\}$ requires one bit, $e\\in\\{-1022,\\ldots,1023\\}$ requires 11 bits, and $m\\in\\{0,1,\\ldots,2^{52}-1\\}$ requires 52 bits. We can dissect a double precision number to see these parts."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"((\"0\",0),(\"01111111111\",0),(\"0000000000000000000000000000000000000000000000000000\",0))"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function ieee(x::Float64)\n",
" b = bits(x);\n",
" s = (b[1:1],parse(Int,b[1:1]));\n",
" e = (b[2:12],parse(Int,b[2:12],2)-1023);\n",
" f = (b[13:64],parse(Int,b[13:64],2));\n",
" return s,e,f\n",
"end\n",
"\n",
"ieee(1.0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next-greater element of $\\mathbb{F}$ is $1+\\epsilon_M$, for machine epsilon $\\epsilon_M=2^{-52}$. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"eps() = 2.220446049250313e-16\n"
]
},
{
"data": {
"text/plain": [
"((\"0\",0),(\"01111111111\",0),(\"0000000000000000000000000000000000000000000000000001\",1))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@show eps()\n",
"ieee(1.0+eps())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are $2^{52}$ elements of $\\mathbb{F}$ equally spaced throughout $[1,2)$. After these, the exponent increases to 1 and the value of $m$ resets to zero. Thus there are also $2^{52}$ elements equally spaced throughout $[2,4)$, as well as $[4,8)$, $[1/2,1)$, and in general, $[2^{e},2^{e+1})$. Let $\\text{fl}(x)$ be the element of $\\mathbb{F}$ nearest to any real number $x$. Consequently, if we momentarily ignore the bounds on the exponent $e$,\n",
"\n",
"$$ \\frac{\\bigl|x-\\text{fl}(x)\\bigr|}{\\bigl|x\\bigr|} \\le \\frac{1}{2} \\epsilon_M.$$\n",
"\n",
"In practice, the largest finite value in $\\mathbb{F}$ is"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R = 2.0 ^ 1023 * (1 + (2 ^ 52 - 1) / 2 ^ 52) = 1.7976931348623157e308\n"
]
},
{
"data": {
"text/plain": [
"((\"0\",0),(\"11111111110\",1023),(\"1111111111111111111111111111111111111111111111111111\",4503599627370495))"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@show R = (2.0^1023)*(1 + (2^52-1)/2^52);\n",
"ieee(R)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Results that should be larger than this become the special value Inf; this situation is called _overflow_."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Inf"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nextfloat(R)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2.2250738585072014e-308"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2.0^(-1022)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The analogous situation near zero is called _underflow_. Ignoring some special \"denormalized\" numbers, the smallest positive element of $\\mathbb{F}$ is "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"r = 2.0 ^ -1022 = 2.2250738585072014e-308\n"
]
},
{
"data": {
"text/plain": [
"((\"0\",0),(\"00000000001\",-1022),(\"0000000000000000000000000000000000000000000000000000\",0))"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@show r = 2.0^-1022;\n",
"ieee(r)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that this minimum value is far smaller than $\\epsilon_M$, which is the number spacing relative to 1. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 0.5.0",
"language": "julia",
"name": "julia-0.5"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "0.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment