Created
May 11, 2020 21:39
-
-
Save jorisvandenbossche/ddc824d4f5fc4e4efb575c3bd59ea201 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Floating point data type with NaN and NA/NULL" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Some exploration of how NaN is handles in different languages and frameworks:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Julia" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Julia has both as separate concepts for float arrays using the `Missing` type (https://docs.julialang.org/en/v1/manual/missing/):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Float64},1}:\n", | |
" 0.0 \n", | |
" NaN \n", | |
" missing" | |
] | |
}, | |
"execution_count": 1, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr = [0.0, NaN, missing]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element BitArray{1}:\n", | |
" false\n", | |
" false\n", | |
" true" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ismissing.(arr)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Bool},1}:\n", | |
" false \n", | |
" true \n", | |
" missing" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"isnan.(arr)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"NaN can be created by division of 0 by 0:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Float64},1}:\n", | |
" NaN \n", | |
" NaN \n", | |
" missing" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr ./ 0.0" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Julia follows IEEE 754 when it comes to element-wise comparisons (always false, except for `!=`):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Bool},1}:\n", | |
" false \n", | |
" false \n", | |
" missing" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr .> 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Bool},1}:\n", | |
" true \n", | |
" false \n", | |
" missing" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr .< 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Bool},1}:\n", | |
" false \n", | |
" false \n", | |
" missing" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr .== NaN" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"3-element Array{Union{Missing, Bool},1}:\n", | |
" true \n", | |
" true \n", | |
" missing" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"arr .!= NaN" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But for sorting, it deviates from those rules to provide deterministic results (NaNs always last, but before missing):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"8-element Array{Union{Missing, Float64},1}:\n", | |
" 3.5 \n", | |
" 4.0 \n", | |
" 5.0 \n", | |
" NaN \n", | |
" NaN \n", | |
" NaN \n", | |
" missing\n", | |
" missing" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sort([NaN, 5, NaN, missing, 4.0, 3.5, NaN, missing])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Reducations propagate both missing and NaN. Missing values can be skipped with `skipmissing`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"missing" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sum([1.0, NaN, missing])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"NaN" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sum(skipmissing([1.0, NaN, missing]))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1.0" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sum(skipmissing([1.0, missing]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## R" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"a <- c(1., 0., NaN, NA)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<style>\n", | |
".list-inline {list-style: none; margin:0; padding: 0}\n", | |
".list-inline>li {display: inline-block}\n", | |
".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n", | |
"</style>\n", | |
"<ol class=list-inline><li>1</li><li>0</li><li>NaN</li><li><NA></li></ol>\n" | |
], | |
"text/latex": [ | |
"\\begin{enumerate*}\n", | |
"\\item 1\n", | |
"\\item 0\n", | |
"\\item NaN\n", | |
"\\item <NA>\n", | |
"\\end{enumerate*}\n" | |
], | |
"text/markdown": [ | |
"1. 1\n", | |
"2. 0\n", | |
"3. NaN\n", | |
"4. <NA>\n", | |
"\n", | |
"\n" | |
], | |
"text/plain": [ | |
"[1] 1 0 NaN NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"a" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<style>\n", | |
".list-inline {list-style: none; margin:0; padding: 0}\n", | |
".list-inline>li {display: inline-block}\n", | |
".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n", | |
"</style>\n", | |
"<ol class=list-inline><li>Inf</li><li>NaN</li><li>NaN</li><li><NA></li></ol>\n" | |
], | |
"text/latex": [ | |
"\\begin{enumerate*}\n", | |
"\\item Inf\n", | |
"\\item NaN\n", | |
"\\item NaN\n", | |
"\\item <NA>\n", | |
"\\end{enumerate*}\n" | |
], | |
"text/markdown": [ | |
"1. Inf\n", | |
"2. NaN\n", | |
"3. NaN\n", | |
"4. <NA>\n", | |
"\n", | |
"\n" | |
], | |
"text/plain": [ | |
"[1] Inf NaN NaN NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"a / 0." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<style>\n", | |
".list-inline {list-style: none; margin:0; padding: 0}\n", | |
".list-inline>li {display: inline-block}\n", | |
".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n", | |
"</style>\n", | |
"<ol class=list-inline><li>FALSE</li><li>FALSE</li><li>TRUE</li><li>TRUE</li></ol>\n" | |
], | |
"text/latex": [ | |
"\\begin{enumerate*}\n", | |
"\\item FALSE\n", | |
"\\item FALSE\n", | |
"\\item TRUE\n", | |
"\\item TRUE\n", | |
"\\end{enumerate*}\n" | |
], | |
"text/markdown": [ | |
"1. FALSE\n", | |
"2. FALSE\n", | |
"3. TRUE\n", | |
"4. TRUE\n", | |
"\n", | |
"\n" | |
], | |
"text/plain": [ | |
"[1] FALSE FALSE TRUE TRUE" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"is.na(a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<style>\n", | |
".list-inline {list-style: none; margin:0; padding: 0}\n", | |
".list-inline>li {display: inline-block}\n", | |
".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n", | |
"</style>\n", | |
"<ol class=list-inline><li>FALSE</li><li>FALSE</li><li>TRUE</li><li>FALSE</li></ol>\n" | |
], | |
"text/latex": [ | |
"\\begin{enumerate*}\n", | |
"\\item FALSE\n", | |
"\\item FALSE\n", | |
"\\item TRUE\n", | |
"\\item FALSE\n", | |
"\\end{enumerate*}\n" | |
], | |
"text/markdown": [ | |
"1. FALSE\n", | |
"2. FALSE\n", | |
"3. TRUE\n", | |
"4. FALSE\n", | |
"\n", | |
"\n" | |
], | |
"text/plain": [ | |
"[1] FALSE FALSE TRUE FALSE" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"is.nan(a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"── \u001b[1mAttaching packages\u001b[22m ─────────────────────────────────────── tidyverse 1.2.1 ──\n", | |
"\n", | |
"\u001b[32m✔\u001b[39m \u001b[34mggplot2\u001b[39m 3.2.1 \u001b[32m✔\u001b[39m \u001b[34mpurrr \u001b[39m 0.3.2\n", | |
"\u001b[32m✔\u001b[39m \u001b[34mtibble \u001b[39m 2.1.3 \u001b[32m✔\u001b[39m \u001b[34mdplyr \u001b[39m 0.8.3\n", | |
"\u001b[32m✔\u001b[39m \u001b[34mtidyr \u001b[39m 1.0.0 \u001b[32m✔\u001b[39m \u001b[34mstringr\u001b[39m 1.4.0\n", | |
"\u001b[32m✔\u001b[39m \u001b[34mreadr \u001b[39m 1.3.1 \u001b[32m✔\u001b[39m \u001b[34mforcats\u001b[39m 0.4.0\n", | |
"\n", | |
"── \u001b[1mConflicts\u001b[22m ────────────────────────────────────────── tidyverse_conflicts() ──\n", | |
"\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mfilter()\u001b[39m masks \u001b[34mstats\u001b[39m::filter()\n", | |
"\u001b[31m✖\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mlag()\u001b[39m masks \u001b[34mstats\u001b[39m::lag()\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"library(tidyverse)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"df <- tibble(x = c(-1., 0., 1., NaN, NA))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 5 × 1</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td> -1</td></tr>\n", | |
"\t<tr><td> 0</td></tr>\n", | |
"\t<tr><td> 1</td></tr>\n", | |
"\t<tr><td>NaN</td></tr>\n", | |
"\t<tr><td> NA</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 5 × 1\n", | |
"\\begin{tabular}{l}\n", | |
" x\\\\\n", | |
" <dbl>\\\\\n", | |
"\\hline\n", | |
"\t -1\\\\\n", | |
"\t 0\\\\\n", | |
"\t 1\\\\\n", | |
"\t NaN\\\\\n", | |
"\t NA\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 5 × 1\n", | |
"\n", | |
"| x <dbl> |\n", | |
"|---|\n", | |
"| -1 |\n", | |
"| 0 |\n", | |
"| 1 |\n", | |
"| NaN |\n", | |
"| NA |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x \n", | |
"1 -1\n", | |
"2 0\n", | |
"3 1\n", | |
"4 NaN\n", | |
"5 NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 5 × 2</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x</th><th scope=col>x2</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td> -1</td><td>-Inf</td></tr>\n", | |
"\t<tr><td> 0</td><td> NaN</td></tr>\n", | |
"\t<tr><td> 1</td><td> Inf</td></tr>\n", | |
"\t<tr><td>NaN</td><td> NaN</td></tr>\n", | |
"\t<tr><td> NA</td><td> NA</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 5 × 2\n", | |
"\\begin{tabular}{ll}\n", | |
" x & x2\\\\\n", | |
" <dbl> & <dbl>\\\\\n", | |
"\\hline\n", | |
"\t -1 & -Inf\\\\\n", | |
"\t 0 & NaN\\\\\n", | |
"\t 1 & Inf\\\\\n", | |
"\t NaN & NaN\\\\\n", | |
"\t NA & NA\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 5 × 2\n", | |
"\n", | |
"| x <dbl> | x2 <dbl> |\n", | |
"|---|---|\n", | |
"| -1 | -Inf |\n", | |
"| 0 | NaN |\n", | |
"| 1 | Inf |\n", | |
"| NaN | NaN |\n", | |
"| NA | NA |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x x2 \n", | |
"1 -1 -Inf\n", | |
"2 0 NaN\n", | |
"3 1 Inf\n", | |
"4 NaN NaN\n", | |
"5 NA NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df %>% mutate(x2 = x / 0)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"Warning message in log(x):\n", | |
"“NaNs produced”\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 5 × 2</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x</th><th scope=col>x2</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td> -1</td><td> NaN</td></tr>\n", | |
"\t<tr><td> 0</td><td>-Inf</td></tr>\n", | |
"\t<tr><td> 1</td><td> 0</td></tr>\n", | |
"\t<tr><td>NaN</td><td> NaN</td></tr>\n", | |
"\t<tr><td> NA</td><td> NA</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 5 × 2\n", | |
"\\begin{tabular}{ll}\n", | |
" x & x2\\\\\n", | |
" <dbl> & <dbl>\\\\\n", | |
"\\hline\n", | |
"\t -1 & NaN\\\\\n", | |
"\t 0 & -Inf\\\\\n", | |
"\t 1 & 0\\\\\n", | |
"\t NaN & NaN\\\\\n", | |
"\t NA & NA\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 5 × 2\n", | |
"\n", | |
"| x <dbl> | x2 <dbl> |\n", | |
"|---|---|\n", | |
"| -1 | NaN |\n", | |
"| 0 | -Inf |\n", | |
"| 1 | 0 |\n", | |
"| NaN | NaN |\n", | |
"| NA | NA |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x x2 \n", | |
"1 -1 NaN\n", | |
"2 0 -Inf\n", | |
"3 1 0\n", | |
"4 NaN NaN\n", | |
"5 NA NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df %>% mutate(x2 = log(x))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 5 × 4</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x</th><th scope=col>x2</th><th scope=col>x3</th><th scope=col>x4</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><lgl></th><th scope=col><lgl></th><th scope=col><lgl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td> -1</td><td>FALSE</td><td> TRUE</td><td>NA</td></tr>\n", | |
"\t<tr><td> 0</td><td>FALSE</td><td>FALSE</td><td>NA</td></tr>\n", | |
"\t<tr><td> 1</td><td> TRUE</td><td>FALSE</td><td>NA</td></tr>\n", | |
"\t<tr><td>NaN</td><td> NA</td><td> NA</td><td>NA</td></tr>\n", | |
"\t<tr><td> NA</td><td> NA</td><td> NA</td><td>NA</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 5 × 4\n", | |
"\\begin{tabular}{llll}\n", | |
" x & x2 & x3 & x4\\\\\n", | |
" <dbl> & <lgl> & <lgl> & <lgl>\\\\\n", | |
"\\hline\n", | |
"\t -1 & FALSE & TRUE & NA\\\\\n", | |
"\t 0 & FALSE & FALSE & NA\\\\\n", | |
"\t 1 & TRUE & FALSE & NA\\\\\n", | |
"\t NaN & NA & NA & NA\\\\\n", | |
"\t NA & NA & NA & NA\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 5 × 4\n", | |
"\n", | |
"| x <dbl> | x2 <lgl> | x3 <lgl> | x4 <lgl> |\n", | |
"|---|---|---|---|\n", | |
"| -1 | FALSE | TRUE | NA |\n", | |
"| 0 | FALSE | FALSE | NA |\n", | |
"| 1 | TRUE | FALSE | NA |\n", | |
"| NaN | NA | NA | NA |\n", | |
"| NA | NA | NA | NA |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x x2 x3 x4\n", | |
"1 -1 FALSE TRUE NA\n", | |
"2 0 FALSE FALSE NA\n", | |
"3 1 TRUE FALSE NA\n", | |
"4 NaN NA NA NA\n", | |
"5 NA NA NA NA" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df %>% mutate(x2 = x > 0, x3 = x < 0, x4 = x != NaN)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 1 × 2</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x1</th><th scope=col>x2</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td>0</td><td>NaN</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 1 × 2\n", | |
"\\begin{tabular}{ll}\n", | |
" x1 & x2\\\\\n", | |
" <dbl> & <dbl>\\\\\n", | |
"\\hline\n", | |
"\t 0 & NaN\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 1 × 2\n", | |
"\n", | |
"| x1 <dbl> | x2 <dbl> |\n", | |
"|---|---|\n", | |
"| 0 | NaN |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x1 x2 \n", | |
"1 0 NaN" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df %>% summarise(x1 = mean(x, na.rm = TRUE), x2 = mean(x, na.rm = FALSE))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
"<caption>A tibble: 8 × 1</caption>\n", | |
"<thead>\n", | |
"\t<tr><th scope=col>x</th></tr>\n", | |
"\t<tr><th scope=col><dbl></th></tr>\n", | |
"</thead>\n", | |
"<tbody>\n", | |
"\t<tr><td>3.5</td></tr>\n", | |
"\t<tr><td>4.0</td></tr>\n", | |
"\t<tr><td>5.0</td></tr>\n", | |
"\t<tr><td> NA</td></tr>\n", | |
"\t<tr><td> NA</td></tr>\n", | |
"\t<tr><td>NaN</td></tr>\n", | |
"\t<tr><td>NaN</td></tr>\n", | |
"\t<tr><td>NaN</td></tr>\n", | |
"</tbody>\n", | |
"</table>\n" | |
], | |
"text/latex": [ | |
"A tibble: 8 × 1\n", | |
"\\begin{tabular}{l}\n", | |
" x\\\\\n", | |
" <dbl>\\\\\n", | |
"\\hline\n", | |
"\t 3.5\\\\\n", | |
"\t 4.0\\\\\n", | |
"\t 5.0\\\\\n", | |
"\t NA\\\\\n", | |
"\t NA\\\\\n", | |
"\t NaN\\\\\n", | |
"\t NaN\\\\\n", | |
"\t NaN\\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"\n", | |
"A tibble: 8 × 1\n", | |
"\n", | |
"| x <dbl> |\n", | |
"|---|\n", | |
"| 3.5 |\n", | |
"| 4.0 |\n", | |
"| 5.0 |\n", | |
"| NA |\n", | |
"| NA |\n", | |
"| NaN |\n", | |
"| NaN |\n", | |
"| NaN |\n", | |
"\n" | |
], | |
"text/plain": [ | |
" x \n", | |
"1 3.5\n", | |
"2 4.0\n", | |
"3 5.0\n", | |
"4 NA\n", | |
"5 NA\n", | |
"6 NaN\n", | |
"7 NaN\n", | |
"8 NaN" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"df <- tibble(x = c(NaN, 5, NaN, NA, 4.0, 3.5, NaN, NA))\n", | |
"df %>% arrange(x)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Arrow / Dremio" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Arrow itself also distinguishes both concepts, since it uses bitmasks consistently for missing values:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pyarrow as pa" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<pyarrow.lib.DoubleArray object at 0x7ff14a15cee8>\n", | |
"[\n", | |
" 0,\n", | |
" 1,\n", | |
" nan,\n", | |
" null\n", | |
"]" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"a = pa.array([0.0, 1.0, float(\"nan\"), None])\n", | |
"a" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This is just storage, though, and there not yet many actual computional operations provided in the `pyarrow` package itself. \n", | |
"\n", | |
"One part of Apache Arrow that already provides computational functionality is Gandiva (LLVM-based execution kernel for SQL-like expressions, https://www.dremio.com/announcing-gandiva-initiative-for-apache-arrow/), which is eg used in Dremio's data services.\n", | |
"\n", | |
"Here, we can check eg the behaviour of division by 0:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from pyarrow import gandiva" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"table = pa.table({'a': a, 'b': [0.0, 0.0, 0.0, 0.0]})" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"builder = gandiva.TreeExprBuilder()\n", | |
"node_a = builder.make_field(table.schema.field(\"a\"))\n", | |
"node_b = builder.make_field(table.schema.field(\"b\"))\n", | |
"node_upper = builder.make_function(\"divide\", [node_a, node_b], pa.float64())\n", | |
"field_result = pa.field('res', pa.float64())\n", | |
"expr = builder.make_expression(node_upper, field_result)\n", | |
"projector = gandiva.make_projector(table.schema, [expr], pa.default_memory_pool())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "ArrowException", | |
"evalue": "ExecutionError in Gandiva: divide by zero error", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mArrowException\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-6-2f99515c79a6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprojector\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mevaluate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtable\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_batches\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m~/miniconda3/envs/arrow-0151/lib/python3.7/site-packages/pyarrow/gandiva.pyx\u001b[0m in \u001b[0;36mpyarrow.gandiva.Projector.evaluate\u001b[0;34m()\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/envs/arrow-0151/lib/python3.7/site-packages/pyarrow/error.pxi\u001b[0m in \u001b[0;36mpyarrow.lib.check_status\u001b[0;34m()\u001b[0m\n", | |
"\u001b[0;31mArrowException\u001b[0m: ExecutionError in Gandiva: divide by zero error" | |
] | |
} | |
], | |
"source": [ | |
"projector.evaluate(table.to_batches()[0])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Where an error is raised, similar to other databases." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## PostgreSQL" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%load_ext sql" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'Connected: postgres@test_db'" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql postgresql://postgres:postgres@localhost/test_db" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"Done.\n", | |
"Done.\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"[]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%sql\n", | |
"DROP TABLE IF EXISTS test;\n", | |
"CREATE TABLE test (\n", | |
" x double precision,\n", | |
" y double precision\n", | |
");\n", | |
"INSERT INTO test VALUES (0.0, 0.0), (1.0, 0.0), ('NaN'::double precision, 0.0), (NULL, 0.0);" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>x</th>\n", | |
" <th>y</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0.0</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1.0</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(0.0, 0.0), (1.0, 0.0), (nan, 0.0), (None, 0.0)]" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT * FROM test;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Division by zero raises an error (so does not result in NaN):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": true, | |
"jupyter": { | |
"outputs_hidden": true | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n" | |
] | |
}, | |
{ | |
"ename": "DataError", | |
"evalue": "(psycopg2.errors.DivisionByZero) division by zero\n\n[SQL: SELECT x / y FROM test]\n(Background on this error at: http://sqlalche.me/e/9h9h)", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mDivisionByZero\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36m_execute_context\u001b[0;34m(self, dialect, constructor, statement, parameters, *args)\u001b[0m\n\u001b[1;32m 1247\u001b[0m self.dialect.do_execute(\n\u001b[0;32m-> 1248\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1249\u001b[0m )\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py\u001b[0m in \u001b[0;36mdo_execute\u001b[0;34m(self, cursor, statement, parameters, context)\u001b[0m\n\u001b[1;32m 589\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdo_execute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 590\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 591\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mDivisionByZero\u001b[0m: division by zero\n", | |
"\nThe above exception was the direct cause of the following exception:\n", | |
"\u001b[0;31mDataError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-15-ea847347b56b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'SELECT x / y FROM test'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py\u001b[0m in \u001b[0;36mrun_line_magic\u001b[0;34m(self, magic_name, line, _stack_depth)\u001b[0m\n\u001b[1;32m 2305\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'local_ns'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getframe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstack_depth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf_locals\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2306\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuiltin_trap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2307\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2308\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2309\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m</home/joris/miniconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-127>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m</home/joris/miniconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-126>\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/IPython/core/magic.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(f, *a, **k)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;31m# but it's overkill for just that one bit of state.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmagic_deco\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0mcall\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcallable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sql/magic.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, line, cell, local_ns)\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 94\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 95\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msql\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparsed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'sql'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muser_ns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 96\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 97\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumn_local_vars\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sql/run.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(conn, sql, config, user_namespace)\u001b[0m\n\u001b[1;32m 338\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 339\u001b[0m \u001b[0mtxt\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msqlalchemy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msql\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstatement\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 340\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtxt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muser_namespace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 341\u001b[0m \u001b[0m_commit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconfig\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconfig\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 342\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mconfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfeedback\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, object_, *multiparams, **params)\u001b[0m\n\u001b[1;32m 982\u001b[0m )\n\u001b[1;32m 983\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 984\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mmeth\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 985\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 986\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_execute_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py\u001b[0m in \u001b[0;36m_execute_on_connection\u001b[0;34m(self, connection, multiparams, params)\u001b[0m\n\u001b[1;32m 291\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_execute_on_connection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconnection\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 292\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msupports_execution\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 293\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mconnection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_execute_clauseelement\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmultiparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 294\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 295\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mObjectNotExecutableError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36m_execute_clauseelement\u001b[0;34m(self, elem, multiparams, params)\u001b[0m\n\u001b[1;32m 1101\u001b[0m \u001b[0mdistilled_params\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1102\u001b[0m \u001b[0mcompiled_sql\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1103\u001b[0;31m \u001b[0mdistilled_params\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1104\u001b[0m )\n\u001b[1;32m 1105\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_has_events\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_has_events\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36m_execute_context\u001b[0;34m(self, dialect, constructor, statement, parameters, *args)\u001b[0m\n\u001b[1;32m 1286\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mBaseException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1287\u001b[0m self._handle_dbapi_exception(\n\u001b[0;32m-> 1288\u001b[0;31m \u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1289\u001b[0m )\n\u001b[1;32m 1290\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36m_handle_dbapi_exception\u001b[0;34m(self, e, statement, parameters, cursor, context)\u001b[0m\n\u001b[1;32m 1480\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mshould_wrap\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1481\u001b[0m util.raise_(\n\u001b[0;32m-> 1482\u001b[0;31m \u001b[0msqlalchemy_exception\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwith_traceback\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mexc_info\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfrom_\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1483\u001b[0m )\n\u001b[1;32m 1484\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py\u001b[0m in \u001b[0;36mraise_\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 176\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 177\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 178\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 179\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 180\u001b[0m \u001b[0;31m# credit to\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py\u001b[0m in \u001b[0;36m_execute_context\u001b[0;34m(self, dialect, constructor, statement, parameters, *args)\u001b[0m\n\u001b[1;32m 1246\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mevt_handled\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1247\u001b[0m self.dialect.do_execute(\n\u001b[0;32m-> 1248\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1249\u001b[0m )\n\u001b[1;32m 1250\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/miniconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py\u001b[0m in \u001b[0;36mdo_execute\u001b[0;34m(self, cursor, statement, parameters, context)\u001b[0m\n\u001b[1;32m 588\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 589\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdo_execute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 590\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 591\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 592\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdo_execute_no_params\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcontext\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mDataError\u001b[0m: (psycopg2.errors.DivisionByZero) division by zero\n\n[SQL: SELECT x / y FROM test]\n(Background on this error at: http://sqlalche.me/e/9h9h)" | |
] | |
} | |
], | |
"source": [ | |
"%sql SELECT x / y FROM test" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Comparisons with NaN do not always give False:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>?column?</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>True</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>True</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(False,), (True,), (True,), (None,)]" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x > y FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>?column?</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>False</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(False,), (False,), (False,), (None,)]" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x < y FROM test" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note from https://www.postgresql.org/docs/current/datatype-numeric.html:\n", | |
"\n", | |
"> In most implementations of the “not-a-number” concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"1 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>sum</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(nan,)]" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT sum(x) FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"1 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>sum</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(1.0,)]" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT Sum(x) FROM test WHERE x != 'NaN'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>x</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(None,), (nan,), (1.0,), (0.0,)]" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x FROM test ORDER BY x DESC" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * postgresql://postgres:***@localhost/test_db\n", | |
"4 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>x</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1.0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(0.0,), (1.0,), (nan,), (None,)]" | |
] | |
}, | |
"execution_count": 24, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x FROM test ORDER BY x ASC" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Clickhouse" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'Connected: default@default'" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql clickhouse://default:clickhouse@localhost/default" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[]" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql DROP TABLE IF EXISTS test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%sql CREATE TABLE test (id Int, x Nullable(Double), y Nullable(Double)) ENGINE = MergeTree ORDER BY id" | |
] | |
}, | |
{ | |
"cell_type": "raw", | |
"metadata": {}, | |
"source": [ | |
"INSERT INTO test VALUES (1, 0.0, 0.0), (1, 1.0, 0.0), (3, NaN, 0.0), (4, NULL, 0.0)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>id</th>\n", | |
" <th>x</th>\n", | |
" <th>y</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>3</td>\n", | |
" <td>nan</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>4</td>\n", | |
" <td>None</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(1, '0', '0'), (1, '1', '0'), (3, 'nan', '0'), (4, None, '0')]" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT * FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>divide(x, y)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>inf</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('nan',), ('inf',), ('nan',), (None,)]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x / y FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>greater(x, y)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('0',), ('1',), ('0',), (None,)]" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x > y FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>isNaN(x)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('0',), ('0',), ('1',), (None,)]" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT isNaN(x) FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>Sum(x)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('nan',)]" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT Sum(x) FROM test" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>Sum(x)</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('1',)]" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT Sum(x) FROM test WHERE isFinite(x)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>x</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('1',), ('0',), ('nan',), (None,)]" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x FROM test ORDER BY x DESCENDING" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" * clickhouse://default:***@localhost/default\n", | |
"Done.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>x</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>nan</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('0',), ('1',), ('nan',), (None,)]" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%sql SELECT x FROM test ORDER BY x ASCENDING" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Spark\n", | |
"\n", | |
"\n", | |
"```\n", | |
"\n", | |
"df = spark.createDataFrame(\n", | |
" [\n", | |
" (1, 0.0, 0.0),\n", | |
" (2, 1.0, 0.0),\n", | |
" (3, float('nan'), 0.0),\n", | |
" (4, None, 0.0),\n", | |
" ],\n", | |
" ['id', 'x', 'y']\n", | |
")\n", | |
"\n", | |
"\n", | |
"df.show()\n", | |
"\n", | |
"\n", | |
"\n", | |
"import pyspark.sql.functions as F\n", | |
"\n", | |
"df.withColumn(\"x2\", (F.col(\"x\") / F.col(\"y\"))).show()\n", | |
"df.withColumn(\"x2\", F.log(\"x\")).show()\n", | |
"df.na.fill(0).show()\n", | |
"```\n", | |
"\n", | |
"```\n", | |
"In [4]: df.withColumn(\"x2\", (F.col(\"x\") / F.col(\"y\"))).show()\n", | |
"+---+----+---+----+\n", | |
"| id| x| y| x2|\n", | |
"+---+----+---+----+\n", | |
"| 1| 0.0|0.0|null|\n", | |
"| 2| 1.0|0.0|null|\n", | |
"| 3| NaN|0.0|null|\n", | |
"| 4|null|0.0|null|\n", | |
"+---+----+---+----+\n", | |
"\n", | |
"In [5]: df.withColumn(\"x2\", F.log(\"x\")).show()\n", | |
"+---+----+---+----+\n", | |
"| id| x| y| x2|\n", | |
"+---+----+---+----+\n", | |
"| 1| 0.0|0.0|null|\n", | |
"| 2| 1.0|0.0| 0.0|\n", | |
"| 3| NaN|0.0| NaN|\n", | |
"| 4|null|0.0|null|\n", | |
"+---+----+---+----+\n", | |
"\n", | |
"In [6]: df.na.fill(0).show()\n", | |
"+---+---+---+\n", | |
"| id| x| y|\n", | |
"+---+---+---+\n", | |
"| 1|0.0|0.0|\n", | |
"| 2|1.0|0.0|\n", | |
"| 3|0.0|0.0|\n", | |
"| 4|0.0|0.0|\n", | |
"+---+---+---+\n", | |
"```\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "R", | |
"language": "R", | |
"name": "ir" | |
}, | |
"language_info": { | |
"codemirror_mode": "r", | |
"file_extension": ".r", | |
"mimetype": "text/x-r-source", | |
"name": "R", | |
"pygments_lexer": "r", | |
"version": "3.6.1" | |
}, | |
"toc-autonumbering": false | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment