Created
December 22, 2020 22:49
-
-
Save guilhermeleobas/098a8b4c8199942eed03beff39a75cce to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 114, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import random" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 115, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from numba import types" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 116, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from rbc.omniscidb import RemoteOmnisci" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 117, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"omnisci = RemoteOmnisci(user='admin', password='HyperInteractive',\n", | |
" host='127.0.0.1', port=6274, dbname='omnisci')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 123, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Create table\n", | |
"omnisci.sql_execute('DROP TABLE IF EXISTS mytable')\n", | |
"omnisci.sql_execute('CREATE TABLE IF NOT EXISTS mytable (i INT[], j INT[]);');\n", | |
"# insert values\n", | |
"omnisci.sql_execute('INSERT INTO mytable VALUES (ARRAY[1,2,3,2,3,4,3,4,5,6], ARRAY[7,2,10,2,7,4,9,4,9,8])');" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 124, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[([1, 2, 3, 2, 3, 4, 3, 4, 5, 6], [7, 2, 10, 2, 7, 4, 9, 4, 9, 8])]\n" | |
] | |
} | |
], | |
"source": [ | |
"# query values\n", | |
"_, result = omnisci.sql_execute('SELECT * FROM mytable')\n", | |
"print(list(result))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 125, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Notice the import below\n", | |
"import rbc.omnisci_backend as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 126, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# euclidean distance\n", | |
"@omnisci('float32(int32[], int32[])')\n", | |
"def euclidean_distance(a, b):\n", | |
" return np.sqrt(np.sum(np.power((a - b), 2)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Internally, the function above will be compiled to LLVM intermediate representation and sent to OmniSciDB to be executed. Notice that `np.sum` will be computed using a for-loop, so, that might not be the most performatic solution" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 127, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[(12.529964447021484,)]\n" | |
] | |
} | |
], | |
"source": [ | |
"# query that computes the euclidean distance\n", | |
"_, result = omnisci.sql_execute('SELECT euclidean_distance(i, j) from mytable;')\n", | |
"print(list(result))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Compute the euclidean distance using a User-Defined Table Function" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"While the function above works, one can also compute the euclidean distance using a table function (UDTF). For that, let's create a table with two columns that holds random (x, y) coordinates:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 157, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Create another table\n", | |
"omnisci.sql_execute('DROP TABLE IF EXISTS other_table')\n", | |
"omnisci.sql_execute('CREATE TABLE IF NOT EXISTS other_table (a INT, b INT);');\n", | |
"# insert values\n", | |
"a, b = [], []\n", | |
"for i in range(30000):\n", | |
" a.append(random.randint(0, 100))\n", | |
" b.append(random.randint(0, 100))\n", | |
"omnisci.load_table_columnar('other_table', a=a, b=b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can query the column to grab the first five elements" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 143, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[(4, 19), (77, 3), (97, 27), (68, 75), (40, 45)]\n" | |
] | |
} | |
], | |
"source": [ | |
"# query\n", | |
"_, result = omnisci.sql_execute('SELECT a, b from other_table limit 5;')\n", | |
"print(list(result))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, let's create a table function that computes the power $(a[i] - b[i]) ^ 2$. One can aggregate the powers using the builtin function `SUM` and finally compute the square root using `SQRT`.\n", | |
"\n", | |
"For large datasets, this might be the most performatic approach at the moment since the aggregation step will be computed in parallel in a GPU." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 156, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[(103.31989159885913,)]\n" | |
] | |
} | |
], | |
"source": [ | |
"# The table function (UDTF) below receives 2 columns<int32> as arguments and returns another column<int32>\n", | |
"# For more information of how to use table functions, please check\n", | |
"# https://github.com/xnd-project/rbc/blob/master/notebooks/rbc-omnisci-udtf-normalize.ipynb\n", | |
"@omnisci('int32(Column<int32>, Column<int32>, RowMultiplier, OutputColumn<int32>)')\n", | |
"def power_udtf(a, b, m, r):\n", | |
" input_row_count = len(a)\n", | |
" for i in range(input_row_count):\n", | |
" r[i] = (a[i] - b[i])*(a[i] - b[i])\n", | |
" # return the number of rows we touched\n", | |
" return input_row_count;\n", | |
"\n", | |
"\n", | |
"_, result = omnisci.sql_execute(\n", | |
" ' SELECT SQRT(SUM(out0)) FROM TABLE(power_udtf('\n", | |
" ' cursor(SELECT a from other_table),' # first column\n", | |
" ' cursor(SELECT b from other_table),' # second column\n", | |
" ' 1))')\n", | |
"\n", | |
"print(list(result)) # the result" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.6" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment