Skip to content

Instantly share code, notes, and snippets.

@guilhermeleobas
Created December 22, 2020 22:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save guilhermeleobas/098a8b4c8199942eed03beff39a75cce to your computer and use it in GitHub Desktop.
Save guilhermeleobas/098a8b4c8199942eed03beff39a75cce to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [],
"source": [
"from numba import types"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"from rbc.omniscidb import RemoteOmnisci"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"omnisci = RemoteOmnisci(user='admin', password='HyperInteractive',\n",
" host='127.0.0.1', port=6274, dbname='omnisci')"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [],
"source": [
"# Create table\n",
"omnisci.sql_execute('DROP TABLE IF EXISTS mytable')\n",
"omnisci.sql_execute('CREATE TABLE IF NOT EXISTS mytable (i INT[], j INT[]);');\n",
"# insert values\n",
"omnisci.sql_execute('INSERT INTO mytable VALUES (ARRAY[1,2,3,2,3,4,3,4,5,6], ARRAY[7,2,10,2,7,4,9,4,9,8])');"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[([1, 2, 3, 2, 3, 4, 3, 4, 5, 6], [7, 2, 10, 2, 7, 4, 9, 4, 9, 8])]\n"
]
}
],
"source": [
"# query values\n",
"_, result = omnisci.sql_execute('SELECT * FROM mytable')\n",
"print(list(result))"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [],
"source": [
"# Notice the import below\n",
"import rbc.omnisci_backend as np"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [],
"source": [
"# euclidean distance\n",
"@omnisci('float32(int32[], int32[])')\n",
"def euclidean_distance(a, b):\n",
" return np.sqrt(np.sum(np.power((a - b), 2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Internally, the function above will be compiled to LLVM intermediate representation and sent to OmniSciDB to be executed. Notice that `np.sum` will be computed using a for-loop, so, that might not be the most performatic solution"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(12.529964447021484,)]\n"
]
}
],
"source": [
"# query that computes the euclidean distance\n",
"_, result = omnisci.sql_execute('SELECT euclidean_distance(i, j) from mytable;')\n",
"print(list(result))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compute the euclidean distance using a User-Defined Table Function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While the function above works, one can also compute the euclidean distance using a table function (UDTF). For that, let's create a table with two columns that holds random (x, y) coordinates:"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [],
"source": [
"# Create another table\n",
"omnisci.sql_execute('DROP TABLE IF EXISTS other_table')\n",
"omnisci.sql_execute('CREATE TABLE IF NOT EXISTS other_table (a INT, b INT);');\n",
"# insert values\n",
"a, b = [], []\n",
"for i in range(30000):\n",
" a.append(random.randint(0, 100))\n",
" b.append(random.randint(0, 100))\n",
"omnisci.load_table_columnar('other_table', a=a, b=b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can query the column to grab the first five elements"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(4, 19), (77, 3), (97, 27), (68, 75), (40, 45)]\n"
]
}
],
"source": [
"# query\n",
"_, result = omnisci.sql_execute('SELECT a, b from other_table limit 5;')\n",
"print(list(result))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's create a table function that computes the power $(a[i] - b[i]) ^ 2$. One can aggregate the powers using the builtin function `SUM` and finally compute the square root using `SQRT`.\n",
"\n",
"For large datasets, this might be the most performatic approach at the moment since the aggregation step will be computed in parallel in a GPU."
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(103.31989159885913,)]\n"
]
}
],
"source": [
"# The table function (UDTF) below receives 2 columns<int32> as arguments and returns another column<int32>\n",
"# For more information of how to use table functions, please check\n",
"# https://github.com/xnd-project/rbc/blob/master/notebooks/rbc-omnisci-udtf-normalize.ipynb\n",
"@omnisci('int32(Column<int32>, Column<int32>, RowMultiplier, OutputColumn<int32>)')\n",
"def power_udtf(a, b, m, r):\n",
" input_row_count = len(a)\n",
" for i in range(input_row_count):\n",
" r[i] = (a[i] - b[i])*(a[i] - b[i])\n",
" # return the number of rows we touched\n",
" return input_row_count;\n",
"\n",
"\n",
"_, result = omnisci.sql_execute(\n",
" ' SELECT SQRT(SUM(out0)) FROM TABLE(power_udtf('\n",
" ' cursor(SELECT a from other_table),' # first column\n",
" ' cursor(SELECT b from other_table),' # second column\n",
" ' 1))')\n",
"\n",
"print(list(result)) # the result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment