Skip to content

Instantly share code, notes, and snippets.

@rwcitek
Last active January 15, 2024 00:16
Show Gist options
  • Save rwcitek/375a3b62bbae98878bf440d396149907 to your computer and use it in GitHub Desktop.
Save rwcitek/375a3b62bbae98878bf440d396149907 to your computer and use it in GitHub Desktop.
ctes.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/rwcitek/375a3b62bbae98878bf440d396149907/ctes.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "bed5ab6c-059f-4491-87ea-63c1774807a6",
"metadata": {
"id": "bed5ab6c-059f-4491-87ea-63c1774807a6"
},
"source": [
"# Lightning Talk: SQL Common Table Expression ( CTE )\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "ZN7f2_bJsfog",
"metadata": {
"id": "ZN7f2_bJsfog"
},
"source": [
"Common Table Expressions: collection of really useful temporary tables ( RUTTs ).\n"
]
},
{
"cell_type": "markdown",
"source": [
"## Link to this notebook\n"
],
"metadata": {
"id": "RnczyToCtRJN"
},
"id": "RnczyToCtRJN"
},
{
"cell_type": "markdown",
"source": [
"![cte.qrcode.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJ8AAACfAQMAAADd1ZmjAAAABlBMVEUAAAD///+l2Z/dAAAAAnRSTlP//8i138cAAAAJcEhZcwAACxIAAAsSAdLdfvwAAAJcSURBVEiJ5Za9jeMwEIVHYMDMaoAA22DGlqgGJLkBuSVmbIOAGhAzBQLn3izsxQYXaO6iwwmGYX82Bpyf94bEv3nof4EH0ZL9i80aiKIj8irY+FqYEtHDXtOJz0qY3WyvFP1h63heyeph5FYQdu/B/QFM1jRGOrySHkpGphXfuI78M81bEPWcivt+fRf5JsRzWH4Wf5DpOM+nxXdhvgbUBK20Fe+TEnbaX9k9COehOfrt60gKaDEGdSz7GupS6iPqIBJp2WxkDuuPaPjUwR7qVNBHmrLv8pVV8AiS1BrMZj1OlawOop4jRGQdJmE83XeRb0Ik8iwXIR2Jf70n5DbkUodCQ6YUayI3Zh1s2TcMIQ524g8Iq4TlWorv1pF1Q8E4sQpyNk+J6eAAY66fKt2GSAcOEMW+FvnsVbCTR0NfLIO0BWSng5whgYp0pnPvkEPWQXSzsW8n3nmTYdZBySgjGjISHx5OVkF0c7VfHOYZ6yPooKg47D066BcialkHIdsDIkIpGMOAReBVsMv0miOAwAfe/nkfHnGHBmdLc8D28WvUwXaiDrxFOGd92P1VWAczQnGP5iC0A7/qYCeEMthZC2YpGpiYEu5QMUVKFl4kg62CzA4DIA521iFfc9BBBOl259OsEfZVkxKingOuHAEz7MYiu1sF8bSMy4ZISbRsdVC2tkBoGTPpUtBB3BlmK/uu8S7LyyphdrIuoWIkBRW/3VsBp4xqYOthAxouaphEyNhcNGaaP0e6C5FRhGnDOc2rXNPnEnUTyk31hBCw+PwzX+nTjpvwL2/U/zD8BWRWdjWPhsjZAAAAAElFTkSuQmCC)\n"
],
"metadata": {
"id": "qqaoYbJFuQf0"
},
"id": "qqaoYbJFuQf0"
},
{
"cell_type": "markdown",
"id": "qiqueMrmmRQz",
"metadata": {
"id": "qiqueMrmmRQz",
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"## Setup\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "RwK_8lsL18Df",
"metadata": {
"id": "RwK_8lsL18Df"
},
"source": [
"- Install zip, unzip, and sqlite3\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0g9EPsfW2CZ4",
"metadata": {
"id": "0g9EPsfW2CZ4"
},
"outputs": [],
"source": [
"%%capture output\n",
"%%bash\n",
"apt-get update\n",
"apt-get install -y zip unzip sqlite3\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9cnAQr2MTkGC",
"metadata": {
"id": "9cnAQr2MTkGC",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c261bc91-1461-4005-dad2-e2a2796cb671"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1\n"
]
}
],
"source": [
"%%bash\n",
"sqlite3 --version"
]
},
{
"cell_type": "markdown",
"id": "vCCQTAA91695",
"metadata": {
"id": "vCCQTAA91695"
},
"source": [
"- Download and uncompress Chinook data set.\n",
"\n",
" - https://www.sqlitetutorial.net/sqlite-sample-database/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "sl9JEMIT2WF5",
"metadata": {
"id": "sl9JEMIT2WF5",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "50379241-b27e-4699-93d7-a0d4366db779"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
" % Total % Received % Xferd Average Speed Time Time Time Current\n",
" Dload Upload Total Spent Left Speed\n",
"\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 298k 100 298k 0 0 1113k 0 --:--:-- --:--:-- --:--:-- 1117k\n"
]
}
],
"source": [
"%%bash\n",
"curl -O https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "z3RYUMKi2jvR",
"metadata": {
"id": "z3RYUMKi2jvR",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8ced0e70-9164-4084-d9e4-d141623ff4f7"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"total 312k\n",
"-rw-r--r-- 1 root root 306k Jan 13 20:28 chinook.zip\n",
"drwxr-xr-x 1 root root 4.1k Jan 11 17:02 sample_data\n"
]
}
],
"source": [
"%%bash\n",
"ls -l --si"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "V432MRhZ2miV",
"metadata": {
"id": "V432MRhZ2miV",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "bef7fcb4-db74-4640-f9fb-3122ea8b5a45"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Archive: chinook.zip\n",
" inflating: chinook.db \n"
]
}
],
"source": [
"%%bash\n",
"unzip -u chinook.zip"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "X2pnZuSSI2Xf",
"metadata": {
"id": "X2pnZuSSI2Xf",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a0970624-fe4b-458b-e6c6-d470ea635fa9"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"total 1.2M\n",
"-rw-r--r-- 1 root root 885k Nov 29 2015 chinook.db\n",
"-rw-r--r-- 1 root root 306k Jan 13 20:28 chinook.zip\n",
"drwxr-xr-x 1 root root 4.1k Jan 11 17:02 sample_data\n"
]
}
],
"source": [
"%%bash\n",
"ls -l --si"
]
},
{
"cell_type": "markdown",
"id": "MpRlw9idHNHh",
"metadata": {
"id": "MpRlw9idHNHh"
},
"source": [
"- List tables in database"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "wx9XTWUIHQsi",
"metadata": {
"id": "wx9XTWUIHQsi",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "69a4d6f2-d429-466d-c099-e4bcd7be68fe"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"albums employees invoices playlists \n",
"artists genres media_types tracks \n",
"customers invoice_items playlist_track\n"
]
}
],
"source": [
"%%script sqlite3 chinook.db\n",
".tables"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "lB4j7bvKI-ga",
"metadata": {
"id": "lB4j7bvKI-ga",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ded1d034-5b6c-41a1-9975-0d3fb96a1694"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"name \n",
"--------------\n",
"albums \n",
"artists \n",
"customers \n",
"employees \n",
"genres \n",
"invoices \n",
"invoice_items \n",
"media_types \n",
"playlists \n",
"playlist_track\n",
"tracks \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"SELECT\n",
" name\n",
"FROM\n",
" sqlite_schema\n",
"WHERE\n",
" type ='table' AND\n",
" name NOT LIKE 'sqlite_%';"
]
},
{
"cell_type": "markdown",
"id": "dvh5S92c2UvF",
"metadata": {
"id": "dvh5S92c2UvF"
},
"source": [
"- View Chinook ER diagram.\n",
" - https://www.sqlitetutorial.net/wp-content/uploads/2018/03/sqlite-sample-database-diagram-color.pdf\n"
]
},
{
"cell_type": "markdown",
"id": "ouDGOGinaTrd",
"metadata": {
"id": "ouDGOGinaTrd"
},
"source": [
"## Structure of a CTE\n"
]
},
{
"cell_type": "markdown",
"id": "r-n_KRxns0lE",
"metadata": {
"id": "r-n_KRxns0lE"
},
"source": [
"Two parts:\n",
"\n",
"- `WITH ...`\n",
"- `SELECT ...`\n",
"\n",
"The `WITH` creates the temporary table structures. There can be more than one and they can reference each other. The `SELECT` uses the temporary table(s) ... or not."
]
},
{
"cell_type": "markdown",
"id": "eXBXq4aXjLX0",
"metadata": {
"id": "eXBXq4aXjLX0"
},
"source": [
"## Uses\n"
]
},
{
"cell_type": "markdown",
"id": "4UvaYQaAmHXv",
"metadata": {
"id": "4UvaYQaAmHXv"
},
"source": [
"### As an alias\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0yqU8l4WMLFy",
"metadata": {
"id": "0yqU8l4WMLFy",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "785d10d2-d11e-41c2-f437-638a800f7b25"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email \n",
"---------- -------- --------- ------------------- --------- ------------------- ------------------- --------------------------- ---------- ----- ------- ---------- ----------------- ----------------- ------------------------\n",
"1 Adams Andrew General Manager 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457 andrew@chinookcorp.com \n",
"2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322 nancy@chinookcorp.com \n",
"3 Peacock Jane Sales Support Agent 2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW Calgary AB Canada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712 jane@chinookcorp.com \n",
"4 Park Margaret Sales Support Agent 2 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW Calgary AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289 margaret@chinookcorp.com\n",
"5 Johnson Steve Sales Support Agent 2 1965-03-03 00:00:00 2003-10-17 00:00:00 7727B 41 Ave Calgary AB Canada T3B 1Y7 1 (780) 836-9987 1 (780) 836-9543 steve@chinookcorp.com \n",
"6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899 michael@chinookcorp.com \n",
"7 King Robert IT Staff 6 1970-05-29 00:00:00 2004-01-02 00:00:00 590 Columbia Boulevard West Lethbridge AB Canada T1K 5N8 +1 (403) 456-9986 +1 (403) 456-8485 robert@chinookcorp.com \n",
"8 Callahan Laura IT Staff 6 1968-01-09 00:00:00 2004-03-04 00:00:00 923 7 ST NW Lethbridge AB Canada T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772 laura@chinookcorp.com \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" workers as ( SELECT * FROM Employees )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" workers\n"
]
},
{
"cell_type": "markdown",
"id": "b0xZipEFmJXk",
"metadata": {
"id": "b0xZipEFmJXk"
},
"source": [
"### As a temporary view\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3XnWpbkoMxG8",
"metadata": {
"id": "3XnWpbkoMxG8",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "314ed484-258d-4987-e7bf-c69eba1fa5dc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"EmployeeID Name ReportsTo\n",
"---------- ---------------- ---------\n",
"1 Andrew Adams \n",
"2 Nancy Edwards 1 \n",
"3 Jane Peacock 2 \n",
"4 Margaret Park 2 \n",
"5 Steve Johnson 2 \n",
"6 Michael Mitchell 1 \n",
"7 Robert King 6 \n",
"8 Laura Callahan 6 \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" workers as ( SELECT EmployeeID, FirstName || \" \" || LastName as Name, ReportsTo FROM Employees )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" workers\n"
]
},
{
"cell_type": "markdown",
"id": "5ZfAle2lmKqb",
"metadata": {
"id": "5ZfAle2lmKqb"
},
"source": [
"### As an alternative to a subquery\n"
]
},
{
"cell_type": "markdown",
"id": "z_d5b28IVazC",
"metadata": {
"id": "z_d5b28IVazC"
},
"source": [
"The subquery to select all above-average invoices."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "bEnEx2oVVWDU",
"metadata": {
"id": "bEnEx2oVVWDU",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "25ba6ea1-e3ae-4ab4-b2a4-45808b8924c6"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"InvoiceId Total Mean \n",
"--------- ----- ----------------\n",
"3 5.94 5.65194174757282\n",
"4 8.91 5.65194174757282\n",
"5 13.86 5.65194174757282\n",
"10 5.94 5.65194174757282\n",
"11 8.91 5.65194174757282\n",
"12 13.86 5.65194174757282\n",
"17 5.94 5.65194174757282\n",
"18 8.91 5.65194174757282\n",
"19 13.86 5.65194174757282\n",
"24 5.94 5.65194174757282\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"SELECT\n",
" InvoiceID, Total, ( SELECT AVG(Total) FROM invoices ) as Mean\n",
"FROM\n",
" invoices\n",
"WHERE\n",
" Total > Mean\n",
"LIMIT\n",
" 10"
]
},
{
"cell_type": "markdown",
"id": "718edbfb-3efd-4a64-967f-d42138ab761e",
"metadata": {
"id": "718edbfb-3efd-4a64-967f-d42138ab761e"
},
"source": [
"<br>\n",
"<br>\n",
"The same with a CTE, using a cross join ( or cartesian join )."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "gcg5932ZRluq",
"metadata": {
"id": "gcg5932ZRluq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "de2e8008-557e-413e-ab26-72b4c413f923"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"InvoiceId Total Mean \n",
"--------- ----- ----------------\n",
"3 5.94 5.65194174757282\n",
"4 8.91 5.65194174757282\n",
"5 13.86 5.65194174757282\n",
"10 5.94 5.65194174757282\n",
"11 8.91 5.65194174757282\n",
"12 13.86 5.65194174757282\n",
"17 5.94 5.65194174757282\n",
"18 8.91 5.65194174757282\n",
"19 13.86 5.65194174757282\n",
"24 5.94 5.65194174757282\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" inv_avg ( Mean ) as ( SELECT AVG(Total) FROM invoices )\n",
"\n",
"SELECT\n",
" InvoiceID, Total, Mean\n",
"FROM\n",
" invoices JOIN inv_avg\n",
"WHERE\n",
" Total > Mean\n",
"LIMIT\n",
" 10\n"
]
},
{
"cell_type": "markdown",
"id": "e15edb5c-e128-4c5a-b621-a4299c4105ab",
"metadata": {
"id": "e15edb5c-e128-4c5a-b621-a4299c4105ab"
},
"source": [
"#### Deconstruct"
]
},
{
"cell_type": "markdown",
"id": "d75be3a1-ce33-4847-a232-d351426bf7a6",
"metadata": {
"id": "d75be3a1-ce33-4847-a232-d351426bf7a6"
},
"source": [
"Remove the `WHERE`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f54206ae-49ed-49e9-8b8a-bd91f1b4960c",
"metadata": {
"id": "f54206ae-49ed-49e9-8b8a-bd91f1b4960c",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f66a9fb2-06fc-421a-db29-48ddbc70d5e1"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"InvoiceId Total Mean \n",
"--------- ----- ----------------\n",
"1 1.98 5.65194174757282\n",
"2 3.96 5.65194174757282\n",
"3 5.94 5.65194174757282\n",
"4 8.91 5.65194174757282\n",
"5 13.86 5.65194174757282\n",
"6 0.99 5.65194174757282\n",
"7 1.98 5.65194174757282\n",
"8 1.98 5.65194174757282\n",
"9 3.96 5.65194174757282\n",
"10 5.94 5.65194174757282\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" inv_avg ( Mean ) as ( SELECT AVG(Total) FROM invoices )\n",
"\n",
"SELECT\n",
" InvoiceID, Total, Mean\n",
"FROM\n",
" invoices JOIN inv_avg\n",
"LIMIT\n",
" 10\n"
]
},
{
"cell_type": "markdown",
"id": "c3d0cf08-a993-4c82-875e-bede1d2ce013",
"metadata": {
"id": "c3d0cf08-a993-4c82-875e-bede1d2ce013"
},
"source": [
"<br>\n",
"<br>\n",
"Select all the columns."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1b2aff18-2119-4160-8f95-8bf384b958c1",
"metadata": {
"id": "1b2aff18-2119-4160-8f95-8bf384b958c1",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "411791a8-67fc-4f3f-e0ed-1d800250e2f9"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total Mean \n",
"--------- ---------- ------------------- ----------------------- ----------- ------------ -------------- ----------------- ----- ----------------\n",
"1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart Germany 70174 1.98 5.65194174757282\n",
"2 4 2009-01-02 00:00:00 Ullevålsveien 14 Oslo Norway 0171 3.96 5.65194174757282\n",
"3 8 2009-01-03 00:00:00 Grétrystraat 63 Brussels Belgium 1000 5.94 5.65194174757282\n",
"4 14 2009-01-06 00:00:00 8210 111 ST NW Edmonton AB Canada T6G 2C7 8.91 5.65194174757282\n",
"5 23 2009-01-11 00:00:00 69 Salem Street Boston MA USA 2113 13.86 5.65194174757282\n",
"6 37 2009-01-19 00:00:00 Berger Straße 10 Frankfurt Germany 60316 0.99 5.65194174757282\n",
"7 38 2009-02-01 00:00:00 Barbarossastraße 19 Berlin Germany 10779 1.98 5.65194174757282\n",
"8 40 2009-02-01 00:00:00 8, Rue Hanovre Paris France 75002 1.98 5.65194174757282\n",
"9 42 2009-02-02 00:00:00 9, Place Louis Barthou Bordeaux France 33000 3.96 5.65194174757282\n",
"10 46 2009-02-03 00:00:00 3 Chatham Street Dublin Dublin Ireland 5.94 5.65194174757282\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" inv_avg ( Mean ) as ( SELECT AVG(Total) FROM invoices )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" invoices JOIN inv_avg\n",
"LIMIT\n",
" 10\n"
]
},
{
"cell_type": "markdown",
"id": "e24b5b2d-3023-49c8-94ca-c28b03ed7ed6",
"metadata": {
"id": "e24b5b2d-3023-49c8-94ca-c28b03ed7ed6"
},
"source": [
"<br>\n",
"<br>\n",
"Calculate the average, a scalar, using a CTE to create a temporary table. Notice that you can name/rename columns."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "wy_YCKH3PBqp",
"metadata": {
"id": "wy_YCKH3PBqp",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "49f34e2d-3a03-4a09-ec9c-dc9bb7c19487"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mean \n",
"----------------\n",
"5.65194174757282\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" inv_avg ( Mean ) as ( SELECT AVG(Total) FROM invoices )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" inv_avg\n"
]
},
{
"cell_type": "markdown",
"id": "6de0c143-686f-45a9-8830-a2ede0732068",
"metadata": {
"id": "6de0c143-686f-45a9-8830-a2ede0732068"
},
"source": [
"### Calculate descriptive statistics"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "191785a7-98fc-4149-8b67-0fed972d65d6",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "191785a7-98fc-4149-8b67-0fed972d65d6",
"outputId": "3a9c4c03-85f9-468b-82e7-510ed0372e32"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"n mean delta2sum variance stdev \n",
"--- ---------------- ---------------- ---------------- ---------------\n",
"412 5.65194174757282 9254.92224660193 22.5180589941653 4.7453196935681\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" vals as ( select Total as value from Invoices ),\n",
" -- calculate descriptive statistics\n",
" count_mean as ( select count(1) as n, AVG(value) as mean from vals ),\n",
" delta as ( select value, n, mean, value - mean as delta from vals, count_mean ),\n",
" delta2 as ( select power(delta, 2) as delta2 from delta ),\n",
" delta2sum as ( select sum(delta2) as delta2sum from delta2 ),\n",
" variance as ( select delta2sum/(n-1) as variance from delta2sum, count_mean ),\n",
" stdev as ( select power(variance, 0.5) as stdev from variance ),\n",
" desc_stats as ( select * from count_mean, delta2sum, variance, stdev )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" desc_stats\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "939e59c2-3c49-41d4-911d-7ae96a029f9b",
"metadata": {
"id": "939e59c2-3c49-41d4-911d-7ae96a029f9b"
},
"source": [
"<br>\n",
"<br>\n",
"Calculate for another column in another table."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "e7ac2404-d130-49b9-87d2-c7dc667b3826",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "e7ac2404-d130-49b9-87d2-c7dc667b3826",
"outputId": "42516504-c2c6-4e23-9d59-f4470ba76a98"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"n mean delta2sum variance stdev \n",
"---- ---------------- ---------------- ------------------ -----------------\n",
"3503 1.05080502426483 200.048529831573 0.0571240804773195 0.239006444426337\n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" vals as ( select UnitPrice as value from Tracks ),\n",
" -- calculate descriptive statistics\n",
" count_mean as ( select count(1) as n, AVG(value) as mean from vals ),\n",
" delta as ( select value, n, mean, value - mean as delta from vals, count_mean ),\n",
" delta2 as ( select power(delta, 2) as delta2 from delta ),\n",
" delta2sum as ( select sum(delta2) as delta2sum from delta2 ),\n",
" variance as ( select delta2sum/(n-1) as variance from delta2sum, count_mean ),\n",
" stdev as ( select power(variance, 0.5) as stdev from variance ),\n",
" desc_stats as ( select * from count_mean, delta2sum, variance, stdev )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" desc_stats\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "2-a1y54rmLlM",
"metadata": {
"id": "2-a1y54rmLlM"
},
"source": [
"### Rewrite a self join as CTE\n"
]
},
{
"cell_type": "markdown",
"id": "2CF8ahGvnC5-",
"metadata": {
"id": "2CF8ahGvnC5-"
},
"source": [
"#### Self join\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "FV5LQmi-Upw1",
"metadata": {
"id": "FV5LQmi-Upw1",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4948603c-a272-44da-ca8e-e1f728ad6403"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Worker Supervisor \n",
"---------------- ----------------\n",
"Robert King Michael Mitchell\n",
"Laura Callahan Michael Mitchell\n",
"Jane Peacock Nancy Edwards \n",
"Margaret Park Nancy Edwards \n",
"Steve Johnson Nancy Edwards \n",
"Nancy Edwards Andrew Adams \n",
"Michael Mitchell Andrew Adams \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"SELECT\n",
" w.FirstName || \" \" || w.LastName as Worker,\n",
" s.FirstName || \" \" || s.LastName as Supervisor\n",
"FROM\n",
" Employees as w JOIN Employees as s ON w.ReportsTo = s.EmployeeID\n",
"ORDER BY\n",
" s.EmployeeID DESC,\n",
" w.EmployeeID ASC\n"
]
},
{
"cell_type": "markdown",
"id": "_DfgvqkdnGJY",
"metadata": {
"id": "_DfgvqkdnGJY"
},
"source": [
"#### Self join using a CTE\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "JrZSOlvIiy5M",
"metadata": {
"id": "JrZSOlvIiy5M",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "002cd39b-714e-4702-ca57-66d5e87b4640"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Worker Supervisor \n",
"---------------- ----------------\n",
"Robert King Michael Mitchell\n",
"Laura Callahan Michael Mitchell\n",
"Jane Peacock Nancy Edwards \n",
"Margaret Park Nancy Edwards \n",
"Steve Johnson Nancy Edwards \n",
"Nancy Edwards Andrew Adams \n",
"Michael Mitchell Andrew Adams \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" Emps as ( SELECT FirstName || \" \" || LastName as Name, EmployeeID, ReportsTo FROM Employees ),\n",
" Workers ( Worker, EmployeeID, SupervisorID ) as ( SELECT Name, EmployeeID, ReportsTo FROM Emps ),\n",
" Supervisors ( SupID, Supervisor) as ( SELECT EmployeeID, Name FROM Emps )\n",
"\n",
"SELECT\n",
" Worker, Supervisor\n",
"FROM\n",
" workers JOIN supervisors ON SupervisorID = SupID\n",
"ORDER BY\n",
" SupID DESC,\n",
" EmployeeID ASC\n"
]
},
{
"cell_type": "markdown",
"id": "aef28778-c463-4a65-9ab6-083312437ee9",
"metadata": {
"id": "aef28778-c463-4a65-9ab6-083312437ee9"
},
"source": [
"#### Deconstruct\n"
]
},
{
"cell_type": "markdown",
"id": "600c1e27-52fb-4c81-aa7b-b423f79b2bc8",
"metadata": {
"id": "600c1e27-52fb-4c81-aa7b-b423f79b2bc8"
},
"source": [
"Select all columns and look at the `JOIN`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "36752931-bff8-4620-981a-80baf88a5397",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "36752931-bff8-4620-981a-80baf88a5397",
"outputId": "41f55adf-aabd-497b-e892-58a5d481a453"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Worker EmployeeID SupervisorID SupID Supervisor \n",
"---------------- ---------- ------------ ----- ----------------\n",
"Robert King 7 6 6 Michael Mitchell\n",
"Laura Callahan 8 6 6 Michael Mitchell\n",
"Jane Peacock 3 2 2 Nancy Edwards \n",
"Margaret Park 4 2 2 Nancy Edwards \n",
"Steve Johnson 5 2 2 Nancy Edwards \n",
"Nancy Edwards 2 1 1 Andrew Adams \n",
"Michael Mitchell 6 1 1 Andrew Adams \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" Emps as ( SELECT FirstName || \" \" || LastName as Name, EmployeeID, ReportsTo FROM Employees ),\n",
" Workers ( Worker, EmployeeID, SupervisorID ) as ( SELECT Name, EmployeeID, ReportsTo FROM Emps ),\n",
" Supervisors ( SupID, Supervisor) as ( SELECT EmployeeID, Name FROM Emps )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" workers JOIN supervisors\n",
" ON SupervisorID = SupID\n",
"ORDER BY\n",
" SupID DESC,\n",
" EmployeeID ASC\n"
]
},
{
"cell_type": "markdown",
"id": "395a33e2-2db3-4c35-b548-760635aacdbf",
"metadata": {
"id": "395a33e2-2db3-4c35-b548-760635aacdbf"
},
"source": [
"\n",
"\n",
"<br>\n",
"<br>\n",
"Display Supervisors."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "C_d6kfmUoJla",
"metadata": {
"id": "C_d6kfmUoJla",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "346104ca-c21b-4152-d39d-e0c0b90a4503"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"SupID Supervisor \n",
"----- ----------------\n",
"1 Andrew Adams \n",
"2 Nancy Edwards \n",
"3 Jane Peacock \n",
"4 Margaret Park \n",
"5 Steve Johnson \n",
"6 Michael Mitchell\n",
"7 Robert King \n",
"8 Laura Callahan \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" Emps as ( SELECT FirstName || \" \" || LastName as Name, EmployeeID, ReportsTo FROM Employees ),\n",
" Workers ( Worker, EmployeeID, SupervisorID ) as ( SELECT Name, EmployeeID, ReportsTo FROM Emps ),\n",
" Supervisors ( SupID, Supervisor) as ( SELECT EmployeeID, Name FROM Emps )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" supervisors\n"
]
},
{
"cell_type": "markdown",
"id": "2ed3de7a-c9bf-49ad-a699-e6f4491c1a51",
"metadata": {
"id": "2ed3de7a-c9bf-49ad-a699-e6f4491c1a51"
},
"source": [
"<br>\n",
"<br>\n",
"Display Workers."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "KCybDNX-oQnC",
"metadata": {
"id": "KCybDNX-oQnC",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4e5646e0-854f-46d5-d6ca-7b7c2c027810"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Worker EmployeeID SupervisorID\n",
"---------------- ---------- ------------\n",
"Andrew Adams 1 \n",
"Nancy Edwards 2 1 \n",
"Jane Peacock 3 2 \n",
"Margaret Park 4 2 \n",
"Steve Johnson 5 2 \n",
"Michael Mitchell 6 1 \n",
"Robert King 7 6 \n",
"Laura Callahan 8 6 \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" Emps as ( SELECT FirstName || \" \" || LastName as Name, EmployeeID, ReportsTo FROM Employees ),\n",
" Workers ( Worker, EmployeeID, SupervisorID ) as ( SELECT Name, EmployeeID, ReportsTo FROM Emps ),\n",
" Supervisors ( SupID, Supervisor) as ( SELECT EmployeeID, Name FROM Emps )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" workers\n"
]
},
{
"cell_type": "markdown",
"id": "a00e236f-dde3-4181-9452-168411151b52",
"metadata": {
"id": "a00e236f-dde3-4181-9452-168411151b52"
},
"source": [
"<br>\n",
"<br>\n",
"Display the Emps table."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "BHpcEgXnoVmR",
"metadata": {
"id": "BHpcEgXnoVmR",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "67d42cc9-6ae0-4228-a544-561c9f9f5ecd"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Name EmployeeID ReportsTo\n",
"---------------- ---------- ---------\n",
"Andrew Adams 1 \n",
"Nancy Edwards 2 1 \n",
"Jane Peacock 3 2 \n",
"Margaret Park 4 2 \n",
"Steve Johnson 5 2 \n",
"Michael Mitchell 6 1 \n",
"Robert King 7 6 \n",
"Laura Callahan 8 6 \n"
]
}
],
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" Emps as ( SELECT FirstName || \" \" || LastName as Name, EmployeeID, ReportsTo FROM Employees ),\n",
" Workers ( EmployeeID, Worker, SupervisorID ) as ( SELECT EmployeeID, Name, ReportsTo FROM Emps ),\n",
" Supervisors ( SupID, Supervisor) as ( SELECT EmployeeID, Name FROM Emps )\n",
"\n",
"SELECT\n",
" *\n",
"FROM\n",
" Emps"
]
},
{
"cell_type": "markdown",
"id": "AUZwAb_2mMOd",
"metadata": {
"id": "AUZwAb_2mMOd"
},
"source": [
"### Generate data\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "rcxokz-ineyb",
"metadata": {
"id": "rcxokz-ineyb"
},
"source": [
"A recursive CTE has two parts that are combined with a `UNION`:\n",
"- a terminal/base `SELECT`\n",
"- a self-referencing `SELECT`."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "-mBumGQzjjEI",
"metadata": {
"id": "-mBumGQzjjEI",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "004f32d5-2db1-4681-8a22-e4bff1a1104a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"number\n",
"------\n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"5 \n",
"6 \n",
"7 \n",
"8 \n",
"9 \n",
"10 \n"
]
}
],
"source": [
"%%script sqlite3 --header --column\n",
"WITH seq AS (\n",
" SELECT 1 AS number\n",
" UNION\n",
" SELECT number + 1 FROM seq WHERE number < 10 )\n",
"\n",
"SELECT\n",
" number\n",
"FROM\n",
" seq\n"
]
},
{
"cell_type": "markdown",
"source": [
"## Summary\n",
"\n"
],
"metadata": {
"id": "YlRSO-YzqGD7"
},
"id": "YlRSO-YzqGD7"
},
{
"cell_type": "markdown",
"source": [
"- alias tables\n",
"- temporary view\n",
"- subquery\n",
"- self join\n",
"- generate data\n",
"- deconstruct CTE\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "VFsx_aADuEPn"
},
"id": "VFsx_aADuEPn"
},
{
"cell_type": "markdown",
"id": "B1qf5klnqdTA",
"metadata": {
"id": "B1qf5klnqdTA"
},
"source": [
"## References\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "3DV_wIQHsYaM",
"metadata": {
"id": "3DV_wIQHsYaM"
},
"source": [
"LinkedIn Learning:\n",
"- [ Intermediate SQL for Data Scientists ]( https://www.linkedin.com/learning/intermediate-sql-for-data-scientists/ )\n",
"- [Advanced SQL for Data Scientists]( https://www.linkedin.com/learning/advanced-sql-for-data-scientists-13972889/ )\n",
"- [SQL Tips and Tricks for Data Science]( https://www.linkedin.com/learning/sql-tips-and-tricks-for-data-science/ )\n",
"\n",
"O'Reilly Learning Library\n",
"- [SQL Cookbook, 2nd Edition]( https://learning.oreilly.com/library/view/sql-cookbook-2nd/9781492077435/ )\n",
"- [Joe Celko's Trees and Hierarchies in SQL for Smarties, 2nd Edition]( https://learning.oreilly.com/library/view/joe-celkos-trees/9780123877338/ )\n",
"- [Joe Celko's SQL for Smarties, 4th Edition]( https://learning.oreilly.com/library/view/joe-celkos-sql/9780123820228/ )\n",
"\n"
]
},
{
"cell_type": "markdown",
"source": [
"## Link to this notebook\n"
],
"metadata": {
"id": "SnczyToCtRJN"
},
"id": "SnczyToCtRJN"
},
{
"cell_type": "markdown",
"source": [
"![cte.qrcode.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJ8AAACfAQMAAADd1ZmjAAAABlBMVEUAAAD///+l2Z/dAAAAAnRSTlP//8i138cAAAAJcEhZcwAACxIAAAsSAdLdfvwAAAJcSURBVEiJ5Za9jeMwEIVHYMDMaoAA22DGlqgGJLkBuSVmbIOAGhAzBQLn3izsxQYXaO6iwwmGYX82Bpyf94bEv3nof4EH0ZL9i80aiKIj8irY+FqYEtHDXtOJz0qY3WyvFP1h63heyeph5FYQdu/B/QFM1jRGOrySHkpGphXfuI78M81bEPWcivt+fRf5JsRzWH4Wf5DpOM+nxXdhvgbUBK20Fe+TEnbaX9k9COehOfrt60gKaDEGdSz7GupS6iPqIBJp2WxkDuuPaPjUwR7qVNBHmrLv8pVV8AiS1BrMZj1OlawOop4jRGQdJmE83XeRb0Ik8iwXIR2Jf70n5DbkUodCQ6YUayI3Zh1s2TcMIQ524g8Iq4TlWorv1pF1Q8E4sQpyNk+J6eAAY66fKt2GSAcOEMW+FvnsVbCTR0NfLIO0BWSng5whgYp0pnPvkEPWQXSzsW8n3nmTYdZBySgjGjISHx5OVkF0c7VfHOYZ6yPooKg47D066BcialkHIdsDIkIpGMOAReBVsMv0miOAwAfe/nkfHnGHBmdLc8D28WvUwXaiDrxFOGd92P1VWAczQnGP5iC0A7/qYCeEMthZC2YpGpiYEu5QMUVKFl4kg62CzA4DIA521iFfc9BBBOl259OsEfZVkxKingOuHAEz7MYiu1sF8bSMy4ZISbRsdVC2tkBoGTPpUtBB3BlmK/uu8S7LyyphdrIuoWIkBRW/3VsBp4xqYOthAxouaphEyNhcNGaaP0e6C5FRhGnDOc2rXNPnEnUTyk31hBCw+PwzX+nTjpvwL2/U/zD8BWRWdjWPhsjZAAAAAElFTkSuQmCC)\n"
],
"metadata": {
"id": "wqaoYbJFuQf0"
},
"id": "wqaoYbJFuQf0"
},
{
"cell_type": "code",
"source": [
"%%script sqlite3 --header --column chinook.db\n",
"WITH\n",
" vals as ( select UnitPrice as value from Tracks ),\n",
" -- calculate scalar descriptive statistics\n",
" n (n) as ( select count(value) from vals ),\n",
" sum (sum) as ( select sum(value) from vals ),\n",
" mean (mean) as ( select avg(value) from vals ),\n",
" min (min) as ( select min(value) from vals ),\n",
" max (max) as ( select max(value) from vals ),\n",
" one (one) as ( select 1 )\n",
"\n",
"-- select rowid, * from n, sum, mean, min, max, one\n",
"select rowid from Tracks order by rowid limit 10"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8VbZcA4rg6Yy",
"outputId": "8bac6b4b-f6af-4a8f-d58f-97bebf6ee4eb"
},
"id": "8VbZcA4rg6Yy",
"execution_count": 31,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"TrackId\n",
"-------\n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"5 \n",
"6 \n",
"7 \n",
"8 \n",
"9 \n",
"10 \n"
]
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "_pLbXMSXhtqb"
},
"id": "_pLbXMSXhtqb",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment