Skip to content

Instantly share code, notes, and snippets.

@tmthyjames
Created August 14, 2016 22:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmthyjames/1366b21d0efffb73f1a91361a25b9a55 to your computer and use it in GitHub Desktop.
Save tmthyjames/1366b21d0efffb73f1a91361a25b9a55 to your computer and use it in GitHub Desktop.
SQL Magic Function for Jupyter Notebook
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I love Jupyter Notebooks and the many ways you can customize your notebooks. I try to out-source any work I do in other apps to Jupyter simply because I like the interface, it's intuitive, easy to navigate, and a joy to use. One example of getting rid of an app that I can use Jupyter for is pgAdmin. I absolutely hate pgAdmin, a postgresql interface. The main function I use it for, like anyone else I presume, is retrieving data (not necessarily the admin tasks, which it's great for). But the main reason for using Jupyter for the primary pgAdmin tasks is because I don't like opening two apps when one will do. Plus I was bored.\n",
"\n",
"To start, I knew I needed to create a custom \"<a href=\"http://ipython.readthedocs.io/en/stable/\">magic function</a>\" in Jupyter. To do this, you need to import IPython's `register_line_cell_magic` decorator from its core library and wrap a function with it."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from IPython.core.magic import (register_line_cell_magic)\n",
"\n",
"@register_line_cell_magic\n",
"def magic_test(line, cell=None):\n",
" return line, cell"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, we now have access to the `line` argument, which is the text directly to the right of the magic function, and the `cell` argument, the text directly under it."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(u'hello, line.', u'hello, cell.')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%magic_test hello, line.\n",
"hello, cell."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Let's try a slightly more complicated example, using actual SQL. For this, we need to `import` SQLAlchemy's `create_engine` function to establish a connection to the database. See <a href\"http://docs.sqlalchemy.org/en/latest/core/engines.html\">SQLAlchemy's docs</a> for more information about engine configuration. You will also need <a href=\"http://pandas.pydata.org/\">pandas</a> in order to view the data in a nice table. Later, we will write our own function for viewing an HTML table in case you don't have pandas installed or don't want to install pandas."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sqlalchemy import create_engine\n",
"import pandas as pd\n",
"\n",
"# choose an existing databae you have access to\n",
"engine = create_engine('postgresql://tdobbins:tdobbins@localhost:5432/bls')"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(path, cell=None):\n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" df = pd.DataFrame([i for i in data])\n",
" if df.empty:\n",
" return 'No data available'\n",
"\n",
" df.columns = columns\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"select *\n",
"from la_unemployment \n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"BOOM! But this is FAR from perfect. Let's continue adding functionality to it so it becomes easier to use and more flexible. After all, we'd like to contribute this to open source and allow other people to use it! So what's missing? We need to allow our magic function to take arguments. But how? Using python's builtin `exec` function."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(args, cell=None):\n",
" \n",
" exec(args) # executes a string as python code\n",
" print PATH # notice that this gets printed and was the argument we passed\n",
" \n",
" return cell"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/tdobbins/Desktop/data.csv\n"
]
},
{
"data": {
"text/plain": [
"u'select *\\nfrom la_unemployment\\nlimit 10'"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql PATH='/Users/tdobbins/Desktop/data.csv'\n",
"select *\n",
"from la_unemployment\n",
"limit 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's build this into our example and put the argument practical use by writing the resulting query results to a CSV."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(args, cell=None):\n",
" \n",
" exec(args)\n",
" \n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" df = pd.DataFrame([i for i in data])\n",
" if df.empty:\n",
" return 'No data available'\n",
" \n",
" df.columns = columns\n",
" \n",
" df.to_csv(PATH)\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql PATH='/Users/tdobbins/Desktop/data.csv'\n",
"select * \n",
"from la_unemployment\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should now have data where ever you specified! Cool right? But we still have a long way to go to make this user-friendly. We need more functionality and ways to parse the arguments. And I'd like to not have to import libraries upon starting a new Jupyter notebook so that this is useful out of the box. Also, how about making the DataFrame available after the call? Let's start with making the DataFrame available after the query results are returned."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(path, cell=None):\n",
" \n",
" args = path.split(' ')\n",
" for i in args:\n",
" if i.startswith('MAKE_GLOBAL'):\n",
" glovar = i.split('=')\n",
" exec(glovar[0]+'='+glovar[1]+'=None')\n",
"\n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" table_data = [i for i in data]\n",
" df = pd.DataFrame(table_data)\n",
" \n",
" if df.empty:\n",
" return 'No data available'\n",
" \n",
" df.columns = columns\n",
"\n",
" if 'PATH' in locals():\n",
" df.to_csv(PATH)\n",
"\n",
" if 'MAKE_GLOBAL' in locals():\n",
" exec('global ' + glovar[1] + '\\n' + glovar[1] + '=df')\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql MAKE_GLOBAL=WHATEVER_VARIABLE_NAME_YOU_WANT\n",
"select *\n",
"from la_unemployment\n",
"limit 5"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"WHATEVER_VARIABLE_NAME_YOU_WANT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"COOOOOOOL! YOU CAN PASS IT ANY VARIABLE NAME (sorry, forgot about my caps), and it will pass the data to that variable!\n",
"\n",
"Another cool feature would be to allow the user to switch databases without reconfiguring the engine. Let's try it."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(path, cell=None):\n",
" \n",
" args = path.split(' ')\n",
" for i in args:\n",
" if i.startswith('MAKE_GLOBAL'):\n",
" glovar = i.split('=')\n",
" exec(glovar[0]+'='+glovar[1]+'=None')\n",
" elif i.startswith('DB'):\n",
" from sqlalchemy import create_engine\n",
" db = i.replace('DB=', '')\n",
" exec(\"global engine\\nengine=engine = create_engine('postgresql://tdobbins:tdobbins@localhost:5432/\"+db+\"')\")\n",
" exec('global DB\\nDB=db')\n",
" else:\n",
" exec(i)\n",
"\n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" table_data = [i for i in data]\n",
" df = pd.DataFrame(table_data)\n",
" \n",
" if df.empty:\n",
" return 'No data available'\n",
" \n",
" df.columns = columns\n",
"\n",
" if 'PATH' in locals():\n",
" df.to_csv(PATH)\n",
"\n",
" if 'MAKE_GLOBAL' in locals():\n",
" exec('global ' + glovar[1] + '\\n' + glovar[1] + '=df')\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>dateof</th>\n",
" <th>team</th>\n",
" <th>opp</th>\n",
" <th>points</th>\n",
" <th>fg</th>\n",
" <th>fgatt</th>\n",
" <th>ft</th>\n",
" <th>ftatt</th>\n",
" <th>fg3</th>\n",
" <th>fg3att</th>\n",
" <th>offrebounds</th>\n",
" <th>defrebounds</th>\n",
" <th>totalrebounds</th>\n",
" <th>assists</th>\n",
" <th>blocks</th>\n",
" <th>fouls</th>\n",
" <th>steals</th>\n",
" <th>turnovers</th>\n",
" <th>side</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1369</td>\n",
" <td>2015-10-27</td>\n",
" <td>DET</td>\n",
" <td>ATL</td>\n",
" <td>106</td>\n",
" <td>37</td>\n",
" <td>96</td>\n",
" <td>20</td>\n",
" <td>26</td>\n",
" <td>12</td>\n",
" <td>29</td>\n",
" <td>23</td>\n",
" <td>36</td>\n",
" <td>59</td>\n",
" <td>23</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>5</td>\n",
" <td>15</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1370</td>\n",
" <td>2015-10-27</td>\n",
" <td>ATL</td>\n",
" <td>DET</td>\n",
" <td>94</td>\n",
" <td>37</td>\n",
" <td>82</td>\n",
" <td>12</td>\n",
" <td>15</td>\n",
" <td>8</td>\n",
" <td>27</td>\n",
" <td>7</td>\n",
" <td>33</td>\n",
" <td>40</td>\n",
" <td>22</td>\n",
" <td>4</td>\n",
" <td>25</td>\n",
" <td>9</td>\n",
" <td>15</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1371</td>\n",
" <td>2015-10-27</td>\n",
" <td>CLE</td>\n",
" <td>CHI</td>\n",
" <td>95</td>\n",
" <td>38</td>\n",
" <td>94</td>\n",
" <td>10</td>\n",
" <td>17</td>\n",
" <td>9</td>\n",
" <td>29</td>\n",
" <td>11</td>\n",
" <td>39</td>\n",
" <td>50</td>\n",
" <td>26</td>\n",
" <td>7</td>\n",
" <td>21</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1372</td>\n",
" <td>2015-10-27</td>\n",
" <td>CHI</td>\n",
" <td>CLE</td>\n",
" <td>97</td>\n",
" <td>37</td>\n",
" <td>87</td>\n",
" <td>16</td>\n",
" <td>23</td>\n",
" <td>7</td>\n",
" <td>19</td>\n",
" <td>7</td>\n",
" <td>40</td>\n",
" <td>47</td>\n",
" <td>13</td>\n",
" <td>10</td>\n",
" <td>22</td>\n",
" <td>6</td>\n",
" <td>13</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1373</td>\n",
" <td>2015-10-27</td>\n",
" <td>NO</td>\n",
" <td>GS</td>\n",
" <td>95</td>\n",
" <td>35</td>\n",
" <td>83</td>\n",
" <td>19</td>\n",
" <td>27</td>\n",
" <td>6</td>\n",
" <td>18</td>\n",
" <td>8</td>\n",
" <td>25</td>\n",
" <td>33</td>\n",
" <td>21</td>\n",
" <td>3</td>\n",
" <td>26</td>\n",
" <td>9</td>\n",
" <td>19</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id dateof team opp points fg fgatt ft ftatt fg3 fg3att \\\n",
"0 1369 2015-10-27 DET ATL 106 37 96 20 26 12 29 \n",
"1 1370 2015-10-27 ATL DET 94 37 82 12 15 8 27 \n",
"2 1371 2015-10-27 CLE CHI 95 38 94 10 17 9 29 \n",
"3 1372 2015-10-27 CHI CLE 97 37 87 16 23 7 19 \n",
"4 1373 2015-10-27 NO GS 95 35 83 19 27 6 18 \n",
"\n",
" offrebounds defrebounds totalrebounds assists blocks fouls steals \\\n",
"0 23 36 59 23 3 15 5 \n",
"1 7 33 40 22 4 25 9 \n",
"2 11 39 50 26 7 21 5 \n",
"3 7 40 47 13 10 22 6 \n",
"4 8 25 33 21 3 26 9 \n",
"\n",
" turnovers side \n",
"0 15 None \n",
"1 15 None \n",
"2 11 None \n",
"3 13 None \n",
"4 19 None "
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql DB=sports\n",
"select *\n",
"from nbateamstats\n",
"limit 5"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql DB=bls PATH='/Users/tdobbins/Desktop/bls.csv'\n",
"select *\n",
"from la_unemployment\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the last specified database name will maintain the connection. So you don't need to specify which database until you want to change databases."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M02</td>\n",
" <td>6.1</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M03</td>\n",
" <td>6.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M04</td>\n",
" <td>5.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M05</td>\n",
" <td>5.8</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000003 1976 M02 6.1 None\n",
"2 LASST470000000000003 1976 M03 6.0 None\n",
"3 LASST470000000000003 1976 M04 5.9 None\n",
"4 LASST470000000000003 1976 M05 5.8 None"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql \n",
"select *\n",
"from la_unemployment\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And just like that we can switch databases. But I think we can make this even more user friendly. Let's add SQLAlchemy parameters and a way to parse the arguments."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@register_line_cell_magic\n",
"def sql(path, cell=None):\n",
"\n",
" args = path.split(' ')\n",
" for i in args:\n",
" if i.startswith('MAKE_GLOBAL'):\n",
" glovar = i.split('=')\n",
" exec(glovar[0]+'='+glovar[1]+'=None')\n",
" elif i.startswith('DB'):\n",
" from sqlalchemy import create_engine\n",
" db = i.replace('DB=', '')\n",
" exec(\"global engine\\nengine=create_engine('\"+driver+\"://\"+username+\":\"+password+\"@\"+host+\":\"+port+\"/\"+db+\"')\")\n",
" exec('global DB\\nDB=db')\n",
" else:\n",
" exec(i)\n",
" \n",
" if 'PARAMS' in locals():\n",
" for i in PARAMS.keys():\n",
" ref = '%('+i+')s'\n",
" if ref in cell:\n",
" cell = cell.replace(ref, (\"'\"+(str(eval(i)))+\"'\" if eval(i) != 'null' else 'null'))\n",
"\n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" table_data = [i for i in data] if 'pd' in globals() else [columns] + [i for i in data]\n",
" df = pd.DataFrame(table_data)\n",
" \n",
" if df.empty:\n",
" return 'No data available'\n",
" \n",
" df.columns = columns\n",
"\n",
" if 'PATH' in locals():\n",
" df.to_csv(PATH)\n",
"\n",
" if 'MAKE_GLOBAL' in locals():\n",
" exec('global ' + glovar[1] + '\\n' + glovar[1] + '=df')\n",
" \n",
" return df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This part is cool. I use SQLAlchemy named parameters all the time in production, but when I copy and paste my queries into pgAdmin, I have to go through my query and delete the SQLAlchemy placeholders and put static values in there, and when I switch back to my development environment, I have to delete the static values and use the named parameters again. Annoying!!!!!\n",
"\n",
"It shouldn't have to be like this."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# define your parameters here\n",
"year = '1976'\n",
"period = 'M01'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000004</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>111152.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000005</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>1691780.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000006</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>1802930.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LAUST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000004 1976 M01 111152.0 None\n",
"2 LASST470000000000005 1976 M01 1691780.0 None\n",
"3 LASST470000000000006 1976 M01 1802930.0 None\n",
"4 LAUST470000000000003 1976 M01 6.9 None"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql PARAMS={'year':year,'period':period}\n",
"select *\n",
"from la_unemployment\n",
"where year = %(year)s\n",
" and period = %(period)s\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Very userful! Let's finish this script. We'll add a way to display the HTML table for those who don't have pandas and a way to write to a CSV. This example contains a `class` that converts the data into an HTML table and a `class` method that writes the data to a CSV. If the pandas import fails, then we invoke the class."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from IPython.core.magic import (register_line_magic, register_cell_magic,\n",
" register_line_cell_magic)\n",
"\n",
"class HTMLTable(list):\n",
" \"\"\"\n",
" Creates an HTML table if pandas isn't installed.\n",
" The .empty attribute takes the place of df.empty,\n",
" and to_csv takes the place of df.to_csv.\n",
" \"\"\"\n",
" \n",
" empty = []\n",
" \n",
" def _repr_html_(self):\n",
" table = '<table width=100%>'\n",
" thead = '<thead><tr>'\n",
" tbody = '<tbody><tr>'\n",
" for n,row in enumerate(self):\n",
" if n == 0:\n",
" thead += ''.join([('<th>' + str(r) + '</th>') for r in row])\n",
" else:\n",
" tbody += '<tr>' + ''.join([('<td>' + str(r) + '</td>') for r in row]) + '</tr>'\n",
" thead += '</tr></thead>'\n",
" tbody += '</tbody>'\n",
" table += thead + tbody\n",
" return table\n",
"\n",
" def to_csv(self, path):\n",
" import csv\n",
" with open(path, 'w') as fp:\n",
" a = csv.writer(fp, delimiter=',')\n",
" a.writerows(self)\n",
" \n",
"try:\n",
" import pandas as pd\n",
" pd.options.display.max_columns = None\n",
" to_table = pd.DataFrame\n",
"except ImportError as e:\n",
" to_table = HTMLTable\n",
" \n",
"# default connection string info here\n",
"driver = 'postgresql'\n",
"username = 'tdobbins'\n",
"password = 'tdobbins'\n",
"host = 'localhost'\n",
"port = '5432'\n",
"\n",
"@register_line_cell_magic\n",
"def sql(path, cell=None):\n",
" \"\"\"\n",
" Create magic cell function to treat cell text as SQL\n",
" to remove the need of third party SQL interfaces. The \n",
" args are split on spaces so don't use spaces except to \n",
" input a new argument.\n",
" Args:\n",
" PATH (str): path to write dataframe to in csv.\n",
" PARAMS (dict): allows SQLAlchemy named parameters.\n",
" MAKE_GLOBAL: make dataframe available globally.\n",
" DB: name of database to connect to.\n",
" Returns:\n",
" DataFrame:\n",
" \"\"\"\n",
" args = path.split(' ')\n",
" for i in args:\n",
" if i.startswith('MAKE_GLOBAL'):\n",
" glovar = i.split('=')\n",
" exec(glovar[0]+'='+glovar[1]+'=None')\n",
" elif i.startswith('DB'):\n",
" from sqlalchemy import create_engine\n",
" db = i.replace('DB=', '')\n",
" exec(\"global engine\\nengine=create_engine('\"+driver+\"://\"+username+\":\"+password+\"@\"+host+\":\"+port+\"/\"+db+\"')\")\n",
" exec('global DB\\nDB=db')\n",
" else:\n",
" exec(i)\n",
" \n",
" if 'PARAMS' in locals():\n",
" for i in PARAMS.keys():\n",
" ref = '%('+i+')s'\n",
" if ref in cell:\n",
" cell = cell.replace(ref, (\"'\"+(str(eval(i)))+\"'\" if eval(i) != 'null' else 'null'))\n",
"\n",
" data = engine.execute(cell)\n",
" columns = data.keys()\n",
" table_data = [i for i in data] if 'pd' in globals() else [columns] + [i for i in data]\n",
" df = to_table(table_data)\n",
" \n",
" if df.empty:\n",
" return 'No data available'\n",
" \n",
" df.columns = columns\n",
"\n",
" if 'PATH' in locals():\n",
" df.to_csv(PATH)\n",
"\n",
" if 'MAKE_GLOBAL' in locals():\n",
" exec('global ' + glovar[1] + '\\n' + glovar[1] + '=df')\n",
" \n",
" return df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With pandas:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>series_id</th>\n",
" <th>year</th>\n",
" <th>period</th>\n",
" <th>value</th>\n",
" <th>footnote_codes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LASST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.2</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LASST470000000000004</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>111152.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LASST470000000000005</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>1691780.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LASST470000000000006</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>1802930.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LAUST470000000000003</td>\n",
" <td>1976</td>\n",
" <td>M01</td>\n",
" <td>6.9</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" series_id year period value footnote_codes\n",
"0 LASST470000000000003 1976 M01 6.2 None\n",
"1 LASST470000000000004 1976 M01 111152.0 None\n",
"2 LASST470000000000005 1976 M01 1691780.0 None\n",
"3 LASST470000000000006 1976 M01 1802930.0 None\n",
"4 LAUST470000000000003 1976 M01 6.9 None"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql PATH='/Users/tdobbins/Desktop/data.csv' PARAMS={'year':year,'period':period} MAKE_GLOBAL=DF DB=bls\n",
"select *\n",
"from la_unemployment\n",
"where year = %(year)s\n",
" and period = %(period)s\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without pandas:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<table width=100%><thead><tr><th>series_id</th><th>year</th><th>period</th><th>value</th><th>footnote_codes</th></tr></thead><tbody><tr><tr><td>LASST470000000000003</td><td>1976</td><td>M01</td><td>6.2</td><td>None</td></tr><tr><td>LASST470000000000004</td><td>1976</td><td>M01</td><td>111152.0</td><td>None</td></tr><tr><td>LASST470000000000005</td><td>1976</td><td>M01</td><td>1691780.0</td><td>None</td></tr><tr><td>LASST470000000000006</td><td>1976</td><td>M01</td><td>1802930.0</td><td>None</td></tr><tr><td>LAUST470000000000003</td><td>1976</td><td>M01</td><td>6.9</td><td>None</td></tr></tbody>"
],
"text/plain": [
"[[u'series_id', u'year', u'period', u'value', u'footnote_codes'],\n",
" (u'LASST470000000000003', 1976, u'M01', 6.2, None),\n",
" (u'LASST470000000000004', 1976, u'M01', 111152.0, None),\n",
" (u'LASST470000000000005', 1976, u'M01', 1691780.0, None),\n",
" (u'LASST470000000000006', 1976, u'M01', 1802930.0, None),\n",
" (u'LAUST470000000000003', 1976, u'M01', 6.9, None)]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql PATH='/Users/tdobbins/Desktop/data.csv' PARAMS={'year':year,'period':period} MAKE_GLOBAL=DF DB=bls\n",
"select *\n",
"from la_unemployment\n",
"where year = %(year)s\n",
" and period = %(period)s\n",
"limit 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you put this script in your ~/.ipython/profile_default/startup directory, it will load the magic function at the start of every Jupyter notebook."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment