Skip to content

Instantly share code, notes, and snippets.

@aok1425
Created September 11, 2019 22:34
Show Gist options
  • Save aok1425/f09ec6875f6d7fe0ab954d722981966e to your computer and use it in GitHub Desktop.
Save aok1425/f09ec6875f6d7fe0ab954d722981966e to your computer and use it in GitHub Desktop.
Example of how to query data using SQL in Python
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to use SQL on this dataset\n",
"\n",
"- If you're comfortable with SQL, we encourage you to use it to inspect your data.\n",
"- It is simpler to do some of these inspections using SQL rather than Pandas."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!pip install ipython-sql"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import sqlite3"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Making the connections"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"conn = sqlite3.connect('translation_experiment_data.db3')\n",
"c = conn.cursor()\n",
"conn.commit()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"u'Connected: @translation_experiment_data.db3'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql \n",
"sqlite:///translation_experiment_data.db3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example queries"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///translation_experiment_data.db3\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id_guest_anon</th>\n",
" <th>dim_treatment</th>\n",
" <th>timestamp_assigned_to_experiment</th>\n",
" <th>dim_country</th>\n",
" <th>dim_traffic_source_of_visitor</th>\n",
" </tr>\n",
" <tr>\n",
" <td>555399913</td>\n",
" <td>treatment</td>\n",
" <td>2018-10-17 20:26:11</td>\n",
" <td>US</td>\n",
" <td>Paid ads</td>\n",
" </tr>\n",
" <tr>\n",
" <td>175791602</td>\n",
" <td>control</td>\n",
" <td>2018-10-18 07:59:14</td>\n",
" <td>MX</td>\n",
" <td>SEO</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(u'555399913', u'treatment', u'2018-10-17 20:26:11', u'US', u'Paid ads'),\n",
" (u'175791602', u'control', u'2018-10-18 07:59:14', u'MX', u'SEO')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"select *\n",
"from guests\n",
"limit 2"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///translation_experiment_data.db3\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id_guest_anon</th>\n",
" <th>timestamp_of_booking</th>\n",
" <th>bookings</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1358668028</td>\n",
" <td>2018-11-02 13:59:27</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1358668028</td>\n",
" <td>2018-12-04 18:24:51</td>\n",
" <td>1</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(u'1358668028', u'2018-11-02 13:59:27', 1),\n",
" (u'1358668028', u'2018-12-04 18:24:51', 1)]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"select *\n",
"from bookings\n",
"limit 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How to put query results into Pandas DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///translation_experiment_data.db3\n",
"Done.\n",
"Returning data to local variable df_guests_sql_result\n"
]
}
],
"source": [
"%%sql df_guests_sql_result <<\n",
"\n",
"select *\n",
"from guests\n",
"limit 2"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id_guest_anon</th>\n",
" <th>dim_treatment</th>\n",
" <th>timestamp_assigned_to_experiment</th>\n",
" <th>dim_country</th>\n",
" <th>dim_traffic_source_of_visitor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>555399913</td>\n",
" <td>treatment</td>\n",
" <td>2018-10-17 20:26:11</td>\n",
" <td>US</td>\n",
" <td>Paid ads</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>175791602</td>\n",
" <td>control</td>\n",
" <td>2018-10-18 07:59:14</td>\n",
" <td>MX</td>\n",
" <td>SEO</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id_guest_anon dim_treatment timestamp_assigned_to_experiment dim_country \\\n",
"0 555399913 treatment 2018-10-17 20:26:11 US \n",
"1 175791602 control 2018-10-18 07:59:14 MX \n",
"\n",
" dim_traffic_source_of_visitor \n",
"0 Paid ads \n",
"1 SEO "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_guests = df_guests_sql_result.DataFrame()\n",
"print(type(df_guests))\n",
"df_guests.head()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.15"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"navigate_num": "#000000",
"navigate_text": "#333333",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700",
"sidebar_border": "#EEEEEE",
"wrapper_background": "#FFFFFF"
},
"moveMenuLeft": false,
"navigate_menu": false,
"number_sections": false,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": true,
"widenNotebook": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment