Skip to content

Instantly share code, notes, and snippets.

@yhilpisch
Last active May 9, 2020 13:02
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save yhilpisch/2561a238e9d78744207a05ab1d110e18 to your computer and use it in GitHub Desktop.
Save yhilpisch/2561a238e9d78744207a05ab1d110e18 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='http://hilpisch.com/tpq_logo.png' width=\"300px\" align=\"right\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FPQ Bootcamp &mdash; Day 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Data Science**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving the Data Set"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import urllib.request"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# urllib.request.urlretrieve('http://hilpisch.com/Titanic.csv',\n",
"# filename='Titanic_from_web.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Data Set"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"01_bc_day_02.md \u001b[31mTitanic.csv\u001b[m\u001b[m*\r\n",
"02_data_science.ipynb \u001b[31mTitanic.txt\u001b[m\u001b[m*\r\n"
]
}
],
"source": [
"ls"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"f = open('Titanic.csv', 'r')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<_io.TextIOWrapper name='Titanic.csv' mode='r' encoding='UTF-8'>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"\",\"Class\",\"Sex\",\"Age\",\"Survived\",\"Freq\"\n",
"\"1\",\"1st\",\"Male\",\"Child\",\"No\",0\n",
"\"2\",\"2nd\",\"Male\",\"Child\",\"No\",0\n",
"\"3\",\"3rd\",\"Male\",\"Child\",\"No\",35\n",
"\"4\",\"Crew\",\"Male\",\"Child\",\"No\",0\n",
"\"5\",\"1st\",\"Female\",\"Child\",\"No\",0\n",
"\"6\",\"2nd\",\"Female\",\"Child\",\"No\",0\n",
"\"7\",\"3rd\",\"Female\",\"Child\",\"No\",17\n",
"\"8\",\"Crew\",\"Female\",\"Child\",\"No\",0\n",
"\"9\",\"1st\",\"Male\",\"Adult\",\"No\",118\n",
"\"10\",\"2nd\",\"Male\",\"Adult\",\"No\",154\n",
"\"11\",\"3rd\",\"Male\",\"Adult\",\"No\",387\n",
"\"12\",\"Crew\",\"Male\",\"Adult\",\"No\",670\n",
"\"13\",\"1st\",\"Female\",\"Adult\",\"No\",4\n",
"\"14\",\"2nd\",\"Female\",\"Adult\",\"No\",13\n",
"\"15\",\"3rd\",\"Female\",\"Adult\",\"No\",89\n",
"\"16\",\"Crew\",\"Female\",\"Adult\",\"No\",3\n",
"\"17\",\"1st\",\"Male\",\"Child\",\"Yes\",5\n",
"\"18\",\"2nd\",\"Male\",\"Child\",\"Yes\",11\n",
"\"19\",\"3rd\",\"Male\",\"Child\",\"Yes\",13\n",
"\"20\",\"Crew\",\"Male\",\"Child\",\"Yes\",0\n",
"\"21\",\"1st\",\"Female\",\"Child\",\"Yes\",1\n",
"\"22\",\"2nd\",\"Female\",\"Child\",\"Yes\",13\n",
"\"23\",\"3rd\",\"Female\",\"Child\",\"Yes\",14\n",
"\"24\",\"Crew\",\"Female\",\"Child\",\"Yes\",0\n",
"\"25\",\"1st\",\"Male\",\"Adult\",\"Yes\",57\n",
"\"26\",\"2nd\",\"Male\",\"Adult\",\"Yes\",14\n",
"\"27\",\"3rd\",\"Male\",\"Adult\",\"Yes\",75\n",
"\"28\",\"Crew\",\"Male\",\"Adult\",\"Yes\",192\n",
"\"29\",\"1st\",\"Female\",\"Adult\",\"Yes\",140\n",
"\"30\",\"2nd\",\"Female\",\"Adult\",\"Yes\",80\n",
"\"31\",\"3rd\",\"Female\",\"Adult\",\"Yes\",76\n",
"\"32\",\"Crew\",\"Female\",\"Adult\",\"Yes\",20\n"
]
}
],
"source": [
"for line in f:\n",
" print(line, end='')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"\",\"Class\",\"Sex\",\"Age\",\"Survived\",\"Freq\"\n",
"\"1\",\"1st\",\"Male\",\"Child\",\"No\",0\n",
"\"2\",\"2nd\",\"Male\",\"Child\",\"No\",0\n",
"\"3\",\"3rd\",\"Male\",\"Child\",\"No\",35\n",
"\"4\",\"Crew\",\"Male\",\"Child\",\"No\",0\n",
"\"5\",\"1st\",\"Female\",\"Child\",\"No\",0\n",
"\"6\",\"2nd\",\"Female\",\"Child\",\"No\",0\n",
"\"7\",\"3rd\",\"Female\",\"Child\",\"No\",17\n",
"\"8\",\"Crew\",\"Female\",\"Child\",\"No\",0\n",
"\"9\",\"1st\",\"Male\",\"Adult\",\"No\",118\n",
"\"10\",\"2nd\",\"Male\",\"Adult\",\"No\",154\n",
"\"11\",\"3rd\",\"Male\",\"Adult\",\"No\",387\n",
"\"12\",\"Crew\",\"Male\",\"Adult\",\"No\",670\n",
"\"13\",\"1st\",\"Female\",\"Adult\",\"No\",4\n",
"\"14\",\"2nd\",\"Female\",\"Adult\",\"No\",13\n",
"\"15\",\"3rd\",\"Female\",\"Adult\",\"No\",89\n",
"\"16\",\"Crew\",\"Female\",\"Adult\",\"No\",3\n",
"\"17\",\"1st\",\"Male\",\"Child\",\"Yes\",5\n",
"\"18\",\"2nd\",\"Male\",\"Child\",\"Yes\",11\n",
"\"19\",\"3rd\",\"Male\",\"Child\",\"Yes\",13\n",
"\"20\",\"Crew\",\"Male\",\"Child\",\"Yes\",0\n",
"\"21\",\"1st\",\"Female\",\"Child\",\"Yes\",1\n",
"\"22\",\"2nd\",\"Female\",\"Child\",\"Yes\",13\n",
"\"23\",\"3rd\",\"Female\",\"Child\",\"Yes\",14\n",
"\"24\",\"Crew\",\"Female\",\"Child\",\"Yes\",0\n",
"\"25\",\"1st\",\"Male\",\"Adult\",\"Yes\",57\n",
"\"26\",\"2nd\",\"Male\",\"Adult\",\"Yes\",14\n",
"\"27\",\"3rd\",\"Male\",\"Adult\",\"Yes\",75\n",
"\"28\",\"Crew\",\"Male\",\"Adult\",\"Yes\",192\n",
"\"29\",\"1st\",\"Female\",\"Adult\",\"Yes\",140\n",
"\"30\",\"2nd\",\"Female\",\"Adult\",\"Yes\",80\n",
"\"31\",\"3rd\",\"Female\",\"Adult\",\"Yes\",76\n",
"\"32\",\"Crew\",\"Female\",\"Adult\",\"Yes\",20\n"
]
}
],
"source": [
"with open('Titanic.csv', 'r') as f:\n",
" for line in f:\n",
" print(line, end='')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"with open('Titanic.csv', 'r') as f:\n",
" raw = [line for line in f]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['\"\",\"Class\",\"Sex\",\"Age\",\"Survived\",\"Freq\"\\n',\n",
" '\"1\",\"1st\",\"Male\",\"Child\",\"No\",0\\n',\n",
" '\"2\",\"2nd\",\"Male\",\"Child\",\"No\",0\\n',\n",
" '\"3\",\"3rd\",\"Male\",\"Child\",\"No\",35\\n',\n",
" '\"4\",\"Crew\",\"Male\",\"Child\",\"No\",0\\n',\n",
" '\"5\",\"1st\",\"Female\",\"Child\",\"No\",0\\n',\n",
" '\"6\",\"2nd\",\"Female\",\"Child\",\"No\",0\\n',\n",
" '\"7\",\"3rd\",\"Female\",\"Child\",\"No\",17\\n',\n",
" '\"8\",\"Crew\",\"Female\",\"Child\",\"No\",0\\n',\n",
" '\"9\",\"1st\",\"Male\",\"Adult\",\"No\",118\\n',\n",
" '\"10\",\"2nd\",\"Male\",\"Adult\",\"No\",154\\n',\n",
" '\"11\",\"3rd\",\"Male\",\"Adult\",\"No\",387\\n',\n",
" '\"12\",\"Crew\",\"Male\",\"Adult\",\"No\",670\\n',\n",
" '\"13\",\"1st\",\"Female\",\"Adult\",\"No\",4\\n',\n",
" '\"14\",\"2nd\",\"Female\",\"Adult\",\"No\",13\\n',\n",
" '\"15\",\"3rd\",\"Female\",\"Adult\",\"No\",89\\n',\n",
" '\"16\",\"Crew\",\"Female\",\"Adult\",\"No\",3\\n',\n",
" '\"17\",\"1st\",\"Male\",\"Child\",\"Yes\",5\\n',\n",
" '\"18\",\"2nd\",\"Male\",\"Child\",\"Yes\",11\\n',\n",
" '\"19\",\"3rd\",\"Male\",\"Child\",\"Yes\",13\\n',\n",
" '\"20\",\"Crew\",\"Male\",\"Child\",\"Yes\",0\\n',\n",
" '\"21\",\"1st\",\"Female\",\"Child\",\"Yes\",1\\n',\n",
" '\"22\",\"2nd\",\"Female\",\"Child\",\"Yes\",13\\n',\n",
" '\"23\",\"3rd\",\"Female\",\"Child\",\"Yes\",14\\n',\n",
" '\"24\",\"Crew\",\"Female\",\"Child\",\"Yes\",0\\n',\n",
" '\"25\",\"1st\",\"Male\",\"Adult\",\"Yes\",57\\n',\n",
" '\"26\",\"2nd\",\"Male\",\"Adult\",\"Yes\",14\\n',\n",
" '\"27\",\"3rd\",\"Male\",\"Adult\",\"Yes\",75\\n',\n",
" '\"28\",\"Crew\",\"Male\",\"Adult\",\"Yes\",192\\n',\n",
" '\"29\",\"1st\",\"Female\",\"Adult\",\"Yes\",140\\n',\n",
" '\"30\",\"2nd\",\"Female\",\"Adult\",\"Yes\",80\\n',\n",
" '\"31\",\"3rd\",\"Female\",\"Adult\",\"Yes\",76\\n',\n",
" '\"32\",\"Crew\",\"Female\",\"Adult\",\"Yes\",20\\n']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['32', 'Crew', 'Female', 'Adult', 'Yes', '20']"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw[-1].replace('\"', '')[:-1].split(',')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"data = [row.replace('\"', '')[:-1].split(',') for row in raw]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['', 'Class', 'Sex', 'Age', 'Survived', 'Freq'],\n",
" ['1', '1st', 'Male', 'Child', 'No', '0'],\n",
" ['2', '2nd', 'Male', 'Child', 'No', '0'],\n",
" ['3', '3rd', 'Male', 'Child', 'No', '35'],\n",
" ['4', 'Crew', 'Male', 'Child', 'No', '0'],\n",
" ['5', '1st', 'Female', 'Child', 'No', '0'],\n",
" ['6', '2nd', 'Female', 'Child', 'No', '0'],\n",
" ['7', '3rd', 'Female', 'Child', 'No', '17'],\n",
" ['8', 'Crew', 'Female', 'Child', 'No', '0'],\n",
" ['9', '1st', 'Male', 'Adult', 'No', '118'],\n",
" ['10', '2nd', 'Male', 'Adult', 'No', '154'],\n",
" ['11', '3rd', 'Male', 'Adult', 'No', '387'],\n",
" ['12', 'Crew', 'Male', 'Adult', 'No', '670'],\n",
" ['13', '1st', 'Female', 'Adult', 'No', '4'],\n",
" ['14', '2nd', 'Female', 'Adult', 'No', '13'],\n",
" ['15', '3rd', 'Female', 'Adult', 'No', '89'],\n",
" ['16', 'Crew', 'Female', 'Adult', 'No', '3'],\n",
" ['17', '1st', 'Male', 'Child', 'Yes', '5'],\n",
" ['18', '2nd', 'Male', 'Child', 'Yes', '11'],\n",
" ['19', '3rd', 'Male', 'Child', 'Yes', '13'],\n",
" ['20', 'Crew', 'Male', 'Child', 'Yes', '0'],\n",
" ['21', '1st', 'Female', 'Child', 'Yes', '1'],\n",
" ['22', '2nd', 'Female', 'Child', 'Yes', '13'],\n",
" ['23', '3rd', 'Female', 'Child', 'Yes', '14'],\n",
" ['24', 'Crew', 'Female', 'Child', 'Yes', '0'],\n",
" ['25', '1st', 'Male', 'Adult', 'Yes', '57'],\n",
" ['26', '2nd', 'Male', 'Adult', 'Yes', '14'],\n",
" ['27', '3rd', 'Male', 'Adult', 'Yes', '75'],\n",
" ['28', 'Crew', 'Male', 'Adult', 'Yes', '192'],\n",
" ['29', '1st', 'Female', 'Adult', 'Yes', '140'],\n",
" ['30', '2nd', 'Female', 'Adult', 'Yes', '80'],\n",
" ['31', '3rd', 'Female', 'Adult', 'Yes', '76'],\n",
" ['32', 'Crew', 'Female', 'Adult', 'Yes', '20']]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2201"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row[-1]) for row in data[1:]])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1731"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row[-1]) for row in data[1:] if row[2] == 'Male'])"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"470"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row[-1]) for row in data[1:] if row[2] == 'Female'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the csv module"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### reader"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"csv_reader = csv.reader(open('Titanic.csv'))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['', 'Class', 'Sex', 'Age', 'Survived', 'Freq']\n",
"['1', '1st', 'Male', 'Child', 'No', '0']\n",
"['2', '2nd', 'Male', 'Child', 'No', '0']\n",
"['3', '3rd', 'Male', 'Child', 'No', '35']\n",
"['4', 'Crew', 'Male', 'Child', 'No', '0']\n",
"['5', '1st', 'Female', 'Child', 'No', '0']\n",
"['6', '2nd', 'Female', 'Child', 'No', '0']\n",
"['7', '3rd', 'Female', 'Child', 'No', '17']\n",
"['8', 'Crew', 'Female', 'Child', 'No', '0']\n",
"['9', '1st', 'Male', 'Adult', 'No', '118']\n",
"['10', '2nd', 'Male', 'Adult', 'No', '154']\n",
"['11', '3rd', 'Male', 'Adult', 'No', '387']\n",
"['12', 'Crew', 'Male', 'Adult', 'No', '670']\n",
"['13', '1st', 'Female', 'Adult', 'No', '4']\n",
"['14', '2nd', 'Female', 'Adult', 'No', '13']\n",
"['15', '3rd', 'Female', 'Adult', 'No', '89']\n",
"['16', 'Crew', 'Female', 'Adult', 'No', '3']\n",
"['17', '1st', 'Male', 'Child', 'Yes', '5']\n",
"['18', '2nd', 'Male', 'Child', 'Yes', '11']\n",
"['19', '3rd', 'Male', 'Child', 'Yes', '13']\n",
"['20', 'Crew', 'Male', 'Child', 'Yes', '0']\n",
"['21', '1st', 'Female', 'Child', 'Yes', '1']\n",
"['22', '2nd', 'Female', 'Child', 'Yes', '13']\n",
"['23', '3rd', 'Female', 'Child', 'Yes', '14']\n",
"['24', 'Crew', 'Female', 'Child', 'Yes', '0']\n",
"['25', '1st', 'Male', 'Adult', 'Yes', '57']\n",
"['26', '2nd', 'Male', 'Adult', 'Yes', '14']\n",
"['27', '3rd', 'Male', 'Adult', 'Yes', '75']\n",
"['28', 'Crew', 'Male', 'Adult', 'Yes', '192']\n",
"['29', '1st', 'Female', 'Adult', 'Yes', '140']\n",
"['30', '2nd', 'Female', 'Adult', 'Yes', '80']\n",
"['31', '3rd', 'Female', 'Adult', 'Yes', '76']\n",
"['32', 'Crew', 'Female', 'Adult', 'Yes', '20']\n"
]
}
],
"source": [
"for line in csv_reader:\n",
" print(line)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"csv_reader = csv.reader(open('Titanic.csv'))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"data = [row for row in csv_reader]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[['', 'Class', 'Sex', 'Age', 'Survived', 'Freq'],\n",
" ['1', '1st', 'Male', 'Child', 'No', '0'],\n",
" ['2', '2nd', 'Male', 'Child', 'No', '0'],\n",
" ['3', '3rd', 'Male', 'Child', 'No', '35'],\n",
" ['4', 'Crew', 'Male', 'Child', 'No', '0']]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[:5]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"470"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row[-1]) for row in data[1:] if row[2] == 'Female'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DictReader"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"csv_reader = csv.DictReader(open('Titanic.csv'))"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"data = [row for row in csv_reader]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[OrderedDict([('', '1'),\n",
" ('Class', '1st'),\n",
" ('Sex', 'Male'),\n",
" ('Age', 'Child'),\n",
" ('Survived', 'No'),\n",
" ('Freq', '0')]),\n",
" OrderedDict([('', '2'),\n",
" ('Class', '2nd'),\n",
" ('Sex', 'Male'),\n",
" ('Age', 'Child'),\n",
" ('Survived', 'No'),\n",
" ('Freq', '0')])]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[:2]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2201"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row['Freq']) for row in data])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1731"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row['Freq']) for row in data if row['Sex'] == 'Male'])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"470"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([int(row['Freq']) for row in data if row['Sex'] == 'Female'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Record Arrays"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(15)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"rn = np.random.random(15)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.16191317, 0.52646587, 0.34916601, 0.1670475 , 0.43703538,\n",
" 0.52421733, 0.19033995, 0.92102025, 0.24972195, 0.88369696,\n",
" 0.18681813, 0.31057942, 0.13754434, 0.31872844, 0.6372609 ])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rn"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('int64')"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.dtype"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rn.dtype"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"s = np.array(data)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ OrderedDict([('', '1'), ('Class', '1st'), ('Sex', 'Male'), ('Age', 'Child'), ('Survived', 'No'), ('Freq', '0')]),\n",
" OrderedDict([('', '2'), ('Class', '2nd'), ('Sex', 'Male'), ('Age', 'Child'), ('Survived', 'No'), ('Freq', '0')]),\n",
" OrderedDict([('', '3'), ('Class', '3rd'), ('Sex', 'Male'), ('Age', 'Child'), ('Survived', 'No'), ('Freq', '35')])], dtype=object)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[:3]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('O')"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.dtype"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"csv_reader = csv.reader(open('Titanic.csv'))"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"data = [tuple(row) for row in csv_reader]"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('', 'Class', 'Sex', 'Age', 'Survived', 'Freq'),\n",
" ('1', '1st', 'Male', 'Child', 'No', '0'),\n",
" ('2', '2nd', 'Male', 'Child', 'No', '0'),\n",
" ('3', '3rd', 'Male', 'Child', 'No', '35'),\n",
" ('4', 'Crew', 'Male', 'Child', 'No', '0')]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[:5]"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"dt = np.dtype({'names': data[0], 'formats': 'int O O O O int'.split()})"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype([('', '<i8'), ('Class', 'O'), ('Sex', 'O'), ('Age', 'O'), ('Survived', 'O'), ('Freq', '<i8')])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dt"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"r = np.array(data[1:], dtype=dt)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([( 1, '1st', 'Male', 'Child', 'No', 0),\n",
" ( 2, '2nd', 'Male', 'Child', 'No', 0),\n",
" ( 3, '3rd', 'Male', 'Child', 'No', 35),\n",
" ( 4, 'Crew', 'Male', 'Child', 'No', 0),\n",
" ( 5, '1st', 'Female', 'Child', 'No', 0),\n",
" ( 6, '2nd', 'Female', 'Child', 'No', 0),\n",
" ( 7, '3rd', 'Female', 'Child', 'No', 17),\n",
" ( 8, 'Crew', 'Female', 'Child', 'No', 0),\n",
" ( 9, '1st', 'Male', 'Adult', 'No', 118),\n",
" (10, '2nd', 'Male', 'Adult', 'No', 154),\n",
" (11, '3rd', 'Male', 'Adult', 'No', 387),\n",
" (12, 'Crew', 'Male', 'Adult', 'No', 670),\n",
" (13, '1st', 'Female', 'Adult', 'No', 4),\n",
" (14, '2nd', 'Female', 'Adult', 'No', 13),\n",
" (15, '3rd', 'Female', 'Adult', 'No', 89),\n",
" (16, 'Crew', 'Female', 'Adult', 'No', 3),\n",
" (17, '1st', 'Male', 'Child', 'Yes', 5),\n",
" (18, '2nd', 'Male', 'Child', 'Yes', 11),\n",
" (19, '3rd', 'Male', 'Child', 'Yes', 13),\n",
" (20, 'Crew', 'Male', 'Child', 'Yes', 0),\n",
" (21, '1st', 'Female', 'Child', 'Yes', 1),\n",
" (22, '2nd', 'Female', 'Child', 'Yes', 13),\n",
" (23, '3rd', 'Female', 'Child', 'Yes', 14),\n",
" (24, 'Crew', 'Female', 'Child', 'Yes', 0),\n",
" (25, '1st', 'Male', 'Adult', 'Yes', 57),\n",
" (26, '2nd', 'Male', 'Adult', 'Yes', 14),\n",
" (27, '3rd', 'Male', 'Adult', 'Yes', 75),\n",
" (28, 'Crew', 'Male', 'Adult', 'Yes', 192),\n",
" (29, '1st', 'Female', 'Adult', 'Yes', 140),\n",
" (30, '2nd', 'Female', 'Adult', 'Yes', 80),\n",
" (31, '3rd', 'Female', 'Adult', 'Yes', 76),\n",
" (32, 'Crew', 'Female', 'Adult', 'Yes', 20)],\n",
" dtype=[('', '<i8'), ('Class', 'O'), ('Sex', 'O'), ('Age', 'O'), ('Survived', 'O'), ('Freq', '<i8')])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4,\n",
" 13, 89, 3, 5, 11, 13, 0, 1, 13, 14, 0, 57, 14,\n",
" 75, 192, 140, 80, 76, 20])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r['Freq']"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2201"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r['Freq'].sum()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ True, True, True, True, False, False, False, False, True,\n",
" True, True, True, False, False, False, False, True, True,\n",
" True, True, False, False, False, False, True, True, True,\n",
" True, False, False, False, False], dtype=bool)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r['Sex'] == 'Male'"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([( 1, '1st', 'Male', 'Child', 'No', 0),\n",
" ( 2, '2nd', 'Male', 'Child', 'No', 0),\n",
" ( 3, '3rd', 'Male', 'Child', 'No', 35),\n",
" ( 4, 'Crew', 'Male', 'Child', 'No', 0),\n",
" ( 9, '1st', 'Male', 'Adult', 'No', 118),\n",
" (10, '2nd', 'Male', 'Adult', 'No', 154),\n",
" (11, '3rd', 'Male', 'Adult', 'No', 387),\n",
" (12, 'Crew', 'Male', 'Adult', 'No', 670),\n",
" (17, '1st', 'Male', 'Child', 'Yes', 5),\n",
" (18, '2nd', 'Male', 'Child', 'Yes', 11),\n",
" (19, '3rd', 'Male', 'Child', 'Yes', 13),\n",
" (20, 'Crew', 'Male', 'Child', 'Yes', 0),\n",
" (25, '1st', 'Male', 'Adult', 'Yes', 57),\n",
" (26, '2nd', 'Male', 'Adult', 'Yes', 14),\n",
" (27, '3rd', 'Male', 'Adult', 'Yes', 75),\n",
" (28, 'Crew', 'Male', 'Adult', 'Yes', 192)],\n",
" dtype=[('', '<i8'), ('Class', 'O'), ('Sex', 'O'), ('Age', 'O'), ('Survived', 'O'), ('Freq', '<i8')])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r[r['Sex'] == 'Male']"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([(17, '1st', 'Male', 'Child', 'Yes', 5),\n",
" (18, '2nd', 'Male', 'Child', 'Yes', 11),\n",
" (19, '3rd', 'Male', 'Child', 'Yes', 13),\n",
" (20, 'Crew', 'Male', 'Child', 'Yes', 0),\n",
" (25, '1st', 'Male', 'Adult', 'Yes', 57),\n",
" (26, '2nd', 'Male', 'Adult', 'Yes', 14),\n",
" (27, '3rd', 'Male', 'Adult', 'Yes', 75),\n",
" (28, 'Crew', 'Male', 'Adult', 'Yes', 192)],\n",
" dtype=[('', '<i8'), ('Class', 'O'), ('Sex', 'O'), ('Age', 'O'), ('Survived', 'O'), ('Freq', '<i8')])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r[(r['Sex'] == 'Male') & (r['Survived'] == 'Yes')]"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1731"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r['Freq'][r['Sex'] == 'Male'].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Relational Databases"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"import sqlite3 as sq3"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"# adjust path to your taste\n",
"con = sq3.connect('/Users/yves/Documents/Temp/data/Titanic.sql')"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<sqlite3.Connection at 0x1041948f0>"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"con"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"q = con.execute"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<sqlite3.Cursor at 0x1042eedc0>"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('CREATE TABLE data (id int, class str, sex str, age str, survived str, freq int)')"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('table',\n",
" 'data',\n",
" 'data',\n",
" 2,\n",
" 'CREATE TABLE data (id int, class str, sex str, age str, survived str, freq int)')]"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT * FROM sqlite_master').fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['', 'Class', 'Sex', 'Age', 'Survived', 'Freq'],\n",
" ['1', '1st', 'Male', 'Child', 'No', '0'],\n",
" ['2', '2nd', 'Male', 'Child', 'No', '0']]"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_reader = csv.reader(open('Titanic.csv'))\n",
"raw = [row for row in csv_reader]\n",
"raw[:3]"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<sqlite3.Cursor at 0x105400810>"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"con.executemany('INSERT INTO data VALUES (?, ?, ?, ?, ?, ?)', raw[1:])"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1, '1st', 'Male', 'Child', 'No', 0),\n",
" (2, '2nd', 'Male', 'Child', 'No', 0),\n",
" (3, '3rd', 'Male', 'Child', 'No', 35),\n",
" (4, 'Crew', 'Male', 'Child', 'No', 0),\n",
" (5, '1st', 'Female', 'Child', 'No', 0),\n",
" (6, '2nd', 'Female', 'Child', 'No', 0),\n",
" (7, '3rd', 'Female', 'Child', 'No', 17),\n",
" (8, 'Crew', 'Female', 'Child', 'No', 0),\n",
" (9, '1st', 'Male', 'Adult', 'No', 118),\n",
" (10, '2nd', 'Male', 'Adult', 'No', 154),\n",
" (11, '3rd', 'Male', 'Adult', 'No', 387),\n",
" (12, 'Crew', 'Male', 'Adult', 'No', 670),\n",
" (13, '1st', 'Female', 'Adult', 'No', 4),\n",
" (14, '2nd', 'Female', 'Adult', 'No', 13),\n",
" (15, '3rd', 'Female', 'Adult', 'No', 89),\n",
" (16, 'Crew', 'Female', 'Adult', 'No', 3),\n",
" (17, '1st', 'Male', 'Child', 'Yes', 5),\n",
" (18, '2nd', 'Male', 'Child', 'Yes', 11),\n",
" (19, '3rd', 'Male', 'Child', 'Yes', 13),\n",
" (20, 'Crew', 'Male', 'Child', 'Yes', 0),\n",
" (21, '1st', 'Female', 'Child', 'Yes', 1),\n",
" (22, '2nd', 'Female', 'Child', 'Yes', 13),\n",
" (23, '3rd', 'Female', 'Child', 'Yes', 14),\n",
" (24, 'Crew', 'Female', 'Child', 'Yes', 0),\n",
" (25, '1st', 'Male', 'Adult', 'Yes', 57),\n",
" (26, '2nd', 'Male', 'Adult', 'Yes', 14),\n",
" (27, '3rd', 'Male', 'Adult', 'Yes', 75),\n",
" (28, 'Crew', 'Male', 'Adult', 'Yes', 192),\n",
" (29, '1st', 'Female', 'Adult', 'Yes', 140),\n",
" (30, '2nd', 'Female', 'Adult', 'Yes', 80),\n",
" (31, '3rd', 'Female', 'Adult', 'Yes', 76),\n",
" (32, 'Crew', 'Female', 'Adult', 'Yes', 20)]"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT * FROM data').fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1, '1st', 'Male', 'Child', 'No', 0),\n",
" (2, '2nd', 'Male', 'Child', 'No', 0),\n",
" (3, '3rd', 'Male', 'Child', 'No', 35),\n",
" (4, 'Crew', 'Male', 'Child', 'No', 0),\n",
" (5, '1st', 'Female', 'Child', 'No', 0)]"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT * FROM data').fetchmany(5)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(2201,)]"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT Sum(freq) FROM data').fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(68.78125,)]"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT Avg(freq) FROM data').fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1731,)]"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('SELECT Sum(freq) FROM data WHERE sex == \"Male\"').fetchall()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<sqlite3.Cursor at 0x1054008f0>"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q('DROP TABLE IF EXISTS data')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='http://hilpisch.com/tpq_logo.png' width=\"300px\" align=\"right\">"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment