Skip to content

Instantly share code, notes, and snippets.

@cthoyt
Last active January 18, 2016 13:59
Show Gist options
  • Save cthoyt/bc262c07df6da9e6e345 to your computer and use it in GitHub Desktop.
Save cthoyt/bc262c07df6da9e6e345 to your computer and use it in GitHub Desktop.
Biological Databases Practical 2016 Day 2
/* How many countries exist */
select count(*) as num_countries
from country;
/* Select the first 10 countries and order them by name */
select *
from country
order by country.Name
limit 10;
/* How many countries exists per Contitent */
select country.Continent, count(country.Code)
from country
group by country.Continent;
/* What is the min,max,average and standard deviation surface area of all country. */
select min(country.SurfaceArea), max(country.SurfaceArea), avg(country.SurfaceArea), std(country.SurfaceArea)
from country;
/* How many languages exists per country */
select country.Name, count(countryLanguage.Language)
from country
join countryLanguage on (country.Code = countryLanguage.CountryCode)
group by country.Name;
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MySQL on the shell"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we start using ipython notebook as our main teaching tool we start the mysql access on the system command-line interpreter console(Linux) or CMD(Window). To remember the general structure of the database here again the Entity–relationship model\n",
"\n",
"![ER model](workbench_er_model_world.png)\n",
"\n",
"Type in the console:\n",
"```\n",
"mysql -u root -p\n",
"```\n",
"We will see\n",
"```\n",
"Enter password:\n",
"```\n",
"Enter your password and you will get the mysql prompt:\n",
"```\n",
"mysql>\n",
"```\n",
"Now we can check for all databases with\n",
"```\n",
"show databases;\n",
"```\n",
"Change to the ***world*** database\n",
"```\n",
"use world\n",
"```\n",
"Show the tables\n",
"```\n",
"show tables;\n",
"```\n",
"Show how the columns in the table city are defined\n",
"```\n",
"show columns from city;\n",
"```\n",
"Select the first 10 entries from city table\n",
"```\n",
"Select * from city limit 10;\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, that's enough! No fun to type in everything. Let's change to execute SQL directly in ipython notebook "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SQL in ipython notebook\n",
"First we have to load ipython-sql"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ERROR: Line magic function `%sql` not found.\n"
]
}
],
"source": [
"%sql mysql+pymysql://root:YOUR_PASSWORD@localhost/world"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Count how many rows we have in the table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select count(*) from city;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Selecting on the column Name and Population"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select Name,Population from city limit 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Order the cities by Population and show the first 10 of them in descending order"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select Name,CountryCode,Population from city order by Population desc limit 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find out what is the meaning of the CountryCode **IND**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"show columns in country;"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from country where Code = 'IND';"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select all countries starting with an **A**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from country where Name like 'Z%';"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select all countries ending with a **C**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from country where Name like '%c';"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select all countries with contains **ee**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from country where Name like '%ee%';"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To show the result in column per line use \\G instead of ;\n",
"***Select * from country where Code='IND'\\G***\n",
"This you can do only on console"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show sum, minimum, maximum, average, standard deviation of Population over all cities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select min(Population),max(Population),avg(Population), std(Population) from city;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What's the name of the city with the smallest population"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select Name from city where Population = (Select min(Population) from city);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show how big is the population and how many cities for each country (for the listed cities), order by number of cities per country and show only the first 10 in descending order"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select CountryCode,sum(Population),count(*) from city group by CountryCode order by count(*) desc limit 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Joining tables\n",
"\n",
"Now we join the information of the city table and the country table. But let's how these both are realted.\n",
"![ER model](images/workbench_er_model_world2.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ... with where"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from city, country where city.CountryCode=country.Code limit 2;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ... with the key word: join"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select * from city join country on (city.CountryCode=country.Code) limit 2;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Complex queries with joins\n",
"Joins show now the power of relational databases\n",
"\n",
"Our question now is: What is the percentage distribution of the cities in relation to their country"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"Select \n",
" city.Name as City, \n",
" country.Name as Country, \n",
" city.Population cityPopulation,\n",
" country.Population countryPopulation,\n",
" city.Population/country.Population*100 as percentage_of_whole_country\n",
"from \n",
" city join country \n",
" on (city.CountryCode=country.Code)\n",
" order by percentage_of_whole_country desc\n",
" limit 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, this is very strange: Singapore city have more people than the country\n",
"\n",
"Let's check what wikipedia says: https://en.wikipedia.org/wiki/Singapore\n",
"\n",
"-> 5535000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercises\n",
"* How many countries exists\n",
"* Select the first 10 countries and order them by name\n",
"* How many countries exists per Contitent\n",
"* What is the min,max,average and standard deviation surface area of all country.\n",
"* How many languages exists per country"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
@cthoyt
Copy link
Author

cthoyt commented Jan 16, 2016

alternatively the second query could be construed as select 10, then order. Depending on the use case, this statement could be written as

select * 
from (select * from country limit 10)
order by country.Name;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment