Created
January 20, 2021 13:08
-
-
Save jonalv/6305c13c491b65bf4d8e6cb709a41f18 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass[12pt, oneside]{memoir} | |
\usepackage{tikz} | |
\usetikzlibrary{intersections} | |
\usetikzlibrary{calc} | |
\usetikzlibrary{er} | |
\usepackage{letltxmacro} | |
\usepackage[utf8]{inputenc} | |
\usepackage[cmintegrals,cmbraces]{newtxmath} | |
\usepackage[lining]{ebgaramond} | |
\usepackage{ebgaramond-maths} | |
\usepackage[T1]{fontenc} | |
\usepackage{nicefrac} | |
\usepackage{microtype} | |
\usepackage{icomma} | |
\usepackage{amsmath} | |
\usepackage{mathtools} | |
\usepackage{pgfplots} | |
\usepackage{keystroke} | |
\usepackage{alltt} | |
\usepackage{cmap} | |
\usepackage{textcomp} | |
\usepackage{wasysym} | |
\usepackage{fancyvrb} | |
\usepackage{datetime} | |
\usepackage[textsize=subscriptsize, linecolor=magenta, bordercolor=magenta, | |
backgroundcolor=magenta, textwidth=7cm]{todonotes} | |
\makeatletter | |
\renewcommand{\@todonotes@drawMarginNoteWithLine}{% | |
\begin{tikzpicture}[remember picture, overlay, baseline=-0.75ex]% | |
\node [coordinate] (inText) {};% | |
\end{tikzpicture}% | |
\marginpar[{% Draw note in left margin | |
\@todonotes@drawMarginNote{r}% | |
\@todonotes@drawLineToLeftMargin% | |
}]{% Draw note in right margin | |
\@todonotes@drawMarginNote{l}% | |
\@todonotes@drawLineToRightMargin% | |
}% | |
} | |
\renewcommand{\@todonotes@drawMarginNote}[1]{ | |
\makebox[\marginparwidth][#1]{\begin{tikzpicture}[remember picture,baseline=(X.base)]% | |
\node(X){\vphantom{X}};% | |
\draw node[notestyle,font=\@todonotes@sizecommand,anchor=north, text width=2.5cm] (inNote) at (X.north)% | |
{\@todonotes@text};% | |
\if@todonotes@authorgiven% | |
\draw node[notestyle,font=\@todonotes@sizecommand,anchor=north, text width=2.5cm] (inNote) at (X.north)% | |
{\@todonotes@sizecommand\@todonotes@author};% | |
\node(Y)[below=of X]{};% | |
\draw node[notestyle,font=\@todonotes@sizecommand,anchor=north, text width=2.5cm] (inNote) at (X.south)% | |
{\@todonotes@text};% | |
\else% | |
\draw node[notestyle,font=\@todonotes@sizecommand,anchor=north, text width=2.5cm] (inNote) at (X.north)% | |
{\@todonotes@text};% | |
\fi% | |
\end{tikzpicture}% | |
}} | |
\makeatother | |
\LetLtxMacro{\oldtodo}{\todo} | |
\renewcommand{\todo}[1]{{\color{white}\oldtodo{\textsf{#1}}}} | |
\renewcommand*{\sideparfont}{\footnotesize\itshape} | |
\sideparmargin{outer} | |
\renewcommand*{\sideparform}{\raggedright} | |
\begin{document} | |
\sidepar{\miniscule \textsf{Version: \today, \currenttime}} | |
\vskip 1 \baselineskip | |
{\let\clearpage\relax\chapter*{Databases exercise 1} | |
In this exercise you will create a simple SQLite database, populate it with some | |
data and execute some select statements.\sidepar{ | |
When writing on the command line you should always hit | |
\Enter to execute, so after this page I will not remind you.} | |
\section*{Starting the SQLite command line client} | |
Open Jupyter Lab and then start a terminal by: \texttt{File} $\rightarrow$ | |
\texttt{New} $\rightarrow$ \texttt{Terminal}. You might want to create a | |
separate directory for this exercise and go into it. If so, write: (don't write | |
the \$ symbol, it is already there) | |
\begin{alltt} | |
\$ mkdir sqlite \Enter \vspace{2pt} | |
\$ cd sqlite \Enter | |
\end{alltt} | |
Next start SQLite using a database file names \texttt{work.db}:\sidepar{We will be using SQLite version 3} | |
\begin{alltt} | |
\$ sqlite3 work.db \Enter | |
\end{alltt}% | |
When doing SQL it is very helpful to have a text editor | |
open to write things in first and then copy from and paste into the terminal. | |
Do \texttt{File} $\rightarrow$ \texttt{New} $\rightarrow$ \texttt{Text File} | |
and drag that tab next to your terminal tab to get something looking a bit like this: | |
\sidepar{This is the \texttt{Settings}$ \rightarrow$ \\ \texttt{Jupyter Lab Theme}$ \rightarrow$ \\ \texttt{Jupyter Lab Dark} theme.} | |
\enlargethispage{3\baselineskip} | |
\noindent | |
%\includegraphics[width=1\textwidth]{images/texteditor.png} | |
Next we will create tables in our new database but first a few words about | |
SQLite. SQLite is not very strict with data types. Normally in SQL there are a | |
few standard data types. SQLite will accept them and transfer them into it's | |
own datatypes. For example most SQL languages have a multitude of | |
\texttt{Integer} versions of different size. This because in some cases using a | |
smaller datatype can mean that the database becomes significantly smaller. | |
SQLite maps all of them to it's own \texttt{Integer} datatype. In this exercise | |
I will use the more standard SQL datatypes meaning that if you find yourself | |
with another database system than SQLite it should still look pretty familiar. | |
\section*{Creating tables}\label{createTableSection} | |
We are going to create the tables from the lecture. It will be a bit to | |
write but take it slowly and step by step. We start by creating the | |
\texttt{Person} table. A \texttt{Person} has a name, an address and a birth | |
date. We also give each table a column \texttt{id} so that we can easily refer | |
to a specific row in the table. | |
\Needspace{8\baselineskip} | |
\sidepar{also the extra white spaces (indentations) are not important, I have | |
just added them to make it look good} | |
We need to write: (Don't write the \texttt{...>} symbols, | |
SQLite adds them for you when you make a newline,) | |
\begin{alltt} | |
sqlite> CREATE TABLE Person ( | |
...> id INTEGER PRIMARY KEY AUTOINCREMENT, | |
...> name VARCHAR(128), | |
...> address VARCHAR(256), | |
...> birthdate DATE | |
...> ); | |
\end{alltt}\label{CreateTableStatement} | |
\sidepar{this works differently on different database systems, in fact the | |
\texttt{AUTOINCREMENT} keyword is not strictly needed in SQL\kern1ptite but it is | |
supported}The \texttt{AUTOINCREMENT} command means that whenever a new person | |
is created it will get a new increasing number. | |
Next, a \texttt{Location} has a name and is situated in a town. Create the | |
\texttt{Location} table: | |
\begin{alltt} | |
sqlite> CREATE TABLE Location ( | |
...> id INTEGER PRIMARY KEY AUTOINCREMENT, | |
...> name VARCHAR(128), | |
...> town VARCHAR(128) | |
...> ); | |
\end{alltt} | |
Finally a \texttt{Job} ties it together. The \texttt{Job} table specifies which | |
\texttt{Person} has what title and works at which \texttt{Location}. Create the | |
\texttt{Job} table: | |
\Needspace{10\baselineskip} | |
\begin{alltt} | |
sqlite> CREATE TABLE Job ( | |
...> id INTEGER PRIMARY KEY AUTOINCREMENT, | |
...> title VARCHAR(128), | |
...> Location_id int, | |
...> Person_id int, | |
...> | |
...> FOREIGN KEY (Location_id) REFERENCES Location(id), | |
...> FOREIGN KEY (Person_id) REFERENCES Person(id) | |
...> ); | |
\end{alltt} | |
Now, the \texttt{Job} table has two \texttt{FOREIGN KEY} specifications. They | |
enforce that we can't create a \texttt{Job} that references a \texttt{Location} | |
or \texttt{Person} that doesn't exist yet. As we will see later on. | |
\section*{Populating tables} | |
\sidepar{Tired of writing? These lines are without `\texttt{...>}' so you should be | |
able to copy from the pdf.}Next we will add some data to our | |
tables. Remember that the command for adding rows to a table is called | |
\texttt{INSERT} and that we must specify which fields we want to add data to. | |
In fact, any field that we don't specify will be set to \texttt{NULL}, | |
\textit{i.e.}, set to nothing, for that entry. First let's create some rows in | |
the \texttt{Person} table: | |
\begin{alltt} | |
INSERT INTO Person(name, address, birthdate) | |
VALUES (\texttt{\textquotesingle}Mary Smith\texttt{\textquotesingle}, \texttt{\textquotesingle}Main street 1\texttt{\textquotesingle}, \texttt{\textquotesingle}1970-04-17\texttt{\textquotesingle}); | |
INSERT INTO Person(name, address, birthdate) | |
VALUES (\texttt{\textquotesingle}John Doe\texttt{\textquotesingle}, \texttt{\textquotesingle}Highway 2\texttt{\textquotesingle}, \texttt{\textquotesingle}1972-07-24\texttt{\textquotesingle}); | |
INSERT INTO Person(name, address, birthdate) | |
VALUES (\texttt{\textquotesingle}Clara Doe\texttt{\textquotesingle}, \texttt{\textquotesingle}Highway 2\texttt{\textquotesingle}, \texttt{\textquotesingle}1995-11-11\texttt{\textquotesingle}); | |
INSERT INTO Person(name, address, birthdate) | |
VALUES (\texttt{\textquotesingle}Anna Jones\texttt{\textquotesingle}, \texttt{\textquotesingle}Midway 34\texttt{\textquotesingle}, \texttt{\textquotesingle}1974-02-03\texttt{\textquotesingle}); | |
\end{alltt} | |
While we are at it let's create the two example rows into the \texttt{Location} | |
table as well: | |
\begin{alltt} | |
INSERT INTO Location(name, town) | |
VALUES (\texttt{\textquotesingle}Site A\texttt{\textquotesingle}, \texttt{\textquotesingle}New York\texttt{\textquotesingle}); | |
INSERT INTO Location(name, town) | |
VALUES (\texttt{\textquotesingle}Site B\texttt{\textquotesingle}, \texttt{\textquotesingle}London\texttt{\textquotesingle}); | |
\end{alltt} | |
All that remains now is the \texttt{Job} table. Remember that it has references | |
to rows in the other tables. In a sense it connects the other two tables. In | |
order to create an entry in the \texttt{Job} table we need to references a | |
\texttt{Person} and a \texttt{Location}. We do this reference by their id | |
field. What id's for \texttt{Location} and \texttt{Person} are are in the | |
database then? An easy way to figure this out is to run: | |
\Needspace{8\baselineskip} | |
\begin{alltt} | |
sqlite> SELECT * FROM Person; | |
{ \fontfamily{pcr}\selectfont | |
1|Mary Smith|Main street 1|1970-04-17 | |
2|John Doe|Highway 2|1972-07-24 | |
3|Clara Doe|Highway 2|1995-11-11 | |
4|Anna Jones|Midway 34|1974-02-03} | |
\end{alltt} | |
However, it would be nice to see the name of the headers, and we can configure | |
SQLite to also print the columns in a nicer way. Run: | |
\begin{alltt} | |
sqlite> .headers on | |
sqlite> .mode column | |
\end{alltt} | |
and the same command as before will look like this instead: | |
\begin{alltt} | |
sqlite> SELECT * FROM Person; | |
{ \fontfamily{pcr}\selectfont | |
id name address birthdate | |
---------- ---------- ------------- ---------- | |
1 Mary Smith Main street 1 1970-04-17 | |
2 John Doe Highway 2 1972-07-24 | |
3 Clara Doe Highway 2 1995-11-11 | |
4 Anna Jones Midway 34 1974-02-03} | |
\end{alltt} | |
and let's also list the locations: | |
\begin{alltt} | |
sqlite> SELECT * FROM Location; | |
{ \fontfamily{pcr}\selectfont | |
id name town | |
---------- ---------- ---------- | |
1 Site A New York | |
2 Site B London} | |
\end{alltt} | |
As you can see from these tables every time a new entry has been inserted a new | |
id number has been given to it. Now, if we try to insert a row into the | |
\texttt{Job} table that references a \texttt{Person} by an id that doesn't yet | |
exist I have said that it shouldn't work. That is not entirely true because SQLite only honors foreign key relations if that feature is activated. Let's try it out by first activating it: | |
\begin{alltt} | |
sqlite> PRAGMA foreign_keys = ON; | |
\end{alltt} | |
and then try: | |
\begin{alltt} | |
sqlite> INSERT INTO Job(title, Location_id, Person_id) | |
...> VALUES ('CEO', 1, 5); | |
\end{alltt} | |
%\sidepar{You should be able to copy and paste these inserts from the pdf | |
%too.} | |
What did you get? At least now we can't create a job for a | |
non-existing \texttt{Person}. That is good to know. | |
\Needspace{8\baselineskip} | |
Now, let's add some rows to the \texttt{Jobs} table as well so that we have | |
something to work with: | |
\begin{alltt} | |
INSERT INTO Job(title, Location_id, Person_id) | |
VALUES (\texttt{\textquotesingle}Secretary\texttt{\textquotesingle}, 1, 1); | |
INSERT INTO Job(title, Location_id, Person_id) | |
VALUES (\texttt{\textquotesingle}Secretary\texttt{\textquotesingle}, 2, 2); | |
INSERT INTO Job(title, Location_id, Person_id) | |
VALUES (\texttt{\textquotesingle}Intern\texttt{\textquotesingle}, 1, 3); | |
INSERT INTO Job(title, Location_id, Person_id) | |
VALUES (\texttt{\textquotesingle}Administrator\texttt{\textquotesingle}, 2, 4); | |
INSERT INTO Job(title, Location_id, Person_id) | |
VALUES (\texttt{\textquotesingle}Fill-in\texttt{\textquotesingle}, 1, 4); | |
\end{alltt} | |
\needspace{12\baselineskip} | |
\section*{Working with our data} | |
\sidepar{``Do I really need to write all this code?'' --- Yes, and no\dots\ I | |
think that you actually learn by trying it out and doing it yourself though. | |
Also, if you think of something else that you want to try, Go Right | |
Ahead! \smiley{}}Finally, we now have a populated database. Maybe just a small example | |
with only a few rows in each table but at least enough for us to try some | |
things on. First let's try the \texttt{UPDATE} command. We can pretend that it | |
turns out that 'Anna Jones' has moved so we need to update her address: | |
\begin{alltt} | |
sqlite> UPDATE Person | |
...> SET address = 'Upper street 12' | |
...> WHERE name = 'Anna Jones'; | |
\end{alltt} | |
We should also try the \texttt{DELETE} command. Let's say that it turns out | |
that this same 'Anna Jones' no longer fills in at the New York office. Then we | |
should delete that entry from the \texttt{Job} table: | |
\Needspace{8\baselineskip} | |
\begin{alltt} | |
sqlite> DELETE FROM Job | |
...> WHERE title='Fill-in'; | |
\end{alltt} | |
Notice that this command removes all fill-ins. In this case only one exists though. | |
\Needspace{8\baselineskip} | |
These two commands didn't really show what the data looked like after they were | |
executed. If we want to do that we need a \texttt{SELECT} statement again. We | |
want to check the address and the jobs of\kern 4pt `Anna Jones' to see that our changes | |
took effect. Remembering what our database looks like: \sidepar{You might also | |
want to have a look at the \texttt{CREATE TABLE} statements on | |
page~\pageref{CreateTableStatement}} | |
\begin{center} | |
\begin{tikzpicture} | |
\footnotesize | |
\node[entity] (person) at (0,0) {Person} | |
child {node [key attribute] {id}} | |
child {node {address}} | |
child {node {name}} | |
child {node {birth date}}; | |
\node[entity] (job) at (3,1) {Job} | |
child {node [key attribute] {id}} | |
child {node {title}}; | |
\node[relationship] (has) at ($(person)!0.5!(job)$) {has} | |
edge (person) | |
edge (job); | |
\node[entity] (location) at (6,0) {Location} | |
child {node [key attribute] {id}} | |
child {node {name}} | |
child {node {town}}; | |
\node[relationship] (has2) at ($(location)!0.5!(job)$) {has} | |
edge (location) | |
edge (job); | |
\end{tikzpicture} | |
\end{center} | |
\noindent | |
we know that we need to join the \texttt{Person} and \texttt{Job} table and | |
that we want the person's name to be \texttt{'Anna Jones'}. Try to write the | |
query yourself before looking at the solution on next page: | |
\newpage | |
\begin{alltt} | |
sqlite> SELECT p.name, p.address, j.title | |
...> FROM Person p, Job j | |
...> WHERE p.name='Anna Jones' | |
...> AND p.id = j.Person_id; | |
{ \fontfamily{pcr}\selectfont | |
name address title | |
---------- --------------- ------------- | |
Anna Jones Upper street 12 Administrator} | |
\end{alltt} | |
We see that our \texttt{UPDATE} of the address had effect and that the | |
\texttt{Job} entry for 'Fill-in' has been removed. | |
Our database is quite small so an \texttt{INDEX} will not make a difference | |
that we can measure but if the database were to grow large it might be useful | |
with and \texttt{INDEX} on the name of \texttt{Person}. Let's add one: | |
\begin{alltt} | |
sqlite> CREATE INDEX name_index ON Person(name); | |
\end{alltt} | |
\newpage | |
\section*{Exercises} | |
Now it's time for you to try to write your own queries. Here are a few | |
exercises for you to work on. We will go through the solutions tomorrow. | |
Tomorrow we will also look at the ChEMBL database which is a database | |
containing chemical data. | |
\begin{itemize} | |
\item [Exercise 1:] Write a \texttt{SELECT} statement that retrieves title, | |
name and address for every person in the database. You will need to join the | |
Person table with the Job table to do that. | |
\item [Exercise 2:] Do you remember the flat file example from the lecture? It | |
looked a bit like this in the lecture: | |
\sidepar{The column headers used in the flat file example is not the same as in | |
the database. If you want to change them, try using Google to find out how to do it.} | |
\tiny | |
\begin{Verbatim}[frame=single] | |
ID NAME HOME ADDRESS BIRTHDATE TITLE OFFICE SITE OFFICE TOWN | |
1 Mary Smith Main street 1 1970-04-17 Secretary Site B London | |
2 John Doe Highway 2 1972-07-24 Secretary Site A New York | |
3 Clara Doe Highway 2 1995-11-11 Intern Site A New York | |
4 Anna Jones Midway 34 1974-02-03 Administrator Site B London | |
5 Anna Jones Midway 34 1974-02-03 Fill-in Site A New York | |
\end{Verbatim} | |
\normalsize | |
Write a \texttt{SELECT} statement that outputs those fields. The output won't | |
be identical, for example remember that we removed the `Fill-in' \texttt{Job} | |
and changed the address of `Anna Jones'. This time you will need to join three | |
tables and list quite a few fields to be included. | |
\item [Exercise 3:] Write a \texttt{SELECT} statement for name, address and | |
title of all the persons working at the New York office. | |
\end{itemize} | |
\end{document} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment