kclejeune/MySQL.md

## MySQL.md

      
    Raw
  

              MySQL.md
            
          
    A Severely Mediocre Crash Course in MySQL

So you're in the WCA statistics group, and you want to learn how to make the nifty tables of mystery stats?
Here are the tools you need to make it happen. If anything here doesn't work, google it or ask someone more qualified than myself.
Installing MySQL - macOS

I'd suggest using Homebrew if you don't already.  To check if you have it and install accordingly, run:
if [ ! -e /usr/local/bin/brew ]; then 
    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
fi
Once you've done that, install MySQL and start the service like so:
brew install mysql
brew services start mysql
Installing MySQL - Windows

Download Ubuntu 20.04 from the Microsoft store. Then, open it and use the instructions for linux.
Installing MySQL - Linux

If you use linux, you probably know your way around a package manager already. Install mysql with something like this;
sudo apt update && sudo apt install mysql-server
To start the service, do something like:
sudo service mysql start
Configuring the database

Open a terminal or command prompt and run
sudo mysql_secure_installation
and follow the instructions to change the root password. Once you've done that, run:
mysql -u root -p
if you're on mac, or
sudo mysql
on linux or WSL.
It might prompt for the root password that you just changed.  If you'd rather skip this step, on UNIX systems the default root password is usually blank, so just hit enter.
You should now see something like this, with MySQL or MariaDB (they are the same for all intents and purposes):
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.22 Homebrew

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

To create the database (imaginatively named wca) and set it as your working database, run:
CREATE DATABASE wca;
USE wca;
SOURCE [filepath to WCA_export.sql];
Type 'exit' or ctrl+d to get out of this shell and you're done.
Importing (or updating) the Database

Updating the database is fairly straightforward. First, download the new export at https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip and extract WCA_export.sql. You can do this with
curl -# -o wcaExport.sql.zip https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip && unzip wcaExport.sql.zip WCA_export.sql && rm wcaExport.sql.zip
To import the sql dump, run:
mysql -u root -p wca < [path to WCA_export.sql]
on mac, or
sudo mysql wca < [path to WCA_export.sql]
and enter your password.
Alternatively, login to MySQL like before:
mysql -u root -p wca
Then, update the database source with:
SOURCE [path to WCA_export.sql];
I've included the script I use to automate this process below in wcaUpdateScript.sh. It's good to do it manually once to understand what's going on, but I'd recommend using my script (or your own, if you have different preferences) in the future to standardize the process.
Querying the Database (a crash course in SQL)

Understanding Database Structure

Login to MySQL and use the WCA database like so:
mysql -u root -p wca
To see what we're working with, try:
SHOW TABLES;
You'll see something like this:
mysql> SHOW TABLES;
+-----------------------------------------+
| Tables_in_wca                           |
+-----------------------------------------+
| Competitions                            |
| Continents                              |
| Countries                               |
| Events                                  |
| Formats                                 |
| Persons                                 |
| RanksAverage                            |
| RanksSingle                             |
| Results                                 |
| RoundTypes                              |
| Rounds                                  |
| Scrambles                               |
| championships                           |
| eligible_country_iso2s_for_championship |
+-----------------------------------------+
Let's take a look inside of the Results table (probably the one you'll use most frequently) using the DESCRIBE command:
mysql> DESCRIBE Results;
+-----------------------+-------------+------+-----+---------+-------+
| Field                 | Type        | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| competitionId         | varchar(32) | NO   |     |         |       |
| eventId               | varchar(6)  | NO   |     |         |       |
| roundTypeId           | char(1)     | NO   |     |         |       |
| pos                   | smallint(6) | NO   |     | 0       |       |
| best                  | int(11)     | NO   |     | 0       |       |
| average               | int(11)     | NO   |     | 0       |       |
| personName            | varchar(80) | YES  |     | NULL    |       |
| personId              | varchar(10) | NO   |     |         |       |
| personCountryId       | varchar(50) | YES  |     | NULL    |       |
| formatId              | char(1)     | NO   |     |         |       |
| value1                | int(11)     | NO   |     | 0       |       |
| value2                | int(11)     | NO   |     | 0       |       |
| value3                | int(11)     | NO   |     | 0       |       |
| value4                | int(11)     | NO   |     | 0       |       |
| value5                | int(11)     | NO   |     | 0       |       |
| regionalSingleRecord  | char(3)     | YES  |     | NULL    |       |
| regionalAverageRecord | char(3)     | YES  |     | NULL    |       |
+-----------------------+-------------+------+-----+---------+-------+
We can see each table is like a page of a spreadsheet, where the fields are the columns. The entire database is like a collection of these pages. We can see values to store a person's name, ID, average, position, times, and more for any given round of an event.
Basic Query Structure

SQL queries typically follow a structure something like this:
SELECT [attribute], [another attribute], [...] 
FROM [table] 
WHERE [condition] 
AND [another condition] 
AND [...]
GROUP BY [attribute to group by for aggregate queries]
ORDER BY [column you're sorting] ASC, [potentially more sorting criteria via same format];
For example, if you wanted to get the top 10 'global averages' (average of all official averages)
for an event (we'll say 3x3), you might run something like this:
mysql> SELECT personId, personName, AVG(average)/100 AS globalAverage
    -> FROM Results
    -> WHERE eventId = '333' AND average > 0
    -> GROUP BY personId, personName
    -> ORDER BY globalAverage ASC
    -> LIMIT 10;
+------------+-----------------------------+---------------+
| personId   | personName                  | globalAverage |
+------------+-----------------------------+---------------+
| 2009ZEMD01 | Feliks Zemdegs              |    7.33307463 |
| 2012PARK03 | Max Park                    |    7.64177122 |
| 2013NAHM01 | Seung Hyuk Nahm (남승혁)    |    7.86930693 |
| 2012PONC02 | Patrick Ponce               |    8.12995851 |
| 2010WEYE01 | Philipp Weyer               |    8.22181269 |
| 2017VILL41 | Sean Patrick Villanueva     |    8.33681818 |
| 2010WEYE02 | Sebastian Weyer             |    8.35223926 |
| 2015MANS03 | Kian Mansour                |    8.42538462 |
| 2017SIAU02 | Max Siauw                   |    8.46956250 |
| 2018CHAN50 | Heidi Chan                  |    8.47250000 |
+------------+-----------------------------+---------------+
10 rows in set (2.01 sec)
The query returns a new table, with the selected attributes and a single tuple for each of the GROUP BY attributes:
It's worth noting that GROUP BY in this context denotes the grouping of an 'aggregate query' (a computational operation across a set of tuples; think SUM, AVG, COUNT, MAX, etc). That means that it'll compute this result for each distinct tuple specified by your attribute.  In this case, it's the difference between getting the average for every person as opposed to getting the average of all times ever recorded in the WCA.
You can use as many or as few of these conditions as you'd like, as long as you're selecting information. For example, the same query without ordering looks like this:
mysql> SELECT personName, AVG(average) / 100 AS globalAverage
    -> FROM Results
    -> WHERE eventId = '333'
    ->   AND average > 0
    -> GROUP BY personName
    -> LIMIT 10;
+-------------------------+---------------+
| personName              | globalAverage |
+-------------------------+---------------+
| Etienne Amany           |   21.90750000 |
| Thomas Rouault          |   22.33272727 |
| Antoine Simon-Chautemps |   21.69850000 |
| Irène Mallordy          |   19.07416667 |
| Marlène Desmaisons      |   28.08444444 |
| Ton Dennenbroek         |   20.21324405 |
| Arnaud van Galen        |   22.45707819 |
| Cyrille Cornu           |   30.45000000 |
| Christophe Woittequand  |   24.03273973 |
| Georges Poinsot         |   30.95166667 |
+-------------------------+---------------+
10 rows in set (1.72 sec)
We can see that instead of sorting by average, the columns are in the order that they were checked. In this case, it's based on personName in alphabetical order.
If you want to filter on an aggregated value, you must do so after the aggregation is completed using the HAVING command. For example, if you wanted to get all people with a sub 8 global average, you might try something like this:
mysql> SELECT personName    
    -> FROM Results
    -> WHERE eventId = '333'    
    ->   AND average > 0    
    ->   AND AVG(average) / 100 < 8
    -> GROUP BY personName;
ERROR 1111 (HY000): Invalid use of group function
This is because we can't perform an aggregation before we actually have a group on which to aggregate. Instead, we do:
mysql> SELECT personName
    -> FROM Results
    -> WHERE eventId = '333'
    ->   AND average > 0
    -> GROUP BY personName
    -> HAVING AVG(average) / 100 < 8;
+-----------------------------+
| personName                  |
+-----------------------------+
| Feliks Zemdegs              |
| Max Park                    |
| Seung Hyuk Nahm (남승혁)    |
+-----------------------------+
3 rows in set (1.77 sec)
Using Joins

Joins are how we can construct mappings between related tables in a database. Related tables are connected by foreign keys that specify which tuples are connected between tables. For example, let's try and get the number of competitors on each continent, using these tables:
mysql> DESCRIBE Persons;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| id        | varchar(10) | NO   |     |         |       |
| subid     | tinyint     | NO   |     | 1       |       |
| name      | varchar(80) | YES  |     | NULL    |       |
| countryId | varchar(50) | NO   |     |         |       |
| gender    | char(1)     | YES  |     |         |       |
+-----------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

mysql> DESCRIBE Countries;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id          | varchar(50) | NO   |     |         |       |
| name        | varchar(50) | NO   |     |         |       |
| continentId | varchar(50) | NO   |     |         |       |
| iso2        | char(2)     | YES  |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> DESCRIBE Continents;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | varchar(50) | NO   |     |         |       |
| name       | varchar(50) | NO   |     |         |       |
| recordName | char(3)     | NO   |     |         |       |
| latitude   | int         | NO   |     | 0       |       |
| longitude  | int         | NO   |     | 0       |       |
| zoom       | tinyint     | NO   |     | 0       |       |
+------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
Intuitively, we want to find all people who have a country id that falls within a specific continent, and we want to count the number of people in this group for each continent, and then sort them by population for readability. Then, we can write:
mysql> SELECT continent.name, COUNT(person.id) as totalCompetitors
    -> FROM Persons person
    ->          INNER JOIN Countries country on person.countryId = country.id
    ->          INNER JOIN Continents continent on continent.id = country.continentId
    -> GROUP BY continent.name
    -> ORDER BY totalCompetitors DESC;
+---------------+------------------+
| name          | totalCompetitors |
+---------------+------------------+
| Asia          |            56997 |
| North America |            36280 |
| Europe        |            34224 |
| South America |            14308 |
| Oceania       |             3554 |
| Africa        |             1927 |
+---------------+------------------+
6 rows in set (0.33 sec)
INNER JOIN selects only the tuples (rows) where all of the conditions are met. In this case, each continent tuple must contain all of the countries within it and all competitors within those countries. It can be useful to visualize this with a venn diagram.  is particularly useful.
This is all you need to get started writing your own queries! If you want to get into more complicated queries, you can read the MySQL documentation at https://dev.mysql.com/doc/ for a pretty comprehensive guide on syntax or features that I didn't explain here.

  
## wcaUpdateScript.sh
#! /bin/bash
# always remove lingering zip archives before we proceed
if [[ -e wcaExport.sql.zip ]]; then
    rm wcaExport.sql.zip
fi

# remove and redownload the file if the current export is more than 1 day old
if [[ $(find "WCA_export.sql" -mtime +1 -print) ]]; then
    rm WCA_Export.sql
    echo "Downloading Current WCA Export"
    curl -# -o wcaExport.sql.zip https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip
    # extract the sql dump
    unzip wcaExport.sql.zip WCA_export.sql && rm wcaExport.sql.zip
fi

# specify a .sqlPass with root user password and nothing else, otherwise prompt for the password
if [[ -e .sqlPass ]]; then
    pass=$(cat .sqlPass)
else
    read -s -p "MySQL Password: " pass
fi

echo "Importing Database"
# import sql dump
mysql -u root --password=$pass wca < WCA_export.sql

echo "Database import complete. Removing archived export files."

# clean up files, leave sql dump file until the next update
rm wcaExport.sql.zip
clear
echo Updated $(date) >> .updateLog
	#! /bin/bash
	# always remove lingering zip archives before we proceed
	if [[ -e wcaExport.sql.zip ]]; then
	rm wcaExport.sql.zip
	fi

	# remove and redownload the file if the current export is more than 1 day old
	if [[ $(find "WCA_export.sql" -mtime +1 -print) ]]; then
	rm WCA_Export.sql
	echo "Downloading Current WCA Export"
	curl -# -o wcaExport.sql.zip https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip
	# extract the sql dump
	unzip wcaExport.sql.zip WCA_export.sql && rm wcaExport.sql.zip
	fi

	# specify a .sqlPass with root user password and nothing else, otherwise prompt for the password
	if [[ -e .sqlPass ]]; then
	pass=$(cat .sqlPass)
	else
	read -s -p "MySQL Password: " pass
	fi

	echo "Importing Database"
	# import sql dump
	mysql -u root --password=$pass wca < WCA_export.sql

	echo "Database import complete. Removing archived export files."

	# clean up files, leave sql dump file until the next update
	rm wcaExport.sql.zip
	clear
	echo Updated $(date) >> .updateLog