So you're in the WCA statistics group, and you want to learn how to make the nifty tables of mystery stats? Here are the tools you need to make it happen. If anything here doesn't work, google it or ask someone more qualified than myself.
I'd suggest using Homebrew if you don't already. To check if you have it and install accordingly, run:
if [ ! -e /usr/local/bin/brew ]; then
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
fi
Once you've done that, install MySQL and start the service like so:
brew install mysql
brew services start mysql
Download Ubuntu 20.04 from the Microsoft store. Then, open it and use the instructions for linux.
If you use linux, you probably know your way around a package manager already. Install mysql with something like this;
sudo apt update && sudo apt install mysql-server
To start the service, do something like:
sudo service mysql start
Open a terminal or command prompt and run
sudo mysql_secure_installation
and follow the instructions to change the root password. Once you've done that, run:
mysql -u root -p
if you're on mac, or
sudo mysql
on linux or WSL.
It might prompt for the root password that you just changed. If you'd rather skip this step, on UNIX systems the default root password is usually blank, so just hit enter.
You should now see something like this, with MySQL or MariaDB (they are the same for all intents and purposes):
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.22 Homebrew
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
To create the database (imaginatively named wca) and set it as your working database, run:
CREATE DATABASE wca;
USE wca;
SOURCE [filepath to WCA_export.sql];
Type 'exit' or ctrl+d to get out of this shell and you're done.
Updating the database is fairly straightforward. First, download the new export at https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip and extract WCA_export.sql. You can do this with
curl -# -o wcaExport.sql.zip https://www.worldcubeassociation.org/results/misc/WCA_export.sql.zip && unzip wcaExport.sql.zip WCA_export.sql && rm wcaExport.sql.zip
To import the sql dump, run:
mysql -u root -p wca < [path to WCA_export.sql]
on mac, or
sudo mysql wca < [path to WCA_export.sql]
and enter your password.
Alternatively, login to MySQL like before:
mysql -u root -p wca
Then, update the database source with:
SOURCE [path to WCA_export.sql];
I've included the script I use to automate this process below in wcaUpdateScript.sh
. It's good to do it manually once to understand what's going on, but I'd recommend using my script (or your own, if you have different preferences) in the future to standardize the process.
Login to MySQL and use the WCA database like so:
mysql -u root -p wca
To see what we're working with, try:
SHOW TABLES;
You'll see something like this:
mysql> SHOW TABLES;
+-----------------------------------------+
| Tables_in_wca |
+-----------------------------------------+
| Competitions |
| Continents |
| Countries |
| Events |
| Formats |
| Persons |
| RanksAverage |
| RanksSingle |
| Results |
| RoundTypes |
| Rounds |
| Scrambles |
| championships |
| eligible_country_iso2s_for_championship |
+-----------------------------------------+
Let's take a look inside of the Results table (probably the one you'll use most frequently) using the DESCRIBE
command:
mysql> DESCRIBE Results;
+-----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| competitionId | varchar(32) | NO | | | |
| eventId | varchar(6) | NO | | | |
| roundTypeId | char(1) | NO | | | |
| pos | smallint(6) | NO | | 0 | |
| best | int(11) | NO | | 0 | |
| average | int(11) | NO | | 0 | |
| personName | varchar(80) | YES | | NULL | |
| personId | varchar(10) | NO | | | |
| personCountryId | varchar(50) | YES | | NULL | |
| formatId | char(1) | NO | | | |
| value1 | int(11) | NO | | 0 | |
| value2 | int(11) | NO | | 0 | |
| value3 | int(11) | NO | | 0 | |
| value4 | int(11) | NO | | 0 | |
| value5 | int(11) | NO | | 0 | |
| regionalSingleRecord | char(3) | YES | | NULL | |
| regionalAverageRecord | char(3) | YES | | NULL | |
+-----------------------+-------------+------+-----+---------+-------+
We can see each table is like a page of a spreadsheet, where the fields are the columns. The entire database is like a collection of these pages. We can see values to store a person's name, ID, average, position, times, and more for any given round of an event.
SQL queries typically follow a structure something like this:
SELECT [attribute], [another attribute], [...]
FROM [table]
WHERE [condition]
AND [another condition]
AND [...]
GROUP BY [attribute to group by for aggregate queries]
ORDER BY [column you're sorting] ASC, [potentially more sorting criteria via same format];
For example, if you wanted to get the top 10 'global averages' (average of all official averages) for an event (we'll say 3x3), you might run something like this:
mysql> SELECT personId, personName, AVG(average)/100 AS globalAverage
-> FROM Results
-> WHERE eventId = '333' AND average > 0
-> GROUP BY personId, personName
-> ORDER BY globalAverage ASC
-> LIMIT 10;
+------------+-----------------------------+---------------+
| personId | personName | globalAverage |
+------------+-----------------------------+---------------+
| 2009ZEMD01 | Feliks Zemdegs | 7.33307463 |
| 2012PARK03 | Max Park | 7.64177122 |
| 2013NAHM01 | Seung Hyuk Nahm (남승혁) | 7.86930693 |
| 2012PONC02 | Patrick Ponce | 8.12995851 |
| 2010WEYE01 | Philipp Weyer | 8.22181269 |
| 2017VILL41 | Sean Patrick Villanueva | 8.33681818 |
| 2010WEYE02 | Sebastian Weyer | 8.35223926 |
| 2015MANS03 | Kian Mansour | 8.42538462 |
| 2017SIAU02 | Max Siauw | 8.46956250 |
| 2018CHAN50 | Heidi Chan | 8.47250000 |
+------------+-----------------------------+---------------+
10 rows in set (2.01 sec)
The query returns a new table, with the selected attributes and a single tuple for each of the GROUP BY
attributes:
It's worth noting that GROUP BY in this context denotes the grouping of an 'aggregate query' (a computational operation across a set of tuples; think SUM, AVG, COUNT, MAX, etc). That means that it'll compute this result for each distinct tuple specified by your attribute. In this case, it's the difference between getting the average for every person as opposed to getting the average of all times ever recorded in the WCA.
You can use as many or as few of these conditions as you'd like, as long as you're selecting information. For example, the same query without ordering looks like this:
mysql> SELECT personName, AVG(average) / 100 AS globalAverage
-> FROM Results
-> WHERE eventId = '333'
-> AND average > 0
-> GROUP BY personName
-> LIMIT 10;
+-------------------------+---------------+
| personName | globalAverage |
+-------------------------+---------------+
| Etienne Amany | 21.90750000 |
| Thomas Rouault | 22.33272727 |
| Antoine Simon-Chautemps | 21.69850000 |
| Irène Mallordy | 19.07416667 |
| Marlène Desmaisons | 28.08444444 |
| Ton Dennenbroek | 20.21324405 |
| Arnaud van Galen | 22.45707819 |
| Cyrille Cornu | 30.45000000 |
| Christophe Woittequand | 24.03273973 |
| Georges Poinsot | 30.95166667 |
+-------------------------+---------------+
10 rows in set (1.72 sec)
We can see that instead of sorting by average, the columns are in the order that they were checked. In this case, it's based on personName in alphabetical order.
If you want to filter on an aggregated value, you must do so after the aggregation is completed using the HAVING
command. For example, if you wanted to get all people with a sub 8 global average, you might try something like this:
mysql> SELECT personName
-> FROM Results
-> WHERE eventId = '333'
-> AND average > 0
-> AND AVG(average) / 100 < 8
-> GROUP BY personName;
ERROR 1111 (HY000): Invalid use of group function
This is because we can't perform an aggregation before we actually have a group on which to aggregate. Instead, we do:
mysql> SELECT personName
-> FROM Results
-> WHERE eventId = '333'
-> AND average > 0
-> GROUP BY personName
-> HAVING AVG(average) / 100 < 8;
+-----------------------------+
| personName |
+-----------------------------+
| Feliks Zemdegs |
| Max Park |
| Seung Hyuk Nahm (남승혁) |
+-----------------------------+
3 rows in set (1.77 sec)
Joins are how we can construct mappings between related tables in a database. Related tables are connected by foreign keys that specify which tuples are connected between tables. For example, let's try and get the number of competitors on each continent, using these tables:
mysql> DESCRIBE Persons;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| id | varchar(10) | NO | | | |
| subid | tinyint | NO | | 1 | |
| name | varchar(80) | YES | | NULL | |
| countryId | varchar(50) | NO | | | |
| gender | char(1) | YES | | | |
+-----------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
mysql> DESCRIBE Countries;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id | varchar(50) | NO | | | |
| name | varchar(50) | NO | | | |
| continentId | varchar(50) | NO | | | |
| iso2 | char(2) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> DESCRIBE Continents;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id | varchar(50) | NO | | | |
| name | varchar(50) | NO | | | |
| recordName | char(3) | NO | | | |
| latitude | int | NO | | 0 | |
| longitude | int | NO | | 0 | |
| zoom | tinyint | NO | | 0 | |
+------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
Intuitively, we want to find all people who have a country id that falls within a specific continent, and we want to count the number of people in this group for each continent, and then sort them by population for readability. Then, we can write:
mysql> SELECT continent.name, COUNT(person.id) as totalCompetitors
-> FROM Persons person
-> INNER JOIN Countries country on person.countryId = country.id
-> INNER JOIN Continents continent on continent.id = country.continentId
-> GROUP BY continent.name
-> ORDER BY totalCompetitors DESC;
+---------------+------------------+
| name | totalCompetitors |
+---------------+------------------+
| Asia | 56997 |
| North America | 36280 |
| Europe | 34224 |
| South America | 14308 |
| Oceania | 3554 |
| Africa | 1927 |
+---------------+------------------+
6 rows in set (0.33 sec)
INNER JOIN
selects only the tuples (rows) where all of the conditions are met. In this case, each continent tuple must contain all of the countries within it and all competitors within those countries. It can be useful to visualize this with a venn diagram. is particularly useful.
This is all you need to get started writing your own queries! If you want to get into more complicated queries, you can read the MySQL documentation at https://dev.mysql.com/doc/ for a pretty comprehensive guide on syntax or features that I didn't explain here.