Counting earthquakes in the contiguous United States.
Dataset here: http://2016.padjo.org/tutorials/sqlite-data-starterpacks/#more-info-m3-0-earthquakes-in-the-contiguous-u-s-1995-through-2015
Let's start with a count of M3.0 earthquakes in the entire dataset (which spans from 1995 through 2015)
SELECT COUNT(*)
FROM quakes;
COUNT(*) |
---|
16941 |
Let's narrow that query to count only M4.0+ quakes:
SELECT COUNT(*)
FROM quakes
WHERE mag >= 4.0;
COUNT(*) |
---|
1945 |
When the COUNT
aggregate function is used on its own, it crunches the dataset into a single scalar value, which is what we want: the total number of earthquakes.
But when we want more than just one count, such as a count by year, then we use GROUP BY
There is no year column in the dataset. But the time
column follows standard UTC format, e.g. 2005-01-02T20:58:48.280Z
Which means SQLite will treat it the time
column with formality (SQLite has no time/date datatype).
Let's ignore the aggregation for now and get a list of top 10 earthquakes by magnitude, time, and extracted year. The strftime function, confusing as it is, can be used here to extract the year:
SELECT mag, time,
strftime('%Y', time) AS year
FROM quakes
ORDER BY mag DESC
LIMIT 10;
mag | time | year |
---|---|---|
7.2 | 2010-04-04 22:40:42.360000 | 2010 |
7.1 | 1999-10-16 09:46:44.460000 | 1999 |
7.0 | 2012-04-12 07:15:48.500000 | 2012 |
6.9 | 2009-08-03 17:59:56.170000 | 2009 |
6.8 | 2001-02-28 18:54:32.830000 | 2001 |
6.7 | 2015-09-13 08:14:08.560000 | 2015 |
6.7 | 2010-10-21 17:53:13.570000 | 2010 |
6.6 | 2013-10-19 17:54:54.700000 | 2013 |
6.6 | 2006-01-04 08:32:32.400000 | 2006 |
6.6 | 1995-08-28 10:46:12.060000 | 1995 |
However, if strftime
is overkill here (and maddeningly besides). SQLite treats the time
column as a string, which means we can use SUBSTR
to same effect:
SELECT mag, time,
SUBSTR(time, 1, 4) AS year
FROM quakes
ORDER BY mag DESC
LIMIT 10;
Going back to the aggregation problem; to find quakes per year, we needed to get a year value. Now that we have it, we can GROUP BY
it:
SELECT
SUBSTR(time, 1, 4) AS year,
COUNT(*) AS qcount
FROM quakes
GROUP BY year
ORDER BY year ASC;
The result:
year | qcount |
---|---|
1995 | 657 |
1996 | 572 |
1997 | 618 |
1998 | 649 |
1999 | 1249 |
2000 | 559 |
2001 | 612 |
2002 | 585 |
2003 | 593 |
2004 | 851 |
2005 | 554 |
2006 | 529 |
2007 | 460 |
2008 | 604 |
2009 | 537 |
2010 | 2053 |
2011 | 799 |
2012 | 580 |
2013 | 724 |
2014 | 1460 |
2015 | 1696 |