Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active November 7, 2018 06:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dannguyen/1ff581b1e76fddb72cf2ff1542aa6c62 to your computer and use it in GitHub Desktop.
Save dannguyen/1ff581b1e76fddb72cf2ff1542aa6c62 to your computer and use it in GitHub Desktop.
Sample SQLite problem in which, given a table of records, I want to link each record with its prior version. Seems like recursion is the solution -- putting down this correlated query for ow

Using recursive CTE in SQLite instead of a correlated query

Given a table of exams, in which each exam belongs to a "student", we want to derive a new table in which for any given student exam, we see how much the score has changed compared to the student's previous exam.

Here's the table of exams, which is generated by the SQL in sql-make-exams-table.sql](#file-sql-make-exams-table-sql):

starting exams table

exam_id year student score
14 2014 Alice 87
1 2010 Alice 76
2 2011 Alice 82
3 2010 Bob 55
4 2013 Alice 92
5 2009 Bob 70
6 2012 Dan 84
7 2011 Eric 34
8 2016 Alice 80
9 2015 Dan 100
10 2013 Bob 42

Derived table

The end goal is to produce a table in which we can do interesting analyses, such as find the student with the biggest improvement, or worst change, etc.:

student time_period score_diff prior_year prior_score year score prior_id exam_id
Alice 2014-2016 -7 2014 87 2016 80 14 8
Alice 2013-2014 -5 2013 92 2014 87 4 14
Alice 2011-2013 10 2011 82 2013 92 2 4
Alice 2010-2011 6 2010 76 2011 82 1 2
Alice 2010 76 1
Bob 2010-2013 -13 2010 55 2013 42 3 10
Bob 2009-2010 -15 2009 70 2010 55 5 3
Bob 2009 70 5
Dan 2012-2015 16 2012 84 2015 100 6 9
Dan 2012 84 6
Eric 2011 34 7

Alice is an easy example -- in 2010, she scored 87 points on exam #1. In 2011, she scored 82 points on exam #2.

If we wanted to record how exam#2 is related to exam#1, we could create a lookup table consisting of 2 columns -- the "current" exam id, and the id of the exam that preceded it, scoped by the student:

exam_id prior_id
2 1

This seems like it could be done recursively, but we can also use a correlated query:

SELECT 
	f.exam_id AS exam_id
	, (SELECT exam_id 
		FROM exams AS g
		WHERE 
			f.student = g.student 
			AND f.year > g.year
		ORDER BY 
			g.year DESC
		LIMIT 1
	) AS prior_id

FROM exams AS f;

Which gets us this result:

exam_id prior_id
14 4
1
2 1
3 5
4 2
5
6
7
8 14
9 6
10 3

We can make a view/temp table of the previous "lookups" query, but I like just throwing it into a common table expression:

WITH lookups AS
		( SELECT 
			f.exam_id AS exam_id
			, (SELECT exam_id 
				FROM exams AS g
				WHERE 
					f.student = g.student 
					AND f.year > g.year
				ORDER BY 
					g.year DESC
				LIMIT 1
			) AS prior_id
		FROM exams AS f)

SELECT 
	ex.student
	, px.year || '-' || ex.year
		AS time_period
	, ex.score - px.score
		AS score_diff
	, px.year AS prior_year
	, px.score AS prior_score
	, ex.year
	, ex.score
	, lookups.prior_id 
	, ex.exam_id 

FROM exams AS ex

INNER JOIN lookups
	ON ex.exam_id = lookups.exam_id
LEFT JOIN exams AS px
	ON lookups.prior_id = px.exam_id

ORDER BY 
	ex.student ASC 
	, ex.year DESC
;				
DROP TABLE IF EXISTS exams;
CREATE TABLE exams(
exam_id INTEGER
, year INTEGER
, student VARCHAR
, score INTEGER
);
CREATE INDEX exams_ix_id ON exams(exam_id);
CREATE INDEX exams_ix_student_year ON exams(student, year);
INSERT INTO exams
VALUES
(14, 2014, 'Alice', 87)
, (1, 2010, 'Alice', 76)
, (2, 2011, 'Alice', 82)
, (3, 2010, 'Bob', 55)
, (4, 2013, 'Alice', 92)
, (5, 2009, 'Bob', 70)
, (6, 2012, 'Dan', 84)
, (7, 2011, 'Eric', 34)
, (8, 2016, 'Alice', 80)
, (9, 2015, 'Dan', 100)
, (10, 2013, 'Bob', 42)
;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment