Let's make things a bit harder, shall we?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- This is a simplified (toy) version of an actual problem I'm | |
| -- trying to solve. It's written for SQLite 3.8.5. | |
| -- We have this schema: | |
| CREATE TABLE samples(curve_id INTEGER, x INTEGER, y REAL); | |
| CREATE INDEX samples_idx ON samples(curve_id, x); | |
| -- representing sample points in line charts (the curve_id identifies | |
| -- which chart). | |
| -- Now, when drawing these things, we're looking at some subset of the x axis | |
| -- (say [x0,x1]) and want to get the end points of all line segments that | |
| -- overlap [x0,x1]. Thus, in general we do not only need all points within | |
| -- [x0,x1] (which is a simple BETWEEN test), but usually also their immediate | |
| -- predecessors and successors wrt. x axis ordering. (These could be | |
| -- arbitrarily far away, so indexing on (x,curve_id) wouldn't be useful.) | |
| -- Let's say our query interval is [10,20]. Here's some test data: | |
| INSERT INTO samples(curve_id, x, y) VALUES | |
| (0, 0, -1.0), | |
| (0, 5, 1.0), | |
| (0, 15, 2.0), | |
| (0, 25, 3.0), | |
| (0, 35, -1.0), | |
| (1, 11, 4.0), | |
| (1, 19, 5.0), | |
| (2, 5, -1.0), | |
| (2, 10, 6.0), | |
| (2, 20, 7.0), | |
| (2, 25, -1.0); | |
| -- A correct query should return all the points marked up with a positive | |
| -- y coordinate, in sequential order, with nothing missing. | |
| -- Of the ways I can come up with, this seems to be the best overall: | |
| WITH | |
| -- Parameters phrased as CTE since I tested this with the SQLite | |
| -- CLI, in practice these would just be named parameters. | |
| range(x0, x1) AS (VALUES(10, 20)) | |
| SELECT s.curve_id, s.x, s.y | |
| -- In a first step, we determine an expanded range for each curve | |
| -- that includes the extra points we need. | |
| FROM ( | |
| -- If a point is the first or last on the chart, it might not have | |
| -- a predecessor/successor. Hence the LEFT JOIN on the filtered | |
| -- samples and the munging with "coalesce" to substitute in our | |
| -- query range start/end when there were no matches. | |
| SELECT cur.curve_id AS curve_id, | |
| coalesce(max(s.x), range.x0) AS x0, | |
| coalesce(min(t.x), range.x1) AS x1 | |
| FROM (SELECT DISTINCT curve_id FROM samples) cur, range | |
| LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x <= range.x0 | |
| LEFT JOIN samples t ON t.curve_id=cur.curve_id AND t.x >= range.x1 | |
| GROUP BY cur.curve_id | |
| ) covered_range | |
| JOIN samples s ON s.curve_id = covered_range.curve_id | |
| AND s.x BETWEEN covered_range.x0 AND covered_range.x1 | |
| ORDER BY s.curve_id, s.x; | |
| -- Except that doesn't work very well either, because the double left join | |
| -- in the covered_range subquery ends up as a nested loop. So I ended up | |
| -- with the even more involved: | |
| WITH | |
| range(x0, x1) AS (VALUES(10, 20)) | |
| SELECT s.curve_id, s.x, s.y | |
| FROM ( | |
| SELECT cur.curve_id AS curve_id, coalesce(max(s.x), range.x0) AS x | |
| FROM (SELECT DISTINCT curve_id FROM samples) cur, range | |
| LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x <= range.x0 | |
| GROUP BY cur.curve_id | |
| ) range_l | |
| JOIN ( | |
| SELECT cur.curve_id AS curve_id, coalesce(min(s.x), range.x1) AS x | |
| FROM (SELECT DISTINCT curve_id FROM samples) cur, range | |
| LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x >= range.x1 | |
| GROUP BY cur.curve_id | |
| ) range_r ON range_r.curve_id = range_l.curve_id | |
| JOIN samples s ON s.curve_id = range_l.curve_id | |
| AND s.x BETWEEN range_l.x AND range_r.x | |
| ORDER BY s.curve_id, s.x; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's a trivial join-free version for a single curve only that might give you some ideas. Notably the use of UNION means it doesn't do any joins or coalescing to get the coverage range. If there are no samples outside the range, then the first and last selects in the UNION don't contribute.