Skip to content

Instantly share code, notes, and snippets.

@rygorous
Last active August 29, 2015 14:03
Embed
What would you like to do?
Let's make things a bit harder, shall we?
-- This is a simplified (toy) version of an actual problem I'm
-- trying to solve. It's written for SQLite 3.8.5.
-- We have this schema:
CREATE TABLE samples(curve_id INTEGER, x INTEGER, y REAL);
CREATE INDEX samples_idx ON samples(curve_id, x);
-- representing sample points in line charts (the curve_id identifies
-- which chart).
-- Now, when drawing these things, we're looking at some subset of the x axis
-- (say [x0,x1]) and want to get the end points of all line segments that
-- overlap [x0,x1]. Thus, in general we do not only need all points within
-- [x0,x1] (which is a simple BETWEEN test), but usually also their immediate
-- predecessors and successors wrt. x axis ordering. (These could be
-- arbitrarily far away, so indexing on (x,curve_id) wouldn't be useful.)
-- Let's say our query interval is [10,20]. Here's some test data:
INSERT INTO samples(curve_id, x, y) VALUES
(0, 0, -1.0),
(0, 5, 1.0),
(0, 15, 2.0),
(0, 25, 3.0),
(0, 35, -1.0),
(1, 11, 4.0),
(1, 19, 5.0),
(2, 5, -1.0),
(2, 10, 6.0),
(2, 20, 7.0),
(2, 25, -1.0);
-- A correct query should return all the points marked up with a positive
-- y coordinate, in sequential order, with nothing missing.
-- Of the ways I can come up with, this seems to be the best overall:
WITH
-- Parameters phrased as CTE since I tested this with the SQLite
-- CLI, in practice these would just be named parameters.
range(x0, x1) AS (VALUES(10, 20))
SELECT s.curve_id, s.x, s.y
-- In a first step, we determine an expanded range for each curve
-- that includes the extra points we need.
FROM (
-- If a point is the first or last on the chart, it might not have
-- a predecessor/successor. Hence the LEFT JOIN on the filtered
-- samples and the munging with "coalesce" to substitute in our
-- query range start/end when there were no matches.
SELECT cur.curve_id AS curve_id,
coalesce(max(s.x), range.x0) AS x0,
coalesce(min(t.x), range.x1) AS x1
FROM (SELECT DISTINCT curve_id FROM samples) cur, range
LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x <= range.x0
LEFT JOIN samples t ON t.curve_id=cur.curve_id AND t.x >= range.x1
GROUP BY cur.curve_id
) covered_range
JOIN samples s ON s.curve_id = covered_range.curve_id
AND s.x BETWEEN covered_range.x0 AND covered_range.x1
ORDER BY s.curve_id, s.x;
-- Except that doesn't work very well either, because the double left join
-- in the covered_range subquery ends up as a nested loop. So I ended up
-- with the even more involved:
WITH
range(x0, x1) AS (VALUES(10, 20))
SELECT s.curve_id, s.x, s.y
FROM (
SELECT cur.curve_id AS curve_id, coalesce(max(s.x), range.x0) AS x
FROM (SELECT DISTINCT curve_id FROM samples) cur, range
LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x <= range.x0
GROUP BY cur.curve_id
) range_l
JOIN (
SELECT cur.curve_id AS curve_id, coalesce(min(s.x), range.x1) AS x
FROM (SELECT DISTINCT curve_id FROM samples) cur, range
LEFT JOIN samples s ON s.curve_id=cur.curve_id AND s.x >= range.x1
GROUP BY cur.curve_id
) range_r ON range_r.curve_id = range_l.curve_id
JOIN samples s ON s.curve_id = range_l.curve_id
AND s.x BETWEEN range_l.x AND range_r.x
ORDER BY s.curve_id, s.x;
@cocoademon
Copy link

Here's a trivial join-free version for a single curve only that might give you some ideas. Notably the use of UNION means it doesn't do any joins or coalescing to get the coverage range. If there are no samples outside the range, then the first and last selects in the UNION don't contribute.

SELECT s.curve_id, s.x, s.y
    FROM
        (
            SELECT * FROM
                (SELECT * FROM samples WHERE (x <= 10) AND curve_id=0 ORDER BY x DESC LIMIT 1)
            UNION SELECT * FROM samples WHERE x BETWEEN 10 AND 20 AND curve_id=0 
            UNION SELECT * FROM
                (SELECT * FROM samples WHERE (x >= 20) AND curve_id=0 ORDER BY x ASC LIMIT 1)
        ) s    
    ORDER BY
        s.curve_id, s.x;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment