How to perform a binary search in SQL to efficiently find records in a logs
table where the created_at
date has no type of any indices. This method narrows down the search range by iteratively adjusting the id
range, and then performs the final query on the optimized range, while id
is sequential auto increment column.
Find the logs which have been created after 2024-04-21.
-- Step 1: Determine the range of `id`s
SELECT MIN(id) INTO min_id FROM logs;
SELECT MAX(id) INTO max_id FROM logs;
-- Initialize variables
SET search_date = '2024-04-21';
SET found = FALSE;
-- Step 2: Perform the binary search
WHILE NOT found LOOP
SET mid_id = (min_id + max_id) / 2;
-- Check the `created_at` value at `mid_id`
SELECT created_at INTO mid_date FROM logs WHERE id = mid_id;
IF mid_date >= search_date THEN
-- If the `created_at` at mid_id is greater than or equal to the search date,
-- move the upper bound to the midpoint
SET max_id = mid_id;
ELSE
-- Otherwise, move the lower bound to the midpoint
SET min_id = mid_id;
END IF;
-- If the range is sufficiently narrow, exit the loop
IF max_id - min_id <= 1 THEN
SET found = TRUE;
END IF;
END LOOP;
-- Step 3: Perform the final query using the found `id` range
SELECT *
FROM logs
WHERE id >= min_id
AND created_at > '2024-04-21';
SELECT MIN(id) INTO min_id FROM logs;
andSELECT MAX(id) INTO max_id FROM logs;
determine the minimum and maximumid
values in thelogs
table.
- A loop is used to perform binary search to narrow down the
id
range. mid_id
is calculated as the midpoint of the current range.- The
created_at
value atmid_id
is checked. - If the
created_at
atmid_id
is greater than or equal to the target date (search_date
), the upper bound (max_id
) is moved tomid_id
. - Otherwise, the lower bound (
min_id
) is moved tomid_id
. - The loop continues until the range is sufficiently narrow (
max_id - min_id <= 1
).
- Once the binary search loop narrows down the
id
range, a final query is performed to select the desired records withcreated_at
greater than '2024-04-21'.
This method is useful for optimizing queries on large datasets where a full table scan is impractical, especially when there is no index on the created_at
column.
Generated @ poe.com powered by GPT-4o ❤️