Skip to content

Instantly share code, notes, and snippets.

@oodavid
Created April 12, 2012 20:01
Show Gist options
  • Save oodavid/2370598 to your computer and use it in GitHub Desktop.
Save oodavid/2370598 to your computer and use it in GitHub Desktop.
MySQL - Group by grid-square

MySQL - Group by grid-square

Inspired by the police.uk crime maps, where they cleverly cluster data together depending on how far zoomed in you are, this MySQL snippet goes a long way to explaining the logic behind such an operation.

A perfect grid

This snippet groups sites to a perfect grid while the police.uk algorithm has some sort of trickery that moves their points about somewhat, looking at their blogs and documentation I would conclude that they have a predefined set of points, each with a catchment area of postcodes. It looks like a whole lot of scraping and manual labour went into their database and quite frankly I don't have the time nor inclination do attempt such a task.

Example

One option would be to use a triangular grid which would give your circular points a bit more breathing space Another would be to group into the grids as we do in the example, HOWEVER take an average lat / lng value of the points within to give a more reasonable centre point - probably be easier to boot.

--
-- This example inserts a massive set of lat/lng points into a database, then selects them back
-- grouped into grid-squares
--
-- Switch to the test table (don't pollute the real database)
USE `test`;
--
-- CREATE SOME DUMMY DATA
--
-- Create our points table
-- using FLOAT(10,6) as anything more accurate is a bit useless (for my application anyhow)
-- the index on lat and lng is important for speed
DROP TABLE IF EXISTS `points`;
CREATE TABLE `points` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`lat` FLOAT(10,6) NOT NULL,
`lng` FLOAT(10,6) NOT NULL,
INDEX `latlng` (`lat`,`lng`)
) ENGINE=InnoDB;
-- Insert a blank row into the db (you'll see why in a moment)
INSERT INTO `points` (lat, lng) VALUES (0, 0);
-- Using that first "seed" insert new rows, repeat this statement to get an exponential amount of rows
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 2
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 4
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 8
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 16
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 32
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 64
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 128
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 256
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 512
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 1024
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 2048
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 4096
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 8192
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 16384
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 32768
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 65536
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 131072
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 262144
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 524288
INSERT INTO `points` (lat, lng) SELECT (RAND() * 180 - 90), (RAND() * 360 - 180) FROM `points`; -- 1048576
-- If you've gone the whole hog and inserted a million rows then you might need to wait till your index is fully populated before continueing...
-- but really, that last statement took 25 minutes
-- go put the kettle on :-)
-- Delete that first row with crap values
DELETE FROM `points` WHERE lat=0 AND lng=0;
--
-- SELECT SOME GROUPED DATA
--
SET @lng_min = -20; -- LEFT
SET @lng_max = 20; -- RIGHT
SET @lat_min = 0; -- TOP
SET @lat_max = 20; -- BOTTOM
SET @gridsize = 5; -- for rounding logic
-- Select the points that fit within the grid, grouped into grid-squares
SELECT
(ROUND(lat / @gridsize) * @gridsize) AS rlat,
(ROUND(lng / @gridsize) * @gridsize) AS rlng,
AVG(lat) AS alat,
AVG(lng) AS alng,
COUNT(*) AS numpoints
FROM `points`
WHERE
lat >= @lat_min AND
lat <= @lat_max AND
lng >= @lng_min AND
lng <= @lng_max
GROUP BY rlat, rlng;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment