Skip to content

Instantly share code, notes, and snippets.

View nyurik's full-sized avatar

Yuri Astrakhan nyurik

View GitHub Profile
View gist:3f44ad42b062da14243df8159e058b1b
https://docs.google.com/presentation/d/1Rl3k_bu7e3YZ-p8mGoQ-rqeJIUUlr6JfDSTCg3cWAog
@nyurik
nyurik / format.rs
Last active June 1, 2023 01:21
Rust format! double referencing performance impact
View format.rs
// Place this page as /benches/format.rs in a rust project created with `cargo new fmttest --lib`
// Add to Cargo.toml:
//
// [dev-dependencies]
// criterion = { version = "0.4", features = ["html_reports"] }
//
// [[bench]]
// name = "format"
// harness = false
@nyurik
nyurik / benches_iters.rs
Last active May 9, 2023 02:37
Add “iterate with separators” iterator function
View benches_iters.rs
// Benchmarks for the Rust iterator extension discussion at
// https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13
// Place this page as /benches/iters.rs in a rust project created with `cargo new itertest --lib`
// Add to Cargo.toml:
//
// [dependencies]
// itertools = "0.10"
//
@nyurik
nyurik / bench.rs
Created April 5, 2023 19:30
Benchmark to evaluate linear vs dup-indexer performance
View bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use dup_indexer::DupIndexer;
fn benchmark_strings(c: &mut Criterion) {
let mut group = c.benchmark_group("Strings");
group.bench_function("String", |b| {
b.iter(|| {
let mut di = DupIndexer::new();
for _ in 0..100 {
@nyurik
nyurik / query.sql
Created January 16, 2023 05:58
Statistics of MVT tile GIS errors when encoding/decoding with ST_AsMVTGeom
View query.sql
SELECT x,
y,
ST_Y(p_mid) mid_lat,
ST_Y(p_min) min_lat,
ST_Y(p_max) max_lat,
ST_Y(d_mid) mid_lat_decoded,
ST_Y(d_min) min_lat_decoded,
ST_Y(d_max) max_lat_decoded,
abs(ST_Y(p_mid) - ST_Y(d_mid)) mid_lat_error,
abs(ST_Y(p_min) - ST_Y(d_min)) min_lat_error,
@nyurik
nyurik / optimized-mbtiles.sql
Created June 10, 2022 19:25
Some ideas on optimizing mbtiles file storage size with a single 32-bit index instead of z/x/y
View optimized-mbtiles.sql
create table map
(
tile_index INTEGER,
tmp_zoom INTEGER GENERATED ALWAYS AS ((tile_index & 0xFC000000) >> 26) VIRTUAL,
tile_column INTEGER GENERATED ALWAYS AS (CASE
WHEN tmp_zoom <= 13 THEN (tile_index & 0x3FFE000) >> 13
ELSE (tile_index & 0x3FFF8000) >> 15 END) VIRTUAL,
tile_row INTEGER GENERATED ALWAYS AS (CASE
WHEN tmp_zoom <= 13 THEN tile_index & 0x1FFF
ELSE tile_index & 0x7FFF END) VIRTUAL,
@nyurik
nyurik / types.rs
Last active February 21, 2022 22:56
Multidimensional Geo-types with separate Metadata
View types.rs
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / types.rs
Last active February 21, 2022 16:21
Multidimensional Geo-types
View types.rs
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / denormalize_osm_data.md
Last active February 15, 2022 06:53
Convenient OSM data
View denormalize_osm_data.md

OpenStreetMap data is heavily normalized, making it very hard to process. Modeled on a relational database, it seems to have missed the second part of the "Normalize until it hurts; denormalize until it works" proverb.

Each node has an ID, and every way and relation uses an ID to reference that node. This means that every data consumer must keep an enrmous cache of 8 billion node IDs and corresponding lat,lng pairs while processing input data. In most cases, node ID gets discarded right after parsing.

I would like to propose a new easy to process data strucutre, for both bulk downloads and streaming update use cases.

Target audience

  • YES -- Data consumers who transform OSM data into something else, i.e. tiles, shapes, analytical reports, etc.