Skip to content

Instantly share code, notes, and snippets.

View nyurik's full-sized avatar

Yuri Astrakhan nyurik

View GitHub Profile
@nyurik
nyurik / types.rs
Last active February 21, 2022 22:56
Multidimensional Geo-types with separate Metadata
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / types.rs
Last active February 21, 2022 16:21
Multidimensional Geo-types
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / denormalize_osm_data.md
Last active February 15, 2022 06:53
Convenient OSM data

OpenStreetMap data is heavily normalized, making it very hard to process. Modeled on a relational database, it seems to have missed the second part of the "Normalize until it hurts; denormalize until it works" proverb.

Each node has an ID, and every way and relation uses an ID to reference that node. This means that every data consumer must keep an enrmous cache of 8 billion node IDs and corresponding lat,lng pairs while processing input data. In most cases, node ID gets discarded right after parsing.

I would like to propose a new easy to process data strucutre, for both bulk downloads and streaming update use cases.

Target audience

  • YES -- Data consumers who transform OSM data into something else, i.e. tiles, shapes, analytical reports, etc.
@nyurik
nyurik / is_tf_in_pr.py
Created April 28, 2021 14:43
A script to detect when Terraform projects or depended modules are part of a GIT pull request change
#!/usr/bin/env python3
# A script to detect when Terraform projects or depended modules are part of a GIT pull request change
#
# Usage: python3 pr_tf_changes.py <branch> <dir>...
#
# <branch> GIT branch to compare using git diff branch... shell call
# <dir> One or more directories to monitor, including all sub-dirs, relative to repo's root
#
# Set DEBUG env var to see additional debugging information
# If match is found, exitcode is 0, otherwise 1
{
"$schema": "https://vega.github.io/schema/vega/v3.0.json",
"data": [
{
"name": "data",
"values": {
"aggregations": {
"pairs": {
"buckets": [
{"key": "aa:cc", "doc_count": 10},
@nyurik
nyurik / kibana_vega_question_guide.md
Last active June 3, 2020 21:46
How to submit Kibana Vega question

How to submit Kibana Vega question

It is usually very difficult to debug Vega questions without having your data. To make it easier, please follow these steps to include data with your graph when posting:

  • Reduce your data query to the smallest possible dataset, e.g. set the time range to 15 minutes. It will work as long as it is not empty and represents your data well enough.
  • Open Browser Debugger (for Chrome, right click and click Inspect)
  • Switch to the Console tab
  • Copy the right command, paste it in the console at the > symbol and hit enter (check the schema in your graph to see if you use Vega or Vega-Lite)
@nyurik
nyurik / OptimizeLabelGrid.sql
Last active May 19, 2020 22:23
Optimizing LabelGrid - the result is worse than before??
------- Testing:
-- git clone https://github.com/openmaptiles/openmaptiles
-- git checkout upgrade-v5-pg12
-- place this file in the openmaptiles/ dir as "test-func.sql"
-- use make start-db to create a new database (in docker)
-- use make bash to start tools (another docker)
-- test with this command. The test call is taken from the openmaptiles-tools/tests/sql/LabelGrid.sql test. Note the "volatile" keyword - without it the query planner will optimize away multiple calls with the same value.
-- profile-pg-func --file test-func.sql "LabelGrid_pgsql(ST_GeomFromText('POINT(100 -100)',900913), 64*9.5546285343)" "LabelGrid_sql(ST_GeomFromText('POINT(100 -100)',900913), 64*9.5546285343)"
-- The results are not that great:
{
$schema: https://vega.github.io/schema/vega/v3.json
data: [
{
name: esdata
url: {
%context%: true
%timefield%: @timestamp
index: logstash-*
body: {
{
"extra": {
"merge-plugin": {
"include": [
"extensions/Wikibase/composer.json",
"../settings.d/composer/*.json"
]
}
}
}
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
fixes['osmlinks'] = {
'regex': True,
'nocase': True,
'msg': {
# '_default': 'Param cleanup, remove obsolete lang parameters - template detects it automatically',
'_default': 'Tag template cleanup - format combinations, use proper {{Tag}} template with params, remove kl= and vl= (handled automatically)',
},