Skip to content

Instantly share code, notes, and snippets.

@atlefren
Created January 11, 2021 09:19
Show Gist options
  • Save atlefren/152c8dc22156394639cba63eb6008aa6 to your computer and use it in GitHub Desktop.
Save atlefren/152c8dc22156394639cba63eb6008aa6 to your computer and use it in GitHub Desktop.

An Event-Based Pipeline for Geospatial Vector Data Management

The Internet and digitization have changed the way geospatial data is distributed and has made map data easier to find than ever. New technologies and techniques for surveying, monitoring, and disseminating geospatial data has created a data abundance. The digitization of maps also brought changes to how map data is created and maintained. Traditionally, the high cost of surveying meant that map production was the domain of the state and large corporations. New methods and technologies paved the way for crowdsourced map data, termed Volunteered Geographical Information (VGI). In parallel with this development, states and governmental institutions changed their practices. The concept of Open Data challenged and changed the practice of selling map data to third parties, and more and more governmental geospatial data is now available under an open licence, free of charge.

Thus, due to sources such as VGI and Open Data, geospatial data is no longer a scarce resource. An abundance of data has replaced data scarcity. The challenge is no longer to get hold of data, but to handle an ever-increasing flow of both new datasets and revisions to existing datasets.

Event sourcing is a pattern for data storage and management that uses events as a mechanism for storage. The event sourcing pattern is well suited for handling a stream of geospatial data with increasing volume and velocity. In addition, it should be capable of handling data variety without sacrificing consistency. However, open governmental geospatial data is rarely distributed as a stream of events but is usually updated in bulk at regular intervals.

This thesis argues that the principle of event sourcing is a viable and effective solution to many of the problems related to managing a large amount of heterogenous spatial datasets. By leveraging the services available through a public cloud provider, a scalable, resilient, and performant solution can be created. In addition, this thesis shows how data from different datasets can be combined through the use of micro-tasking, and how data can be used through read projections and event listeners.

The scientific contributions of this thesis consist of four published peer-reviewed research articles, detailing algorithms and approaches related to the implementation of an event-based pipeline for geospatial vector data management. The thesis further details the components needed to implement such a pipeline and concludes that this is a viable approach that will ensure several benefits over a traditional management pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment