Skip to content

Instantly share code, notes, and snippets.

@dmfenton
Created May 5, 2016 19:01
Show Gist options
  • Save dmfenton/b5cfa5b38d78e637e4f9b6b28609be2d to your computer and use it in GitHub Desktop.
Save dmfenton/b5cfa5b38d78e637e4f9b6b28609be2d to your computer and use it in GitHub Desktop.

Lessons learned AWS GeoSpatial Big Data System

A team working under a DARPA contract moved a system for geolocating iamges from a local cluster to AWS.

  • aws s3 => always sync thrice
  • do your own key store in the root of your bucket?
  • map high volume IO to local ephemeral disks
  • if you are using a NAT within a VPC, know that all traffic moves through that instance so size it accordingly
  • snapshots are cheap, compressed, differential
  • https://libcloud.apache.org/ is a library that abstracts different cloud providers into the same API

Oak Ridge National Lab

A research scientists presented on a ton of cool big data applications, mostly around tracking ecological health

Slides available here: https://t.co/3scXM0GCYT

HHypermap

HHypermap is an opendata.arcgis.com like project from Harvard and Terranodo to index all the geospatial services on the web. There is a lot of overlap with our project and a lot of areas of collaboration e.g. we could share our index with them or use their uptime monitoring statistics.

  • http://hypersearch.cga.terranodo.io/
  • Every remote service is cached
  • Initial top level view is cached
    • Using map proxy in the background
  • Harvest initial view
  • Lucene backed
  • Building feature level search 20million features
    • coming in the next couple months
  • Uptime stats on everything using it in ranking

What's coming in Postgres 9.6/9.7

Hagander.net/talks/PostgreSQL_9.6.pdf

  • Parallelism
    • helps with cpu bound operations
    • mark function parallel safe
    • no json
    • no string
    • no array
  • Foreign data wrappers have more pushdown capabilities
  • Datetimes faster
  • Heavy write loads faster dues to a better locking scheme

GeoWave - Distributed GeoSpatial Indexing

Geowave is a project sponsored by NSA for indexing billions of features. It's key innovation is using a Hilbert Space-Filling curve.

Valhalla Routing

Valhalla is Mapzen's routing implementation.

  • valhalla now supports multimodal

  • mapzen has a mobility team

  • valhalla can be embedded due to it's dynamic runtime costing

  • single dataset multiple route types + options

  • super granular on just biking

    • bike type
    • use roads
    • hills

very personalized routing

  • routing tiles
    • local
      • roads and paths
    • arterial
      • remove road and paths
  • highways
    • trunks, highways

Precision agriculture

tractors are autosteered

  • connected to large WIFI network
  • 200k pts of data per day multiple crop management zones
  • plant seeds based on the soil characteristics know exactly where the fields are

sats, drones, stationary sensors

1 billion points a day all the apis are different 3-4 week turnaround for download 99% has spatial attributes

big mongo bucket => postgis

filo db? cassandra with spatial hooks

us sugar esri license they aren't using

hardware vendors are fighting sharing with opendata

using spotfire for analytics

Mapping the planet from outer space

earth observatory

sharing spatial imagery formats is a challenge lots of different products: processing levels, coverage, time

nisar is going to dramatically increase the data coming in per day

github nasa-gibs

all this stuff is developed in house

TIE brings everything together into a Meta-Raster-Format

investigating move to the cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment