Skip to content

Instantly share code, notes, and snippets.

@grischard
Created September 7, 2023 17:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save grischard/826c6a43be9bbfae5c2e96929c6d9497 to your computer and use it in GitHub Desktop.
Save grischard/826c6a43be9bbfae5c2e96929c6d9497 to your computer and use it in GitHub Desktop.

Planet on S3 proposal

Goals

  • Move planet.openstreetmap.org to S3
  • Maintain existing URLs to latest files
  • Maximise backwards compatibility
  • Make it possible to get notifications for files by subscribing to a prefix over aws sns
  • Simplify directory structure
    • Paul's goals:
      • put PBF and XML side-by-side -> ok
      • minutely replication under same tree as planet -> won't do, makes subscriptions too noisy
      • split out changesets/discussions and put under same tree as changeset replication
      • structure notes so we can add replication at some point

Proposed setup

Bucket called planet.openstreetmap.org set up for static website hosting. All the files are hosted in that bucket. We keep a backup outside of AWS.

Proposed layout

/planet/
  ├── changeset-latest.osm.bz2 (Redirect for backwards compatibility)
  ├── changeset-latest.osm.bz2.torrent (Redirect for backwards compatibility)
  ├── planet-latest.pbf (Redirect)
  ├── planet-latest.pbf.md5 (Redirect)
  ├── planet-latest.pbf.torrent (Redirect)
  ├── planet-latest.osm.bz2 (Redirect)
  ├── planet-latest.osm.bz2.md5 (Redirect)
  ├── planet-latest.osm.bz2.torrent (Redirect)
  ├── /YYYY/
  │   └── planet-YYMMDD.osm.pbf (with md5 and torrent)
  │   └── planet-YYMMDD.osm.bz2 (with md5 and torrent)
  ├── planet-pbf-rss.xml
  ├── planet-bz2-rss.xml
  ├── /replication/
  │   ├── /minute/
  │   ├── /hour/
  │   └── /day/

/full_history/
  ├── history-latest.osm.bz2.md5 (Redirect)
  ├── history-bz2-rss.xml
  ├── history-latest.osm.bz2.torrent
  ├── history-latest.osm.bz2 (Redirect)
  ├── /YYYY/
  │   └── history-YYMMDD.osm.bz2 (with md5 and torrent)

/statistics/
  └── data_stats.html

/changesets/
  ├── /replication/
  │   └── changesets/
  ├── /discussions/
  │   └── discussions-latest.bz2 (Redirect)
  │   └── discussions-latest.md5 (Redirect)
  │   └── discussions-latest.torrent (Redirect)
  │   └── /YYYY/
  │       └── discussions-YYMMDD.osm.bz2 (with md5 and torrent)
  │   └── discussions-bz2-rss.xml
  └── /history/
      └── history-latest.bz2 (Redirect)
      └── history-latest.md5 (Redirect)
      └── history-latest.torrent (Redirect)
      └── /YYYY/
          └── history-YYMMDD.osm.bz2 (with md5 and torrent)
      └── changesets-bz2-rss.xml

/notes/
  ├── /archive/
  │   └── planet-notes-latest.osn.bz2 (Redirect)
  │   └── planet-notes-latest.md5 (Redirect)
  │   └── /YYYY/
  │       └── planet-notes-YYMMDD.osn.bz2 (with md5)

/tile_logs/
  ├── /YYYY/
  │   └── tiles-YYYY-MM-DD.txt.xz
  │   └── apps-YYYY-MM-DD.csv
  │   └── hosts-YYYY-MM-DD.csv

/users/
  ├── /deleted/
  │   └── users_deleted.txt
  ├── /agreed/
  │   └── (three lists, unchanged)

Plus unchanged archives: gps, cc-by-sa, etc.

Handling of redirects

S3 doesn't natively handle redirects. There is a metadata tag you can set to redirect http clients, and the planet export script could run:

aws s3api copy-object \
  --bucket $BUCKET_NAME \
  --copy-source $BUCKET_NAME/planet-latest.osm.bz2 \
  --key planet-latest.osm.bz2 \
  --website-redirect "/$LATEST_BZ2" \
  --metadata-directive "REPLACE"

Clients using the s3:// protocol will however get the contents of the file. The file should therefore be small and contain a short help text:

"To access the latest file over s3://, see https://wiki.openstreetmap.org/wiki/Planet/S3"

Further work

  • Write/port README files for each directory
  • Write the documentation for the root directory
  • Write code examples to fetch the latest files, over http, torrent, and s3
  • Patch the generation code to update the redirects after uploading the file
  • Patch the rss generation code to use the new paths
  • Patch the torrent generation code to use the new paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment