Skip to content

Instantly share code, notes, and snippets.

Created September 7, 2023 17:42
Show Gist options
  • Save grischard/826c6a43be9bbfae5c2e96929c6d9497 to your computer and use it in GitHub Desktop.
Save grischard/826c6a43be9bbfae5c2e96929c6d9497 to your computer and use it in GitHub Desktop.

Planet on S3 proposal


  • Move to S3
  • Maintain existing URLs to latest files
  • Maximise backwards compatibility
  • Make it possible to get notifications for files by subscribing to a prefix over aws sns
  • Simplify directory structure
    • Paul's goals:
      • put PBF and XML side-by-side -> ok
      • minutely replication under same tree as planet -> won't do, makes subscriptions too noisy
      • split out changesets/discussions and put under same tree as changeset replication
      • structure notes so we can add replication at some point

Proposed setup

Bucket called set up for static website hosting. All the files are hosted in that bucket. We keep a backup outside of AWS.

Proposed layout

  ├── changeset-latest.osm.bz2 (Redirect for backwards compatibility)
  ├── changeset-latest.osm.bz2.torrent (Redirect for backwards compatibility)
  ├── planet-latest.pbf (Redirect)
  ├── planet-latest.pbf.md5 (Redirect)
  ├── planet-latest.pbf.torrent (Redirect)
  ├── planet-latest.osm.bz2 (Redirect)
  ├── planet-latest.osm.bz2.md5 (Redirect)
  ├── planet-latest.osm.bz2.torrent (Redirect)
  ├── /YYYY/
  │   └── planet-YYMMDD.osm.pbf (with md5 and torrent)
  │   └── planet-YYMMDD.osm.bz2 (with md5 and torrent)
  ├── planet-pbf-rss.xml
  ├── planet-bz2-rss.xml
  ├── /replication/
  │   ├── /minute/
  │   ├── /hour/
  │   └── /day/

  ├── history-latest.osm.bz2.md5 (Redirect)
  ├── history-bz2-rss.xml
  ├── history-latest.osm.bz2.torrent
  ├── history-latest.osm.bz2 (Redirect)
  ├── /YYYY/
  │   └── history-YYMMDD.osm.bz2 (with md5 and torrent)

  └── data_stats.html

  ├── /replication/
  │   └── changesets/
  ├── /discussions/
  │   └── discussions-latest.bz2 (Redirect)
  │   └── discussions-latest.md5 (Redirect)
  │   └── discussions-latest.torrent (Redirect)
  │   └── /YYYY/
  │       └── discussions-YYMMDD.osm.bz2 (with md5 and torrent)
  │   └── discussions-bz2-rss.xml
  └── /history/
      └── history-latest.bz2 (Redirect)
      └── history-latest.md5 (Redirect)
      └── history-latest.torrent (Redirect)
      └── /YYYY/
          └── history-YYMMDD.osm.bz2 (with md5 and torrent)
      └── changesets-bz2-rss.xml

  ├── /archive/
  │   └── planet-notes-latest.osn.bz2 (Redirect)
  │   └── planet-notes-latest.md5 (Redirect)
  │   └── /YYYY/
  │       └── planet-notes-YYMMDD.osn.bz2 (with md5)

  ├── /YYYY/
  │   └── tiles-YYYY-MM-DD.txt.xz
  │   └── apps-YYYY-MM-DD.csv
  │   └── hosts-YYYY-MM-DD.csv

  ├── /deleted/
  │   └── users_deleted.txt
  ├── /agreed/
  │   └── (three lists, unchanged)

Plus unchanged archives: gps, cc-by-sa, etc.

Handling of redirects

S3 doesn't natively handle redirects. There is a metadata tag you can set to redirect http clients, and the planet export script could run:

aws s3api copy-object \
  --bucket $BUCKET_NAME \
  --copy-source $BUCKET_NAME/planet-latest.osm.bz2 \
  --key planet-latest.osm.bz2 \
  --website-redirect "/$LATEST_BZ2" \
  --metadata-directive "REPLACE"

Clients using the s3:// protocol will however get the contents of the file. The file should therefore be small and contain a short help text:

"To access the latest file over s3://, see"

Further work

  • Write/port README files for each directory
  • Write the documentation for the root directory
  • Write code examples to fetch the latest files, over http, torrent, and s3
  • Patch the generation code to update the redirects after uploading the file
  • Patch the rss generation code to use the new paths
  • Patch the torrent generation code to use the new paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment