Skip to content

Instantly share code, notes, and snippets.

@jsanz

jsanz/README.md Secret

Last active April 15, 2024 14:36
Show Gist options
  • Save jsanz/235570f46634269ee354c831f87caf65 to your computer and use it in GitHub Desktop.
Save jsanz/235570f46634269ee354c831f87caf65 to your computer and use it in GitHub Desktop.
[Elastic] Geospatial ES repositories

Geospatial ES repositories

In the table below (data.csv) there is a list of Elasticsearch datasets available as snapshots. Follow the instructions below to restore any of them in your cluster.

How to restore a dataset

First, edit the cluster elasticsearch.yml file and add the following setting:

repositories.url.allowed_urls: 
        - "https://storage.googleapis.com/jsanz-bucket/*"

Given a dataset you need to first create the repository using the repo_basepath column value

PUT /_snapshot/{repository_name}
{
  "type": "url",
  "settings": {
    "url": "https://storage.googleapis.com/jsanz-bucket/{repo_basepath}/"
  }
}

Warning ⚠️ Pay attention to the forward slash at the end of the URL!

Check the snapshot contents

GET _snapshot/{repository_name}/*

Then you can restore the snapshot; they typically hold a single index.

POST /_snapshot/{repository_name}/{snapshot}/_restore

You can override the index settings, for example if you are on a cluster with a single node you may want to avoid replicas

POST /_snapshot/{repository_name}/{snapshot}/_restore
{
  "index_settings": {
    "index.number_of_replicas": 0
  }
}

How to backup an index into a bucket

Follow instructions here to have a service account registered in the Elasticsearch keystore with NAME as secondary.

Create a repository in your cluster with

PUT _snapshot/{repository_name}
{
  "type": "gcs",
  "settings": {
    "bucket": "jsanz-bucket",
    "base_path": "v8/{repository_name}",
    "client": "secondary"
  }
}

Create a snapshot in the GCP bucket with:

PUT /_snapshot/{repository_name}/{snapshot_name}?wait_for_completion=true
{
  "indices": "index1, index2,...",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "jsanz",
    "taken_because": "{A description of the data backup}"
  }
}

Confirm with

GET _snapshot/{repository_name}/*
description version repo_basepath snapshot index size docs
Colorado GHCND observations (datastream) 8 v8/ghcnd ghcnd_daily_obs_ds ghcnd_daily_obs_ds 95.7mb 353480
GHCND observations 8 v8/ghcnd ghcnd_daily_observations ghcnd_daily_observations 22.2Gb 66712912
Eruptive pits (Cumbre Vieja) 8 v8/cumbre-vieja snapshot eruptive_pits 9.5kb 16
Earthquakes (Cumbre Vieja) 8 v8/cumbre-vieja snapshot earthquakes 2.7mb 17668
Lava Flows (Cumbre Vieja) 8 v8/cumbre-vieja snapshot lapalma 48.2mb 144
La Palma buildings (Cumbre Vieja) 8 v8/cumbre-vieja snapshot lapalma_buildings 255mb 422444
OSM Italy center 8 v8/osm osm_italy_centro osm_italy_centro 16.8gb 86005418
OSM Estonia 8 v8/osm osm_estonia osm_estonia 5.7gb 25575218
OSM Valencia (Spain) 8 v8/osm osm_spain_valencia osm_spain_valencia 4.9gb 24710000
OSM Arizona (USA) 8 v8/osm osm_usa_arizona osm_usa_arizona 10.3gb 62320000
OSM Andorra 8 v8/osm osm_andorra osm_andorra 110mb 569238
Madrid buildings 8 v8/geospatial_demos madrid_cadastre madrid_cadastre 1.1gb 2090990
Barcelona lines 8 v8/geospatial_demos barcelona_lines barcelona_lines 39230 39230
Geonames 8 v8/geospatial_demos geonames geonames 5.4gb 23936628
NYC 311 7 nyc311_repo snapshot_1 311 13.4gb 37516678
NYC Boroughs 7 nyc311_repo snapshot_2 nyc_boroughs 9.4mb 10
@nickpeihl
Copy link

This is fantastic! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment