In the table below (data.csv
) there is a list of Elasticsearch datasets available as snapshots. Follow the instructions below to restore any of them in your cluster.
First, edit the cluster elasticsearch.yml
file and add the following setting:
repositories.url.allowed_urls:
- "https://storage.googleapis.com/jsanz-bucket/*"
Given a dataset you need to first create the repository using the repo_basepath
column value
PUT /_snapshot/{repository_name}
{
"type": "url",
"settings": {
"url": "https://storage.googleapis.com/jsanz-bucket/{repo_basepath}/"
}
}
Warning
⚠️ Pay attention to the forward slash at the end of the URL!
Check the snapshot contents
GET _snapshot/{repository_name}/*
Then you can restore the snapshot; they typically hold a single index.
POST /_snapshot/{repository_name}/{snapshot}/_restore
You can override the index settings, for example if you are on a cluster with a single node you may want to avoid replicas
POST /_snapshot/{repository_name}/{snapshot}/_restore
{
"index_settings": {
"index.number_of_replicas": 0
}
}
Follow instructions here to have a service account registered in the Elasticsearch keystore with NAME
as secondary
.
Create a repository in your cluster with
PUT _snapshot/{repository_name}
{
"type": "gcs",
"settings": {
"bucket": "jsanz-bucket",
"base_path": "v8/{repository_name}",
"client": "secondary"
}
}
Create a snapshot in the GCP bucket with:
PUT /_snapshot/{repository_name}/{snapshot_name}?wait_for_completion=true
{
"indices": "index1, index2,...",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "jsanz",
"taken_because": "{A description of the data backup}"
}
}
Confirm with
GET _snapshot/{repository_name}/*
This is fantastic! Thank you!