A little background on my context:
We run multiple Solr JVMs per box, which live in directories like:
/mnt/solr_8983
/mnt/solr_8984
/mnt/solr_8985
...
/mnt/solr_${PORT}
I also keep a directory at /mnt/solr that can be written by the solr user, whom all Solr processes run as.
I have two SolrCloud collections indexing identical content, there's two because they have different performance SLAs and retention requirements.
On solr1-collection1.mycorp.com:
curl "http://localhost:8983/solr/collection1_shard1_replica1/replication?command=backup&location=/mnt/solr/data"
Note that the backup call above is asynchronous. You can poll for completion by pinging this endpoint:
curl "http://localhost:8983/solr/collection1_shard1_replica1/replication?command=details&wt=json"
With a backup in progress you'll see something like this (ex, pipe the curl above into: jq .details.backup
):
[
"startTime",
"Mon Oct 13 18:18:19 UTC 2014",
"fileCount",
408,
"status",
"success",
"snapshotCompletedAt",
"Mon Oct 13 18:20:29 UTC 2014",
"snapshotName",
null
]
This endpoint is also useful for measuring how much disk the backup will occupy since it reports the size of the index on disk.
On solr1-collection2.mycorp.com:
cd /mnt/tmp/index_transfer
scp -r solr1-collection1.mycorp.com:/mnt/solr/data/snapshot.20141013181819643 .
cd /mnt/solr_8983/home/myconfig/collection2_shard1_replica1/
# stop indexers
curl "http://localhost:8983/solr/admin/cores?action=UNLOAD&core=collection1_shard1_replica1"
sudo mv ./data ./data_old
sudo mkdir ./data && sudo chown solr.solr ./data
sudo mv /mnt/tmp/index_transfer/snapshot.20141013181819643/ ./data/index
sudo chown -R solr.solr ./data
curl "http://localhost:8983/solr/admin/collections?action=CREATESHARD&shard=shard1&collection=collection2&node=solr1-collection2.mycorp.com:8983_solr"
Note that you only need to update the port designation in the node parameter, SolrCloud will handle routing for you otherwise.