Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save rodneytamblyn/eb2af5b50cb6d991a93dd4564f6ae568 to your computer and use it in GitHub Desktop.
Save rodneytamblyn/eb2af5b50cb6d991a93dd4564f6ae568 to your computer and use it in GitHub Desktop.
CASSANDRA SNAPSHOTING
//READ nodetool snapshot documentation in cassandra
//READ stack-overflow-topic gist or directly in stackoverflow -> http://stackoverflow.com/questions/25465904/how-can-i-restore-cassandra-snapshots
//EXPORT the databases schemas
cqlsh -e "DESCRIBE SCHEMA" > my_backup_name.schema
//Create snapshot of the whole server
nodetool snapshot my_backup_name
//Compress the backups
tar czf my_backup_name.tgz cassandra/data/*/*/snapshots/my_backup_name
//Copy the schema file {my_backup_name.schema} on the other server (see scp command)
//Copy the snapshot file {my_backup_name.tgz} to the other server
//Restore the schema on the other server
cqlsh -f my_backup_name.schema
//Extract the tar
tar xzf my_backup_name.tgz
//Execute the script from the gist called restore_snapshot-location_snapshot-name.sh with the following parameters
restore_snapshot-location_snapshot-name.sh {extractedTarLocation} {my_backup_name}
//PROFIT :)
#!/bin/bash
KEYSPACES=`cqlsh -e "DESCRIBE KEYSPACES"`
TABLE_LIST=`nodetool cfstats | grep "Table: " | sed -e 's+^.*: ++'`
for KEYSPACE in $KEYSPACES; do
if [ "$KEYSPACE" = "system" ]; then
echo ------------------SYSTEM KEYSPACE ${KEYSPACE}: SKIPPING---------------------------
continue
fi
if [ "$KEYSPACE" = "system_traces" ]; then
echo ------------------SYSTEM KEYSPACE ${KEYSPACE}: SKIPPING---------------------------
continue
fi
echo "-------------------Restoring: ${KEYSPACE}-------------------------"
echo "Tables to restore:"
TABLE_LIST=`cqlsh -e "DESCRIBE KEYSPACE ${KEYSPACE}" | grep "CREATE TABLE" | sed -e 's+CREATE TABLE++' | sed -e 's+(++'`
echo $TABLE_LIST
for table in $TABLE_LIST; do
if [ ! -d "${1}var/lib/cassandra/data/${KEYSPACE}/${table}/snapshots/${2}" ]; then
echo SKIP "${1}var/lib/cassandra/data/${KEYSPACE}/${table}/snapshots/${2}";
else
echo START COPY ${1}/var/lib/cassandra/data/${KEYSPACE}/${table}/snapshots/${2}
sudo cp -a "${1}var/lib/cassandra/data/${KEYSPACE}/${table}/snapshots/${2}/." "/var/lib/cassandra/data/${KEYSPACE}/${table}/"
echo COPIED TO /var/lib/cassandra/data/${KEYSPACE}/${table}
sudo chown -R cassandra /var/lib/cassandra/data/${KEYSPACE}/${table}
sudo nodetool refresh -- ${KEYSPACE} ${table}
echo NODE TOOL SUCCESS
fi
done
echo "Resotred: ${KEYSPACE}----------------"
done
I'm building a backup and restore process for a Cassandra database so that it's ready when I need it, and so that I understand the details in order to build something that will work for production. I'm following Datastax's instructions here:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html.
As a start, I'm seeding the database on a dev box then attempting to make the backup/restore work. Here's the backup script:
#!/bin/bash
cd /opt/apache-cassandra-2.0.9
./bin/nodetool clearsnapshot -t after_seeding makeyourcase
./bin/nodetool snapshot -t after_seeding makeyourcase
cd /var/lib/
tar czf after_seeding.tgz cassandra/data/makeyourcase/*/snapshots/after_seeding
Yes, tar is not the most efficient way, perhaps, but I'm just trying to get something working right now. I've checked the tar, and all the files are there.
Once the database is backed up, I shut down Cassandra and my app, then rm -rf /var/lib/cassandra/ to simulate a complete loss.
Now to restore the database. Restoration "Method 2" from http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html is more compatible with my schema-creation component than Method 1.
So, Method 2/Step 1, "Recreate the schema": Restart Cassandra, then my app. The app is built to re-recreate the schema on startup when necessary. Once it's up, there's a working Cassandra node with a schema for the app, but no data.
Method 2/Step 2 "Restore the snapshot": They give three alternatives, the first of which is to use sstableloader, documented at http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html. The folder structure that the loader requires is nothing like the folder structure created by the snapshot tool, so everything has to be moved into place. Before going to all that trouble, I'll just try it out on one table:
>./bin/sstableloader makeyourcase/users
Error: Could not find or load main class org.apache.cassandra.tools.BulkLoader
Hmmm, well, that's not going to work. BulkLoader is in ./lib/apache-cassandra-2.0.9.jar, but the loader doesn't seem to be set up to work out of the box. Rather than debug the tool, let's move on to the second alternative, copying the snapshot directory into the makeyourcase/users/snapshots/ directory. This should be easy, since we're throwing the snapshot directory right back where it came from, so tar xzf after_seeding.tgz should do the trick:
cd /var/lib/
tar xzf after_seeding.tgz
chmod -R u+rwx cassandra/data/makeyourcase
and that puts the snapshot directories back under their respective 'snapshots' directories, and a refresh should restore the data:
cd /opt/apache-cassandra-2.0.9
./bin/nodetool refresh -- makeyourcase users
This runs without complaint. Note that you have to run this for each and every table, so you have to generate the list of tables first. But, before we do that, note that there's something interesting in the Cassandra logs:
INFO 14:32:26,319 Loading new SSTables for makeyourcase/users...
INFO 14:32:26,326 No new SSTables were found for makeyourcase/users
So, we put the snapshot back, but Cassandra didn't find it. I also tried moving the snapshot directory under the existing SSTables directory, and copying the old SSTable files into the existing directory, with the same error in the log. Cassandra doesn't log where it expects to find them, just that it can't find them. The docs say to put them into a directory named data/keyspace/table_name-UUID, but there is no such directory. There is one named data/makeyourcase/users/snapshots/1408820504987-users/, but putting the snapshot dir there, or the individual files, didn't work.
The third alternative, the "Node restart method" doesn't look suitable for a multi-node production environment, so I didn't try that.
Edit:
Just to make this perfectly explicit for the next person, here are the preliminary, working backup and restore scripts that apply the accepted answer.
myc_backup.sh:
#!/bin/bash
cd ~/bootstrap/apache-cassandra-2.0.9
./bin/nodetool clearsnapshot -t after_seeding makeyourcase
./bin/nodetool snapshot -t after_seeding makeyourcase
cd /var/lib/
tar czf after_seeding.tgz cassandra/data/makeyourcase/*/snapshots/after_seeding
myc_restore.sh:
#!/bin/bash
cd /var/lib/
tar xzf after_seeding.tgz
chmod -R u+rwx cassandra/data/makeyourcase
cd ~/bootstrap/apache-cassandra-2.0.9
TABLE_LIST=`./bin/nodetool cfstats makeyourcase | grep "Table: " | sed -e 's+^.*: ++'`
for TABLE in $TABLE_LIST; do
echo "Restore table ${TABLE}"
cd /var/lib/cassandra/data/makeyourcase/${TABLE}
if [ -d "snapshots/after_seeding" ]; then
cp snapshots/after_seeding/* .
cd ~/bootstrap/apache-cassandra-2.0.9
./bin/nodetool refresh -- makeyourcase ${TABLE}
cd /var/lib/cassandra/data/makeyourcase/${TABLE}
rm -rf snapshots/after_seeding
echo " Table ${TABLE} restored."
else
echo " >>> Nothing to restore."
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment