CockroachDB supports enterprise grade backup and restore using object storage targets. For local development, a userfile
scheme was introduced to allow fast prototyping without a heavy burden of spinning up costly infra. A customer requested ability to quickly validate whether they can recover from a complete tear down of a cluster without using object storage. This is my experience and current workaround to get this to work.
Recover from an Oops with CockroachDB
This tutorial takes a short detour compared to my other article due to a specific requirement that we need to recover from a local disaster into a new cluster. userfile
is very helpful but it assumes the cluster is up and restore can proceed as intended. However, when you intend to shut down the source cluster, remove all of the underlying storage along with the user space where userfile
stores its backups, userfile
stops being useful. In that case, we have to download the backups out of the user space prior to shutting down the cluster and removing the data volumes.
- Start a three node CockroachDB cluster in Docker
- Load data
- Issue a backup
- Download the backup to an external location
- Terminate the cluster and remove data volumes
- Start a new cluster
- Upload the full backup from an external location into the user space
- Restore from the full backup
- Verify
- Clean up
cockroach start \
--insecure \
--store=path=node1 \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach start \
--insecure \
--store=path=node2 \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach start \
--insecure \
--store=path=node3 \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach init --insecure
cockroach workload \
fixtures import tpcc \
'postgresql://root@127.0.0.1:26257?sslmode=disable'
cockroach sql --insecure
BACKUP DATABASE tpcc TO 'userfile://defaultdb.public.userfiles_root/tpcc-backup' AS OF SYSTEM TIME '-10s';
\q
cockroach userfile list --insecure
tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM
tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM
tpcc-backup/BACKUP-STATISTICS
tpcc-backup/BACKUP_MANIFEST
tpcc-backup/BACKUP_MANIFEST-CHECKSUM
tpcc-backup/data/671996324698980353.sst
tpcc-backup/data/671996324712611841.sst
tpcc-backup/data/671996325002772481.sst
tpcc-backup/data/671996325674582018.sst
tpcc-backup/data/671996325674647554.sst
tpcc-backup/data/671996325861130241.sst
tpcc-backup/data/671996326321553409.sst
tpcc-backup/data/671996328409071617.sst
tpcc-backup/data/671996328410742785.sst
tpcc-backup/data/671996331349704706.sst
tpcc-backup/data/671996339443695618.sst
mkdir backups
cockroach userfile get tpcc --insecure backups
downloading tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM..downloaded tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM to backups/tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM (4 B)
downloaded tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM to backups/tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM (4 B)
downloaded tpcc-backup/BACKUP-STATISTICS to backups/tpcc-backup/BACKUP-STATISTICS (47 KiB)
downloaded tpcc-backup/BACKUP_MANIFEST to backups/tpcc-backup/BACKUP_MANIFEST (2.9 KiB)
downloaded tpcc-backup/BACKUP_MANIFEST-CHECKSUM to backups/tpcc-backup/BACKUP_MANIFEST-CHECKSUM (4 B)
downloaded tpcc-backup/data/671996324698980353.sst to backups/tpcc-backup/data/671996324698980353.sst (973 B)
downloaded tpcc-backup/data/671996324712611841.sst to backups/tpcc-backup/data/671996324712611841.sst (94 KiB)
downloaded tpcc-backup/data/671996325002772481.sst to backups/tpcc-backup/data/671996325002772481.sst (3.3 MiB)
downloaded tpcc-backup/data/671996325674582018.sst to backups/tpcc-backup/data/671996325674582018.sst (960 KiB)
downloaded tpcc-backup/data/671996325674647554.sst to backups/tpcc-backup/data/671996325674647554.sst (2.8 MiB)
downloaded tpcc-backup/data/671996325861130241.sst to backups/tpcc-backup/data/671996325861130241.sst (1.5 KiB)
downloaded tpcc-backup/data/671996326321553409.sst to backups/tpcc-backup/data/671996326321553409.sst (554 KiB)
downloaded tpcc-backup/data/671996328409071617.sst to backups/tpcc-backup/data/671996328409071617.sst (507 KiB)
downloaded tpcc-backup/data/671996328410742785.sst to backups/tpcc-backup/data/671996328410742785.sst (488 KiB)
downloaded tpcc-backup/data/671996331349704706.sst to backups/tpcc-backup/data/671996331349704706.sst (6.5 MiB)
downloaded tpcc-backup/data/671996339443695618.sst to backups/tpcc-backup/data/671996339443695618.sst (5.1 MiB)
➜ 070121 tree backups
backups
└── tpcc-backup
├── BACKUP-CHECKPOINT-671996313361481729-CHECKSUM
├── BACKUP-CHECKPOINT-CHECKSUM
├── BACKUP-STATISTICS
├── BACKUP_MANIFEST
├── BACKUP_MANIFEST-CHECKSUM
└── data
├── 671996324698980353.sst
├── 671996324712611841.sst
├── 671996325002772481.sst
├── 671996325674582018.sst
├── 671996325674647554.sst
├── 671996325861130241.sst
├── 671996326321553409.sst
├── 671996328409071617.sst
├── 671996328410742785.sst
├── 671996331349704706.sst
└── 671996339443695618.sst
2 directories, 16 files
➜ 070121 pkill cockroach
➜ 070121 initiating graceful shutdown of server
initiating graceful shutdown of server
initiating graceful shutdown of server
server drained and shutdown completed
server drained and shutdown completed
rm -rf node*
cockroach start \
--insecure \
--store=path=node1 \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach start \
--insecure \
--store=path=node2 \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach start \
--insecure \
--store=path=node3 \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background
cockroach init --insecure
cockroach userfile upload backups/tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM --insecure tpcc-backup/BACKUP-CHECKPOINT-672014548244594689-CHECKSUM
cockroach userfile upload backups/tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM --insecure tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM
cockroach userfile upload backups/tpcc-backup/BACKUP-STATISTICS --insecure tpcc-backup/BACKUP-STATISTICS
cockroach userfile upload backups/tpcc-backup/BACKUP_MANIFEST --insecure tpcc-backup/BACKUP_MANIFEST
cockroach userfile upload backups/tpcc-backup/BACKUP_MANIFEST-CHECKSUM --insecure tpcc-backup/BACKUP_MANIFEST-CHECKSUM
cockroach userfile upload backups/tpcc-backup/data/671996324698980353.sst --insecure tpcc-backup/data/671996324698980353.sst
cockroach userfile upload backups/tpcc-backup/data/671996324712611841.sst --insecure tpcc-backup/data/671996324712611841.sst
cockroach userfile upload backups/tpcc-backup/data/671996325002772481.sst --insecure tpcc-backup/data/671996325002772481.sst
cockroach userfile upload backups/tpcc-backup/data/671996325674582018.sst --insecure tpcc-backup/data/671996325674582018.sst
cockroach userfile upload backups/tpcc-backup/data/671996325674647554.sst --insecure tpcc-backup/data/671996325674647554.sst
cockroach userfile upload backups/tpcc-backup/data/671996325861130241.sst --insecure tpcc-backup/data/671996325861130241.sst
cockroach userfile upload backups/tpcc-backup/data/671996326321553409.sst --insecure tpcc-backup/data/671996326321553409.sst
cockroach userfile upload backups/tpcc-backup/data/671996328409071617.sst --insecure tpcc-backup/data/671996328409071617.sst
cockroach userfile upload backups/tpcc-backup/data/671996328410742785.sst --insecure tpcc-backup/data/671996328410742785.sst
cockroach userfile upload backups/tpcc-backup/data/671996331349704706.sst --insecure tpcc-backup/data/671996331349704706.sst
cockroach userfile upload backups/tpcc-backup/data/671996339443695618.sst --insecure tpcc-backup/data/671996339443695618.sst
cockroach sql --insecure
restore database tpcc FROM 'userfile://defaultdb.public.userfiles_root/tpcc-backup';
USE tpcc;
SHOW TABLES;
schema_name | table_name | type | owner | estimated_row_count | locality
--------------+------------+-------+-------+---------------------+-----------
public | customer | table | root | 30000 | NULL
public | district | table | root | 10 | NULL
public | history | table | root | 30000 | NULL
public | item | table | root | 100000 | NULL
public | new_order | table | root | 9000 | NULL
public | order | table | root | 30000 | NULL
public | order_line | table | root | 300343 | NULL
public | stock | table | root | 100000 | NULL
public | warehouse | table | root | 1 | NULL
(9 rows)
SELECT COUNT(*) FROM tpcc.order_line;
count
----------
300343
(1 row)
Our engineering team acknowledges the gaps in userfile upload
and they advised me there is work in progress to address the issue. There is a PR out to make userfile upload
recursive and it will be coming in the next release. In the meantime, here's a handy script to automate the upload and admittedly, it can be improved to handle more error logic but I am a bit in a hurry here.
# parent dir
for x in backups/tpcc-backup/*;
do
echo $x
cockroach userfile upload ./$x $x --url='postgres://root@127.0.0.1:26257?sslmode=disable';
done
# child data dir
for x in backups/tpcc-backup/data/*;
do
echo $x
cockroach userfile upload ./$x $x --url='postgres://root@127.0.0.1:26257?sslmode=disable';
done
Then restoring the table into a new cluster will look like so
restore database tpcc FROM 'userfile://defaultdb.public.userfiles_root/backups/tpcc-backup';
pkill cockroach
rm -rf node*
rm -rf backups