Skip to content

Instantly share code, notes, and snippets.

@dbist
Last active December 9, 2022 13:20
Show Gist options
  • Save dbist/1a470798216e74b69cc4ea35ae96e0da to your computer and use it in GitHub Desktop.
Save dbist/1a470798216e74b69cc4ea35ae96e0da to your computer and use it in GitHub Desktop.

Restore a database from a userfile backup


CockroachDB supports enterprise grade backup and restore using object storage targets. For local development, a userfile scheme was introduced to allow fast prototyping without a heavy burden of spinning up costly infra. A customer requested ability to quickly validate whether they can recover from a complete tear down of a cluster without using object storage. This is my experience and current workaround to get this to work.


Related articles

Recover from an Oops with CockroachDB


Motivation

This tutorial takes a short detour compared to my other article due to a specific requirement that we need to recover from a local disaster into a new cluster. userfile is very helpful but it assumes the cluster is up and restore can proceed as intended. However, when you intend to shut down the source cluster, remove all of the underlying storage along with the user space where userfile stores its backups, userfile stops being useful. In that case, we have to download the backups out of the user space prior to shutting down the cluster and removing the data volumes.

High Level Steps

  • Start a three node CockroachDB cluster in Docker
  • Load data
  • Issue a backup
  • Download the backup to an external location
  • Terminate the cluster and remove data volumes
  • Start a new cluster
  • Upload the full backup from an external location into the user space
  • Restore from the full backup
  • Verify
  • Clean up

Step by step instructions

Start a 3 node cluster

cockroach start \
--insecure \
--store=path=node1 \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

cockroach start \
--insecure \
--store=path=node2 \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

cockroach start \
--insecure \
--store=path=node3 \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

Initialize the cluster

cockroach init --insecure

Load some data

cockroach workload \              
 fixtures import tpcc \
 'postgresql://root@127.0.0.1:26257?sslmode=disable'

Login into sql shell to backup a database

cockroach sql --insecure

Run a full backup of the generated database

BACKUP DATABASE tpcc TO 'userfile://defaultdb.public.userfiles_root/tpcc-backup' AS OF SYSTEM TIME '-10s';

\q

The full backup for this particular table contains 16 files total

cockroach userfile list --insecure


tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM
tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM
tpcc-backup/BACKUP-STATISTICS
tpcc-backup/BACKUP_MANIFEST
tpcc-backup/BACKUP_MANIFEST-CHECKSUM
tpcc-backup/data/671996324698980353.sst
tpcc-backup/data/671996324712611841.sst
tpcc-backup/data/671996325002772481.sst
tpcc-backup/data/671996325674582018.sst
tpcc-backup/data/671996325674647554.sst
tpcc-backup/data/671996325861130241.sst
tpcc-backup/data/671996326321553409.sst
tpcc-backup/data/671996328409071617.sst
tpcc-backup/data/671996328410742785.sst
tpcc-backup/data/671996331349704706.sst
tpcc-backup/data/671996339443695618.sst

Create an external directory to the cluster

mkdir backups

Download the tpcc backup to an external directory

cockroach userfile get tpcc --insecure backups

downloading tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM..downloaded tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM to backups/tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM (4 B)
downloaded tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM to backups/tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM (4 B)
downloaded tpcc-backup/BACKUP-STATISTICS to backups/tpcc-backup/BACKUP-STATISTICS (47 KiB)
downloaded tpcc-backup/BACKUP_MANIFEST to backups/tpcc-backup/BACKUP_MANIFEST (2.9 KiB)
downloaded tpcc-backup/BACKUP_MANIFEST-CHECKSUM to backups/tpcc-backup/BACKUP_MANIFEST-CHECKSUM (4 B)
downloaded tpcc-backup/data/671996324698980353.sst to backups/tpcc-backup/data/671996324698980353.sst (973 B)
downloaded tpcc-backup/data/671996324712611841.sst to backups/tpcc-backup/data/671996324712611841.sst (94 KiB)
downloaded tpcc-backup/data/671996325002772481.sst to backups/tpcc-backup/data/671996325002772481.sst (3.3 MiB)
downloaded tpcc-backup/data/671996325674582018.sst to backups/tpcc-backup/data/671996325674582018.sst (960 KiB)
downloaded tpcc-backup/data/671996325674647554.sst to backups/tpcc-backup/data/671996325674647554.sst (2.8 MiB)
downloaded tpcc-backup/data/671996325861130241.sst to backups/tpcc-backup/data/671996325861130241.sst (1.5 KiB)
downloaded tpcc-backup/data/671996326321553409.sst to backups/tpcc-backup/data/671996326321553409.sst (554 KiB)
downloaded tpcc-backup/data/671996328409071617.sst to backups/tpcc-backup/data/671996328409071617.sst (507 KiB)
downloaded tpcc-backup/data/671996328410742785.sst to backups/tpcc-backup/data/671996328410742785.sst (488 KiB)
downloaded tpcc-backup/data/671996331349704706.sst to backups/tpcc-backup/data/671996331349704706.sst (6.5 MiB)
downloaded tpcc-backup/data/671996339443695618.sst to backups/tpcc-backup/data/671996339443695618.sst (5.1 MiB)

Confirm there are 16 files and 2 directories

➜  070121 tree backups 
backups
└── tpcc-backup
    ├── BACKUP-CHECKPOINT-671996313361481729-CHECKSUM
    ├── BACKUP-CHECKPOINT-CHECKSUM
    ├── BACKUP-STATISTICS
    ├── BACKUP_MANIFEST
    ├── BACKUP_MANIFEST-CHECKSUM
    └── data
        ├── 671996324698980353.sst
        ├── 671996324712611841.sst
        ├── 671996325002772481.sst
        ├── 671996325674582018.sst
        ├── 671996325674647554.sst
        ├── 671996325861130241.sst
        ├── 671996326321553409.sst
        ├── 671996328409071617.sst
        ├── 671996328410742785.sst
        ├── 671996331349704706.sst
        └── 671996339443695618.sst

2 directories, 16 files

Stop the cluster and delete all of the data directories

➜  070121 pkill cockroach
➜  070121 initiating graceful shutdown of server
initiating graceful shutdown of server
initiating graceful shutdown of server
server drained and shutdown completed
server drained and shutdown completed

delete data directories

rm -rf node*

start a new cluster

cockroach start \
--insecure \
--store=path=node1 \
--listen-addr=127.0.0.1 \
--port=26257 \
--http-port=8080 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

cockroach start \
--insecure \
--store=path=node2 \
--listen-addr=127.0.0.1 \
--port=26259 \
--http-port=8081 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

cockroach start \
--insecure \
--store=path=node3 \
--listen-addr=127.0.0.1 \
--port=26261 \
--http-port=8082 \
--join=127.0.0.1:26257,127.0.0.1:26259,127.0.0.1:26261 \
--background

Initialize the cluster

cockroach init --insecure

Upload the full backup to the new cluster, directory structure must be preserved

cockroach userfile upload backups/tpcc-backup/BACKUP-CHECKPOINT-671996313361481729-CHECKSUM --insecure tpcc-backup/BACKUP-CHECKPOINT-672014548244594689-CHECKSUM

cockroach userfile upload backups/tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM --insecure tpcc-backup/BACKUP-CHECKPOINT-CHECKSUM

cockroach userfile upload backups/tpcc-backup/BACKUP-STATISTICS --insecure tpcc-backup/BACKUP-STATISTICS

cockroach userfile upload backups/tpcc-backup/BACKUP_MANIFEST --insecure tpcc-backup/BACKUP_MANIFEST
cockroach userfile upload backups/tpcc-backup/BACKUP_MANIFEST-CHECKSUM --insecure tpcc-backup/BACKUP_MANIFEST-CHECKSUM

cockroach userfile upload backups/tpcc-backup/data/671996324698980353.sst --insecure tpcc-backup/data/671996324698980353.sst

cockroach userfile upload backups/tpcc-backup/data/671996324712611841.sst --insecure tpcc-backup/data/671996324712611841.sst

cockroach userfile upload backups/tpcc-backup/data/671996325002772481.sst --insecure tpcc-backup/data/671996325002772481.sst

cockroach userfile upload backups/tpcc-backup/data/671996325674582018.sst --insecure tpcc-backup/data/671996325674582018.sst

cockroach userfile upload backups/tpcc-backup/data/671996325674647554.sst --insecure tpcc-backup/data/671996325674647554.sst

cockroach userfile upload backups/tpcc-backup/data/671996325861130241.sst --insecure tpcc-backup/data/671996325861130241.sst

cockroach userfile upload backups/tpcc-backup/data/671996326321553409.sst --insecure tpcc-backup/data/671996326321553409.sst

cockroach userfile upload backups/tpcc-backup/data/671996328409071617.sst --insecure tpcc-backup/data/671996328409071617.sst

cockroach userfile upload backups/tpcc-backup/data/671996328410742785.sst --insecure tpcc-backup/data/671996328410742785.sst

cockroach userfile upload backups/tpcc-backup/data/671996331349704706.sst --insecure tpcc-backup/data/671996331349704706.sst

cockroach userfile upload backups/tpcc-backup/data/671996339443695618.sst --insecure tpcc-backup/data/671996339443695618.sst

Restore the database into the new cluster

cockroach sql --insecure
restore database tpcc FROM 'userfile://defaultdb.public.userfiles_root/tpcc-backup';

Verify

USE tpcc;
SHOW TABLES;
schema_name | table_name | type  | owner | estimated_row_count | locality
--------------+------------+-------+-------+---------------------+-----------
  public      | customer   | table | root  |               30000 | NULL
  public      | district   | table | root  |                  10 | NULL
  public      | history    | table | root  |               30000 | NULL
  public      | item       | table | root  |              100000 | NULL
  public      | new_order  | table | root  |                9000 | NULL
  public      | order      | table | root  |               30000 | NULL
  public      | order_line | table | root  |              300343 | NULL
  public      | stock      | table | root  |              100000 | NULL
  public      | warehouse  | table | root  |                   1 | NULL
(9 rows)
SELECT COUNT(*) FROM tpcc.order_line;
  count
----------
  300343
(1 row)

UPDATE

Our engineering team acknowledges the gaps in userfile upload and they advised me there is work in progress to address the issue. There is a PR out to make userfile upload recursive and it will be coming in the next release. In the meantime, here's a handy script to automate the upload and admittedly, it can be improved to handle more error logic but I am a bit in a hurry here.

# parent dir
for x in backups/tpcc-backup/*; 
do 
    echo $x
    cockroach userfile upload ./$x $x --url='postgres://root@127.0.0.1:26257?sslmode=disable';
done

# child data dir
for x in backups/tpcc-backup/data/*; 
do 
    echo $x
    cockroach userfile upload ./$x $x --url='postgres://root@127.0.0.1:26257?sslmode=disable';
done

Then restoring the table into a new cluster will look like so

restore database tpcc FROM 'userfile://defaultdb.public.userfiles_root/backups/tpcc-backup';

Cleanup

pkill cockroach
rm -rf node*
rm -rf backups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment