Skip to content

Instantly share code, notes, and snippets.

@Slach
Last active September 14, 2021 12:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Slach/d6390005975b9643344e9ee935fa882a to your computer and use it in GitHub Desktop.
Save Slach/d6390005975b9643344e9ee935fa882a to your computer and use it in GitHub Desktop.
clickhouse-backup diff every hour

howto install this example?

mkdir -p /opt/clikchouse-backup-diff/
git clone https://gist.github.com/d6390005975b9643344e9ee935fa882a /opt/clikchouse-backup-diff/
chmod +x /opt/clikchouse-backup-diff/*.sh
cp -fv /opt/clikchouse-backup-diff/cron_example.txt /etc/cron.d/clickhouse_backup_diff
cp -fv /opt/clikchouse-backup-diff/clickhouse-backup-config-example.yml /etc/clickhouse-backup/config.yml

how to restore last created backup from remote?

last_remote_backup="$(clickhouse-backup list remote | tail -n 1 | cut -d " " -f 1)"
clickhouse-backup download "${last_remote_backup}"
clickhouse-backup restore --rm "${last_remote_backup}"
# place this file into /etc/clickhouse-backup/config.yml
general:
# store last 24 backup if we run (could allocate extra space cause data will not delete during backdroung merge)
keep_backup_local: 24
keep_backup_remote: 24
remote_storage: s3
s3:
compression_level: 1
compression_format: gzip
sse: AES256
access_key: "your_s3_access_key"
secret_key: "your_s3_access_key"
bucket: "backup-bucket-name"
endpoint: "https://your-endpoint-host"
#!/bin/bash
# put this script into /opt/clickhouse-backup-diff/clickhouse-backup-cron.sh, and don't forget chmod +x /opt/clickhouse-backup-diff/clickhouse-backup-cron.sh
set +x
run_diff=$1
backup_date=$(date +%Y-%m-%d-%H-%M-%S)
if [[ "run_diff" == "${run_diff}" && "2" -le "$(clickhouse-backup list local | wc -l)" ]]; then
echo "create diff local backup"
clickhouse-backup create "diff-${backup_date}"
echo "upload backup as diff from previous backup, when we run with 'run_diff' parameter"
clickhouse-backup upload --diff-from "$(clickhouse-backup list local | tail -n 2 | head -n 1 | cut -d " " -f 1)" "diff-${backup_date}"
elif [[ "" == "${run_diff}" ]]; then
echo "create full local backup"
clickhouse-backup create "full-${backup_date}"
echo "upload backup as full, and remove all old backups to avoid allocate too much extra space cause hardlinks will still allocate disk space even after data parts merged in background"
KEEP_BACKUPS_LOCAL=1 KEEP_BACKUPS_REMOTE=1 clickhouse-backup upload "full-${backup_date}"
fi
# put this file into /etc/cron.d/clikhouse_backup on server where you install clickhouse-server
EMAIL=your_email_for_notification_about_backup_process
# run create + upload full backup at 0:00
0 0 * * * clickhouse /bin/clickhouse-backup-cron.sh
# run create + upload diff backup every hour except 0:00
0 1-23 * * * clickhouse /bin/clickhouse-backup-cron.sh run_diff
@gowth
Copy link

gowth commented Sep 7, 2021

Slach,

      Even there is difference between the past backup and latest backup it is not uploading that, it is directly going to the else statement. 

@gowth
Copy link

gowth commented Sep 9, 2021

clickhouse /bin/clickhouse-backup-cron.sh run_diff not giving any output

@gowth
Copy link

gowth commented Sep 9, 2021

this is my bash script

set +x
run_diff=$1
backup_date=$(date +%Y-%M-%d-%H-%M-%S)

if [[ "run_diff" == "${run_diff}" && "2" -le "$(clickhouse-backup list local | wc -l)" ]]; then
echo "create diff local backup"
clickhouse-backup create --tables=BBYFlow.aggFlowsFive_local_table,BBYFlow.aggFlowsFive_local_v1,BBYFlow.aggFlowsHour_local,BBYFlow.aggFlowsHour_local_table,BBYFlow.aggFlowsPrefix_local,BBYFlow.aggFlowsPrefix_local_table "diff-${backup_date}"
echo "upload backup as diff from previous backup, when we run with 'run_diff' parameter"
clickhouse-backup upload --diff-from "$(clickhouse-backup list local | tail -n 2 | head -n 1 | cut -d " " -f 1)" "diff-${backup_date}"
elif [[ "" == "${run_diff}" ]]; then
echo "create full local backup"
clickhouse-backup create --tables=BBYFlow.aggFlowsFive_local_table,BBYFlow.aggFlowsFive_local_v1,BBYFlow.aggFlowsHour_local,BBYFlow.aggFlowsHour_local_table,BBYFlow.aggFlowsPrefix_local,BBYFlow.aggFlowsPrefix_local_table "full-${backup_date}"
echo "upload backup as full, and remove all old backups to avoid allocate too much extra space cause hardlinks will still allocate disk space even after data parts merged in background"
clickhouse-backup upload "full-${backup_date}"
fi

@gowth
Copy link

gowth commented Sep 14, 2021

The script is giving duplicates each time when we run the diff backups to be restored in the new server

@Slach
Copy link
Author

Slach commented Sep 14, 2021

@gowth did you try to run clickhouse-backup restore --rm <backup-name> ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment