Skip to content

Instantly share code, notes, and snippets.

@oofnikj
Last active November 19, 2024 01:10
Show Gist options
  • Save oofnikj/4ee432033421f34b548ae2891067efcf to your computer and use it in GitHub Desktop.
Save oofnikj/4ee432033421f34b548ae2891067efcf to your computer and use it in GitHub Desktop.
PostgreSQL dump / restore to Google Cloud Storage
#!/bin/bash
PGUSER=${PGUSER:-"postgres"}
CATALOG_BUCKET="normalized-catalog"
_get_dbs() {
psql -U $PGUSER -c "COPY (
SELECT datname FROM pg_database
WHERE datname NOT IN (
'postgres',
'template0',
'template1'
)
) TO STDOUT;"
}
_pg_dump() {
echo "dumping database $1..."
pg_dump -U $PGUSER -d $1 \
--format=c \
--compress=1 \
| gsutil cp - gs://$CATALOG_BUCKET/$1.pgdump
}
_cat() {
if [[ $1 == "-" ]] ; then
echo "restoring from stdin..." >&2
cat
else
echo "restoring database $1 from GCS..." >&2
gsutil cat gs://$CATALOG_BUCKET/$1.pgdump
fi
}
_pg_restore() {
_cat $1 \
| pg_restore -d postgres -U $PGUSER \
--create \
--no-tablespaces \
--clean \
--if-exists
}
_usage() {
cat <<EOF
Usage:
${BASH_SOURCE[0]} [ dump_all | dump | restore ] [database]
Commands 'dump' and 'restore' require user to provide the database name.
If database name is '-', database will be restored from standard input.
'dump_all' dumps all the databases in compressed pg_dump custom format.
EOF
exit 1
}
case $1 in
dump_all)
for db in $(_get_dbs); do
_pg_dump $db
done
;;
dump)
( shift; _pg_dump $1 ) || ( echo 'must provide database name'; exit 1 )
;;
restore)
( shift; _pg_restore $1 ) || ( echo 'must provide database name'; exit 1 )
;;
*)
_usage
;;
esac
@pnmartinez
Copy link

pnmartinez commented Feb 16, 2022

Hi @oofnikj . Good gist, even to get only parts of it.

I have an odd case that you can probably solve.

I've hit a limit though: GCS hard limit of 5Tb for single objects. So now looking for a way to deal with larger-than-5Tb backups.

Note: generating a file locally and then spliting it to upload the parts is not an option here, due to DB server space issues. Cannot generate a complete backup file on the server.

So: the backup must be pg_dumped, splited, and sent on a stream in the same operation.

I would kindly ask you to review my approach below, basically piping Linux's split utility in the middle of the pg_dump | gsutil cp pipe:

pg_dump -d $DB -b --format=t \    # pg_dump piped to
    | split -b 50G - \      # split utility, piped to 
    | gsutil cp - gs://$BUCKET/$BACKUP    # gsutil cp

I've posted on StackOverflow, in case you'd like the reputation.

@digitalml
Copy link

@pnmartinez Did you solve this issue? Did split work?

@pnmartinez
Copy link

@pnmartinez Did you solve this issue? Did split work?

Hi! Short answer: yes, it works. Long answer: no, it does not.

The script works, though I had serious problems with connection not stable enough. GCS has some parameters to allow for resuming interrupted uploads, but I guess piping 3 different commands makes it crash in some other stage. So works, but very bleak.

In the end the task I needed this for was cancelled, so I never looked for an alternative. I must say, however, that I did not find anything better on the interwebs, at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment