Skip to content

Instantly share code, notes, and snippets.

@nian88
Forked from tcpipuk/postgres-backup-replica.md
Created September 20, 2023 12:38
Show Gist options
  • Save nian88/fa4a2b6dcaea7e90a0c4fe7e177acf9c to your computer and use it in GitHub Desktop.
Save nian88/fa4a2b6dcaea7e90a0c4fe7e177acf9c to your computer and use it in GitHub Desktop.
Setting Up a Replica for Backups for Postgres 15 in Docker

Setting Up a Replica for Backups for Postgres 15 in Docker

Introduction

Backing up a write-heavy database like Synapse can be a challenge: in my case, a dump of the database would take >15 minutes and cause all sorts of performance and locking issues in the process.

This guide will walk you through setting up replication from a primary Postgres 15 Docker container to a secondary container dedicated for backups. By the end, you'll have a backup system that's efficient, minimizes performance hits, and ensures data safety.

Preparing Docker Compose

Below is an example of my database entry in docker-compose.yml:

volumes:
  sockets:

services:
  db:
    cpus: 4
    image: postgres:alpine
    environment:
      POSTGRES_DB: synapse
      POSTGRES_USER: synapse
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
    mem_limit: 8G
    restart: always
    volumes:
      - sockets:/sockets
      - ./pgsql:/var/lib/postgresql/data

As you can see, I'm using a "sockets" volume for Unix socket communication, which as well as avoiding unnecessary open TCP ports, provides a lower latency connection when containers are on the same host.

To do the same, just ensure you have this in your postgresql.conf to let Postgres know where to write its sockets:

unix_socket_directories = '/sockets'

If you're not using sockets (e.g. your replica's on a different host) then you may need to adjust some of the later steps to replicate via TCP port instead.

I've then added this replica, almost identical except for the standby configuration with lower resource limits:

  db-replica:
    cpus: 2
    depends_on:
      - db
    image: postgres:alpine
    environment:
      POSTGRES_DB: synapse
      POSTGRES_USER: synapse
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_STANDBY_MODE: "on"
      POSTGRES_PRIMARY_CONNINFO: host=/sockets user=synapse password=${POSTGRES_PASSWORD}
    mem_limit: 2G
    restart: always
    volumes:
      - sockets:/sockets
      - ./pgreplica:/var/lib/postgresql/data

You can try setting lower limits, I prefer to allow the replica 2 cores to avoid replication interruptions while the backup runs, as on fast storage this can easily cause one core to run at 100%.

Configuration

  1. Primary Postgres Configuration:

    Now, you'll likely want this at the bottom of your postgresql.conf to make sure it's ready to replicate:

    hot_standby = on
    archive_mode = off
    wal_level = replica
    max_wal_senders = 3
    wal_keep_size = 1024

    It'll need to be restarted for these changes to take effect, which would be safest done now before copying the data:

    docker compose down db && docker compose up db -d
  2. Preparing Replica Data:

    Postgres replication involves streaming updates as they're made to the database, so to start we'll need to create a duplicate of the current database to use for the replica.

    You can create a copy of your entire database like this, just substitute the container name and user as required:

    docker exec -it synapse-db-1 pg_basebackup -h /sockets -U synapse -D /tmp/pgreplica

    The data is initially written to /tmp/ inside the container as it's safest for permissions. We can then move it to /var/lib/postgresql/data/ so we can more easily access it from the host OS:

    docker exec -it synapse-db-1 mv /tmp/pgreplica /var/lib/postgresql/data/

    You can hopefully now reach the data and move it to a new directory for your replica, updating the ownership to match your existing Postgres data directory:

    mv ./pgsql/pgreplica ./
    chown -R 70:1000 ./pgreplica
  3. Replica Postgres Configuration:

    Now for the replica's postgresql.conf, add this to the bottom to tell it that it's a secondary and scale back its resource usage as it won't be actively serving clients:

    port = 5433
    hot_standby = on
    checkpoint_timeout = 30min
    shared_buffers = 512MB
    effective_cache_size = 1GB
    maintenance_work_mem = 128MB
    work_mem = 4MB
    max_wal_size = 2GB
    max_parallel_workers_per_gather = 1
    max_parallel_workers = 1
    max_parallel_maintenance_workers = 1
  4. Primary Postgres Replication

    This will instruct the primary to allow replication:

    # Enable replication for the user
    docker exec -it your_primary_container_name psql -U synapse -c "ALTER USER synapse WITH REPLICATION;"
    
    # Create a replication slot
    docker exec -it your_primary_container_name psql -U synapse -c "SELECT * FROM pg_create_physical_replication_slot('replica_slot_name');"

Starting Replication

Once you run docker compose up db-replica -d your new replica should now be running.

Running this command confirms that the primary sees the replica and is streaming data to it:

docker exec -it synapse-db-1 psql -h /sockets -U synapse -d synapse -c "SELECT application_name, state, sync_priority, sync_state, pg_current_wal_lsn() - sent_lsn AS bytes_behind FROM pg_stat_replication;"

The output should look something like this:

 application_name |   state   | sync_priority | sync_state | bytes_behind
------------------+-----------+---------------+------------+--------------
 walreceiver      | streaming |             0 | async      |            0
(1 row)

Replica Logs

When running docker logs synapse-db-replica-1 (adjusting your replica's name as necessary) we should now see messages distinct from the primary's typical "checkpoint" logs. Here's a concise breakdown using an example log:

LOG:  entering standby mode
LOG:  consistent recovery state reached at [WAL location]
LOG:  invalid record length at [WAL location]: wanted [X], got 0
LOG:  started streaming WAL from primary at [WAL location] on timeline [X]
LOG:  restartpoint starting: [reason]
LOG:  restartpoint complete: ...
LOG:  recovery restart point at [WAL location]

Key Points:

  • Entering Standby Mode: The replica is ready to receive WAL records.
  • Consistent Recovery State: The replica is synchronized with the primary's WAL records.
  • Invalid Record Length: An informational message indicating the end of available WAL records.
  • Started Streaming WAL: Active replication is in progress.
  • Restart Points: Periodic checkpoints in the replica for data consistency.
  • Recovery Restart Point: The point where recovery would begin if the replica restarts.

If you're seeing errors here, double-check the steps above: Postgres will refuse to start if the configuration between the two containers is too different, so if you've skipped steps or done them out of order then it should explain quite verbosely what went wrong here.

Backup Script

I've written the following to take a backup - the files are automatically compressed using gzip before they're written to save space and minimise wear on your storage:

#!/bin/bash

# Define backup directory, filenames, and the number of backups to keep
BACKUP_DIR="/path/to/backups"
CURRENT_BACKUP="$BACKUP_DIR/backup_$(date +%Y%m%d%H%M).sql.gz"
NUM_BACKUPS_TO_KEEP=6

# Take the backup and compress it using gzip
docker exec synapse-db-replica-1 pg_dump -h /sockets -U synapse -d synapse | gzip > $CURRENT_BACKUP

# Check if the backup was successful
if [ $? -eq 0 ]; then
    echo "Backup successful!"
    # Check if previous backups exist and manage them
    ...
else
    echo "Backup failed!"
    rm $CURRENT_BACKUP
fi

To configure, simply set the BACKUP_DIR to the location you want your backups to be stored, the NUM_BACKUPS_TO_KEEP to the number of previous backups to store before removal, and update the docker exec line to match your replica's details.

You could also tailor the script to your specific needs, for example, by adding email notifications to let you know when backups are failing for any reason.

Make sure to mark this script as executable so it can be run:

chmod +x /path/to/postgres_backup.sh

We can then configure a cron job (e.g. in /etc/cron.d/postgres) to run it:

30 */4 * * * root /path/to/postgres_backup.sh 2>&1 | logger -t "postgres-backup"

This would run every 4 hours from 12:30am, however you could set a specific list of hours like this:

30 3,7,11,15,19,23 * * * root /path/to/postgres_backup.sh 2>&1 | logger -t "postgres-backup"

Round Up

And there you have it! A dedicated backup replica for your Postgres 15 in Docker.

Remember: this is a general guide that should work for a lot of people, but it's impossible to cover every scenario, so if you've read this far, it's recommended to try with a test database first, and spend some time deciding whether this solution is for you.

For potential troubleshooting or further reading, consider referring to the official Postgres documentation on replication and backups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment