Skip to content

Instantly share code, notes, and snippets.

@coltoneshaw
Last active March 19, 2024 07:41
Show Gist options
  • Save coltoneshaw/48c67b0b2af0018849211596b2c46e36 to your computer and use it in GitHub Desktop.
Save coltoneshaw/48c67b0b2af0018849211596b2c46e36 to your computer and use it in GitHub Desktop.
Postgres Replica Setup Observations

Observations

Config Suggestions

If the database has less than 12GB RAM, reduce the below settings.

Note that you can set shared_buffers and effective_cache_size higher on the primary database because they generally receive less traffic.

Primary Database

# This value should match the "SqlSettings.MaxConnections" value within your config.json for Mattermost
# This is a suggestion and can be set lower / higher based on the size of your server.
max_connections = 1020
tcp_keepalives_idle = 5
tcp_keepalives_interval = 1
tcp_keepalives_count = 5

# Set both of the below settings to 65% of total memory. For a 32 GB instance, it should be 21 GB.
# If on a smaller server, set this to 20% or less total RAM.
# ex: 512MB would work for a 4GB RAM server
shared_buffers = 512MB
effective_cache_size = 512MB

# Set it to 16 MB for readers and 32 MB for writers. If it's a single instance, 16 MB should be sufficient. If the instance is of a lower capacity than r5.xlarge, then set it to a lower number.
work_mem = 16MB

# 1GB (reduce this to 512MB if your server has less than 32GB of RAM)
autovacuum_work_mem = 512MB
autovacuum_max_workers = 4
autovacuum_vacuum_cost_limit = 500

#Set it to 1.1 unless the DB is using spinning disks.
random_page_cost = 1.1

restart_after_crash = on

Replica Database

In addition to the above settings

# connection string to sending server
# This was created when you added a replication role to the primary database.
# username - replace "test" with the role you made
# password - replace "testpassword" with the role password
# host - replace "x.x.x.x" with your IP or URL string.
primary_conninfo = 'user=test password=testpassword host=x.x.x.x port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'

# replication slot on sending server
# This needs to be configured using the docs below for "Keeping the replica and primary in sync."
primary_slot_name = 'replica1'

hot_standby = on
# Allows query communication between reader and primary. Suggested to prevent any timeouts.
hot_standby_feedback = on

Enable Replica Lag logging in Mattermost

  1. Add to the Mattermost config.

    Within the Mattermost config.json, add the below to the SqlSettings.ReplicaLagSettings. The query will pull the diff between the current WAL and the replica. The DataSource string should point to your primary database.

    • username - Is the PostgreSQL username with permissions to the mattermost database. Usually, this is mmuser.
    • password - Password assigned to the username from above.
    • connectionString - IP address or URL of the PostgresSQL instance. It doesn't require :5432.
    {
    "SqlSettings": {
        "ReplicaLagSettings": [
        {
            "DataSource": "postgres://username:password@connectionString/mattermost?sslmode=disable\u0026connect_timeout=10",
            "QueryAbsoluteLag": "select usename, pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) as metric from pg_stat_replication;",
            "QueryTimeLag": null
        }
        ]
    }
    }
  2. Give permissions

    Before restarting, give the mmuser access to the PostgreSQL monitoring role. Updating permissions should be done on the PostgreSQL primary database, not the replica.

    PostgreSQL Role documentation

    sudo -u postgres psql
    postgres=# GRANT pg_monitor TO mmuser;
    
  3. Update chart and restart Mattermost.

    In the Mattermost Performance Monitoring v2, update the Replica Lag chart to use the value mattermost_db_replica_lag_abs instead of the current time value.

    The query should be the below:

    mattermost_db_replica_lag_abs{instance=~"$server"}
    
  4. Restart Mattermost

    You should see this go into effect. Check the logs for any query errors.

    sudo systemctl restart mattermost
    

Keeping the replica and primary in sync

If a read replica fails, when it attempts to recover, it will be unable to because the primary database will not have stored the WAL log it requires to sync back up. You'll see the below in the logs usually if this happens.

2022-03-25 04:38:27.369 UTC [77723] mmuser@mattermost FATAL:  the database system is in recovery mode
...
2022-03-25 04:44:12.456 UTC [77947] LOG:  entering standby mode
2022-03-25 04:44:12.465 UTC [77947] LOG:  redo starts at 6/5FA18838
2022-03-25 04:44:12.969 UTC [77950] mmuser@mattermost FATAL:  the database system is starting up
2022-03-25 04:44:12.971 UTC [77951] mmuser@mattermost FATAL:  the database system is starting up
2022-03-25 04:44:13.574 UTC [77947] LOG:  consistent recovery state reached at 6/6DA47060
2022-03-25 04:44:13.575 UTC [77946] LOG:  database system is ready to accept read only connections
2022-03-25 04:44:23.290 UTC [77947] LOG:  invalid resource manager ID 215 at 6/7236A070
2022-03-25 04:44:23.306 UTC [77975] LOG:  started streaming WAL from primary at 6/72000000 on timeline 1
2022-03-25 04:44:23.306 UTC [77975] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000600000072 has already been removed

Option 1 - WAL Archive / Recovery

You can use WAL Archiving with recovery on the primary / read-replica. I did not explore this option in testing because it requires an additional setup. However, docs can be found on PostgreSQL's page for continuous archiving.

Option 2 - Replication Slots

Replication slots allow the primary database to save the WAL archive for the read-replica until all transactions have been updated, confirmed then it gets flushed. Some information on this can be found here or here Effectively, this allows for an easy recovery on the read-replica if it crashes.

The only genuine concern with this option is that if your read replica goes offline for an extended period, the primary will store the WAL archive until the replica is back online. With the replica offline, it could consume a large amount of storage if offline for days/weeks as the primary stores the WAL archive.

  • To view existing replication slots - select * from pg_replication_slots;
  • Check replication slot lag - SELECT redo_lsn, slot_name,restart_lsn,
  • Remove a replication slot - select pg_drop_replication_slot(‘ocean’);

Setup

  1. On the primary, you need to add a replication slot with the below. You can replace replica1 with any unique value you want.

    select pg_create_physical_replication_slot(‘replica1’);
  2. In the PostgreSQL config on the read replica, set the primary_slot_name value to the slot name above

  3. Edit the read replica Postgres.conf file to use the slot name.

    File location - /etc/postgres/12/main/postgres.conf

    primary_slot_name = 'replica1'
  4. Restart the replica PostgreSQL service.

    sudo systemctl restart postgresql
  5. You can confirm this is running by running the below command on the primary PostgreSQL instance.

    SELECT
    	client_addr, state, pid, slot_name, active
    FROM
    	pg_stat_replication
    INNER JOIN
    	pg_replication_slots on pid = pg_replication_slots.active_pid;

    You should see:

      client_addr  |   state   |  pid   | slot_name | active
     ---------------+-----------+--------+-----------+--------
     x.x.x.x       | streaming | 116774 | replica1   | t

Preventing query cancels on read-replica

Option 1 - Enable Hot Standby Feedback

This allows the primary/replica to communicate about queries happening and not cancel the read replica. This seemed to have the most overall benefit with the least amount of errors on the replica. They claim it adds primary system load, but it appears to be minimal at most.

hot_standby_feedback = on

Option 2 - Add a standby delay

This delay allows the standby server to wait for an update on the primary instead of canceling the query. This will reduce the overall errors in the database getting updated during a query. However, it can cause slower responses within Mattermost. Additionally, this still has some replica lag and some timed-out queries due to reaching the Mattermost max query time.

max_standby_archive_delay = 300s
max_standby_streaming_delay = 300s

Enable Crash Restart

restart_after_crash = on

Disable OOM

This should be a last resort. If it's getting OOM killed, increase the server specs first. https://www.percona.com/blog/2019/08/02/out-of-memory-killer-or-savior/

Set up PostgreSQL to use read replication.

This guide assumes you've already set up your Mattermost database and have it running correctly.

Setting up the primary

  1. Create the replica role

    We are creating a role on your primary to allow an incoming connection to replicate the database.

    • username replace this with the username you want your replica to use to connect to the primary.
    • password replace with your replica connection password.
    sudo -u postgres psql
    postgres=# CREATE ROLE username WITH REPLICATION PASSWORD 'password' LOGIN;
    Output
    CREATE ROLE
  2. Create a replication slot

    You can replace replica1 with anything you want to identify your read replica. Note that this needs to be a unique name per replica you connect.

    select pg_create_physical_replication_slot(‘replica1’);
  3. Allow the replica communication

    Edit the pg_hba.conf file on the primary database to allow incoming communication via the configured above username

    /etc/postgresql/12/main/pg_hba.conf
    • username - This is the role username configured in step 1 above.
    • replica-IP - This is the IP address of your replica.
    ...
    host    replication     username    replica-IP/32   md5
  4. Restart Postgres

    sudo systemctl restart postgresql

Setup the Read Node

  1. Install Postgres

    sudo apt-get install postgres
  2. Clear the replica data directory.

    Find the data directory first by the command below:

    sudo -u postgres psql
    postgres=#SHOW data_directory;
        data_directory
    -----------------------------
    /var/lib/postgresql/12/main
    (1 row)

    Delete the contents of the data directory on the replica.

    sudo -u postgres rm -r /var/lib/postgresql/12/main
    sudo -u postgres mkdir /var/lib/postgresql/12/main
    sudo -u postgres chmod 700 /var/lib/postgresql/12/main
  3. Backup the primary database onto the replica

    The backup will take some time, depending on the database size. Testing with a 16GB database took ~4 minutes.

    • The -U username option allows you to specify the user you connect to the primary cluster as. This is the role you created in the previous step.
    • It will prompt for a password. This will be what was specified in the prior step
    sudo -u postgres pg_basebackup -h primary-ip-addr -p 5432 -U username -D /var/lib/postgresql/12/main/ -Fp -Xs -R
  4. Edit the read replica postgres.conf file to use the slot name.

    This primary_slot_name is the same slot you configured above in Step 2 of setting up the primary.

    File location - /etc/postgres/12/main/postgres.conf

    primary_slot_name = 'replica1'
  5. Restart postgres

    sudo systemctl restart postgresql

Verify connectivity

  1. Connect to PostgreSQL on the primary server.

    sudo -u postgres psql
  2. Copy and run the below command

    SELECT client_addr, state, pid, slot_name, active
    FROM pg_stat_replication
    INNER JOIN pg_replication_slots on pid = pg_replication_slots.active_pid;

    You should see:

    client_addr  |   state   |  pid   | slot_name | active
    ---------------+-----------+--------+-----------+--------
    x.x.x.x       | streaming | 116774 | replica1   | t

Setup the replica config in Mattermost

Edit the config.json file on the Mattermost server and include the below:

  • username - This is the same username on your primary database
  • password - This is the same password on your primary database
  • databaseIP - This points to the database IP of your replica
{
    "SqlSettings": {
        "DataSourceReplicas": [
            "postgres://username:password@databaseIP/mattermost?sslmode=disable\u0026connect_timeout=10"
        ],
    }
}

Setup

These tests were preformed on:

  • Mattermost instance running in AWS on c5.2xlarge
  • Mattermost version 6.3.9.
  • Two Postgres 12 databases running AWS c5.large and the above replication setup.

Results

Test # Test Name Start Time End Time Highest Replica Lag NumErrors Termination Query Cancelling User Request Supported Users Changes Notes
1 Tuned PG with hot standby feedback 2022-03-28 16:32:41 2022-03-28 17:33:12 43.6MB 3536 0 19 25 4672 Tuned Postgres to Agniva's settings. Adjusted the reader to 512MB for the RAM settings and hot_standby_feedback = on High replica lag only occured once. Usually less than 1MB
1b Validation of above 2022-03-28 17:37:28 2022-03-28 18:36:24 39.4MB 3297 0 0 12 4623 - High replica lag only occured once. Usually less than 1MB
2 Tuned PG with hot standby feedback off 2022-03-28 18:42:29 2022-03-28 19:44:21 150MB 6370 886 5941 6 4685 Same as Test 1 but with hot_standby_feedback = off Replica lag was over 100MB for a bulk of the time once 4k users was hit.
2b Validation of above 2022-03-28 19:47:12 2022-03-28 20:46:55 166MB 5149 831 5657 2 4618 - Same as test 2, but slightly higher max replica lag,
3 Tuned, hot standby feedback off, delay 2022-03-29 13:14:22 2022-03-29 14:21:04 271MB 33785 745 3918 10 5239 hot_standby_feedback = off with delay set to 100s Postgres crashed. Replica lag was over 200MB multiple times once 5k users was hit.
3b Validation of test 3 2022-03-29 14:38:51 2022-03-29 15:43:22 218MB 6097 436 3754 10 4729 - Lag was over 150MB for the bulk after 4k users.
4 Delay increased to 300s 2022-03-29 16:53:15 2022-03-29 17:56:37 466MB 6989 125 937 15 4608 increased the delay settings to 300s Replica lag increased dramatically. Frequently hit 200MB+
4b Validation of test 4 2022-03-29 18:09:54 2022-03-29 19:10:14 264MB 6664 93 829 73 4512 - Not as high of replica lag as test 4, but still averaged over 150MB frequently

Troubleshooting Postgres

Log Location:

/var/log/postgresql/postgresql-<version>-main.log

Out of Sync Replica

Check on the primary database to see if the replica is still connected and streaming. If no replica is connected, you will see a blank table returned.

postgres=# select * from pg_stat_replication \x\g\x
-[ RECORD 1 ]----+------------------------------
pid              | 4643
usesysid         | 16384
usename          | test
application_name | 12/main
client_addr      | 172.31.70.183
client_hostname  |
client_port      | 41482
backend_start    | 2022-03-28 13:11:48.668763+00
backend_xmin     |
state            | streaming
sent_lsn         | 14/61502C98
write_lsn        | 14/61502C98
flush_lsn        | 14/61502C98
replay_lsn       | 14/61502C98
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2022-03-28 13:40:28.312833+00

Check how bad the delay is between the primary and replica. The higher the number, the more out of sync the replica is. The query will return the difference in bytes for how out of sync it is.

So, 200000000 will be 200MB out of sync. Ideally, this metric is below 10MB out of sync, and when it starts to reach high numbers, that usually means the replica has gone down and requires a restart.

select usename, client_addr, state, pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) as delay from pg_stat_replication;
 usename |  client_addr  |   state   | delay
---------+---------------+-----------+-------
 test    | 172.31.70.183 | streaming |     0
(1 row)

Resyncing a Replica

Usually, this is as simple as just restarting the PostgreSQL service on the read replica. If the read replica was misconfigured or fallen too far behind the WAL archive on the primary, you might need to rebuild the replica's database with the below commands.

  1. Delete all the postgres data on the read replica.

    sudo -u postgres rm -r /var/lib/postgresql/12/main
    sudo -u postgres mkdir /var/lib/postgresql/12/main
    sudo -u postgres chmod 700 /var/lib/postgresql/12/main
  2. Redownload the data from the primary database.

    Use this command on the replica.

    • -h primary-ip-addr - This will be the IP address of your primary database.
    • -U username - the username here is the role that you configured within the primary database for the replica to connect with. You can check the name by using \du+ on the primary database.
    • It will prompt for the password you configured for that role.
    sudo -u postgres pg_basebackup -h primary-ip-addr -p 5432 -U username -D /var/lib/postgresql/12/main/ -Fp -Xs -R
    
    • -U test is the replication role configured on the primary to accept communications. If you do not remember the role name, you can run \du+ on the primary database.
    • -D /var/lib/postgresql/12/main/ this tells the command where to output the backup.
    • -h primary-ip-addr The IP address of the primary database.
  3. Restart Postgres on the read replica.

    sudo systemctl restart postgresql

    If PostgreSQL fails to start, use sudo tail -n 300 /var/log/postgresql/* to check what issues occurred during startup.

Check Connection

pg_isready is a utility designed to check if your box can access the PostgreSQL install.

  1. Install pg_isready.

    Replace 12 with the version string of your PostgreSQL instance.

    Example for Postgres 12

    sudo apt-get install postgresql-client-common postgresql-client-12
  2. Test the connection

    This will reach out to the PostgreSQL instance running at the below and make sure it's accepting connections properly.

    You can find all this data in the mattermost config.json file. If you run the below you can use most of the data from the DataSource string to fill this in.

    sudo grep "SqlSettings" -A 10 /opt/mattermost/config/config.json
    pg_isready -d <db_name> -h <host_name> -p <port_number> -U <db_user>

    Expected Response:

    accepting connections
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment