Skip to content

Instantly share code, notes, and snippets.

@jeanfbrito
Forked from andywer/postgres-outage.md
Created October 2, 2018 13:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeanfbrito/3cdbdf616022dbca006be5575ff35ac7 to your computer and use it in GitHub Desktop.
Save jeanfbrito/3cdbdf616022dbca006be5575ff35ac7 to your computer and use it in GitHub Desktop.
Post Mortem: Postgres outage on 2018-08-20

2018-08-20: Postgres failure

What happened

The PostgreSQL container stopped unexpectedly, was automatically restarted, but suddenly didn't accept any connections anymore. Neither from the API service containers nor from the Macbook over the internet.

Error in logs:

FATAL:  pg_hba.conf rejects connection for host "10.0.1.2", user "postgres", database "******", SSL off

Cause

Two lines were added at the beginning of the /var/lib/postgresql/data/pg_hba.conf file (automatically by some script of the Postgres docker image?), even before the initial comment block:

host all postgres 0.0.0.0/0 reject
host all pgdbadm 0.0.0.0/0 md5

The first line caused the outage, since it would reject any connection using that user.

Fix

$ docker ps
$ docker exec -it <postgres-container-ID> bash
# In the container:
$ vi /var/lib/postgresql/data/pg_hba.conf

Change first line of pg_hba.conf or (untested:) remove the top two lines:

- host all postgres 0.0.0.0/0 reject
+ host all postgres 0.0.0.0/0 md5

Run (still in the Postgres container):

$ su - postgres
$ pg_ctl reload

That's it. I was now able to connect from the Macbook and the API services worked again.

How to prevent in the future

Not possible to prevent until the cause of the configuration change is known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment