On May 12, for about five hours in the afternoon EDT, mastodon.technology experienced intermittent outages, lasting from twenty seconds to five minutes. The outages were caused in the course of an investigation into a failure to backup the database. The database is fine, and turned out that the backup method itself wasn't working properly.
March 10, 2018: I upgrade the instance to Mastodon 2.3.0, from 2.1.x. After the upgrade, I notice Sidekiq keeps segfaulting and dropping jobs. I investigate and open an issue, which is later resolved. However, in the process of investigation, I upgrade my host's Docker version.
May 12, 2018, Around Noon EDT: I notice that local backups are not nearly as large as they should be. The offsite backups on S3 indicate the problem started on the March 11. I reproduce the failure to run pg_dump
, which hangs every time. It gets stuck around the same table