Skip to content

Instantly share code, notes, and snippets.

@hugodias
Forked from mlafeldt/postmortem.md
Last active November 22, 2018 23:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hugodias/beef3e91e5cbd58f4df88c8b7edc6361 to your computer and use it in GitHub Desktop.
Save hugodias/beef3e91e5cbd58f4df88c8b7edc6361 to your computer and use it in GitHub Desktop.
Post mortem Siteground - All servers down

Siteground (incident #1)

Date

2018-11-22

Authors

  • Hugo

Status

Completed

Summary

All of our servers in Site Ground were down. Blogs hosted on that servers were also down.

Impact

At least 200 blogs were down for at least 37 minutes

Resolution

Siteground support fixed it.

Detection

I've opened Siteground sever 2 and noticed that it was loading forever. Entered on a blog on that same server (https://blog.planetarionaescola.com.br/) and it was down too.

Action Items

Action Item Type Owner Bug
Opened a chat with Siteground's support investigate hugo n/a DONE

Timeline

2018-11-22 (all times UTC−03:00)

Time Description
20:40 I've tried to add SSL to blog.planetarionaescola.com.br and I notice that the server 184.154.163.146 was down for some reason
20:45 I've connected with Siteground's support and Antonio K. confirmed that all servers are down: "At the moment we seem to experience an issues with the connectivity of the server and our administrators are working on it and at the moment we can only wait for them to fix it"
20:51 I've managed to find a status page (https://ot.singlehop.com/incidents/z8m9dnvh1hqk) of Siteground's servers and this is the message on the panel: "We're currently experiencing problems with network reachability in Chicago. We're aware of the problem and investigating the cause. We'll provide updates shortly."
20:53 I've asked Antonio K. to send an email when the service get's back online. He replied: "I cannot really say if there will be an email about that , as we have still not been informed if a mass email will be sent"
20:56 Noticed that the website is back online
20:57 WHM still down but Atonio shared a new link to access the cpanel: https://usm1350.sgded.com:2083
21:00 INCIDENT ENDS, WHM and all blogs are back online.

Supporting Information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment