Skip to content

Instantly share code, notes, and snippets.

@stowler
Created January 23, 2014 06:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stowler/8573753 to your computer and use it in GitHub Desktop.
Save stowler/8573753 to your computer and use it in GitHub Desktop.
Timeline of events for 1599 cooling outage. Pasted from the notes I started the day after the outage.
======================== F, 20140103: ========================
- 4:47p - email from hippostore: ERROR - Battery temperature is too high
- 4:57p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=2
- 4:58p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=3
- 5:16p - email from hippostore: ERROR - Enclosure temperature sensor error: encl=0, temperature sensor=0
- 5:16p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=1
- 5:16p - email from hippostore: ERROR - Enclosure temperature sensor error: encl=0, temperature sensor=1
- 5:30p - email from Emory: Power outage at 1599 clifton rd COLO server room (cooling systems impacted)
- 5:33p - email from hippostore: ERROR - Battery charging fault
- 5:38p - I email Keith and depart for 1599: True positive alarms: all cooling died in 1599...on my way in a sec
- 5:42p - email from Dan: his servers hot, wants to be sure I saw Emory's 5:30 announcement
- 5:50p - I arrive at1599
- 5:52p - email from Keith: hippostore RAID controller is 53 C, RAID battery failed. Drive temps are 55 degrees. Keith shuts down remotely.
- 6:07p - I push the power button on all qballs (Keith and I can't get them to respond otherwise). Greg Keys doesn't know who owns the still-active equipment below ours.
- 6:22p - portable blowers start arriving
- 6:30p - I call Gopi: someone thinks equipment below us is CSI/BITC
- 6:45p - Lei from CSI/BITC arrives and starts shutting down the equipment below ours
- 6:49p - 119 F ambient according to free-standing probe in rack G20 (low-density, coolest area of room)
- 7:51p - Cooling units repaired. Bringing our equipment back on line. No visible external alarms.
- 8:20p - I emailed update to VA imagers
- 9:00p - email from Emory: Service Restored Date/Time: 2014-01-03 09:00 PM EST
- 9:00p - I leave 1599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment