2014-07-08 - Berkshelf v2 Outage - Community
Start every PM stating the following
- This is a blameless Post Mortem.
- We will not focus on the past events as they pertain to "could've" "should've"...
- All follow up action items will be assigned to a team/individual before the end of the meeting. If the item is not going to be top priority leaving the meeting, don't make it a follow up item.
Incident Leader: Christopher Webber
Users of Berkshelf v2.x were unable to follow redirects to https and as a result, unable to do work.
All Times UTC
- 2014-07-07 19:15:00 - DNS is updated in Dyn making community.opscode.com and cookbooks.opscode.com be served by Supermarket.
- 2014-07-07 19:40:00 - Launch is declared complete.
- 2014-07-07 19:51:24 - pcorliss reports in #berkshelf that there are issues with Berkshelf v2.
Chef response did not contain a JSON body
- 2014-07-07 20:17:48 - pcorliss reports the same issue in #chef
- 2014-07-07 20:20:20 - davidordave confirms the issue pcorliss is seeing
- 2014-07-07 20:31:35 - cubed confirms issue as well.
- 2014-07-07 20:38:39 - Adam reaches out to cwebber and jtimberman to look at issue in #chef.
- 2014-07-07 20:40:37 - cwebber starts discusion with pcorliss about the issue in #berkshelf.
- 2014-07-07 20:42:54 - pcorliss provides a gist of the error output.
- 2014-07-07 20:44:13 - cwebber makes note of the http -> https uplift issue we had seen in testing.
- 2014-07-07 20:47:09 - cwebber reaches out to Adam to confirm escalation path
- 2014-07-07 20:48:00 - cwebber notifies the Ops, Dev and Community rooms in HipChat to notify that incident is being started
- 2014-07-07 20:48:00 - Ops On-Call is notified via HipChat that we are starting an incident for a Berkshelf v2 Outage
- 2014-07-07 20:50:00 - cwebber recaps issue in Incident room in HipChat
- 2014-07-07 20:51:00 - sethvargo updates http://status.opscode.com
- 2014-07-07 20:52:00 - sethvargo dives into code to verifiy the issue from the code
- 2014-07-07 20:53:00 - jtimberman begins work on allowing cookbooks.opscode.com to pass-thru without the 301 to https.
- 2014-07-07 20:59:00 - jtimberman explains fix prior to implementation
- 2014-07-07 21:03:00 - @jgoldschrafe reports that he is seeing similar issues with an on-prem berkshelf-api server https://twitter.com/jgoldschrafe/status/486254022890778624
- 2014-07-07 21:03:00 - @sethvargo responds with a fix to use https://supermarket.getchef.com/api/v1 instead of http://cookbooks.opscode.com/api/v1
- 2014-07-07 21:04:00 - jtimberman posts diff of changes for review
- 2014-07-07 21:09:00 - cwebber updates the attributes for the staging environment
- 2014-07-07 21:09:00 - jtimberman uploads v2.4.2 of the supermarket cookbook to Chef Server
- 2014-07-07 21:09:00 - jtimberman uploads changes to the supermarket-app role
- 2014-07-07 21:10:00 - CCR (chef-client run) on supermarket-app in prod
- 2014-07-07 21:14:00 - jtimberman begins process of attempting to install berkshelf 2.0.17 for testing
- 2014-07-07 21:16:00 - cwebber confirms calls to http://cookbooks.opscode.com pass through as http
- 2014-07-07 21:19:05 - cwebber reaches out to pcorliss, cubed, davidordave to confirm the fix
- 2014-07-07 21:21:16 - pcorliss repots that things are working again
- 2014-07-07 21:22:48 - davidordave responds that he is still seeing errors: http://pastebin.com/NmLF00a4
- 2014-07-07 21:26:09 - icarus reports that https://supermarket.getchef.com/cookbooks/ant is white paging
- 2014-07-07 21:26:00 - Focus swiches away from this incident to Supermarket being unresponsive
- 2014-07-07 21:50:00 - status.opscode.com updated to reflect the ongoing issue http://status.opscode.com/post/91084396786/berkshelf-2-outage-ongoing
- 2014-07-07 22:06:00 - status.opscode.com updated with resolved status. http://status.opscode.com/post/91085603896/berkshelf-2-outage-resolved
- 2014-07-07 23:58:00 - Josh Glass reports in Incident room that he is still seeing issues with berkshelf
- 2014-07-08 00:00:00 - Ryan Cragun notes that the issue is with https://github.com/ruby/ruby/blob/v1_9_3_547/lib/open-uri.rb#L235-244
- 2014-07-08 00:08:00 - sethvargo begins work on correcting the bug with Berkshelf 2.
- 2014-07-08 00:09:00 - jtimberman points to http://mislav.uniqpath.com/2011/07/faraday-advanced-http/ for more info.
- 2014-07-08 00:11:00 - sethvargo makes note that this is actually a bug in ruby. http://stackoverflow.com/questions/10013293/open-uri-is-not-redirecing-http-to-https
- 2014-07-08 00:20:00 - sethvargo opens https://github.com/berkshelf/berkshelf/pull/1251
- 2014-07-08 00:41:00 - reset releases Berkshelf 2.0.18
Ruby has a bug in the open-uri library that doesn't handle the redirect from http to https.
- Allowed cookbooks.opscode.com to be served via http
- Released Berkshelf v2.0.18
- Adam advised users in irc to update their sources for berkshelf
Users that use Berkshelf v 2.x were unable to use cookbooks.opscode.com until a new version was released.
Duration: ~ 5.5 hrs
- Make note in the README for supermarket cookbook to visit berkshelf 2 for testing (cwebber)
- Post updates to status.getchef.com to IRC (cwebber)