Skip to content

Instantly share code, notes, and snippets.

@cwebberOps
Created July 14, 2014 18:24
Show Gist options
  • Save cwebberOps/7de165a9643b42c2e5db to your computer and use it in GitHub Desktop.
Save cwebberOps/7de165a9643b42c2e5db to your computer and use it in GitHub Desktop.
Berkshelf API Protocol Mismatch

2014-07-14 - Berkshelf Protocol Mismatch - Community

Start every PM stating the following

  1. This is a blameless Post Mortem.
  2. We will not focus on the past events as they pertain to "could've" "should've"...
  3. All follow up action items will be assigned to a team/individual before the end of the meeting. If the item is not going to be top priority leaving the meeting, don't make it a follow up item.

Incident Leader: Christopher Webber

Description

https://supermarket.getchef.com/universe was returning http URLs instead of https URLs, causing Berkshelf clients to crash.

Timeline

All times UTC

  • Unknown - Issue occurs
  • 2014-07-14 09:07:32 - (#chef) acoulton reports seeing an issue.
  • 2014-07-14 09:07:57 - (#chef) jrwesolo confirms issue.
  • 2014-07-14 09:10:06 - (#chef) coderanger mentions cwebber.
  • 2014-07-14 09:11:19 - (#chef) cwebber responds and begins investigating
  • 2014-07-14 09:13:49 - cwebber confirms that /universe is returning http URLs
  • 2014-07-14 09:14:12 - (#chef) cwebber notifies #chef that he he has determined the issue
  • 2014-07-14 09:17:29 - cwebber flushes the cache for the Universe controller. knife ssh 'role:supermarket-app AND chef_environment:supermarket-prod' -a ec2.public_hostname '(cd /srv/supermarket/current && sudo RAILS_ENV=production bundle exec rails runner "Rails.cache.delete(Api::V1::UniverseController::CACHE_KEY)")'
  • 2014-07-14 09:18:23 - (#chef) cwebber asks acoulton to confirm that things are fixed.
  • 2014-07-14 09:20:27 - (#chef) acoulton confirms that issue is resolved
  • 2014-07-14 09:20:38 - (#chef) ambient sound also confirms that the issue is resolved
  • 2014-07-14 09:33:30 - cwebber updates https://status.getchef.com

Contributing Factors

The cache key being used to store the universe endpoint doesn't properly handle the protocol differences. This results in the protocol of the request that generated the cache to be the protocol in the cache.

Stabilization Steps

Flushed the cache for the Universe controller. knife ssh 'role:supermarket-app AND chef_environment:supermarket-prod' -a ec2.public_hostname '(cd /srv/supermarket/current && sudo RAILS_ENV=production bundle exec rails runner "Rails.cache.delete(Api::V1::UniverseController::CACHE_KEY)")'.

Impact

Berkshelf v3 clients crashed on attempting to download the first cookbook.

Corrective Actions

  • Add monitoring to ensure that none of the URLs are returning as http
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment