- This is a blameless Post Mortem.
- We will not focus on the past events as they pertain to "could've", "should've", etc.
- All follow up action items will be assigned to a team/individual before the end of the meeting. If the item is not going to be top priority leaving the meeting, don't make it a follow up item.
The yum_repository
resource was added and released in chef-client
version 12.14.60. The resource did not fully support the custom resource shipped as part of the yum cookbook.
The 12.14.60 release of Chef Client included a number of other regressions as well. We will use the specific regressions around the yum_repository
resource as a proxy for the release and not dig into the specifics of the other regressions though they will be captured in this incident report.
All times listed in UTC.
- 4-Sep-2016 18:42 - GitHub issue 5282 - yum_repository action :delete doesn't seem to work opened.
- 9-Sep-2016 23:02 - Pre-release Announcement for Chef-Client 12.14.60 posted to the Chef Mailing List.
- 9-Sep-2016 21:10 - CHANGELOG includes "Create and delete yum repositories #5187 (thommay)" as a listed change.
- 12-Sep-2016 13:56 - Chef Client Release Notes 12.14
- 14-Sep-2016 18:19 - Chef Client 12.14.60 released
- 14-Sep-2016 19:09 - GitHub issue 5317 - Yum Repository with
url
parameter fails opened. - 14-Sep-2016 20:13 - GitHub Issue 5318 - User manage_home not working in chef-client 12.14.60
- 14-Sep-2016 21:13 - GitHub issue 5282 - https://github.com/chef/chef/issues/5282 fixed in the master branch.
- 14-Sep-2016 22:01 - Release Announcement posted to the Chef Mailing List.
- 14-Sep-2016 22:07 - GitHub issue 117 opened.
- 14-Sep-2016 22:17 - GitHub issue 5321 - CHANGELOG.md doesn't mention addition of yum_repository opened.
- 14-Sep-2016 22:24 - GitHub issue 5317 - Yum Repository with
url
parameter fails fixed in the master branch. - 14-Sep-2016 22:50 - GitHub issue 5321 - CHANGELOG.md doesn't mention addition of yum_repository issue updated to include a link to the documentation in the CHANGELOG.
- 14-Sep-2016 22:50 - Release Notes and CHANGELOG updated to clarify the inclusion of the
yum_repository
resource in commit 0987d724. - 14-Sep-2016 22:51 - GitHub issue 5321 - CHANGELOG.md doesn't mention addition of yum_repository closed.
- 15-Sep-2016 01:11 - current build of chef-client released that includes the fixes.
- 15-Sep-2016 12:23 - GitHub Issue 5324 - systemd_unit resource not creating units anymore
- 15-Sep-2016 19:28 - TRI-259 Jira ticket opened internally.
- 19-Sep-2016 -
yum_respository
added to the codebase for the Chef Docs site in commit 69c1995c. - 19-Sep-2016 23:46 - Chef Client 12.14.77 Released. This release resolves the issue.
- 20-Sep-2016 23:20 - yum_repository added to Chef Docs website.
- 20-Sep-2016 23:22 - GitHub issue 117 closed.
- 28-Sep-2016 23:59 - GitHub Issue 5397 - gpgcheck default on new yum_repository
- Time to detect - 70 minutes - 18:19 - Chef Client released, 19:09 - GitHub issue 5317 opened.
- Time to resolve - 6 days, 5 hours, 1 minute
- 6 hours, 52 minutes - 14-Sep-2016 18:19 Chef Client 12.14.60 released, 15-Sep-2016 01:11 current build of chef-client released that includes the fixes.
- 5 days, 5 hours, 27 minutes - 14-Sep-2016 18:19 Chef Client 12.14.60 released, 19-Sep-2016 23:46 Chef Client 12.14.77 released
- 6 days, 5 hours, 1 minute - 14-Sep-2016 18:19 Chef Client 12.14.60 released, 20-Sep-2016 23:20 Doc site includes
yum_repository
resource
- GitHub issue 5282 - yum_repository action :delete doesn't seem to work was still open at time of release.
- Not recognized as a regression.
- Expected: Core provider overrides the cookbook. Actual: The core provider won out.
- The core provider's
provides
method will always win out on systems withyum
.
- No clear communication between release & community engineering
- Not recognized as a regression.
- Moving custom resources from cookbooks into core chef reduces our test coverage on the resources.
- CHANGELOG for 12.14.60 was unclear about the scope of the change at time of release.
- Chef documentation site did not include release notes until 20-Sep-2016, six days after release.
- Other regressions and broken travis and Jenkins builds which
- UID and GID collisions in Jenkins clogged up the build pipeline
- Resolved late Thursday before a "no-release" Friday
- Engineering was not aware of the Zendesk / Customer Support issues being opened.
- Bumping Ruby in the same release - that's a big change!
- Release Notes and CHANGELOG on GitHub updated to reflect the changes.
- Updates made to master branch of source code to address the issues.
- Nightly release of chef-client with the required changes.
- Released a version of the yum cookbook that added deprecation warnings
- Failed chef-client runs for anyone using a
yum_repository
resource with aurl
parameter or adelete
action and chef-client version 12.14.60.
GitHub Issues
- 14-Sep-2016 20:13 - GitHub Issue 5318 - User manage_home not working in chef-client 12.14.60
- 15-Sep-2016 12:23 - GitHub Issue 5324 - systemd_unit resource not creating units anymore
- 28-Sep-2016 23:59 - GitHub Issue 5397 - gpgcheck default on new yum_repository
Build Failures
- 15-Sep-2016 - Travis build fails
- 16-Sep-2016 03:00 - Travis build passes with change 40aa8daf. (Friday morning)
- 16-Sep-2016 - Jenkins CI, acceptance environment, fails (Friday)
- 19-Sep-2016 22:20 - Jenkins fixed with change (Monday) ca81ec75
- Decide and document a process for recommending our customers hold off on upgrading. COOL team and Product Management
- Suggest: Announce all regressions that are going to trigger a bug fix release.
- Discuss moving release target days of the week: Pre-release announcement moves to Wednesday. Target release moves to Monday. COOL team and Product Management
- Migrate tests from cookbooks with custom providers when migrating providers to core chef-client Community Engineering / Tim Smith & Lamont Granquist
- Document how the provider resolver works and provide guidance when migrating providers to core chef-client. Lamont Granquist
- Research other projects' documentation practices. How can we get better release notes at time of PR? Ryan Hass
- Additional automation for creating docs from source code David Wrede
- Ideas, suggestions, and possibilities:
- Autogenerate resource documentation from code in a similar fashion to InSpec team
- Move docs for chef-client into the code base
- Ideas, suggestions, and possibilities:
A post outlining this incident is available on the Chef blog. TODO: Update this with the link.