Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Review of the "Modern Release Engineering in a Nutshell" paper. Initially, I have quoted few lines of the paper and have explained my views on the same.
Broader questions include: what will be the long-term effect on user-perceived quality of releases [43, 45],
how quickly will technical debt ramp up when release cycles are so short and can end users keep up with a 
continuous stream of new releases?

Consider a single Kubernetes cluster running on a particular version in a production environment with live request traffic. For every new release, a new version is deployed. During a release, a release engineer might expect a certain delay or downtime due to the merge conflicts, incorrect package dependency resolution, etc. That is what I understood when the author says increased technical debt.

Such technical debts can be minimized by using high availability cluster setups to divert the traffic. DNS providers like NS2, Cloudflare, and Amazon Route 53 have a way of health checking your applications. We can configure the DNS providers with a failover i.e. if one cluster fails to respond to the requests, DNS providers can route the traffic to another cluster. Alternatively, we can use a global load balancer like the one that AWS provides.

Despite careful reviewing of individual commits, branch merges still pose a risk of so-called merge 
conflicts [18, 67], which cause significant costs for companies [17]. Basically, while a team is working 
in isolation in its branch, other teams are doing the same in their own branches. The longer that this
parallel development takes before merging, the more code changes have been merged in the meantime into 
the parent branch, and the more incompatible both branches will have become.

As a developer, text-level conflicts do not scare me because when I rebase and edit the code, the conflicts are resolved. But for a release engineer when a security breach (CVE) is fixed and a patch is created for the same, the release engineer has to also cherry pick those changes to the previous supported versions of the application. While doing so, the engineer might come across conflicts that could cause delays in the release. I like this part of the paper, not only because it demonstrates practical use cases but also because it introduced me to new types of conflicts like test-level conflict, build conflict, and semantic conflicts.

Recent empirical studies suggest that the rapid feedback loop provided by CI has a positive effect on 
team productivity, while not significantly impacting code quality [88].

Not only does CI have a positive effect on team productivity, it also has a significant impact on the code quality. As part of CI, many static code analysis tools are run after the submission of a PR like Pylint, ESlint, JSLint, etc, that help in identifying the unused variables, syntax errors, and incorrect indentation which improve the readability and quality of the code to a great extent.

To keep Cl builds short, Cl typically does not run a full test suite after compilation, but a 
representative subset.The idea is that the rest of the test suite, as well as tests that take 
more time, such as integration, system or performance tests, will be run in later stages, typically
at set intervals such as nightly or on weekends.

I partially agree with this, as CI requires resources and can be an expensive affair for some organizations. The majority of open source projects that I have worked on, run unit, functional, and integration tests on each pull request. It becomes essential in these projects since pull requests come from a variety of contributors. There might be a possibility that few ‘bad actors’ raise a pull request containing a Common Vulnerabilities and Exposures (CVE) that does not catch the eye of a reviewer at first glance and the code gets merged. However, by running the functional tests and the integration tests, one could have recognized those errors. Perhaps, when there are untrusted parties involved, it is a good practice to run the CI on each PR. On the other hand, for an internal team, it is okay to implement CI on a nightly basis.

Points that I would like to add to this paper, improvements and few solutions:

  • Changelogs : The primary reason for a changelog is to convey noteworthy changes. Let us consider a case, where a project is running in a production environment and was affected by a Common Vulnerabilities and Exposures (CVE). After upgrade, if a client is in doubt whether or not their system is affected by that particular CVE, the client can simply check the changelog to identify whether or not that patch is implemented. For best practices refer to this document. It is also useful for testers, developers and management to keep a track of the progress in the project.
  • Secure Release Framework: The main idea here is to create a secure release pipeline for the application build. Inspired by the recent SolarWinds hack, where because of a bad actor the malicious code was injected in the updates of the software, resulting in gaining remote access of the machines in which the application was installed. Using gpg verify or sha1sum to identify if the software released is the same as the one installed/updated on end-users machines.
  • Cherry Picking and Backporting: while supporting multiple versions of applications it is important for release engineers to identify and cherry pick or backport relevant patches. Isolating those patches from the rest of the code, making sure of it having no side effects, and then cherry picking the crucial patches to previous versions.

Agreeing with the author, I believe that the highlight of the paper is the checklist. Looking at the fast growth of the field of release engineering, I would like to add a few more points to the checklist:

  • Architecture and changing environment: a release engineer must be cognizant about the target environment that is being used. The successful release of an application is also dependent on the type of processor or hardware that is being used for the application.
  • Choice of programming language: programming languages today play a vital role in deployments. For example, Go binaries are statically-linked i.e. they are locked and all the bindings have been done at the compile time. Therefore, one can not face dependency or version issues as updates will probably not break the go binaries.

Building on top of this paper, if time permitted, I would like to also mention few topics that are relevant to this topic:

  1. Reliability of 9
  2. SLA and SLO: how they affect today's release engineering processes?
  3. Role of a Site Reliability Engineer(SRE) in the release process.
  4. With today's technological advancements, how companies are not deploying monoliths and are rewriting software to serverless. While using monolithic architecture how would the release engineer think about the capacity planning?
  5. Shift-left testing
  6. Fedora release packaging
  7. Signed packages
  8. How agentless configuration management tools like Ansible have played an important role in the release process?
  9. Importance of Continuous Delivery (CD).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment