mattfarina/k8s-uncommon-ci.md

## k8s-uncommon-ci.md

      
    Raw
  

              k8s-uncommon-ci.md
            
          
    Kubernetes Uncommon CI And Automation Tooling Experiences

Kubernetes has a number of unique challenges, such as the high number of pull requests being merged to the main kubernetes repositories, which have necessaitated building custom tooling to handle these situations. Many projects of this size and scale have gone through the same process, such as Cloud Foundry which has released their tooling for others to use via Concourse.
Why it's important to capture uncommon tooling features and experiences

Developers who work on Kubernetes will often work on other projects in parallel. For example, they may have their own deployment tools, be building applications that run in Kubernetes, or have applications they work on for the companies they work for.
In addition to working on other projects in parallel, developers typically come from other software projects to start on Kubernetes. Few, if any, have ever started their software development journey on Kubernetes itself.
When tooling features are implemented in uncommon ways, compared to other projects developers may use, it means that developers need to:

Slow down to figure out what's going on
When context switching they need to pay attention to how the tooling working in their current context
Spend extra time learning how to develop Kubernetes beyond the code itself

All of this is time developers need to spend that isn't related to working on the code itself.
The people most affected by this are first time contributors and those who don't contribute regularly.
The goal in capturing these and trying to understand them is to inprove the contributor experience.
Uncommon CI Tooling Experiences

The following is a list of noted uncommon experience elements.
OWNERS files, Approvers, Reviewers, and Automation merges

Kubernetes, the repository, has a fairly unique problem. Being a monorepo, which itself is unique in the current landscape, there are different people who own or are prepared to review different parts of the code. To control access and enable reviewers to be part of the workflow, the Kubernetes project has layered in tooling on top of GitHub.
The tooling, which is currently being implemented in the tide, blunderbus, and prow projects, enables two complementry things:

The use of OWNERS files to detail who is approved to merge code for a part of the tree and who is a reviewer of the code for that section of the tree.
Automation to merge for parts of the tree, where an individutal does not have write access to the repository itself, and automation to request reviews on pull requests from reviewers.

This entire system is a Kubernetes specific setup and invention. Other open source projects do not do this. Due to the rate of pull request merging for the main Kubernetes repository, merge automation is required. The merge rate has passed the rate that manual merging is an option.
GitHub Testing Roll-up Results

Tide is a part of the Kubernetes infrastructure that enables automation to merge pull requests. It reports status alongside testing tools, such as Jenkins, TravisCI, CircleCI, and other tools. Where tide is different is that it's not about testing passing or failing and if the pull request is "passing". Instead, it's about the merge automation. This is a context changes for developers.
Consider projects such as Rails or React. When you visit their listings of pull requests (e.g, Rails here and React here) you can see the status of testing next to the pull request. At a glance a maintainer can see if tests are passing or failing for a set of pull requests.
Kubernetes is different. For example, see the pull requests for the kubernetes and test-infra. Here the status is almost always the yellow dot, which indicates it's waiting on responses to come in, or the red X, noting that something has failied.
Many of the pull requests on a kubernetes repository are passing all the tests but still report that they are waiting on results. This is because of tide and the merge automation. Tide doesn't report a positive status until the pull requset has been reviewed and approved to merge along with passing all tests.
This changes the way pull request dashboards and rolled up testing status are used on Kubernetes repositories compared to the way they are used in other GitHub projects and even projects on other systems that have copied this style, like Gitlab.
Use of Bazel

Kubernetes is using the Bazel build tooling on numberous repositories. This toolchain, which came out of Google in a similar way to Kubernetes, has numerous benefits for some projects but is not in common use.
There are a few things to look at with Bazel.
First, the list of users is fairly small and many of the projects have come out of Google. Many of the new contributors to Kubernetes are not coming from Google or these other projects. So, there is a lack of familiarity. There is familiarity to those in Google but the use of Bazel is a place where Google does things a little different from many others.
Second, when building large projects it's useful to heavily cache build objects. This can make a huge different in time and resource usage for builds and test runs. Many languages don't have built in features to help with this so tools on the outside, such as Bazel, can help with this. This is especially true for large projects, like Kubernetes.
But, Go has the ability to generate these objects and they can be cached. This can even be done in tools like TravisCI, CircleCI, and Jenkins. Seeing this as a need, the Go toolchain developers built it in. The generated objects need to be cached and restored on test runs.
Conclusions

Any conclusions here are targed at contributor experience, in particular for those who are not full time and part of the "in crowd". The conclusions are weekly held and should be debated.

A review of how often Bazel failures negatively impact test runs should be looked at. How often do bazel configuration failures cause test runs to fail and how much time is lost correcting those?
A people cost/benefit analysis of Bazel usage should be looked at. Is Bazel caching faster better than caching Go objects?  Is caching at all a benefit? For some repositories, espcially large ones, there may be a difference. For some of the small ones the difference may be minimal.
Remove ways Kubernetes deviates from standard conventions and processes. Build tools that are additive rather than convention changing. For example, the way testing status on pull requests is different for Kubernetes from other projects.
Add at a glance documentation for how Kubernetes tooling is layered on or enhances GitHub. For first time contributors this may be an automated post to the pull request along with pointers to more information. Probot already has a tool to leave a comment for first time contributors for a repo that could be leveraged.

Thanks for reading this and have a wonderful day.