After many months of work, we deployed GitHub to production using Ruby 2.7 in July. For those who aren’t familiar with GitHub’s stack, we’ve been running on Ruby since the beginning. Many years ago, we ran GitHub on a fork of Ruby (and Rails!) and while that hasn’t been the case for some time, that experience taught us how important it is to keep up with new releases.
Ruby 2.7 is a unique upgrade because the Ruby Core team has deprecated how keyword arguments behave. With this release, future versions of Ruby will no longer accept passing an options hash when a method expects keyword arguments. At GitHub, we’re committed to running deprecation-free on both Ruby and Rails to prevent falling behind on future upgrades. It’s important to identify major changes early so we can evolve the application when necessary.
In order to run Ruby 2.7 deprecation-free, we had to fix over 11k warnings. Fixing that many warnings, some of which were coming from external libraries, takes a lot of coordination and teamwork. In order to be successful we needed a solid strategy for sharing the work.
Just like we did with our Rails upgrade, we set up our application to be dual-bootable in both Ruby 2.6 and Ruby 2.7 by using an environment variable. This made it easy for us to make backwards compatible changes, merge those to the main branch, and avoid maintaining a long running branch for our upgrade. It also made it easier for other engineering teams who needed to make changes to get their system running with the new Ruby version. Due to how large our application is (over 400k lines!) and how many changes go in daily (100’s of PRs!), this drastically simplifies our upgrade process.
Once we had the build running, we weren’t quite yet ready to ask other teams to help fix warnings. Since Ruby warnings are simply strings in the test output we needed to capture the deprecations and turn them into lists for each team to fix.
To accomplish this we monkey patched the
Warning module in Ruby. Here’s a simplified version of our monkey patch:
The patch stores the deprecation warning and the test path that caused the warning in a
WarningCollector object which writes the warnings to a file and then processes them:
WarningCollector#process method stores all the warnings in a file called
warnings.txt. We then parse warnings using
CODEOWNERS and turn them into files that correspond to each owning team.
Once we had all the warnings processed, we opened issues for those teams with easy-to-follow directions for booting the application in the new Ruby version. Our warning reports included the file emitting the warning, the warning itself, and the test suites that triggered the warnings. They looked like this:
This process helped us avoid duplicating work across teams and made it simple to determine ownership and status of each warning.
We tracked warning counts in the Ruby 2.7 CI build to ensure that new code wasn’t introducing new warnings. After a few months, coordinating with 40 teams, 30+ gem upgrades, and 11k warnings our CI build was 100% warning-free. Gems that were unmaintained were replaced with maintained gems. Once we had fixed the warnings, we altered our monkey patch to raise errors in Ruby 2.7 which ensured that all new code going into the GitHub codebase was warning-free.
Benefits of Upgrading to 2.7
You may be reading this and wondering why it’s worth doing all this work and investing the engineering resources and time in the Ruby upgrade. If you’ve been writing Ruby for a while you’re likely aware of the difficulty with this particular upgrade. It’s been the topic of conversation in the Ruby community since before the release in December. Regardless of how hard this upgrade was, we saw an impressive improvement in performance. The Ruby Core team is well on their way to fulfilling the promise of Ruby 3.0 being 3x faster.
First, we saw a drop in the amount of time it takes the application to boot in production mode. In production (this is when the entire application is eager loaded) we saw our boot time drop from an average of ~90 seconds to ~70 seconds. That’s a 20-second drop. Here’s a graph:
This faster boot time means faster deploys which means you get our features, bug fixes, and performance improvements faster as well!
In addition to an improvement in boot time, we saw a decrease in object allocations which went from ~780k allocations to ~668k allocations. Object allocations affect available memory so it’s important to lower these numbers whenever possible.
Aside from the performance benefits of upgrading, ensuring you stay on the most recent version of your languages and frameworks helps keep your application healthy. Through this process we found a lot of unowned code that was no longer used in the application and deleted it. We also took this opportunity to remove or replace unmaintained gems in our application.
For gems that were maintained we gave back to the community by sending patches for any gems that were emitting warnings in our application including Rails, rails-controller-testing, capybara, factory_bot, view_component, posix-spawn, github-ds, ruby-kafka, and many others. GitHub believes strongly in supporting the open source community and upgrades are one of many ways that we do so directly.
There are risks to deploying any major upgrade, but at GitHub we’ve designed processes that reduce this risk drastically.
For Ruby and Rails upgrades, we run in dual-builds until we’re sure all the tests are passing and code is stable. In addition, we have all teams that work on the core product click test their area of the codebase in a staging environment to ensure there are no obvious issues with the new version.
Rolling out the upgrade is a big deal, so we do it carefully by increasing the percentage of traffic running on the new version and verifying each deployment is error-free in Sentry and regression-free in Datadog. For this deploy, we rolled out to 2% of traffic and quickly saw a new frozen string exception. Due to our process we were able to rollback quickly and less than 10 users saw an error in one endpoint.
Once we had a fix for the frozen string exception, we restarted the rollout process and again deployed to 2% of traffic. We let this one sit for 15 minutes before going to the next percentage: 30% of Kubernetes partitions. Again we waited about 15 minutes and after verifying there was no regression we deployed to another 30% to total 60% of Kubernetes partitions.
Finally, we deployed to 30% of our non-Kubernetes deployment partitions. These deploys take longer because they need to compile Ruby. It’s a bit nerve-wracking waiting 15 minutes for Ruby to compile, but everything went smoothly. From there we did a full-production deploy and merged the upgrade after 30 minutes. Overall the entire deploy took about 2 hours.
At GitHub, we’ve invested in building out processes for deploying Ruby and Rails upgrades so that we can be confident they are the lowest possible risk. We had no downtime while deploying the Ruby upgrade and our customer impact was almost zero.
Was it worth it?
For any companies that are wondering if this upgrade is worth it the answer is: 100%. Even without the performance improvements, falling behind on Ruby upgrades has drastic negative effects on the stability of your codebase. Upgrading Ruby supports your application health, improves performance, fixes language and framework bugs, and guides the future of the language!
At GitHub, not only do we believe in the open source community, we believe that a strong foundation is the first step to a stable, resilient, and functioning application. Running on the most recent version Ruby helps us do just that. We’re looking forward to Ruby 2.8 and beyond. Happy upgrading!