Skip to content

Instantly share code, notes, and snippets.

@mattfarina
Last active February 21, 2023 17:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save mattfarina/7627cb5ebb8fc01bfd62f4a6942fce04 to your computer and use it in GitHub Desktop.
Save mattfarina/7627cb5ebb8fc01bfd62f4a6942fce04 to your computer and use it in GitHub Desktop.
Large Projects on GitHub

Large Projects on GitHub

This document attempts to capture some detail about large projects on GitHub. That includes looking at what tools they are using, what processes they use, and how much interaction they have.

This document is intended to provide a high level overview rather than dig into all of the details.

Projects

The projects that will be looked at are:

  1. Ruby on Rails: With over 3,400 contributors and many more users, Rails is one of the more popular and active projects on GitHub.
  2. Homebrew: The mac package manager had so many contributions to the repository and so many contributors (over 6,000) that they reworked their repository structure to become more maintainable.
  3. Bootstrap: The Web UI library, started at Twitter, is one of the more popular UI libraries in use. With close to 1,000 contributors they've dealt with about the same level of contributors as Kubernetes. One thing that makes Bootstrap interesting is the number of forks which are #2 on all of GitHub.
  4. React: The UI library from Facebook as over 1,100 contributors. The project started about the same time as Kubernetes, as well.
  5. Cloud Foundry: The popular PaaS is similar to Kubernetes in the way they organize the codebase. Leveraging multiple organizations and repositories for microservices is similar to the direction Kubernetes is going.

When comparisons with Kubernetes are drawn between repositories on projects that are primairly developed on a single repository the Kubernetes repository is used. When comparisons are drawn to projects using multiple repositories the comparies are to the Kubernetes organization. The document will attempt to note those when they arise.

Note, other infrastructure heavy projects (e.g., Apache Mesos and OpenStack) tend to run on their own infrastructure other than GitHub for daily tasks. GitHub, for these projects, is a mirror of the codebase.

Some quick stats on Rails, at the time the document was written:

  • 3,442 contributors
  • Peak contributions in a period were about 200 and only happened on occasion.
  • Many change sets are small. For example, numerous developers have just shy of 50 commites with having only hundreds or a couple thousand lines of code changed.
  • 710 pull requests are open, 20,028 are closed
  • 360 open issues, 10,654 are closed

Rails uses a minimal set of CI tools via Travis CI and Code Climate. The configuration for Travis and tests run are checked in to the rails git repository alongside the codebase. To probot stale issue closer is in addition to testing tied into the pull requests.

For reference, probot is similar to Kubernetes prow. Probot is a service run by GitHub. The issue closer provided by probot is configured per repo where the current Kubernetes one is configured for an entire org.

While Rails has a high number of contributors those contributions typically come in smaller chunks compared to Kubernetes and the number of pull requets processed per week (~25) is less than the Kubernetes repository (~145). Rails as a project has been around longer than Kubernetes and has grown contributors over time. The size of the codebases are quite different. Rails is about 17% the size of Kubernetes.

Rails usage of CI tools is inline with how projects in general use GitHub. The project does not deviate from standard practices.

Homebrew is broken into several repositories. The package manager itself is in one repository, the standard forumula that people can install are in another repository, some test automation scripts are in another reposotiry, formula to specific platforms (e.g., PHP) are in separate repositories, and the website is in another repository. Different repositories have different purposes.

Homebrew-core, the repository with the most contributors at over 6,700, is mostly small configuration files detailing how to install applications. These are called forumula. These are short ruby files. It has over 105,000 commits, can have several hundreds of pull requests merged in a month, and leverages some continious integration customizations.

The continious testing for the formula happens in a Jenkins cluster. The scripts for that are in another repository. The Jenkins cluster enables testing on macOS for various versions and supported configurations. The system is also able to make Bottles which are binary and packaged distributions for applications for different macOS and architecture versions to support download rather than local builds.

The way Jenkins interacts with GitHub is via standard CI channels.

The brew application, that manages the formula, has had almost 590 contributors. For CI testing, brew leverages Travis CI in a standard setup.

It's worth noting that Homebrew has had so many clones against GitHub, as cloning is part of the workflow, that GitHub worked with Homebrew to optimize the way it interacts with git. In addition, Homebrew has caused GitHub to do work to their infrastructure to support projects with the number of clones Homebrew has. Homebrew is an example of a project GitHub adapted to in order to support.

Like Ruby on Rails, Homebrew uses the Probot stale issues closer.

Bootstrap is, possibly, the most widely used web UI framework. In addition to being widely downloaded and used, Bootstrap has had 975 contributors.

There are a few places where Bootstrap is different from the Kubernetes repo:

  1. The codebase is far smaller
  2. Over the past 6 years there have been fewer than 18,000 commits. Kubernetes can see that many commits in a single year
  3. One committer, the projects lead, has nearly 1/3 of all commits to the project

For testing Bootstrap leverages Travis CI in a standard setup.

In less than 5 years time React has become one of the more popular UI frameworks. While it started out as web it has moved into device UIs as well.

Some quick stats on React, at the time the document was written:

  • 1,163 contributors
  • 54 pull requests are open, 6,426 are closed
  • 315 open issues, 5,202 are closed
  • The top 10 contributors, by commit, have 3,950 commits accounting for about 41% of the total commits. For reference, Kubernetes to 10 contributors account for approximately 14% of the commits

For CI testing, React leverages CircleCI in a standard setup along with a typical pull request workflow.

Cloud Foundry is similar to Kubernetes in several regards.

  1. The project leverages multiple GitHub organizations
  2. There's a scheduler (Diego) that is an approximate competitor to Kubernetes
  3. Deals with Cloud Native Applications
  4. Is backed by big enterprises via a non-profit foundation

While not a direct competitor in terms of features and market there is overlap.

Multiple Organizations and Repositories

Because Cloud Foundry leverages multiple GitHub organizations and multiple repos on a GitHub organization looking at pull requests, issue counts, and starts doesn't offer a good comparison to Kubernetes.

The main Cloud Foundry GitHub organization has 345 repositories on it. Other organizations, such as cloudfoundry-incubator can have many repositories as well. For example, cloudfoundry-incubator has 188 repositories.

Even individual projects can be broken into multiple repositories. Diego and Garden are two examples of that. There are even repositories just to hold the design information and discussion.

Continious Integration Testing

Concourse is a CI toolchain, similar to TravisCI and CircleCI, that came out of the Cloud Foundry community. The Cloud Foundry community operates an instance of Councourse and uses it for testing on numberous Cloud Foundry repositories leveraging standard GitHub CI workflows.

While there are numerous repositories, typically used as microservies is a larger application, integration tests are run. A dashboard, powered by Concourse, can be seen at https://release-integration.ci.cf-app.com.

When Concourse is used on a repository that integration works using typical GitHub workflows.

Conclusions?

This document is designed to fuel conversation rather than drive immediate conclusions.

Where further detail is needed or there are corrections it can be added as comments or the revision of the document may be updated with more detail.

Appendix: Notes on Kubernetes

Because this is being looked at through the eyes of Kubernetes development it can be useful to provide some context and details about Kubernetes.

Lines of code for Kubernetes:

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 Go                   10160      3010475       326333       526328      2157814
 JSON                   154       335193            8            0       335185
 HTML                    67       267185         3929            1       263255
 YAML                   746        34279          774         1847        31658
 Markdown               585        39322         9563            0        29759
 Bourne Shell           313        40362         5311        10497        24554
 JavaScript              19        13806         1559         2913         9334
 Protobuf                88        22437         3748        10655         8034
 Plain Text              25         4454          319            0         4135
 Assembly                36         3996          292           35         3669
 Python                  21         4185          771          722         2692
 Makefile                85         4035          534         1710         1791
 CSS                      4         1468            8            5         1455
 Perl                     8         1128          142          139          847
 C/C++ Header             2         5613          401         4371          841
 Autoconf                16          669           10           45          614
 Java                     2          318           47           71          200
 XML                      3          141           18           24           99
 C                        4          164           33           37           94
 Ruby                     1           70           12            1           57
 Toml                     3           91           18           24           49
 PHP                      1           41            6            0           35
 INI                      3           36            6            0           30
 ASP.NET                  4           18            0            0           18
 SQL                      1            8            1            0            7
--------------------------------------------------------------------------------
 Total                12351      3789494       353843       559425      2876226
--------------------------------------------------------------------------------

Excluding vendored dependencies, Kubernetes is:

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 Go                    6329      1448459       153478       194565      1100416
 HTML                    67       267185         3929            1       263255
 JSON                   143       217972            5            0       217967
 Bourne Shell           296        38942         5155        10139        23648
 YAML                   648        23050          371         1690        20989
 Markdown               376        20337         4448            0        15889
 JavaScript              19        13806         1559         2913         9334
 Protobuf                53        16751         2935         8664         5152
 Python                  20         4059          757          708         2594
 Makefile                61         3467          444         1479         1544
 CSS                      4         1468            8            5         1455
 Autoconf                16          669           10           45          614
 Plain Text              10          281           16            0          265
 Java                     2          318           47           71          200
 XML                      3          141           18           24           99
 C                        2          104           18           26           60
 Ruby                     1           70           12            1           57
 PHP                      1           41            6            0           35
 INI                      2           24            4            0           20
 ASP.NET                  4           18            0            0           18
 SQL                      1            8            1            0            7
--------------------------------------------------------------------------------
 Total                 8058      2057170       173221       220331      1663618
--------------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment