Skip to content

Instantly share code, notes, and snippets.

@SpencerKaiser
Last active June 10, 2020 21:50
Show Gist options
  • Save SpencerKaiser/873db5b547adc513e8dcadb36bdb5dec to your computer and use it in GitHub Desktop.
Save SpencerKaiser/873db5b547adc513e8dcadb36bdb5dec to your computer and use it in GitHub Desktop.
GitHub Assessment - Spencer Kaiser

Prompt One

Company: Acme computers
Version control platform(s): Many GitHub Enterprise instances installed throughout the company by different teams. Acme Computers is trying to standardize on GitHub Enterprise and consolidate their GitHub usage onto a single instance. The company has many instances of other Git hosting solutions installed as well. Some are fully supported applications. Other instances are on machines under people's desks.
Customer requests:

  • Shrink large repository: Acme wants GitHub to help them shrink the large repository to a more manageable size that is performant for common Git operations. The large repo is a project that is high visibility with an aggressive roadmap. They request that we help them within the month. It's a large, monolithic repository.
  • Consolidate instances: Acme wants you to tell them the best way to move all the other teams, using GitHub Enterprise or other Git solutions, onto their consolidated GitHub Enterprise instance. They have asked you to give them five or six bullet points about how you would approach that initiative, both technically and culturally.
  • Migrate an SVN repo: The customer has one SVN repository that hasn't migrated over to a Git solution. They would like help moving this one large repository over. The team has a trunk based development pattern with this repository and is unfamiliar with Git.

Acme Computers + GitHub

Shrink Large Repository

There are quite a few things that can lead to an increase repository size, so in order to make the most impactful recommendation, it would be very helpful to be able to explore the repo and take a look. In lieu of being able to explore the repo personally, the sections below contain a few ideas that might help get the repo's size under control.

Git Large File Storage (LFS)

If you require large files to be tracked in your repo, consider using git-lfs:

Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.

Git LFS provides a wide range of benefits, so it's definitely worth considering if the repo size is partly due to large files being tracked. If you'd like to learn more, see the instructions for installing Git LFS and the instructions for configuring Git LFS on GitHub Enterprise Server. Utilizing this solution would also likely require some modifications to your repo's git history.

git gc

Git features a command, git gc, which will run "housekeeping" tasks against your repo that might slim down the overall size. A nice writeup about the potential value and potential pitfalls of git gc can be found on Stack Overflow.

Delete old and unused branches

Make sure your repo does not have a large number of unused branches. By navigating to the Branches tab in your repo, you're able to see a high-level view of your current branches including the last activity for each and whether they have any related pull requests. You can use this view to quickly remove branches from your repo, but teams will still need to remove branches not tracked by the repo itself; this post from Stack Overflow provides a potential solution but please read the instructions carefully before using the suggested commands.

Git Submodules (Not recommended)

Although I'm not recommending this approach, it's worth noting that some teams have shown success by migrating a monolithic application into modules and breaking those modules out into their own repos. This process is usually time intensive and given the short timeframe needed, this is probably not the right approach to take.

Long-Term Solutions

There are several steps your team can take in order to reduce the future likelihood of other repositories becoming too large to manage.

Leverage GitHub Packages

Although not currently available on GitHub Enterprise Server, GitHub Packages provides development teams with the option of storing dependencies outside of your repo, which can reduce the repo size and potentially increase build time in a CI environment. If a migration to GitHub Enterprise Cloud is a potential option for the future, it might be worth considering how your teams could leverage GitHub Packages.

Reducing Git Push Limits

Within your GHE instance, you can modify the settings for the maximum file size to prevent files exceeding certain limits from being pushed. A modification to this setting might help keep future repos to a more manageable size.

Squash Commits

Another option to help streamline your commit history while also reducing size is to utilize the Squash and merge commit strategy within your repos. Not only will this help reduce the number of commits in your repo's history, but it will also prevent temporarily added files from being permanently included in your repo.

Consolidate instances

Depending on the source of your data, the migration to a consolidated GHE instance might have a different approach:

  • Migration from other GHE Server instances - GitHub provides some excellent documentation on how to migrate content from one GitHub Enterprise Server instance to another: Migrating user, organization, and repository data. This process allows for the migration of repos and other related data such as Teams:

    In a migration, everything revolves around a repository. Most data associated with a repository can be migrated. For example, a repository within an organization will migrate the repository and the organization, as well as any users, teams, issues, and pull requests associated with the repository.

  • Migration from non-GHE Sources - Depending on the hosting solution for the non-GHE repos there may be a way to automate the process of migrating repos via a script that utilizes the GHE API. The API allows for the creation of Org Repos, which could be utilized by iterating over all existing repos, setting a remote for those repos, and pushing the repo to its new destination.

From a cultural perspective, I have a few recommendations:

  • GitHub Workshop - In order to best prepare teams for the transition, we recommend hosting workshop led by members of our team which will dive into common GitHub functionality and provide your teams with the info they need to hit the ground running when the migration is complete. This will also be a great opportunity for us to share some of the features our customers love most, some of which they may not have previously had access to if they were running an older version of GHE Server.
  • Communicate the Purpose - Communicate the primary reasons behind the need to consolidate and how it will benefit both the company and your development teams. As you transition to one supported instance, that instance is likely to see an increase in stability, your development teams will see better consistency between projects (previous instances may have been running different versions of GHE), and this will also allow the team supporting your GHE instance to focus more time on tasks that benefit the community as a whole. This is also an excellent opportunity to provide reassurance to your dev teams that their experience isn't likely to change substantially and, if anything, their access to new features and functionality is likely to increase.
  • Set Expectations - Provide your development community with clear expectations for when the migration is going to take place, what impacts they can expect during that time, and who they can reach out to if they need support.

Migrate an SVN repo

From a technical perspective, this migration should be straight forward. GitHub Importer allows for seamless migration from Subversion to GitHub.

From a cultural perspective, bringing your teams up to speed with GitHub in a safe environment will hopefully make the transition as smooth as possible. Prior to making the transition, introducing your teams to the GitHub Learning Lab would be a great way to help prepare them for what to expect. The Learning Lab includes courses which focus on how to use git as well as courses on how to make the most of GitHub itself.

Prompt Two

Company: Dunder Mifflin Technologies
Version control platform(s): They currently use Gerrit, out-of-the-box Git, Subversion, and Team Foundation Server.
Customer requests:

  • Help us modernize our practices: Dunder Mifflin is worried they are falling behind their industry. They have lots of legacy software and development patterns that were created 20 years ago. They have found it incredibly difficult to change any aspect of their SDLC because of their infrastructure, processes, and long-tenured team members who are resistant to change.
  • Help us release more often: Dunder Mifflin releases software four times a year. They are shipping largely web-based applications. They want to increase release more frequently, but they are unsure of the best first steps. What areas would you explore with the customer to help them move this goal forward?
  • Commit/merge/deploy permissions: Dunder Mifflin has expressed concern about moving away from Gerrit. They have asked how they can control repository access, merging, and deployment permissions within GitHub, and what aspects of their desired security setup can be enforced programmatically.

Dunder Mifflin + GitHub

Help us modernize our practices

This is a subject many companies struggle with and it's something that requires both technical and cultural change internally. Fortunately, people tend to gravitate towards things that make their day-to-day as simple and enjoyable as possible. At GitHub we stand behind our product and how it impacts the lives of both technical and non-technical users. In order to help show your team why GitHub will help improve how they work, we recommend hosting a workshop led by members of our team in order to demonstrate the features our platform has to offer.

In addition to using GitHub, providing your employees with best-in-class tools can create significant drive and motivation. An excellent way to do this is to host an internal hackathon; it provides employees with a chance to test-drive new tech without committing to making significant changes. An internal hackathon could also encourage adoption of GitHub by requiring the submissions be made through GitHub and by basing a small portion of the score on how the team uses GitHub (documentation, CI/CD, etc.).

Lastly, encourage your hiring managers to seek talent that fits the model you strive for, not the model you currently have. As you begin to bring new talent into the company, they will inevitably help disrupt the stagnant SDLC processes in place and push for more modern tech.

Help us release more often

One of the fastest ways to increase your delivery speed is to increase your reliance on CI/CD instead of manual testing and QA. In my opinion, GitHub has created the new gold standard for CI/CD in the form of GitHub Actions. Not only is there a marketplace where your teams can find existing, open source solutions that meet your needs, but it also allows your teams to create modular CI/CD implementations to solve your unique concerns. These actions can be used in conjunction with Branch Protection to identify issues earlier in the SDLC and help prevent uncaught bugs from making it to production. By relying more heavily on CI/CD, your teams will begin to increase their confidence in their toolchain and in their deployments. As this continues, they will become increasingly more comfortable making more frequent deployments.

In addition to CI/CD, encouraging Code Reviews can have a tremendous impact on the quality of the code you deliver and help you deliver more consistently. The Pull Request process provides a means for making quick suggestions and feedback, improving the quality of code before it's merged into your primary branch. Pull Requests also help with team bonding and knowledge transfer.

Commit/merge/deploy permissions

As you begin to rely more and more on CI/CD, questions around access to code and deployments begin to fall into two primary groups: Repo Access and Automated Deployments.

Repo Access

One of the best ways of scalably controlling who has access to repos is through the use of Teams. Through the use of Teams, you can easily and quickly provide large groups of users with a specified level of access to multiple repos. In addition, these groups of users can be automated via the GitHub API, so you can write software to automate the process of modifying Team membership as employees join and leave the company.

Automated Deployments

As mentioned above, Branch Protection and Pull Requests are two great methods of ensuring the quality of code that makes it to your primary branch. Ideally, deployments aren't something manually triggered by an individual, but instead reliant on several levels of code validation. Before code is merged into a protected branch (and subsequently deployed), you can require a certain number of pull request reviews and even require reviews from specific people known as code owners. In addition, you can mandate that certain requirements are met via Status Checks. In other words, you can make quality gates like test coverage or security scans a requirement before code can be merged, even if the code has already been reviewed and approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment