Skip to content

Instantly share code, notes, and snippets.

@joshjohanning
Forked from robandpdx/rewriting-history.md
Created March 27, 2024 15:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joshjohanning/e3a08cb2ea877576f01f9188b20985c8 to your computer and use it in GitHub Desktop.
Save joshjohanning/e3a08cb2ea877576f01f9188b20985c8 to your computer and use it in GitHub Desktop.

Rewriting repository history

Sometimes history rewrites are required in order to migrate repositories into github.com. Several factors can dictate the need to rewrite history of a repository:

  • objects larger than 100Mb
  • commits larger than the 2GB push limit

Although rewriting history might not be required for your repository to migrate to github.com, you may consider rewriting history for several reasons:

  • migrate large objects to LFS
  • cleanup previous mistakes or bad practices that caused repo bloat
  • remove secrets from repo history

It needs to be understood that rewriting repository history will cause the PR metadata to become outdated! PRs reference commit SHAs. When repository history is rewritten, commit SHAs will change from the point of the rewrite and beyond, therefore PRs will reference SHAs that no longer exist in the repository. This can be fixed ONLY during migration!

Fixing PR metadata during migration

Tools that rewrite repository history, like git-lfs and git-filter-repo, can output a file mapping old SHAs to new SHAs during their rewrite operations. This file can be used to find and replace old SHAs with new SHAs in the repository metadata. This is done by opening the migration archive, running a find and replace on each *.json file in the archive for each old SHA/new SHA pair. After the find and replace operations complete, repackage the migration archive as it was before and continue with the import into github.com.

Rewriting history during migration

While history rewrites can occur before migration, via local rewrite and force push, perhaps the best option is to do the history rewrites during the migration process. The migration archive contains the git repo in its entirety. So the process would be

  • open the migration archive
  • rewrite the history
  • fix PR metadata as described above
  • repackage the migration archive
  • continue with the import into github.com

If using GEI for your migration, first download the migration archive from your blob storage, follow the bullet points above to fix the metadata, then continue the import with GEI by re-uploading the manipulated archive to your blob storage and using the undocumented --git-archive-url and --metadata-archive-url arguments to gei migrate-repo command.

Recommendation and guidance regarding rewriting repository history

If you ever plan to rewrite repository history, migration is the time to do it because you have the opportunity to fix the PR metadata during the migration process. It is recommended that the entire migration process, including export, history rewrites, fixing PR metadata, and import be scripted and practiced until the desired results are confirmed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment