If you have a code-base under source control that is internal to your company, for instance, it's hosted on premises, you may have checked in some secrets in your configuration files. We all know we should have had those encrypted or kept outside of the checked-in source in some way but perhaps it was just better to accrue some technical debt and worry about it later. Okay, now it's later.
The following procedure will allow the rewriting of the repository’s history with all the secrets overwritten.
Use scoop (we're on Windows) to install git-filter-repo.
Create a fresh bare clone as a source:
git clone --bare my-repo.git
Create a new working copy as a target:
git clone my-repo my-repo-new
Change directory into the working copy.
Find relevant commits where you've replaced those secrets:
git log --oneline --since=2020-03-01 --author=ThatGuy | Select-String my-tracking-id
Create patch files for the relevant commits:
git log -1 -p -m <commit-hash> > ..\<commit-hash>.patch
Open the patch files and cut-and-paste all the secrets into an expressions.txt
file, making regexes where appropriate, i.e. \bPassword[0-9]*\b
Now we are ready to rewrite history:
git filter-repo --replace-text .\expressions.txt --source <dir_clone_bare> --target <dir_clone_wc>
Check the commits again to ensure all the secrets have been replaced (in this case with ***REMOVED***).
It's likely that you are working on this before asking your team to take a break
while we swap over the repos. That means you will want to update the bare repo
for the final version. That's why we selected a bare repo and used a source and
target repo rather than doing it in place on a freshly cloned working copy.
Watch how you do this update as git fetch --all
isn't what you want. Bare repos don't
have the refs for all of the heads for fetch. You need to use git fetch *:*
to make
sure you update every branch.
The working copy with the rewritten history can now be pushed up to a new origin and shared.
So, then the inevitable happens. There's a missing file. If you are in a case
insensitive system like Windows you may have had two versions of a file checked
in that only varied in case. git-filter-repo
will see two files that are the
same and select one, not necessarily the one you wanted. This is easy enough to
fix with a git revert
of the offending commit followed by a git commit
with
the corrected file(s).
The following are some useful tools for finding out what has gone wrong.
-
You can check how many files are in the before and the after repos:
git ls-files | wc -l
-
Pickaxe will find changes in a given string throughout your repo:
git log -Sthingtolookfor
-
You can alter the width for the filenames in a log message to better see the whole path:
git log --stat=200
-
Find out which branches a commit appears in:
git branch --all --contains <commit-hash>
Now we found, and we don't know why, that some recent commits hadn't made it across. You can compare the history by creating a file with the recent logs from the before and the after repos and then diffing those with, for example, meld.
You then have a couple of options for pulling commits across. You can either create a patch file for the relevant commit(s) and apply that or you can fetch the whole of the old repo's commits and cherry-pick from there.
Using a patch file:
git --git-dir=../<old_repo>/.git format-patch -k -1 --stdout <commit-hash> | git am -3 -k
Using a remote, fetching then cherry-picking:
git remote add my-repo ..\dir-my-repo\.git
git fetch --all my-repo
git cherry-pick <first-missing-sha>..<last-missing-sha>
The patch file method is ideal if you don't want to have all those extra commits in your .git folder. You can just as easily create a patch file for a range of commits, too. If you do have the remote you can always remove it later and purge your local repo.
The cherry-pick method is a little better if you have a slightly more complicated
set of commits. For instance, in our project there were a number of merge commits that
were empty as the develop branch was already merged into the feature branch before
the feature branch was merged into the develop branch. When you are going through
the cherry-pick process it will regularly barf and ask you if you want to skip or
add the empty commit or reset (git reset
is probably what you want).
You will also have to look at a some merges when
git can't automatically handle the merge and perhaps check against the original
repo history to know what to do. Note that when doing a cherry-pick the main
branch you are merging in is the remote.