This playbook outlines the steps to use the BFG (Byte-Friendly Granularity) tool to rewrite Git repository history and remove sensitive data, such as passwords, credentials, or other private information that may have been accidentally committed.
- URL of the Git repository to be cleaned
- Name of the main branch (e.g.,
master
,main
, etc.) - (Optional) List of sensitive strings or patterns to be removed (e.g., API keys, passwords)
-
Ask for Sensitive Data
- Ask the user if they have a list of sensitive strings or patterns that need to be removed from the repository.
- If the user does not provide a list, proceed to the next step with a general cleanup.
-
Clone the Repository
- Clone the full repository to the local machine
git clone <repo_url>
without mirror flag.
- Clone the full repository to the local machine
-
Install the BFG Tool
- Install Java and jq via
sudo apt -y install default-jre-headless jq
- Get latest version and set it in the shell variable
latestVersion=$(curl -s "https://search.maven.org/solrsearch/select?q=a:bfg" | jq -r '.response.docs[0].latestVersion'); export latestVersion
- Fetch latest bfg.jar
sudo wget https://repo1.maven.org/maven2/com/madgag/bfg/$latestVersion/bfg-$latestVersion.jar
- Install Java and jq via
-
Identify Sensitive Data and create replace.txt
- If the user has provided a list of sensitive strings or patterns in the initial prompt use
git log --all --full-history --pretty=format:"%H" | xargs git show --pretty=format:"" --stdin | grep -E '<user_provided_pattern>'
- If the user has not provided a list of sensitive strings or patterns in the initial prompt, use:
git log --all --full-history --pretty=format:"%H" | xargs git show --pretty=format:"" --stdin | grep -Ei '(AWS|amazonaws|access|secret|password|passwd|api|token|credential|auth|oauth|bearer|encryption|client|private|cert|ssl|ssh|jwt|key|username|user|uname|email|mail|database|db|connection|conn|url|endpoint|config|cfg)'
- Note down only the sensitive strings that need to be removed from the repository, not the variable names but the actual sensitive data.
- Create Replace Text file with the sensitive strings
echo -e '<sensitive_credential_string>' > replace.txt
- If the user has provided a list of sensitive strings or patterns in the initial prompt use
-
Remove Sensitive Patterns with BFG command
- Replace sensitive strings in the repository:
java -jar bfg.jar --replace-text replace.txt
- Replace sensitive strings in the repository:
-
Verify the Change after running BFG command
- For each pattern key from replace.txt verify that the sensitive data has been removed from the new branch with
git log --all --full-history --pretty=format:"" --stdin | grep -E '<user_provided_pattern> '
. - Ensure that the sensitive data has been removed, if not then re-run the BFG tool with the correct pattern.
- For each pattern key from replace.txt verify that the sensitive data has been removed from the new branch with
-
Update the Repository
- Move any newly created files for the cleanup process outside the repository and make sure it's never committed.
- After running the BFG tool, update the repository to reflect the changes
git reflog expire --expire=now --all && git gc --prune=now --aggressive
- Create a new branch based on the cleaned-up commit
git checkout --orphan <new_cleaned_branch_name>
- Create a new commit after cleaning up the repository with the message
"Cleaned sensitive data"
-
Share the Clean Repository
- Create a ZIP archive of the cleaned repository and share it via the messaging interface.
- The sensitive data has been removed from the repository history.
- The cleaned repository is available for further use or sharing.
- The original repository remains unchanged, and the cleaned repository can be used as a replacement.
- Use only BFG tool for cleaning sensitive data from the repository.
- Ignore the variables that still contain sensitive data even after running bfg as they may be used in the codebase at the latest commit.
- Clone the repository only once in the beginning.
- Do general cleanup if not provided with a list of sensitive strings or patterns in the initial prompt.
- Always create a fresh clone of the repository when using the BFG tool to avoid potential conflicts or issues.
- Never push anything on GitHub.
- Always keep a backup of the original repository before running the BFG tool.