Svn repos often have binary files and secrets committed. This data should not get to target Git repository.
To migrate Svn repo to git I have the following workflow:
- Use SubGit to do initial translation of Subversion revisions to git commits
- Inspect resulting git repo for uwanted/sensitive data
- Adjust SubGit configuration file to avoid unwanted data at import stage
- Use BFG Repo-Cleaner to remove sensitive pieces of data from text files history
-
Ignore unnecessary files by using
excludePath
inrepo.git/subgit/config
-
Use the following command to find the heaviest files in the history:
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
- Use the following command to find binary files in the history:
git log --all --numstat \
| grep '^-' \
| cut -f3 \
| gsed -r 's|(.*)\{(.*) => (.*)\}(.*)|\1\2\4\n\1\3\4|g' \
| sort -u
- Edit
repo.git/subgit/config
accordingly, i.e.:
excludePath = *.png
excludePath = *.jpeg
excludePath = *.jar
excludePath = *.zip
excludePath = *.gzip
excludePath= public/api/examples/**
excludePath = log/**/*
excludePath = tmp/**/*
excludePath = vendor/assets/bower_components/**/*
- Do not create empty git commits, change
repo.git/subgit/config
:
[translate]
createEmptyGitCommits = false
- Optionally do not create
.gitattributes
(see SO answer), changerepo.git/subgit/config
:
[translate]
eols = false
otherProperties = false
- Create and maintain
authors.txt
, i.e.
first-last@company.example = First Last <first-last@company.example>
- Use the following commands to list contributors in source SVN repo:
svn log --quiet | grep "^r" | awk '{print $3}' | sort | uniq
# or
git shortlog --summary --numbered --email
-
Remove secrets:
-
Use tools like gittyleaks to detect sensitive data in repo history
-
Remove pieces of secrets using BFG:
bfg --replace-text ../sensitive.txt
. Example sensitive.txt composed of results from gittyleaks:
pa$$w0rd
an0ther_passw0rd
password: qwerty==>password: ***REMOVED***
By default BFG replaces matched substrings with ***REMOVED***
When authors.txt
, SubGit config
and sensitive.txt
are ready, perform the final import:
subgit configure https://svn.repo/repo \
&& cp authors.txt repo.git/subgit/authors.txt \
&& cp config repo.git/subgit/config \
&& subgit import repo.git \
&& git clone repo.git repo-git && cd repo-git \
&& bfg --replace-text ../secrets.txt . \
&& git reflog expire --expire=now --all && git gc --prune=now --aggressive
After this you can additionally check the size of imported git repo: du -sh .git
and decide if you want to exclude more files or reduce history.
Today I Learned that: