Skip to content

Instantly share code, notes, and snippets.

@gavvvr
Last active September 22, 2021 14:59
Show Gist options
  • Save gavvvr/ce5be29d2374dd64c14a116c52d5778c to your computer and use it in GitHub Desktop.
Save gavvvr/ce5be29d2374dd64c14a116c52d5778c to your computer and use it in GitHub Desktop.
Subversion to git migration check list

The workflow

Svn repos often have binary files and secrets committed. This data should not get to target Git repository.

To migrate Svn repo to git I have the following workflow:

  • Use SubGit to do initial translation of Subversion revisions to git commits
  • Inspect resulting git repo for uwanted/sensitive data
  • Adjust SubGit configuration file to avoid unwanted data at import stage
  • Use BFG Repo-Cleaner to remove sensitive pieces of data from text files history

The Checklist

  • Ignore unnecessary files by using excludePath in repo.git/subgit/config

  • Use the following command to find the heaviest files in the history:

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
git log --all --numstat \
    | grep '^-' \
    | cut -f3 \
    | gsed -r 's|(.*)\{(.*) => (.*)\}(.*)|\1\2\4\n\1\3\4|g' \
    | sort -u
  • Edit repo.git/subgit/config accordingly, i.e.:
excludePath = *.png
excludePath = *.jpeg
excludePath = *.jar
excludePath = *.zip
excludePath = *.gzip
excludePath= public/api/examples/**
excludePath = log/**/*
excludePath = tmp/**/*
excludePath = vendor/assets/bower_components/**/*
  • Do not create empty git commits, change repo.git/subgit/config:
[translate]
        createEmptyGitCommits = false
  • Optionally do not create .gitattributes (see SO answer), change repo.git/subgit/config:
[translate]
     eols = false
     otherProperties = false
  • Create and maintain authors.txt, i.e.
first-last@company.example = First Last <first-last@company.example>
  • Use the following commands to list contributors in source SVN repo:
svn log --quiet | grep "^r" | awk '{print $3}' | sort | uniq
# or
git shortlog --summary --numbered --email
  • Remove secrets:

  • Use tools like gittyleaks to detect sensitive data in repo history

  • Remove pieces of secrets using BFG: bfg --replace-text ../sensitive.txt. Example sensitive.txt composed of results from gittyleaks:

pa$$w0rd
an0ther_passw0rd
password: qwerty==>password: ***REMOVED***

By default BFG replaces matched substrings with ***REMOVED***


When authors.txt, SubGit config and sensitive.txt are ready, perform the final import:

subgit configure https://svn.repo/repo \
&& cp authors.txt repo.git/subgit/authors.txt \
&& cp config repo.git/subgit/config \
&& subgit import repo.git \
&& git clone repo.git repo-git && cd repo-git \
&& bfg --replace-text ../secrets.txt . \
&& git reflog expire --expire=now --all && git gc --prune=now --aggressive

After this you can additionally check the size of imported git repo: du -sh .git and decide if you want to exclude more files or reduce history.

@gavvvr
Copy link
Author

gavvvr commented Feb 27, 2021

Today I Learned that:

  • BFG is a kind of deprecated (since there is git-filter-repo now)
  • also git-sizer can be used instead of shell scripts for finding huge files in history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment