Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Migrate From SVN To GIT

Migrating From SVN to Git

This gist details the following:

  1. Converting a Subversion (SVN) repository into a Git repository
  2. Purging the resultant Git repository of large files

Migrating from SVN to Git is roughly split into three steps:

  1. Retrieve a list of SVN commit usernames
  2. Match SVN usernames to email addresses
  3. Migrate to Git using git-svn clone command

Step 1: Retrieve A List Of SVN Commit Usernames

A SVN commit only lists a user's username. Git on the other hand lists much more details, but at the very least, a git commit author needs both a username and an email address associated to that username. Since the email address is not available in SVN, it needs to be manually matched.

A list of usernames as recorded by SVN therefore needs to be created for the match. The following command will result in a file called authors.txt which will have the SVN usernames as its contents:

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

Step 2: Match SVN usernames to email addresses

The contents of authors.txt is in the following format:

jwilkins = jwilkins <jwilkins>

It needs to be converted into this:

jwilkins = John Albin Wilkins <johnalbin@example.com>

Step 3: Migrate To Git Using git-svn clone Command

Create a folder where the git clone is to be stored, and then do the following:

git svn clone --stdlayout --authors-file=path/to/authors.txt <svn_repo>

This last step may take some time, but it should result in a Git repo.

##Find And Purge Large Files From Git History

Git (at least GitHub) seems to be stricter than SVN regarding large files. In order to migrate a SVN repository to Git, one may need to purge these files from the Git history.

Step 1: Determine The Files That Are Large

Go to newly created Git repo and do the following:

git rev-list --objects --all | sort -k 2 > allfileshas.txt;git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt

This will result in two files:

  1. allfileshas.txt - a list of all sha's in the git repo
  2. bigobjects.txt - a list of sha's representing objects that are large

To transform these two files into a list of file names and sorted by size in descending order:

for SHA in `cut -f 1 -d\  < bigobjects.txt`; do echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print$1,$3,$7}' >> bigtosmall.txt; done

NOTE: The above script may take a long time (and may never stop), so after 2 minutes (max), just ctr-c stop it.

The resulting file, bigtosmall.txt will contain a list of file names, sorted from largest to smallest.

Step 2: Purge The Files From The Git History

Select files (or even a directory) from bigtosmall.txt that you want purged. Then run the following for each file, substituing MY-BIG-DIRECTORY-OR-FILE with the directory or file that is to be purged:

git filter-branch -f --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all
@madhu-onchip

This comment has been minimized.

Show comment
Hide comment
@madhu-onchip

madhu-onchip Aug 30, 2017

could you tell in detail

could you tell in detail

@saurabhperiwal

This comment has been minimized.

Show comment
Hide comment
@saurabhperiwal

saurabhperiwal Sep 1, 2017

Anyone successful in migrating from SVN to GIT using above process?

Anyone successful in migrating from SVN to GIT using above process?

@pnixon

This comment has been minimized.

Show comment
Hide comment
@pnixon

pnixon Sep 8, 2017

worked for me. one thing it doesn't mention is that you need to install git-svn: sudo apt install git-svn

pnixon commented Sep 8, 2017

worked for me. one thing it doesn't mention is that you need to install git-svn: sudo apt install git-svn

@edcasillas

This comment has been minimized.

Show comment
Hide comment
@edcasillas

edcasillas Feb 22, 2018

Tried to follow the guide but got stock in the first step. While trying to create the list of commiters, I get this:

awk : The term 'awk' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:14
+ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2);  ...
+              ~~~
    + CategoryInfo          : ObjectNotFound: (awk:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Tried to follow the guide but got stock in the first step. While trying to create the list of commiters, I get this:

awk : The term 'awk' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:14
+ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2);  ...
+              ~~~
    + CategoryInfo          : ObjectNotFound: (awk:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
@yagnendra

This comment has been minimized.

Show comment
Hide comment
@yagnendra

yagnendra Feb 25, 2018

Anyone success migrate svn to git by using this above process. ?? Please let me know anything.

Anyone success migrate svn to git by using this above process. ?? Please let me know anything.

@tarrynn

This comment has been minimized.

Show comment
Hide comment
@tarrynn

tarrynn Mar 8, 2018

worked by doing the first 3 steps. nice one!

tarrynn commented Mar 8, 2018

worked by doing the first 3 steps. nice one!

@Mexicoder

This comment has been minimized.

Show comment
Hide comment
@Mexicoder

Mexicoder Jul 10, 2018

For anyone with issues with cmd not recognizing "awk" go here: http://gnuwin32.sourceforge.net/packages/gawk.htm.
download the setup you want and install.
Now you need to Update your PATH variable. the dir you need should be "C:\Program Files (x86)\GnuWin32\bin"
Here is the stack post i followed to do it: https://stackoverflow.com/a/21930462/5919289

Mexicoder commented Jul 10, 2018

For anyone with issues with cmd not recognizing "awk" go here: http://gnuwin32.sourceforge.net/packages/gawk.htm.
download the setup you want and install.
Now you need to Update your PATH variable. the dir you need should be "C:\Program Files (x86)\GnuWin32\bin"
Here is the stack post i followed to do it: https://stackoverflow.com/a/21930462/5919289

@chanhlt190290

This comment has been minimized.

Show comment
Hide comment
@chanhlt190290

chanhlt190290 Jul 18, 2018

Work for me. Thanks a lot!

Work for me. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment