Migrating SVN/R-Forge packages to git/github
Migration \Mi*gra"tion, n. [L. migratio: cf. F. migration]
1: The movement of persons or groups from one country or locality to another.
2: The passage of software developers from one platform, language or environment to another for the purpose of feeding, breeding or enhanced health of their offspring.
I have ~ 16 R packages I maintain or contribute to. I have a Ubuntu linux workstation and several Windows machines, but for R work and package development, I mostly use Windows, because most things work more easily there.
In the past, I've maintained the repositories for these on R-Forge, and used the eclipse / StatET IDE for development, testing, and submission of new versions to CRAN directly from R-Forge. This has generally worked fairly well, but I've come to want to switch to using git for version control and github for my package repositories. As well, RStudio offers an increasingly attractive IDE for R package development, and I've been using RStudio more and more.
Why do this? It violates the Lazy Developer's Golden Rule: "If it ain't broke, don't fix it" (phrase attributed to Bert Lance in a different setting) Well, R-Forge does have some advantages:
- In theory, it automatically builds and checks R packages on Linux and Windows platforms, and for the both the current and devel releases.
- It provides a Submit to CRAN link on the R packages page for each project, but that doesn't appear if the package failed R CMD check.
- Each package automatically gets a
pkg/directory as well as a
www/directory, in case you want to also maintain a parallel set of web pages related to a package. I haven't used this much, but it was useful for the
Lahmanpackage, whose R-Forge pages are at http://lahman.r-forge.r-project.org/.
However, I've found that package building and checking on R-Forge is often extremely slow (one day or more) and the checking for the devel releases has been disabled for some time. R-Forge also offers some tools for collaborative work (but only for project members), an email list, bug-tracking, but these have seemed (to me) quite hard to use.
git/github on the other hand (particularly with RStudio), offers the following significant advantages:
- git (once you understand how it works) is a far better tool for version management and collaborative work
- much easier collaboration with other users and developers: people can edit your code or documentation and easily create a pull request that you can view and act on, and inline discussions about pull requests can be initiated in case the maintainer is not happy with proposed pull request
- Hadley Wickham's devtools package makes it extremely simple to develop and test R packages within R itself and it is tuned to github.
- github provides related wiki and gh-pages sites, as well as providing gist's for creating coding examples that are not necessarily package related
- github repositories may be treated as public or private, depending on the purpose (R packages are generally public given that R and its extensions are open-source)
- feature requests and bug tracking are user friendly and also integrate directly with the git work-flow (e.g., using
git commit -a -m 'closes issue #24'will automatically close the respective issue #24, and send an email to the proposer)
- integration with open source testing machines such as travis-ci allow the execution of user defined shell and make files. These are useful for checking R packages with multiple R versions (including the latest development), and executing internal tests that are defined with the testthat or testit packages
- ... more features ???
How to migrate?
I posted a query to R-Help, and Yihui Xie replied:
In the past Github allows one to import from an existing SVN repository automatically via its web interface, but I just checked it and it seems to have gone. What is left is: https://help.github.com/articles/importing-from-subversion . Perhaps it is best for you to do the conversion under Linux and then work under Windows.
This document suggests using svn2git,
a ruby package that is a wrapper for
git svn. Unfortunately, I could
never get this to work for me with R-Forge projects. I got errors
no matter what options I tried. Instead, I ended up using
git svn directly, which was designed to provide
bidirectional operation between a Subversion repository (e.g., R-Forge) and git
(e.g., a local git-based repository)
This means that I did the migration on my Ubuntu linux machine, first installing
git-svn package, which is an add-on for
sudo aptitude install git-svn
Then, the plan for migrating one R package from SVN/R-Forge to git/github consists of the following steps:
- Create an empty github project for the package. github suggests to create a
README.mdfile for the package, so it can be cloned locally from github.
But don't do this now, because it will create problems when you first try to push your converted git repo to github: it will be considered non-fast-forward, since there is content on github not in your local repo.
git svn cloneto copy the existing SVN repository with its history to a local git repo on linux.
- Fixup the directory structure to accord with github conventions, as described in the section Repository directory structures.
- Setup git to track the remote github repository using
git add remote.
- Push the local repository to github using
git push -u origin master
At this point, I can setup eclipse/StatET or RStudio to work with the new github repository, and abandon further work on the R-Forge repository --- but only if no one working on the package project updates the R-Forge repo.
Initial migration from R-Forge to git
The general form to use with
git svn clone is
git svn clone svn+ssh://firstname.lastname@example.org/svnroot/packagename/pkg/
By default, this will import all revisions in the history of the project,
and may take a long time. If you don't want this, you can use the
git svn clone -r 100:HEAD svn+ssh://email@example.com/svnroot/packagename/pkg/
to include only the revisions from rev. 100 forward. But then you will not be able
git blame to find out when an earlier problem was introduced, and you should
probably also use
git svn rebase
to update the local repository to HEAD.
Here is an example of using
git svn clone for one package,
For testing purposes, I did this in
tmp/, first creating a folder,
euclid: /tmp % mkdir tableplot euclid: /tmp % cd tableplot
git svn clone, making sure to use the URL pointing to the
of the R-Forge project.
euclid: /tmp/tableplot % git svn clone svn+ssh://firstname.lastname@example.org/svnroot/tableplot/pkg/ Initialized empty Git repository in /tmp/tableplot/pkg/.git/ email@example.com's password: r1 = 3612d5bd4d900f0a8a23527f31fcfd3b885b61c9 (refs/remotes/git-svn) A R/utility.R A R/cellgram.R A R/tableplot.R A R/make.specs.R A R/make.specs0.R A DESCRIPTION A data/NEO.n.RData ... r13 = ba56a0a752e151ac4de56e9f5cfc0bb5fb3fe93f (refs/remotes/git-svn) A .Rbuildignore M DESCRIPTION M man/tableplot-package.Rd r14 = a57ed845d62270b2ff0f9bd619bab02bf68d4cd0 (refs/remotes/git-svn) Checked out HEAD: svn+ssh://firstname.lastname@example.org/svnroot/tableplot/pkg r14 euclid: /tmp/tableplot % ls -la total 24 drwxr-xr-x 4 friendly staff 4096 Oct 28 22:23 . drwxrwxrwt 17 root root 12288 Oct 28 22:17 .. drwxr-xr-x 8 friendly staff 4096 Oct 28 22:23 pkg
Note that the package is now in
/tmp/tableplot/pkg/. I'll fix that up as described
below, then move the package to my
~/R/projects tree for subsequent work.
Repository directory structures
In migrating from svn/R-Forge to git/gihub, it is important to consider the differences in directory structure in the remote repositories and the implications this has for how you refer to them using references to their locations using svn or git, and when setting them up in IDEs like eclipse/StatET or RStudio.
When a new project is created on R-Forge, the following (initially empty) directory structure is created at the project root:
pkg/ www/ README
pkg/ is to be filled with the normal content of an R package, e.g.,
pkg/ data/ demo/ inst/ man/ R/ DESCRIPTION NAMESPACE
In normal operation, when I checkout a new R package project in eclipse / StatET,
File -> Import... -> SVN -> Checkout projects from SVN -> Create a new repository location and then give the URL in one of the following forms
where the first form is for anonymous access and the second for developer access
using SSH with password or other authentication. One key thing here is that
eclipse looks at the folder structure and offers the choice to checkout either the
root structure (
www/). I generally checkout only the
R CMD tools work at this level.
On github, on the other hand, R packages do not have that outer structure, and simply look like the contents of an ordinary package:
data/ demo/ inst/ man/ R/ DESCRIPTION NAMESPACE
This means that to migrate from SVN/R-Forge to git/github, it is necessary to
convert only the
pkg/ directory, but then adjust that using git to move
the contents up one level (removing the
pkg/ folder). That is,
after migrating from svn to git as described below, execute
% git mv pkg/* pkg/.[a-zA-Z]* . % rmdir pkg/
mv pkg/* . alone won't move 'dot' files like
mv pkg/.* . will complain about wanting to
mv pkg/.. to the current
Then, create an empty
README.md file, and update the local repo with git:
touch README.md git add --all git commit -m `fixup initial migration from svn'
Connect to github and push
Once the local repository is in shape, you need to tell git to track
and sync with the github repository.
git remote add sets up the
remote github repository to be tracked. The form of the URL can be
in HTTPS form, i.e.,
or SSH form,
package directory, the commands take the following form for me:
git remote add origin email@example.com/friendly/package.git git push -u origin master
Now, point your browser to the github repo, e.g.,
https://github.com/friendly/tableplot, to see what is there. At this
point, you will probably want to create/edit the
to exclude any files in the local repo you don't want under git control
.Rhistory, etc.) and create/edit the
README.md file for
the package page on github.
Putting it all together
It is relatively simple to write a shell or
perl script or R function
that uses system commands to perform these operations. For example, the
Appendix lists an R function,
rforge2git() that carries out these steps.
It allows you to specify the source
svn.repo and/or the remote
and does the appropriate things.
Here is an example of its use, just for the
git svn clone related steps:
> source("~/R/functions/rforge2git.R") > setwd('/tmp') > svn.repo='svn://svn.r-forge.r-project.org/svnroot/tableplot' > rforge2git(svn.repo) Initialized empty Git repository in /tmp/tableplot/.git/ A www/index.php A README ... r13 = 45279467a6f9b02aaaf48294cee53300eb4989d4 (refs/remotes/git-svn) A pkg/.Rbuildignore M pkg/DESCRIPTION M pkg/man/tableplot-package.Rd r14 = d5c0eaac6a29e66c37a54acd043cd66baa226068 (refs/remotes/git-svn) Checked out HEAD: svn://svn.r-forge.r-project.org/svnroot/tableplot r14 [master 12724da] fixup initial migration from svn 60 files changed, 806 deletions(-)
Contents of the
euclid: /tmp % ll -a tableplot total 56 drwxr-xr-x 8 friendly staff 4096 Nov 4 10:19 . drwxrwxrwt 17 root root 12288 Nov 4 10:19 .. drwxr-xr-x 2 friendly staff 4096 Nov 4 10:19 data drwxr-xr-x 2 friendly staff 4096 Nov 4 10:19 demo -rw-r--r-- 1 friendly staff 845 Nov 4 10:19 DESCRIPTION drwxr-xr-x 9 friendly staff 4096 Nov 4 10:19 .git drwxr-xr-x 2 friendly staff 4096 Nov 4 10:19 inst drwxr-xr-x 2 friendly staff 4096 Nov 4 10:19 man -rw-r--r-- 1 friendly staff 135 Nov 4 10:19 NAMESPACE -rw-r--r-- 1 friendly staff 123 Nov 4 10:19 NEWS drwxr-xr-x 2 friendly staff 4096 Nov 4 10:19 R -rw-r--r-- 1 friendly staff 10 Nov 4 10:19 .Rbuildignore -rw-r--r-- 1 friendly staff 0 Nov 4 10:19 README.md
Updating a git/github repo from SVN/R-Forge
In the transition between using SVN/R-Forge and git/github, it may happen that you or other collaborators continue to make changes in the R-Forge SVN repository. For example, someone may still be using eclipse/StatET or RStudio, with the remote repository set to the R-Forge path.
Here's where things can get a little tricky, and dealing with this situation is
In general, it is possible to create a local branch which tracks an svn repo
(from R-Forge) such that you can merge changes from your git repo in and then commit them to svn (using
git svn dcommit), or vice-versa: merge changes in the svn repo back to git. But doing so probably requires some black-belt git skills.
In the latter case (merge svn changes to git), you can try to fix this up as shown below:
- Fetch the latest commits from svn
- Rebase the local repo to include the commits from svn
- Push the changes back to github
git svn fetch git svn rebase git push
This won't work nicely if you have modified the directory structure of the
git svn clone for the
pkg/ folder as described above. There are
some options for the
git svn fetch step that will take account of this,
but they are not described here. Additional useful information for using git with
R-forge can be found at Cameron Bracken's blog post:
Using Git with R-Forge ....