Skip to content

Instantly share code, notes, and snippets.

@ramiro
Created April 30, 2012 01:09
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save ramiro/2554646 to your computer and use it in GitHub Desktop.
Importing Django branches from SVN to Git{,Hub}?

Importing Django branches (and merges) from SVN to our GitHub repo

Versions used:

$ svn --version
svn, version 1.6.17 (r1128011)
compiled Nov 20 2011, 03:42:58
...
$ git --version
git version 1.7.10

tl;dr version

$ wget https://raw.github.com/brosner/django-git-authors/master/authors.txt
$ wget https://www.djangoproject.com/m/data/django-svn.svndump.bz2
$ bunzip2 --stdout django-svn.svndump.bz2 > django-svn.svndump
$ svnadmin create svn-repo
$ svnadmin load svn-repo < django-svn.svndump

(wait a few hours for the entire SVN history to be replayed)

Important: In the next command, note the trailing django/ in the SVN repository path.

$ git svn init \
      --rewrite-root=http://code.djangoproject.com/svn \
      --trunk=trunk \
      --branches=branches/releases \
      --branches=branches/features \
      --branches=branches/soc2009 \
      --branches=branches/soc2010 \
      --branches=branches/attic \
      --branches={0.90-bugfixes,0.91-bugfixes,0.95-bugfixes,0.96-bugfixes} \
      file://`pwd`/svn-repo/django/ \
      django-dry-run

$ cd django-dry-run
$ git svn fetch --quiet --authors-file=../authors.txt

(wait a few hours for the entire SVN repository history to be imported)

Track the remote branches, and make use of the chance to rename them:

$ git checkout --track -b stable/0.90.X remotes/0.90-bugfixes
$ git checkout --track -b stable/0.91.X remotes/0.91-bugfixes
$ git checkout --track -b stable/0.95.X remotes/0.95-bugfixes
$ git checkout --track -b stable/0.96.X remotes/0.96-bugfixes
$ git checkout --track -b stable/1.0.X remotes/1.0.X
$ git checkout --track -b stable/1.1.X remotes/1.1.X
$ git checkout --track -b stable/1.2.X remotes/1.2.X
$ git checkout --track -b stable/1.3.X remotes/1.3.X
$ git checkout --track -b stable/1.4.X remotes/1.4.X

$ git checkout --track -b soc2009/i18n-improvements remotes/i18n-improvements
$ git checkout --track -b soc2009/model-validation remotes/model-validation
$ git checkout --track -b soc2009/multidb remotes/multidb
$ git checkout --track -b soc2009/admin-ui remotes/admin-ui
$ git checkout --track -b soc2009/http-wsgi-improvements remotes/http-wsgi-improvements
$ git checkout --track -b soc2009/test-improvements remotes/test-improvements
$ git checkout --track -b soc2010/app-loading remotes/app-loading
$ git checkout --track -b soc2010/query-refactor remotes/query-refactor
$ git checkout --track -b soc2010/test-refactor remotes/test-refactor

$ git checkout --track -b attic/boulder-oracle-sprint remotes/boulder-oracle-sprint@11505
$ git checkout --track -b attic/full-history remotes/full-history
$ git checkout --track -b attic/generic-auth remotes/generic-auth
$ git checkout --track -b attic/gis remotes/gis@11507
$ git checkout --track -b attic/i18n remotes/i18n@11508
$ git checkout --track -b attic/magic-removal remotes/magic-removal@11509
$ git checkout --track -b attic/multi-auth remotes/multi-auth@11510
$ git checkout --track -b attic/multiple-db-support remotes/multiple-db-support
$ git checkout --track -b attic/new-admin remotes/new-admin@11512
$ git checkout --track -b attic/newforms-admin remotes/newforms-admin@11514
$ git checkout --track -b attic/per-object-permissions remotes/per-object-permissions
$ git checkout --track -b attic/queryset-refactor remotes/queryset-refactor@11516
$ git checkout --track -b attic/schema-evolution remotes/schema-evolution
$ git checkout --track -b attic/schema-evolution-ng remotes/schema-evolution-ng
$ git checkout --track -b attic/search-api remotes/search-api
$ git checkout --track -b attic/sqlalchemy remotes/sqlalchemy
$ git checkout --track -b attic/unicode remotes/unicode@11521

# Get back to trunk
$ git checkout master

Fork the official Django GitHub repository in your account using the GitHub Web UI.

Push the branches to it:

$ git remote add mine https://ramiro@github.com/ramiro/django.git
$ git push mine $(git branch |grep -v ^\*\ master)

Long version

  1. We will be roughly following Adrian's instructions from http://www.holovaty.com/writing/django-github/

  2. Get latest authors.txt authors map:

    $ wget https://raw.github.com/brosner/django-git-authors/master/authors.txt
    
  3. Get a SVN dump from our Open Data page (https://code.djangoproject.com/wiki/OpenData):

    $ wget https://www.djangoproject.com/m/data/django-svn.svndump.bz2
    
  4. Prepare it:

    $ bunzip2 --stdout django-svn.svndump.bz2 > django-svn.svndump
    
  5. Create and populate the SVN repo:

    $ svnadmin create django-svn
    $ svnadmin load django-svn < django-svn.svndump
    

    (wait a few hours for the entire SVN history to be replayed)

  6. Init the Git repo and git-svn metadata/configuration.

    Important: Note the trailing django/ in the SVN repository path.

    $ git svn init \
          --rewrite-root=http://code.djangoproject.com/svn \
          --trunk=trunk \
          --branches=branches/releases \
          --branches=branches/features \
          --branches=branches/soc2009 \
          --branches=branches/soc2010 \
          --branches=branches/attic \
          --branches={0.90-bugfixes,0.91-bugfixes,0.95-bugfixes,0.96-bugfixes} \
          file:///path/to/local/SVN/repo/django/ \
          django-dry-run
    

    This will create a [svn-remote ...] section in the .git/config file similar to this:

    [svn-remote "svn"]
            url = file:///path/to/local/SVN/repo
            fetch = django/trunk:refs/remotes/trunk
            branches = django/branches/releases/*:refs/remotes/*
            branches = django/branches/features/*:refs/remotes/*
            branches = django/branches/attic/*:refs/remotes/*
            branches = django/branches/soc2009/*:refs/remotes/*
            branches = django/branches/soc2010/*:refs/remotes/*
            branches = django/branches/{0.90-bugfixes,0.91-bugfixes,0.95-bugfixes,0.96-bugfixes}:refs/remotes/*
    
  7. Perform the actual SVN -> Git cloning:

    $ cd django-dry-run
    $ git svn fetch --quiet --authors-file=../authors.txt
    

    This took three and a half hours approx.

  8. Verify branches:

    Check that trunk is known as master:

    $ git branch
    * master
    

    Check that remote branches were created for the SVN branches:

    $ git branch -r
    0.90-bugfixes
    0.90-bugfixes@3590
    0.91-bugfixes
    0.91-bugfixes@3571
    0.95-bugfixes
    0.95-bugfixes@4358
    0.96-bugfixes
    0.96-bugfixes@6603
    1.0.X
    1.1.X
    1.2.X
    1.3
    1.3.X
    1.4.X
    admin-ui
    app-loading
    boulder-oracle-sprint
    boulder-oracle-sprint@11505
    full-history
    full-history@11500
    full-history@11501
    generic-auth
    generic-auth@11506
    gis
    gis@11507
    http-wsgi-improvements
    i18n
    i18n-improvements
    i18n@11508
    magic-removal
    magic-removal@11509
    model-validation
    multi-auth
    multi-auth@11510
    multidb
    multiple-db-support
    multiple-db-support@11511
    new-admin
    new-admin@11512
    newforms-admin
    newforms-admin@11514
    per-object-permissions
    per-object-permissions@11515
    query-refactor
    queryset-refactor
    queryset-refactor@11516
    schema-evolution
    schema-evolution-ng
    schema-evolution-ng@11518
    schema-evolution@11517
    search-api
    search-api@11519
    sqlalchemy
    sqlalchemy@11520
    test-improvements
    test-refactor
    trunk
    unicode
    unicode@11521
    

    Branches that got moved to the Attic are represented by two (or more) remote branches, where the extra ones are suffixed with '@revnumber' and represent points in that branches' histories right before they were moved to under branches/attic/. The branch without such suffixes in its name is the terminal one, after the move. This distinction could be important in the next step.

    Also, GSoC 2009 i18n-improvements, model-validation and multidb branches weren't merged by using SVN facilities but by a manual, local merge by a mentor that then performed a plain commit.

  9. Create local branches from the remote ones.

    I think that for branches that:

    • Got merged back to trunk (see section at the end of this document) and
    • Later were moved to the Attic (see above)

    we need to use the 'branchname@revnumber' branch instead of the 'branchname' one. This will provide for an easier and more realistic scenario later when we try to convert these merges into Git merges.

    # Release maintenance branches
    $ git checkout --track -b releases/1.0.X remotes/1.0.X
    $ git checkout --track -b releases/1.1.X remotes/1.1.X
    $ git checkout --track -b releases/1.2.X remotes/1.2.X
    $ git checkout --track -b releases/1.3.X remotes/1.3.X
    $ git checkout --track -b releases/1.4.X remotes/1.4.X
    
    # Branches that got merged back into trunk
    $ git checkout --track -b boulder-oracle-sprint remotes/boulder-oracle-sprint@11505
    $ git checkout --track -b gis remotes/gis@11507
    $ git checkout --track -b i18n remotes/i18n@11508
    $ git checkout --track -b magic-removal remotes/magic-removal@11509
    $ git checkout --track -b multi-auth remotes/multi-auth@11510
    $ git checkout --track -b new-admin remotes/new-admin@11512
    $ git checkout --track -b newforms-admin remotes/newforms-admin@11514
    $ git checkout --track -b queryset-refactor remotes/queryset-refactor@11516
    $ git checkout --track -b unicode remotes/unicode@11521
    $ git checkout --track -b soc2009/i18n-improvements remotes/i18n-improvements
    $ git checkout --track -b soc2009/model-validation remotes/model-validation
    $ git checkout --track -b soc2009/multidb remotes/multidb
    
    # Branches for GSoC student work, abandoned
    $ git checkout --track -b soc2009/admin-ui remotes/admin-ui
    $ git checkout --track -b soc2009/http-wsgi-improvements remotes/http-wsgi-improvements
    $ git checkout --track -b soc2009/test-improvements remotes/test-improvements
    
    $ git checkout --track -b soc2010/app-loading remotes/app-loading
    $ git checkout --track -b soc2010/query-refactor remotes/query-refactor
    $ git checkout --track -b soc2010/test-refactor remotes/test-refactor
    
    # Abandoned branches
    $ git checkout --track -b attic/full-history remotes/full-history
    $ git checkout --track -b attic/generic-auth remotes/generic-auth
    $ git checkout --track -b attic/multiple-db-support remotes/multiple-db-support
    $ git checkout --track -b attic/per-object-permissions remotes/per-object-permissions
    $ git checkout --track -b attic/schema-evolution remotes/schema-evolution
    $ git checkout --track -b attic/schema-evolution-ng remotes/schema-evolution-ng
    $ git checkout --track -b attic/search-api remotes/search-api
    $ git checkout --track -b attic/sqlalchemy remotes/sqlalchemy
    
    # The future
    $ git checkout --track -b features/py3k remotes/py3k
    
    # Get back to trunk
    $ git checkout master
    
  10. THIS STEP ISN'T NEEDED ANYMORE -- We avoid it by using the --rewrite-root=http://code.djangoproject.com /svn command line option when running git svn init in step 7 above.

    Fix the SVN repo URLs.

    This will correct the commit IDs so they are identical to the ones in our officla GitHub repository:

    $ git filter-branch --msg-filter \
        "sed \"s|^git-svn-id: file:///path/to/local/SVN/repo/django|git-svn-id: http://code.djangoproject.com/svn/django|g\"" -- --all
    Rewrite xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (nnnnn/17593)
    Ref 'refs/heads/attic/full-history' was rewritten
    ...
    Ref 'refs/remotes/unicode@11521' was rewritten
    

    Note

    Consider using the -d switch for filter-branch and point it to a RAM disk.

    Warning

    Note that git filter-branch doubles the number of commits in the repository as it creates the modified ones but doesn't delete the original ones. It seems the original commits can be removed by a do-nothing git filter-branch run or by cloning the repository to another one.

  11. Perform some basic sanity checks. e.g. compare

    • The commit ID
    • The full commit message

    of this migrated commit: https://github.com/django/django/commit/ddc5d59c6a547f76797d99510df8c3cec61e5f89 and the corresponding commit in our django-dry-run Git repo. They should be identical.

  12. Play with trying to get SVN branch merges as nice Git merges.

    Unfortunately it seems grafts created in the .git/info/grafts file can't be transferred to other repos and getting them effectively incorporated into the repository so they can be transferred changes the hashes for all the subsequent commits (in Git the hash of a commit is calculated among other item based on the IDs of its parents).

    Some useful links:

  13. Push to GitHub for review.

Which branches were merged into trunk and when?

AKA SVN archeology.

  1. Boulder Oracle sprint
    • Branch name: bould-oracle-sprint
    • Merged in: r5519 (https://code.djangoproject.com/changeset/5519) -- 06/23/07 11:16:00
    • Merge commit: ac64e91a0cadc57f4bc5cd5d66955832320ca7a1
    • Parent commit in trunk: 553a20075e6991e7a60baee51ea68c8adc520d9a
    • Parent commit in branch: 0cb8e31823b2e9f05c4ae868c19f5f38e78a5f2e
  2. GIS
    • Branch name: gis
    • Merged in: r8219 (https://code.djangoproject.com/changeset/8219) -- 08/05/08 15:13:06
    • Merge commit: 79e68c225b926302ebb29c808dda8afa49856f5c
    • Parent commit in trunk: d0f57e7c7385a112cb9e19d314352fc5ed5b0747
    • Parent commit in branch: aa239e3e5405933af6a29dac3cf587b59a099927
  3. i18n (original hugo's work)
    • Branch name: i18n
    • Merged in: r1068 (https://code.djangoproject.com/changeset/1068) -- 11/04/05 01:59:46
    • Merge commit: 5cf8f684237ab5addaf3549b2347c3adf107c0a7
    • Parent commit in trunk: cb45fd0ae20597306cd1f877efc99d9bd7cbee98
    • Parent commit in branch: e27211a0deae2f1d402537f0ebb64ad4ccf6a4da
  4. Magic removal
    • Branch name: magic-removal
    • Merged in: r2809 (https://code.djangoproject.com/changeset/2809) -- 05/01/06 22:31:56
    • Merge commit: f69cf70ed813a8cd7e1f963a14ae39103e8d5265
    • Parent commit in trunk: d5dbeaa9be359a4c794885c2e9f1b5a7e5e51fb8
    • Parent commit in branch: d2fcbcf9d76d5bb8a661ee73dae976c74183098b
  5. multi-auth
    • Branch name: multi-auth
    • Merged in: r3226 (https://code.djangoproject.com/changeset/3226) -- 06/28/06 13:37:02
    • Merge commit: aab3a418ac9293bb4abd7670f65d930cb0426d58
    • Parent commit in trunk: 4ea7a11659b8a0ab07b0d2e847975f7324664f10
    • Parent commit in branch: adf4b9311d5d64a2bdd58da50271c121ea22e397
  6. Alex's Multiple DB support (GSoC 2009) (MANUAL SVN MERGE)
    • Branch name: multidb
    • Merged in: r11952 (https://code.djangoproject.com/changeset/11952) -- 12/22/09
    • Merge commit: ff60c5f9de3e8690d1e86f3e9e3f7248a15397c8
    • Parent commit in trunk: 7ef212af149540aa2da577a960d0d87029fd1514
    • Parent commit in branch: 45b4288bb66a3cda401b45901e85b645674c3988
  7. rjwittams' first admin refactoring
    • Branch name: new-admin
    • Merged in: r1434 (https://code.djangoproject.com/changeset/1434) -- 11/25/05 18:20:09
    • Merge commit: 9dda4abee1225db7a7b195b84c915fdd141a7260
    • Parent commit in trunk: 4fe5c9b7ee09dc25921918a6dbb7605edb374bc9
    • Parent commit in branch: 3a7c14b583621272d4ef53061287b619ce3c290d
  8. Second admin refactoring
    • Branch name: newforms-admin
    • Merged in: r7967 (https://code.djangoproject.com/changeset/7967) -- 07/18/08 20:54:34
    • Merge commit: a19ed8aea395e8e07164ff7d85bd7dff2f24edca
    • Parent commit in trunk: dc375fb0f3b7fbae740e8cfcd791b8bccb8a4e66
    • Parent commit in branch: 42ea7a5ce8aece67d16c6610a49560c1493d4653
  9. Malcolm's QuerySet refactor
    • Branch name: queryset-refactor
    • Merged in: r7477 (https://code.djangoproject.com/changeset/7477) -- 04/26/08 23:50:16
    • Merge commit: 9c52d56f6f8a9cdafb231adf9f4110473099c9b5
    • Parent commit in trunk: c91a30f00fd182faf8ca5c03cd7dbcf8b735b458
    • Parent commit in branch: 4a5c5c78f2ecd4ed8859cd5ac773ff3a01bccf96
  10. Unicode
    • Branch name: unicode
    • Merged in: r5609 (https://code.djangoproject.com/changeset/5609) -- 07/04/07 09:11:04
    • Merge commit: 953badbea5a04159adbfa970f5805c0232b6a401
    • Parent commit in trunk: 4c958b15b250866b70ded7d82aa532f1e57f96ae
    • Parent commit in branch: 5664a678b29ab04cad425c15b2792f4519f43928
  11. model validation (GSoC 2009) (MANUAL SVN MERGE)
    • Branch name: model-validation
    • Merged in: r12098 (https://code.djangoproject.com/changeset/12098) -- 09/11/09 18:23:55
    • Merge commit: 471596fc1afcb9c6258d317c619eaf5fd394e797
    • Parent commit in trunk: 4e89105d64bb9e04c409139a41e9c7aac263df4c
    • Parent commit in branch: 3e9035a9625c8a8a5e88361133e87ce455c4fc13
  12. i18n-improvements (GSoC 2009) (MANUAL SVN MERGE)
    • Branch name: i18n-improvements
    • Merged in: r11964 (https://code.djangoproject.com/changeset/11964) -- 12/22/09 14:58:49
    • Merge commit: 9233d0426537615e06b78d28010d17d5a66adf44
    • Parent commit in trunk: 6632739e94c6c38b4c5a86cf5c80c48ae50ac49f
    • Parent commit in branch: 18e151bc3f8a85f2766d64262902a9fcad44d937
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment