Skip to content

Instantly share code, notes, and snippets.

@fsmv
Last active April 14, 2023 19:17
Show Gist options
  • Save fsmv/4685221d6a300a1bb987e4c22f76a303 to your computer and use it in GitHub Desktop.
Save fsmv/4685221d6a300a1bb987e4c22f76a303 to your computer and use it in GitHub Desktop.
How to merge multiple git repos into one repo, and preserve history

Merging multiple git repos into one

tl;dr:

  1. Import other repos as branches
  2. Move the files in the other repos into subdirectories for each project
  3. Rebase the other repo branches onto the master
  4. Merge everything into master in order

Note: placeholders are in angle brackets.

  1. See the tips section

  2. Create new folder and git repo: mkdir <repo> && cd <repo> && git init

  3. Make an empty initial commit: git commit --allow-empty

    This is useful so that you can rebase the branches for the other repos onto this commit later.

    My Message:

    Initial commit to move separate server repos into one repo together
    
    I had started this project with separate git repos for each of the
    servers involved.  I'm changing my mind, they should all be in one
    repository together.
    
    Now they will all be packages under the folder "<repo>"
    
    I'm going to import the separate repos as branches and  merge them
    together in the order I did the commits into the master branch. I will
    also fix import paths as I merge them together.
    
    This will create one master branch with a commit history in order and
    make it so every commit compiles correctly.
    
  4. Create a remote for the external repo: git remote add <imported_repo> ~/go/src/<imported_repo>/.git

  5. Copy the branch into the new repo: git fetch <imported_repo> && git checkout -b <imported_repo> remotes/<imported_repo>/master

    Thanks: https://stackoverflow.com/questions/9767381/importing-one-git-repo-as-a-branch-into-another-git-repo

  6. Delete the remote: git remote remove <imported_repo>

  7. Move the files in the imported branch into a subdirectory

    After much searching and trying different series of rebases to make this happen, I found in the git help filter-branch manpage:

    To move the whole tree into a subdirectory, or remove it from there:
    
        git filter-branch --index-filter \
                'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
                        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                                git update-index --index-info &&
                 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
    

    so just replace newsubdir at the end of that line with the subdir name and change HEAD to the imported branch.

    git filter-branch --index-filter \
            'git ls-files -s | sed "s-\t\"*-&<imported_repo>/-" |
                    GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                            git update-index --index-info &&
             mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' <imported_repo>
    
  8. Rebase the imported branch onto the master branch: git checkout master && git rebase master <imported_repo>

    Goes from:

    init_commit (master)
    
    A -- B -- C (<imported_repo>)
    

    To:

                           /-- A -- B -- C (<imported_repo>)
    init_commit (master) -/
    
  9. Repeat for the rest of the repos. They should all be branched off of the empty commit on master.

  10. Merge commits into master in order of date, fixing import paths

    git checkout master then git merge --no-ff --no-commit --log <commit hash>

    You can then edit the files before merging them to fix things like import path changes due to moving the project into a subdir.

    git commit to finalize the merge

Tips:

  • Setup a nice commit log graph printing alias. This is all about manipulating the tree so you should be able to see it.

    Git has built in alias commands!

    Set up git tree to see a graph of the entire repository: git config --global alias.tree "log --graph --all --date=rfc --pretty=format:'%C(auto)%h%d %Cgreen%an <%ae>%n%CresetAuthor Date: %Cred%ar %Cblue(%ad) %n%CresetCommit Date: %Cred%cr %Cblue(%cd) %n%n%Creset%s%n'"

    The above sets up an alias git tree for git log --graph --all .... Check out git help log and search for (by typing /search term) %h to find the list of message format modifiers.

    Set up git hist which is the same thing, but only shows the current branch. git config --global alias.hist "log --graph --date=rfc --pretty=format:'%C(auto)%h%d %Cgreen%an <%ae>%n%CresetAuthor Date: %Cred%ar %Cblue(%ad) %n%CresetCommit Date: %Cred%cr %Cblue(%cd) %n%n%Creset%s%n'"

    git hist is just the same as git tree but with --all removed

    Thanks: https://stackoverflow.com/questions/1057564/pretty-git-branch-graphs

  • This is a lot of operations to do to a git repo, so make sure you make backups when you get the repo into a good state. Just copy the entire project dir (or just the .git folder if you want).

    cp -R project project.bak

  • RTFM! The git help <command> pages are good!

  • Commits have two sets of dates, usernames, and emails. There's two of each, one for COMMITTER, and one for AUTHOR. Some commands, like rebase, only change the committed date.

  • When you use git rebase, it creates a backup file in .git/refs/original/. So, if you do rebase more than once on the same repo, you will have to add -f to overwrite the backup (or just delete it rm -rf .git/refs/original).

  • git show <commit hash> shows you the diff that commit applies, as well as commit metadata.

  • git branch -m <new_name> renames the current branch

Bonus: fixing name, email, and timezone

  • My git user.name and user.email were set up wrong on my install. To change the username and email of the old commits:

    git filter-branch --env-filter '
        GIT_AUTHOR_NAME="Andrew Kallmeyer"
        GIT_COMMITTER_NAME="Andrew Kallmeyer"
        GIT_AUTHOR_EMAIL="fsmv@sapium.net"
        GIT_COMMITTER_EMAIL="fsmv@sapium.net"
    '
    
  • My system timezone was set to UTC even though I live in PDT. To change the timezone in the old commits:

    Thanks: https://stackoverflow.com/questions/30960157/change-timezone-for-all-commits-in-git-history

    git filter-branch --env-filter '
        GIT_AUTHOR_DATE=`echo $GIT_AUTHOR_DATE|sed -e "s/+0000/-0700/g"`
        GIT_COMMITTER_DATE=`echo $GIT_COMMITTER_DATE|sed -e "s/+0000/-0700/g"`
    '
    
@fsmv
Copy link
Author

fsmv commented Apr 16, 2020

To make edits to all commits in the history

git filter-branch --index-filter "#COMMAND (including git commands)" -f -- --all

Ex:

git filter-branch --tree-filter "find ./ -name '*.asm' -exec sed -i '1s|^|; Provided under the MIT License: http://mit-license.org/\n|' {} \;" -f -- --all HEAD

Thanks https://stefanolsen.com/posts/to-rewrite-git-history-add-and-edit-files-back-in-time/

@fsmv
Copy link
Author

fsmv commented Apr 14, 2023

Here's how to copy (rename) an annotated tag with the message and date set and add ".0" to the end of the name and sign it: GIT_COMMITTER_DATE="$(git tag -l --format="%(taggerdate)" $TAG)" git tag -s -a -m "$(git tag -l --format='%(contents)' $TAG)" $TAG.0 "$TAG^{}"

Just set TAG=v0.8 to copy v0.8 to v0.8.0 (you can modify the line if you want to change the new tag name). Remove -s if you don't sign commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment