Skip to content

Instantly share code, notes, and snippets.

@SKempin
Last active April 17, 2024 03:47
Show Gist options
  • Save SKempin/b7857a6ff6bddb05717cc17a44091202 to your computer and use it in GitHub Desktop.
Save SKempin/b7857a6ff6bddb05717cc17a44091202 to your computer and use it in GitHub Desktop.
Git Subtree basics

Git Subtree Basics

If you hate git submodule, then you may want to give git subtree a try.

Background

When you want to use a subtree, you add the subtree to an existing repository where the subtree is a reference to another repository url and branch/tag. This add command adds all the code and files into the main repository locally; it's not just a reference to a remote repo.

When you stage and commit files for the main repo, it will add all of the remote files in the same operation. The subtree checkout will pull all the files in one pass, so there is no need to try and connect to another repo to get the portion of subtree files, because they were already included in the main repo.

Adding a subtree

Let's say you already have a git repository with at least one commit. You can add another repository into this respository like this:

  1. Specify you want to add a subtree
  2. Specify the prefix local directory into which you want to pull the subtree
  3. Specify the remote repository URL [of the subtree being pulled in]
  4. Specify the remote branch [of the subtree being pulled in]
  5. Specify you want to squash all the remote repository's [the subtree's] logs

git subtree add --prefix {local directory being pulled into} {remote repo URL} {remote branch} --squash

For example:

git subtree add --prefix subtreeDirectory https://github.com/newfivefour/vimrc.git master --squash

This will clone https://github.com/newfivefour/vimrc.git into the directory subtreeDirectory.


Pull in new subtree commits

If you want to pull in any new commits to the subtree from the remote, issue the same command as above, replacing add for pull:

git subtree pull --prefix subtreeDirectory https://github.com/newfivefour/vimrc.git master --squash


Updating / Pushing to the subtree remote repository

If you make a change to anything in subtreeDirectory the commit will be stored in the host repository and its logs. That is the biggest change from submodules.

If you now want to update the subtree remote repository with that commit, you must run the same command, excluding --squash and replacing pull for push.

git subtree push --prefix subtreeDirectory https://github.com/newfivefour/vimrc.git master


Subtree issues

  • It isn't readily apparent that part of the main repo is built from a subtree
  • You can't easily list the subtrees in your project
  • You can't, at least easily, list the remote repositories of the subtrees
  • The logs are slightly confusing when you update the host repository with subtree commits, then push the subtree to its host, and then pull the subtree.

Other than that, they're looking nicer than submodules.

Amended from original articles:

  1. https://newfivefour.com/git-subtree-basics.html
  2. https://docs.acquia.com/articles/using-git-subtrees-instead-git-submodules
@aadityataparia
Copy link

add .md in filename please

@xor2003
Copy link

xor2003 commented Oct 1, 2019

List subtrees:
Since subtree must have a folder with the same name in the root folder of the repository, you can run this to get the info you want (in Bash shell):

git log | grep git-subtree-dir | tr -d ' ' | cut -d ":" -f2 | sort | uniq
Now, this doesn't check whether the folder exist or not (you may delete it and the subtree mechanism won't know), so here's how you can list only the existing subtrees, this will work in any folder in the repository:

git log | grep git-subtree-dir | tr -d ' ' | cut -d ":" -f2 | sort | uniq | xargs -I {} bash -c 'if [ -d $(git rev-parse --show-toplevel)/{} ] ; then echo {}; fi'

@DarinDev1000
Copy link

DarinDev1000 commented Aug 31, 2020

https://gist.github.com/SKempin/b7857a6ff6bddb05717cc17a44091202#gistcomment-2859188
Add .md to the end of your file name so it formats right

@SKempin
Copy link
Author

SKempin commented Sep 2, 2020

Done, cheers guys.

@scaprile
Copy link

scaprile commented Mar 30, 2021

Perhaps you'd like to add that one has to do this on a clean working directory, that is, any files added/modified must be commited. Otherwise a not so clear message may show up

@SKempin
Copy link
Author

SKempin commented Apr 1, 2021

Perhaps you'd like to add that one has to do this on a clean working directory, that is, any files added/modified must be commited. Otherwise a not so clear message may show up

Please feel free to fork

@artu-hnrq
Copy link

artu-hnrq commented Jul 8, 2021

Nice summary! I'll get back in here, surely.
But the two links at the end are break

@DougLeonard
Copy link

"The logs are slightly confusing when you update the host repository with subtree commits, then push the subtree to its host, and then pull the subtree."

That's an understatement.

See git-alltrees (my work):

https://gitlab.com/douglas.s.leonard/alltrees/-/wikis/home

Scroll down to the second image. That represents the confused histories than can exist.
Scroll up to the first image. That's what alltrees produces instead for each repo, a correct history of the remote edits from the perspective of each involved repo, shown as branches and merges. It has other advantages as well.

It can list subtrees too, but only if you've configured them. Of course someone else pulling the project just sees the whole project, which is the point of subtrees. They might not even have access to the subtree repo you contributed from.

It's a new tool. As far as I can tell it works well. There's room for more development.

@sankartn
Copy link

How can I change the subtree code to point the code of specific commit of remote repository.

@epcim
Copy link

epcim commented Jan 11, 2022

To update a list or other tools, 3y back I came with idea of the similar with git workstree.

Sth changed over time, today git treat the worktree in main repo root as another git repo and as such as submodule (which is not bad, as it's basically submodule with subtree path). Prototype is here https://github.com/epcim/git-cross. The idea was to have single file in repo, to define all dependencies. Worktree is a brach and as such you shall be to cherry-pick to upstream projects in worktree.

All-trees above looks promising.

@DougLeonard
Copy link

@sankartn I think what you want is probably just to add a branch to the desired commit on the remote (or your clone of it), and git subtree add that branch to your project. Of course simply copying in files is always an option to consider.

@sankartn
Copy link

sankartn commented Jan 12, 2022

@DougLeonard I am already aware of adding branch to a subtree. But I have explained below clearly with an example what I expect now.

Lets say there is a repo called repoB with branch called main and latest commit pointing to the commit hash called commit0.
Now lets say, we have a main repo A, inside which we have added a subtree repo B. When we add, we have added the subtree to point to the branch main of remote repo B. So now repo A will have repo B subtree pointing to the main branch with commit0 of remote repo B.

Now say, there are some commits in repo B. Let's say there are 3 commits with commit hash as commit1, commit2, commit3.
Now what I want to do is, I want the subtree repoB inside main repo A to point to the commit2 contents of main branch of remote repoB. What is the git command to do so?

@DougLeonard
Copy link

DougLeonard commented Jan 12, 2022

@sankartn

on the remote (repoB or the clone of it that you control):

git tag mytag commit2

on repo A

git subtree -P subdir pull repoB mytag

Alternatively you could probably git fetch the remote branch and git subtree merge the specific commit, I think without tagging it.

Subtrees don't actually point to anything though, if you can even say there is such a thing as a subtree.

(edited syntax)

@sankartn
Copy link

@DougLeonard Thanks a lot for guiding with a very clear explanation. Thumbs up!

@tmillr
Copy link

tmillr commented Jul 31, 2022

Thanks for this. I'm just wondering how this is any better than submodules when it has several issues and isn't even apart of core git? It's also implemented with hacky solutions and external changes to the subrepo are not tracked by git fetch nor git status? It seems to me like the only benefit is one less option that you have to pass when doing git clone and seems to offer little convenience over simply doing what subtree does, but manually (something like git pull --commit -s subtree remote refspec). I've also heard that rebases can become more work/more confusing with subtrees involved. At least with submodules git status will notify you if your submodule is behind yeah? Am I missing something?

@KuangJie7
Copy link

Is there a way to extract folders or files in one git repo(mono-repo) as subtree/subtrees of another git repo?

For example, mono-repo A:

mono-repo A
  - .git
  - packages
    - module1
    - module2
    - module3

I want to define A/packages/module1 and A/packages/module3 as subtrees of repo B so that I can track updates from repo A.
In this case, repo B:

repo B
  - .git
  - sub-modules -> attached with repo A
    - module1
    - module3

Then I can take use of source code from repo A in repo B. Any idea to achieve this?

@chris-hatton
Copy link

chris-hatton commented Dec 4, 2023

I've been using submodules for years and failing to understand the hate; yes they add a bit of maintenance overhead, but it's easy to see how they work and you get the benefit of a single source-of-truth for each component.
Now that I need to widen out the usage to a larger team I thought I would give subtrees a spin; but the experience is awful I can't imagine why these are billed as friendlier than submodules. The pull/push just doesn't work as advertised; I had git telling me the tip was behind HEAD and that's why I couldn't push, even though it wasn't, I ended up having to delete and recreate the subtree several times, and on one occasion the entire history of the consuming repo got pushed into the subtree repo 🤦 I'm not an inexperienced Git user, and no doubt I'm 'holding it wrong' somehow - but friendlier than submodules it is not...
Backing away hard and thinking how to most gently intro submodules to the team instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment