Skip to content

Instantly share code, notes, and snippets.

@tdd
Last active October 2, 2020 18:25
Show Gist options
  • Save tdd/005d30a79894b313c549 to your computer and use it in GitHub Desktop.
Save tdd/005d30a79894b313c549 to your computer and use it in GitHub Desktop.
Subtrees investigations

Workbench

We extract our test repos from this small Zip file.

  • main is a "container" repository with its working copy,
  • plugin is a "shared" repository with its working copy,
  • remotes emulates remote bare repos for both, to better resemble regular usage.

The idea is to use plugin as a subtree of main in a vendor/plugins/demo path, and allow maintenance both ways:

  1. Initial grab of the plugin as a subtree
  2. Upgrade of the centrally-updated plugin from its remote to our subtree
  3. Upstream sharing of local changes to our subtree (assuming these use dedicated commits in our main repo)

(In the following commands, git ci is an alias for git commit and git lg is a tuned git log.)

Common setup

Plugin remote

In order to keep later commands reasonably concise, we define a remote for our to-be-subtree plugin, in our local main repo:

git remote add -f plugin ../remotes/plugin

Initial logs

On the main repo:

* b90985a - (HEAD, origin/master, master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

On the plugin repo:

* fe64799 - (HEAD, origin/master, master) Fix repo name for main project companion demo repo
* 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
* cc88751 - Initial commit

First approach: git subtree contrib script

We base our work on the not-quite-well-integrated doc.

Initial grab

$ git subtree add -P vendor/plugins/demo plugin master
git fetch plugin master
From ../remotes/plugin
 * branch            master     -> FETCH_HEAD
Added dir 'vendor/plugins/demo'

$  tree vendor/
vendor/
└── plugins
    └── demo
        ├── README.md
        ├── lib
        │   └── index.js
        └── plugin-config.json

$ git lg
*   3d4475e - (HEAD, master) Add 'vendor/plugins/demo/' from commit 'fe6479991d214f4d95ac2ae959d7252a866e01a3'
|\
| * fe64799 - (plugin/master) Fix repo name for main project companion demo repo
| * 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
| * cc88751 - Initial commit
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

This merges the entire history of our plugin. We could have avoided that by using --squash, but this will still create a "parallel bump" in our graph, as we'll demonstrate in the next command.

Upgrading from the centrally-updated plugin

Let's assume we added a "Semver compatibility" commit to the plugin, and pushed it to its remote. We now upgrade our subtree from it, this time with --squash to avoid conflating the histories too much:

$ git subtree pull -P vendor/plugins/demo plugin master --squash
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../remotes/plugin
 * branch            master     -> FETCH_HEAD
   fe64799..5fcbb84  master     -> plugin/master
Merge made by the 'recursive' strategy.
 vendor/plugins/demo/semver | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 vendor/plugins/demo/semver

$ git lg
*   97bdd7a - (HEAD, master) Merge commit '5b78c452cc7e37d8f8fc631d57f93197b8b00f11'
|\
| * 5b78c45 - Squashed 'vendor/plugins/demo/' changes from fe64799..5fcbb84
* |   3d4475e - Add 'vendor/plugins/demo/' from commit 'fe6479991d214f4d95ac2ae959d7252a866e01a3'
|\ \
| |/
| * fe64799 - Fix repo name for main project companion demo repo
| * 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
| * cc88751 - Initial commit
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

Note how there is a separate squash commit and a merge commit, and how the squash commit retains its parent. This is true of later upgrades, too:

$ git subtree pull -P vendor/plugins/demo plugin master --squash
remote: Counting objects: 1, done.
remote: Total 1 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (1/1), done.
From ../remotes/plugin
 * branch            master     -> FETCH_HEAD
   5fcbb84..5c1d3d3  master     -> plugin/master
Already up-to-date!
Merge made by the 'recursive' strategy.

$ git lg
*   45f4d3a - (HEAD, master) Merge commit '54ac3a6f180ffaebd633bcba65115e18169ff735'
|\
| * 54ac3a6 - Squashed 'vendor/plugins/demo/' changes from 5fcbb84..5c1d3d3
* |   97bdd7a - Merge commit '5b78c452cc7e37d8f8fc631d57f93197b8b00f11'
|\ \
| |/
| * 5b78c45 - Squashed 'vendor/plugins/demo/' changes from fe64799..5fcbb84
* |   3d4475e - Add 'vendor/plugins/demo/' from commit 'fe6479991d214f4d95ac2ae959d7252a866e01a3'
|\ \
| |/
| * fe64799 - Fix repo name for main project companion demo repo
…

This creates a considerably polluted graph. Also note that this is regardless of whether our initial git subtree add squashed or not.

Sharing a local subtree change upstream

On the other hand, upstream sharing of local fixes/upgrades is very reliable:

$ date >> vendor/plugins/demo/log
$ git add vendor/plugins/demo/log
$ git ci -m "Local plugin work #1"
[master 43bda00] Local plugin work #2
 1 file changed, 1 insertion(+)

$ date >> main-file-1
$ git ci -am "Container repo work"
[master d869b86] Container repo work
 1 file changed, 1 insertion(+)

$ date >> vendor/plugins/demo/log
$ git ci -am "Local plugin work #2"
[master 4decd9b] Local plugin work #2
1 file changed, 1 insertion(+)

$ git subtree push -P vendor/plugins/demo plugin master
git push using:  plugin master
-n 1/      10 (0)
-n 2/      10 (0)
-n 3/      10 (0)
-n 4/      10 (1)
-n 5/      10 (1)
-n 6/      10 (2)
-n 7/      10 (2)
-n 8/      10 (3)
-n 9/      10 (4)
-n 10/      10 (5)
Counting objects: 6, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 592 bytes | 0 bytes/s, done.
Total 6 (delta 2), reused 0 (delta 0)
To ../remotes/plugin
   5c1d3d3..d10bb34  d10bb34fbbfbfc98c38c38966cb497ba76f8ad13 -> master

$ git lg -3 plugin/master
* d10bb34 - (plugin/master) Local plugin work #2
* b445598 - Local plugin work #1
* 5c1d3d3 - Further update to the plugin

This is the key strong point of git subtree.

Second approach: manual commands as per Pro Git

This approach manually uses subtree merge strategies, with vanilla Git commands.

Scott recommends creating a local tracking branch for the plugin, totally independent of the main working copy, so we'll do that (although this has issues with .gitignored files that show up as untracked/conflicting in the plugin's tracking branch, requiring duplicated in .git/info/exclude, for instance).

Initial grab

$ git checkout -b plugin_branch plugin/master
Branch plugin_branch set up to track remote branch master from plugin.
Switched to a new branch 'plugin_branch'

$ git read-tree --prefix=vendor/plugins/demo -u plugin_branch
$ git ci -m "Added plugin as subtree"

Upgrading from the centrally-updated plugin

$ git checkout plugin_branch
Switched to branch 'plugin_branch'
Your branch is up-to-date with 'plugin/master'.

$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../remotes/plugin
   fe64799..b0dfe08  master     -> plugin/master
Successfully rebased and updated refs/heads/plugin_branch.

$ git merge --squash -s subtree --no-commit plugin_branch
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested

$ git ci -m "Updated plugin subtree"
[master aa39e0e] Updated plugin subtree
 1 file changed, 1 insertion(+)
 create mode 100644 vendor/plugins/demo/semver

$ git lg
* aa39e0e - (HEAD, master) Updated plugin subtree
* 183c84c - Added plugin as subtree
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

Note that in recent Git versions, --no-commit is superfluous: any subtree-strategy merge seems to refuse inline committing, even if --commit and/or -m is provided. It's always a two-step operation, as above (merge + commit).

On the other hand, here we get no pollution at all in our graph: the subtree update is a single commit in our container repo's history, which can be awesome if you wish to keep both graphs entirely separate.

Sharing a local subtree change upstream

$ date >> vendor/plugins/demo/log
$ git add vendor/plugins/demo/log
$ git ci -m "Local plugin work #1"
[master 4019d88] Local plugin work #1
 1 file changed, 1 insertion(+)
 create mode 100644 vendor/plugins/demo/log

$ date >> main-file-1
$ git ci -am "Container repo work"
[master cc1216a] Container repo work
 1 file changed, 1 insertion(+)

$ date >> vendor/plugins/demo/log
$ git ci -am "Local plugin work #2"
[master 8e42115] Local plugin work #2
 1 file changed, 1 insertion(+)

Pro Git fails us at this point. To get the diff between our container branch's subtree and the plugin's tracking branch, it tells us a git diff-tree -p plugin_branch should suffice. Whether we're in the subtree's directory or not, this fails for us:

$ git diff-tree -p plugin_branch
b0dfe082a515ec34e67534935bbc241f071a9df7
diff --git a/semver b/semver
new file mode 100644
index 0000000..a4f4c53
--- /dev/null
+++ b/semver
@@ -0,0 +1 @@
+Mer 27 aoû 2014 16:17:03 CEST

This is about the file we added to the container repo (at its root), not about our subtree changes at all. We get the exact same display if we attempt a git diff-tree -p master when plugin_branch is checked out.

We couldn't find a single way to get that diff.

On the other hand, Pro Git's reverse-merge seems to work in this case (but has repeatedly not worked in actual situations during our training sessions or in production):

$ git checkout plugin_branch
Switched to branch 'plugin_branch'
Your branch is up-to-date with 'plugin/master'.

$ git merge --squash -s subtree --no-commit master
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested

$ git diff --staged
diff --git c/log i/log
new file mode 100644
index 0000000..07e3154
--- /dev/null
+++ i/log
@@ -0,0 +1,2 @@
+Mer 27 aoû 2014 16:26:19 CEST
+Mer 27 aoû 2014 16:28:35 CEST

$ git ci -m "Backported subtree changes in main repo"
[plugin_branch b281d04] Backported subtree changes in main repo
 1 file changed, 2 insertions(+)
 create mode 100644 log

$ git push
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 330 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ../remotes/plugin
   b0dfe08..b281d04  plugin_branch -> master

This is admittedly more work than with git subtree. Also, this fails to recreate independent commits used to change our subtree in the main repo's branch. On the one hand, this spares us from having to use dedicated commits for the subtree (which remains a best practice, though), on the other hand, this makes upstream shares harder to read/understand.

Third approach: manual commands as per GitHub Help

GitHub advocates a slightly different approach. It still relies on vanilla Git commands. Because it does not concern itself with upstream backports though, it skips the local tracking branch for the plugin, using the fetched remote branch directly. This is pretty much in line with Git's official subtree merge how-to.

Initial grab

$ git merge -s ours --no-commit plugin/master
Automatic merge went well; stopped before committing as requested

$ git read-tree --prefix=vendor/plugins/demo -u plugin/master
$ git ci -m "Added plugin as subtree"
[master e2e81e2] Added plugin as subtree

The ours merge before the read-tree is there to prep our history, as we do not squash here. The resulting graph includes the plugin's entire history, much as our git subtree add did in our first approach, as we had elected not to squash there either.

$ git lg
*   e2e81e2 - (HEAD, master) Added plugin as subtree
|\
| * fe64799 - (plugin/master) Fix repo name for main project companion demo repo
| * 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
| * cc88751 - Initial commit
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

Had we squashed our initial ours-strategy merge, we'd get a single "Added plugin as subtree" commit. This has no impact on later upgrades of the subtree from upstream. Actually, if we squash, the preamble ours-strategy merge can be skipped entirely: the read-tree will suffice, much as in approach 2.

Upgrading from the centrally-updated plugin

GitHub advocates skipping a step by pulling directly with a subtree merge strategy (yeehaa!), so let's try that:

$ git pull -s subtree plugin master
From ../remotes/plugin
 * branch            master     -> FETCH_HEAD
Merge made by the 'subtree' strategy.
 vendor/plugins/demo/semver | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 vendor/plugins/demo/semver

$ git lg
*   caba3d5 - (HEAD, master) Merge branch 'master' of ../remotes/plugin
|\
| * 07db4f3 - (plugin/master) Semver compatibility
* |   e2e81e2 - Added plugin as subtree
|\ \
| |/
| * fe64799 - Fix repo name for main project companion demo repo
| * 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
| * cc88751 - Initial commit
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

Note that this maintains "parallel bumps" in our graph, which rather blows. Let's reset and try again with a squash:

$ git pull -s subtree --squash plugin master
From ../remotes/plugin
 * branch            master     -> FETCH_HEAD
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested

$ git ci -m "Updated subtree"
[master f7d5028] Updated subtree
 1 file changed, 1 insertion(+)
 create mode 100644 vendor/plugins/demo/semver

$ git lg
* f7d5028 - (HEAD, master) Updated subtree
*   e2e81e2 - Added plugin as subtree
|\
| * fe64799 - Fix repo name for main project companion demo repo
| * 89d24ad - Main files (incl. subdir) for plugin, to populate its tree.
| * cc88751 - Initial commit
* b90985a - (origin/master) Main files for the project, to populate its tree a bit.
* e052943 - Initial import

Now that's better.

This requires no rebasing, by the way. Rebasing on pulls here would create an entirely different history, which would be quite awkward if we kept squashing all the way through. As we don't use a local tracking branch on which to graft config here, this would need an explicit --no-rebase param to override otherwise active pull rebasings.

At any rate, we do not have any clear path for merging local subtree changes back to the plugin's upstream.

What we really want

  1. Proper initial grab, probably single-commit. Although git subtree requires only a single command for this, it never creates a single commit, always at least two (the squash commit then an extra merge commit). Both other approaches, when squashing, do this well, even if in multiple steps.
  2. Proper downstream upgrades, probably single-commit. Again, git subtree doesn't deliver. Even in squash mode, it creates two commits and maintains parenthood on the previous injection point, polluting the graph with a parallel line just for our subtree. Both other approaches, when squashing, do this pretty well, even with a single command in the third approach!
  3. Easy upstream backporting. We want the best of the two first approaches: we want to auto-extract the subtree-only commits and replay them "upstream," but since git subtree is messy on points 1 and 2, we'd like to do that without it. Approach 2 works in terms of contents, but does not easily reproduce the commit list, rather a single squashed backport. And approach 3 sidesteps the issue entirely.

It is impossible to use a git subtree push on histories not obtained through git subtree add/pull, especially if you squash, as it won't find its rebase origin point and refuse to force-push on the upstream (rightly so, too).

Manually obtaining the list of commits on the subtree would require something along the lines of a temporary synthetic branch that we'd git filter-branch --subdirectory-filter=vendor/plugins/demo, but finding the proper starting point for that is tricky, and then we'd have to rebase that segment on top of our upstream, which requires a temporary local tracking branch anyway.

Another option would be to cherry-pick all commits pertaining to the subtree, using a subtree merge strategy. The difficulty, again, is to properly figure out the starting point in history.

@tdd
Copy link
Author

tdd commented Aug 27, 2014

So the more I think about this, the more I see a possible solution for this… I'd write a contrib script (git subdir? git boxtree? git plugin? git vendor?), probably with zsh/bash completion add-ons, that would provide three commands:

  • add for initial grabbing
  • pull for upgrade from upstream
  • push for upstream backporting

It would use formatted remote names and the local config to persist their tracked branch, etc.

It's be fairly opinionated. Basically, add blah -P prefix url [branch=master] would:

  1. Check for a clean WD
  2. git remote add -f vendor-blah url
  3. git read-tree --prefix=prefix -u vendor-blah/branch
  4. git commit -m "Vendored blah"
  5. git notes add -m "vendor.blah.latest"

Then upgrades from upstream would use pull blah, which would:

  1. Check for a clean WD
  2. git pull -s subtree --no-rebase --squash vendor-blah/branch (using the tracked branch info stored in .git/config by add)
  3. git notes add -m "vendor.blah.latest"

Finally, when backporting local fixes upstream with push blah [commit…]:

  1. If commits are provided, use this list. Otherwise, find the most recent commit with a vendor.blah.latest note and git log from there limiting to the subtree path (stored in local config).
  2. If no vendor-blah-backports branch exists, checkout -b it from vendor-blah/branch (stored in local config), otherwise checkout it and pull it first.
  3. For each commit we want, cherry-pick -s subtree it on our local tracking branch
  4. git push
  5. git checkout -
  6. git notes append -m "vendor.blah.latest" on either the latest commit we cherry-picked, or HEAD.

This should work, I guess. What do you think?

@tdd
Copy link
Author

tdd commented Nov 9, 2014

@matthewmccullough @schacon hey guys, remember this talk about "better subtree command"?

I finally got around to writing that script: http://tdd.github.io/git-stree/

Before I start pimping it (be it in my Git trainings or online), I'd love some feedback on the CLI/UX and implementation, if you can spare a few minutes. Issues, PRs, or whatever. I'm sure you can whip up some valuable input even on cursory review.

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment