Skip to content

Instantly share code, notes, and snippets.

@shelby3
Created November 27, 2015 23:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shelby3/f69c969ecaa3ecfbe579 to your computer and use it in GitHub Desktop.
Save shelby3/f69c969ecaa3ecfbe579 to your computer and use it in GitHub Desktop.

Version Control

Distributed Version Control

Each repository in a distributed version control system (DVCS) such as Git and Mercurial, is a directed acyclic graph (DAG). Each DAG node is a cryptographic hash that uniquely identifies (i.e. hashes or checksums) a comment, the set of changes (a.k.a. ‘commit’ or ‘changeset’) from, and the cryptographic hash of each of, the directly antecedent ‘parent’ changeset(s). A ‘merge’ changeset has multiple parents, and specifies the changes to each parent such that all those parents result in the same content.


Figure 1. ‘M’ is a merge of parents ‘D’ and ‘E’

Branching

A lightweight ‘branch’ is a named pointer to a changeset that will be updated to point to a newly created descendant (‘child’) changeset of that branch. These are not Mercurial’s named branches, which put immutable thus irreparable global names in the changesets. These lightweight branch pointers are exchanged between repositories (and at least in Git with the local choice for each remote repository name prepended to branch names) but are not recorded in any changesets. Thus deleting or renaming the name pointer leaves no trace in the changesets.

Merge or rebase?

The debate is whether pulled or pushed changesets should be merged to or ‘rebased’ to the ‘tip’ (a.k.a. ‘top’) of the target branch.


Figure 2. ‘R’ is a rebase of branch ‘E’ to parent ‘D’


Figure 3. feature branch is rebased to master branch

The tip of the target branch can be ‘fast-forwarded’—i.e. ‘merged’ without an additional merge changeset—if the most antecedent changeset of the changesets has been merged to or rebased to be a descendent of the (tip of the) target branch.


Figure 4. master branch pointer is fast-forwarded to rebased feature branch

Merging can result in a noisy history that requires special effort to decipher.


Figure 5. visual charts with obfuscating rainbow branch lines due to merging

Combining rebasing with ‘squashing’ (a.k.a. --collapse) noisy comments from distinct changesets into a single or minimum number of multiple changeset(s), creates a sequential, concise, and less cluttered history in the target branch. But modifying changesets alters or discards history that might later be needed for understanding—e.g. if there is collaboration on the branch before a rebase and/or squash—where said collaboration is also a violation of the rule that changesets that exist outside your repository should never be rebased.

Merge and rebase solution

An optimal solution is to retain the original lightweight branch and only rebase and squash a renamed copy of the branch to the (tip of the) target branch which benefits from such concision, and record—within the comment for at least the first and last of the rebased changeset(s)—the hash for the changeset the source branch points to (at the time of the rebase). Since feature and experimentation branches are typically kept in a remote repository—i.e. lightweight branches within a heavy branch—this will not bloat the (perhaps canonical) repository containing the target branch. In that case, the target branch’s repository forgets which remote repository possesses the referent changeset unless the remote repository pushes a tag but not repetitively pushing all tags.

$ git log -1 --format="%H" my-branch           # print hash of ‘my-branch’ branch, which is always the tip of its only head
   1a5c9ca2bfd5de7d5fd79fb89701ea538af65746
$ git checkout master                          # change working directory to local version of ‘master’ branch
$ git pull [SOURCE/master]                     # pull (and update working directory to) current changes for the ‘master’ branch from the default if tracking (or SOURCE, e.g. ‘origin’) remote repository
$ git log -1 --format="%h"                     # print current working directory short hash (which is tip of its head for ‘master’ branch)
   b56ce7b07c52
$ git checkout -b (my-branch):1a5c9ca2bfd5->b56ce7b07c52 # create a branch named to indicate the sources of its fast-forward composition
$ git commit -m 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746'
$ git checkout -b temp my-branch               # create temporary branch copy of ‘my-branch’ branch
$ git rebase -i (my-branch):1a5c9ca2bfd5->b56ce7b07c52   # replay changeset(s) onto tip of current working directory head for temporary branch from (not including) its DAG common ancestor up to and including ‘my-branch’ branch
$ git checkout (my-branch):1a5c9ca2bfd5->b56ce7b07c52
$ git merge temp                               # merge temporary branch copy to new branch
$ git branch -d temp                           # delete temporary branch (name)
$ git commit -m 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746'
$ git rebase -i master                         # interactively edit the history of the appended changeset(s), possibly deleting the first and last and appending 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746' to the comment for the remaining changesets
$ git tag -a (my-branch) 1a5c9ca2bfd5de7d5fd79fb89701ea538af65746 -m ''
$ git push SOURCE (my-branch)                  # push the new tag to the remote repository
$ hg log -B my-branch -l 1 -T '{rev}:{node}\n' # print revision number and hash of ‘my-branch’ bookmark (branch), which may or may not be the tip of its (named branch's) head(s)
   7348:1a5c9ca2bfd5de7d5fd79fb89701ea538af65746
$ hg pull -u -b default [SOURCE]               # pull (and update working directory to) current changes for the ‘default’ (named) branch from the default (or SOURCE) remote repository
$ hg log -r . -l 1 -T '{rev}:{node|short}\n'   # print current working directory revision number and short hash (which is tip of its only head for ‘default’ branch and the tip of the repository¹)
   7401:b56ce7b07c52
$ hg commit -m 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746'
$ hg graft -r ..7348                           # replay changeset(s) onto tip of current working directory head (which is named branch ‘default’) from (not including) its DAG common ancestor up to and including ‘my-branch’ bookmark
$ hg commit -m 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746'
$ hg histedit -r 7402                          # interactively edit the history of the appended changeset(s), possibly deleting the first and last and appending 'rebasing (my-branch):1a5c9ca2bfd5de7d5fd79fb89701ea538af65746' to the comment for the remaining changesets
$ hg bookmark (my-branch):1a5c9ca2bfd5->b56ce7b07c52 # create a bookmark (branch) named to indicate the sources of its fast-forward composition (but note we've polluted the named branch ‘default’)

Thus feature and experimentation branches can be public allowing collaboration employing merging instead of rebase; and thus without incidence of merge and rebase interleaved hell. Via the aforementioned hash(es) in the comment(s), the full lineage is not hidden from the pull requests of the rebased and squashed copy.

Successive test merges of (the tip of) the target branch during collaboration can become noisy; and if the rebase to the target branch is done more than once, conflict resolution work is repeated. This can be solved with a rebased and squashed renamed copy of the branch instead of a test merge— i.e. recursively applying the aforementioned merge and rebase solution (where the parenthesized (my-branch) in example above would contain the recursively nested history hash->hash in the branch name).
¹ tip of the repository(from comment in Mercurial example above)

Subrepositories

Subrepositories, submodules, subtrees, and subtree merging add complexity that breaks. The simpler approach which is more compatible with decentralized VCS (i.e. DVCS) is to include the files for each subrepository in each super-repository. Changes to the subrepository can be merged with a rebase and squash so that history is clean. Record the hash of the remote changeset in the comment for the changeset in the super-repository, to record which remote changeset matches the current copy in the super-repository. To make this changeset reference easier to find, add a file to the super-repository’s copy of each subrepository which contains this hash. Update the hash before rebasing and squashing. Due to the decentralization of DVCS, this hash may not be found at some canonical repository for stable versions of the subrepository; and some maintainer of a canonical super-repository may refuse to merge such.

So that super-repositories that include copies of super-repositories can reuse the mutual copy of each subrepository, place all repositories at the same level of the directory hierarchy so relative paths remain consistent. Super-repositories that require different versions of mutual subrepositories will need to be placed at different mutual relative paths.

Git or Mercurial?

Mercurial suffers a critical design error of not being able to prioritize lightweight branches as first-class citizens of the repository, because named branches are first-class— an infectious cascade that is antithetical to decentralization.

The unavoidable implications of requiring every changeset in the repository (even those in the default branch) to record an immutable branch name in its immutable hash are:

  • deletion of a branch name is impossible to globally assure (even if hg strip removes the implicated changesets on some repositories) due to infecting the changesets on every decentralized copy.
  • branch names have a global namespace; thus can conflict with decentralized reuse.
  • there is a performance cost to proliferating branch names.

Thus since creating superfluous named branches is undesirable, Mercurial must allow more than one ‘head’ per named branch (and anonymous heads not pointed to by any ‘tag’ nor ‘bookmark’) when pushing or pulling changesets which are not fast-forwards. A head is a changeset without any descendants.

Ongoing attempts to improve Mercurial’s ‘bookmarks’ to be as functionally first-class as Git’s lightweight branches will very likely be complicated by the need to deal with interactions of corner cases between two models of branching that Mercurial conflates. An example of this conflation is when pushing a bookmark to a named branch which will create another anonymous head on the remote repository (i.e. the changesets are not a fast-forward) requires hg push --force because multiple (thus anonymous) heads undefines which head the named branch points to and forces everyone to pull both heads.

In Git, ‘detached’ heads are subject to deletion by garbage collection because by definition they are not pointed to by any lightweight branch name. The only branch heads that assuredly exist are those associated with lightweight branch names. A Git branch can point to the same head as another branch, but never point to multiple heads. Avoiding multiple heads by creating new branch names is not undesirable, because Git’s lightweight branch names are not recorded in changesets, so there is no adverse cost to proliferating them. Note a Mercurial named branch can’t point to the same head as another named branch which limits decentralized compositional degrees-of-freedom.

Each Git repository’s set of local branch names can have its own namespace because lightweight branch names are not immutably recorded in changesets. Thus each repository can refer to another repository’s set of local branch names by prepending what ever name the referring repository chooses for the other repository— i.e. repository names are chosen relative to the referrer. Whereas, Mercurial’s bookmarks (as currently implemented) invert the responsibility for unique naming from the referrer to the referent; and thus require each repository to select a name which will be unique from all other possible names that any other repositories might use. The suggestion by some Mercurial proponents of forcing all remote instances of a matching branch name to be prepended with an unwavering (i.e. absolute instead of relative) repository name, is a centralized solution because the paradigm requires every repository’s name is globally unique.

Git’s slogan emphasizes decentralization, “--everything-is-local”. Since the definition of mercurial is inappropriate, the name Mercurial must be intended to imply curation— which means (a centralized notion of) taking charge (or control) of organizing a collection (in one chosen form for presentation).

The dual of Mercurial’s inability to assuredly delete a named branch from every decentralized copy, is Git’s inability to assure the existence of a recoverable decentralized copy of an accidentally deleted lightweight branch. Whereas, Mercurial discourages use of hg strip; Git encourages deletion of abandoned branches. Whereas, hg push --force can cause a new anonymous head to be created in the remote repository (and inexplicably for all branches); git push --force can possibly cause a head to become detached on the remote repository risking losing changesets. Commanding --force on Git is more dangerous than on Mercurial.

It is not clear that Mercurial’s named branches add any necessary functionality. Git’s lightweight branches don't record where they originated, but that might not be very useful information. Mercurial’s named branches is the only feature not present in Git which prevents roundtrip editing in Mercurial of repositories stored in Git format via Hg-Git. Perhaps Mercurial’s named branches could be beneficial for the “collaboration employing merging” aspect of the aforementioned aforementioned merge and rebase solution, but it is difficult to articulate real use cases. Rather it intuitively appears that named branches are anathema to decentralization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment