Skip to content

Instantly share code, notes, and snippets.

@leereilly

leereilly/lee.md Secret

Created October 19, 2020 16:41
Show Gist options
  • Save leereilly/c08ba724a13d9c27a210458ce540f11d to your computer and use it in GitHub Desktop.
Save leereilly/c08ba724a13d9c27a210458ce540f11d to your computer and use it in GitHub Desktop.
lee.md

The open source Git project just released Git 2.29 with features and bug fixes from over ?? contributors, ?? of them new. Last time we caught up with you, Git 2.28 had just been released. One version later, let's take a look at the most interesting features and changes that have happened since then.

Experimental SHA-256 support

Git 2.29 includes experimental support for writing your repository's objects using a SHA-256 hash of their contents, instead of using SHA-1.

What does all of that mean? To explain, let's start from the beginning.

When you add files to a repository, Git copies their contents into blob objects in its local database, and creates tree objects that refer to the blobs. Likewise, when you run git commit, this creates a commit object that refers to the tree representing the committed state. How do these objects "refer" to each other, and how can you identify them when interacting with Git? The answer is that each object is given a unique name, called its object id, based on a hash of its contents. Git uses SHA-1 as its hash algorithm of choice, and depends on the object ids of different objects to be unique.

Back in this blog post, we estimated that even if you had five million programmers writing one commit every second, you would only have a 50% chance of accidentally generating a collision before the Sun engulfs the Earth. Some published attacks exist which use tricks that exploit weaknesses in SHA-1 that can reduce the effort required to generate a collision, but these attacks still cost tens of thousands of dollars to execute, and no known examples have been published which target Git.

Like we stated back in that earlier blog post, Git (and providers that use it, like GitHub) checks each object it hashes to see if there is evidence that that object is part of a colliding pair. This prevents GitHub from accepting both the benign and malicious halves of the pair, since the mathematical tricks required to generate a collision in any reasonable amount of time can be detected and rejected by Git.

Even so, any weaknesses in a cryptographic hash are a bad sign. Even though Git has implemented detections that prevent the known attacks from being carried out, there's no guarantee that new attacks won't be found and used in the future. So the Git project has been preparing a transition plan to begin using a new object format with no known attacks: SHA-256.

In Git 2.29, you can try out a SHA-256 enabled repository for yourself:

$ git --version
git version 2.29.0
$ git init --object-format=sha256 repo
Initialized empty Git repository in /home/ttaylorr/repo/.git/
$ cd repo

$ echo 'Hello, SHA-256!' >README.md
$ git add README.md
$ git commit -m "README.md: initial commit"
[master (root-commit) 6e92961] README.md: initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md

$ git rev-parse HEAD
6e929619da9d82c78dd854dfe237c61cbad9e95148c1849b1f96ada5ee800810

As of version 2.29, Git can operate in either a full SHA-1 or full SHA-256 mode. It is currently not possible for repositories using different object formats to interoperate with one another, but eventual support is planned. It is also important to note that there are no major providers (including GitHub) which support hosting SHA-256-enabled repositories at the time of writing.

In future releases, Git will support interoperating between repositories with different object formats by computing both a SHA-1 and SHA-256 hash of each object it writes, and storing a translation table between them. This will eventually allow repositories that store their objects using SHA-256 to interact with (sufficiently up-to-date) SHA-1 clients, and vice-versa. It will also allow converted SHA-256 repositories to have their references to older SHA-1 commits still function as normal (e.g., if I write a commit whose message references an earlier commit by its SHA-1 name, then Git will still be able to follow that reference even after the repository is converted to use SHA-256 by consulting the translation table).

For more about SHA-256 in Git, and what some of the future releases might look like, you can read Git's transition plan.

[source, source, source, source, and so, much, more]

Negative refspecs

When you run git fetch origin, all of the branches from the remote origin repository are fetched into your local refs/remotes/origin/ hierarchy. How does Git know which branches to fetch, and where to put them?

The answer is that your configuration file contains one or more "refspecs" for each remote (remember that a "ref" is Git's word for any named point in history: branches, tags, etc). When you run git clone, it sets up a default refspec to be used when you fetch from your origin repository:

$ git config remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*

This refspec tells Git to fetch what's on the left side of the colon (everything in refs/heads/; i.e., all branches) and to write them into the hierarchy on the right-hand side. The * means "match everything" on the left-hand side and "replace with the matched part" on the right-hand side.

You can have multiple refspecs, and they can refer to individual refs. For example, this command instructs Git to additionally fetch any git notes from the remote (the --add is important so that we don't overwrite the default refspec that fetches branches):

$ git config --add remote.origin.fetch refs/notes/commits:refs/notes/origin-notes

Refspecs are used by git push, as well. Even if you type only git push origin mybranch, that last mybranch is really a shorthand for refs/heads/mybranch:refs/heads/mybranch. This allows you to express more complicated scenarios. Say you're tagging and want to push all of the tags you have, but you're not quite ready to share the tips of all of your branches. Here, you could write something like:

$ git push origin 'refs/tags/*:refs/tags/*'

Prior to Git 2.29, refspecs could only be used to say which selection of reference(s) you want. So, if you wanted to fetch all branches except one, you'd have to list them out as arguments one by one. Of course, that assumes that you know the names of all the other references beforehand, so in practice this would look something like:

$ git ls-remote origin 'refs/heads/*' |
  grep -v ref-to-exclude |
  awk '{ print $2:$2 }' |
  xargs git fetch origin

to get all refs in refs/heads/* except for refs/heads/ref-to-exclude. Yeesh; there must be a better way.

In Git 2.29, there is: negative refspecs. Now, if a refspec begins with ^ it indicates which references are to be excluded. So, instead of the above, you could write instead something like:

$ git fetch origin 'refs/heads/*:refs/heads/*' ^refs/heads/ref-to-exclude

and achieve the same result. When a negative refspec is present, the server considers a reference worth sending if it matches at least one positive refspec and does not match any negative refspecs. Negative refspecs behave exactly as you expect, with a couple of caveats:

  • Negative refspecs can contain wildcard patterns, but cannot specify the destination. These wouldn't mean anything, anyway (you wouldn't want to say ^refs/heads/foo/*:refs/heads/bar/* which means, literally, "map heads from foo to bar, but don't send me any foo refs to begin with"). To exclude a wildcard refspec, you'd just write ^refs/heads/foo/*.

  • While positive refspecs can refer to a single object by its object id, negative refspecs cannot.

And of course those negative refspecs work equally well in configuration values. If you always want to fetch every branch except foo, you can just add it to your config:

$ git config --add remote.origin.fetch ^refs/heads/foo

[source]

New git shortlog tricks

While you have almost certainly used (or heard of) git log, the same might not be necessarily true of git shortlog. For those who haven't, git shortlog acts a lot like git log, except instead of displaying commits in a sequence, it groups them by the author.

In fact, the Git release notes end with a shortlog of all of the patches in the release, broken out by their author, generated by git shortlog [source]. At the time of writing, they look something like this:

Aaron Lipman (12):
      t6030: modernize "git bisect run" tests
      rev-list: allow bisect and first-parent flags
      cmd_bisect__helper: defer parsing no-checkout flag
      [...]

Adrian Moennich (1):
      ci: fix inconsistent indentation

Alban Gruin (1):
      t6300: fix issues related to %(contents:size)

[...]

In older versions of Git, git shortlog could only group by commit author (the default behavior), and optionally by the committer identity (with git shortlog -c). This restricts who gets the credit for a commit by who that commit's author/committer is. So, if your project uses the 'Co-authored-by' trailer (like this commit in git/git does), then your co-authors are out of luck: there is no way to tell git shortlog to group commits by co-authors.

...That is, until Git 2.29! In this release, git shortlog learned a new --group argument, to specify how commits are grouped and assigned credit. It takes --group=author (the default behavior from before) and --group=committer (equivalent to git shortlog -c), but it also accepts a --group=trailer:<field> argument.

Passing the latter allows us to group commits by their co-authors, and it also allows for more creative uses. If your project is using the Reviewed-by trailer, you can use git shortlog to see who is reviewing the most patches:

$ git shortlog -ns --group=trailer:reviewed-by v2.28.0.. | head -n5
    40	Eric Sunshine
    10	Taylor Blau
     4	brian m. carlson
     2	Elijah Newren
     1	Jeff King

git shortlog also allows multiple --group=<type> arguments, in which case commits are counted once per each grouping. So, if you want to see who is contributing the most whether that individual is the primary author, or is listed as a co-author, then you can write:

$ git shortlog -ns --group=author --group=trailer:co-authored-by

...putting authors and co-authors on equal footing. Instead of counting, you can also use the --format option to find other fun ways to show the data. For example:

$ git shortlog --format="...helped %an on %as" --group=trailer:helped-by v2.28.0..v2.29.0
Chris Torek (3):
      ...helped René Scharfe on 2020-08-12
      ...helped René Scharfe on 2020-08-12
      ...helped René Scharfe on 2020-08-12

David Aguilar (1):
      ...helped Lin Sun on 2020-05-07

Denton Liu (1):
      ...helped Shourya Shukla on 2020-08-21

Derrick Stolee (2):
      ...helped Taylor Blau on 2020-08-25
      ...helped Taylor Blau on 2020-09-17

[...]

[source]

Tidbits

git for-each-ref learned a few new tricks in Git 2.29. Since there are a good handful of them, let's start there:

  • git for-each-ref usually outputs the name, type, and object id of each ref, but you can find out a lot more with its --format option. Git 2.29 learned some new fields, including contents:size, subject:sanitize, and more consistent :short modifiers to get abbreviated object ids. These last few were contributed by Hariom Verma, a Google Summer of Code student working on the Git project.

    [source, source]

  • git for-each-ref can now take multiple --merged and --no-merged arguments, printing references if they are reachable from at least one --merged argument, and aren't reachable from any --no-merged ones.

    [source]

Now with all of the git for-each-ref updates out of the way, let's move on to all of the rest of the tidbits:

  • When your git merge results in a merge conflict, you're greeted by a message that looks something like the following (this example is courtesy of Elijah Newren):

    CONFLICT (rename/delete): foo.c deleted in b01dface... Removed
    unnecessary stuff and renamed in HEAD.  Version HEAD of foo.c left
    in tree.
    

    Can you tell what this message means? In versions of Git prior to 2.29, it was ambiguous: did Git remove stuff and rename files in HEAD, or is "Removed unnecessary stuff" the name of a commit message? It turns out that it's the latter, but you had no way of knowing that!

    In 2.29, Git will now enclose the subject of a commit message in parenthesis, making much clearer what part of the conflict message came from a commit, and what part was generated by Git.

    [source]

  • Here's an easy one! Git supports a configuration option called merge.renormalize. In case you're not familiar on the entirety of Git's nearly 5,000 line man git-config, here's a refresher: merge.renormalize causes Git to check-out and check-in each stage of a three-way merge. This can be useful if the line-endings change between two branches you're working on.

    This configuration used to not get picked up by git checkout -m (and a few related invocations), but now it is!

    [source]

  • In our highlights from Git 2.26, we talked about protocol v2, which became the default in that release. Between 2.26 and 2.27, a bug in this new protocol was found and fixed (see 4fa3f00abb for the juicy details). For safety, Git 2.27 went back to protocol "v0" to ease back the transition, and marked the feature experimental in Git 2.28.

    Now that we've had a few chances to iron out any linger bugs after Git 2.28, protocol v2 is re-enabled as the default protocol in Git 2.29.

    [source]

  • git bisect is an incredibly handy tool to use when trying to determine the source of a bug. You mark a "good" and "bad" endpoint, and then Git leads you through a binary search between the endpoints to find the commit that introduced the problem.

    In Git 2.29, git bisect learned a new --first-parent option to modify which commits are traversed between those endpoints. One way to understand this option is to imagine what happens when bisecting through a merge. Before, your bisection would include commits on the branch being merged in, in addition to the merge commit itself.

    Passing --first-parent tells Git to avoid considering commits that are only on the branch being merged in as a potential source of the bug. If your workflow is such that only the merges on the main branch are interesting stopping points (because you primarily work by merging pull requests, and each individual pull request may have work-in-progress commits that might not even build), then --first-parent lets you skip all that.

    [source]

  • Git optionally includes a remote backend for pushing and pulling from a MediaWiki instance. It is generally unsupported and not compiled by default, but a recently discovered vulnerability can lead to arbitrary command execution, which has been patched in this release.

    Note that you are only affected by this vulnerability if you use the mediawiki backend against an untrusted MediaWiki instance (chances are if you have to ask yourself whether or not you are affected, you probably aren't).

    [source]

  • Git uses the low-level git index-pack command to receive a git push or process a git fetch. In Git 2.29, git index-pack learned to work more efficiently on multi-core machines, which means all of your pushes and fetches should get faster just by upgrading.

    [source, source]

  • When you create a merge commit, the default message reads something like "Merge $upstream into $dest". Historically, merges into the main branch would omit the "into $dest" part.

    Git 2.29 learned the merge.suppressDest configuration. Any branch in this multi-valued config variable will cause git merge to omit "into $dest" part.

    [source]

The kaboodle

That's just a sample of changes from the latest release. For more, check out the release notes for 2.29, or any previous version in the Git repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment