The open source Git project just released Git 2.29 with features and bug fixes from over ?? contributors, ?? of them new. Last time we caught up with you, Git 2.28 had just been released. One version later, let's take a look at the most interesting features and changes that have happened since then.
Git 2.29 includes experimental support for writing your repository's objects using a SHA-256 hash of their contents, instead of using SHA-1.
What does all of that mean? To explain, let's start from the beginning.
When you add files to a repository, Git copies their contents into blob
objects in its local database, and creates tree
objects that refer to the
blobs. Likewise, when you run git commit
, this creates a commit
object that
refers to the tree representing the committed state. How do these objects
"refer" to each other, and how can you identify them when interacting with Git?
The answer is that each object is given a unique name, called its object id,
based on a hash of its contents. Git uses SHA-1 as its hash algorithm of choice,
and depends on the object ids of different objects to be unique.
Back in this blog post, we estimated that even if you had five million programmers writing one commit every second, you would only have a 50% chance of accidentally generating a collision before the Sun engulfs the Earth. Some published attacks exist which use tricks that exploit weaknesses in SHA-1 that can reduce the effort required to generate a collision, but these attacks still cost tens of thousands of dollars to execute, and no known examples have been published which target Git.
Like we stated back in that earlier blog post, Git (and providers that use it, like GitHub) checks each object it hashes to see if there is evidence that that object is part of a colliding pair. This prevents GitHub from accepting both the benign and malicious halves of the pair, since the mathematical tricks required to generate a collision in any reasonable amount of time can be detected and rejected by Git.
Even so, any weaknesses in a cryptographic hash are a bad sign. Even though Git has implemented detections that prevent the known attacks from being carried out, there's no guarantee that new attacks won't be found and used in the future. So the Git project has been preparing a transition plan to begin using a new object format with no known attacks: SHA-256.
In Git 2.29, you can try out a SHA-256 enabled repository for yourself:
$ git --version
git version 2.29.0
$ git init --object-format=sha256 repo
Initialized empty Git repository in /home/ttaylorr/repo/.git/
$ cd repo
$ echo 'Hello, SHA-256!' >README.md
$ git add README.md
$ git commit -m "README.md: initial commit"
[master (root-commit) 6e92961] README.md: initial commit
1 file changed, 1 insertion(+)
create mode 100644 README.md
$ git rev-parse HEAD
6e929619da9d82c78dd854dfe237c61cbad9e95148c1849b1f96ada5ee800810
As of version 2.29, Git can operate in either a full SHA-1 or full SHA-256 mode. It is currently not possible for repositories using different object formats to interoperate with one another, but eventual support is planned. It is also important to note that there are no major providers (including GitHub) which support hosting SHA-256-enabled repositories at the time of writing.
In future releases, Git will support interoperating between repositories with different object formats by computing both a SHA-1 and SHA-256 hash of each object it writes, and storing a translation table between them. This will eventually allow repositories that store their objects using SHA-256 to interact with (sufficiently up-to-date) SHA-1 clients, and vice-versa. It will also allow converted SHA-256 repositories to have their references to older SHA-1 commits still function as normal (e.g., if I write a commit whose message references an earlier commit by its SHA-1 name, then Git will still be able to follow that reference even after the repository is converted to use SHA-256 by consulting the translation table).
For more about SHA-256 in Git, and what some of the future releases might look like, you can read Git's transition plan.
[source, source, source, source, and so, much, more]
When you run git fetch origin
, all of the branches from the remote
origin
repository are fetched into your local refs/remotes/origin/
hierarchy. How does Git know which branches to fetch, and where to put
them?
The answer is that your configuration file contains one or more
"refspecs" for each remote (remember that a "ref" is Git's word for
any named point in history: branches, tags, etc). When you run git clone
, it sets up a default refspec to be used when you fetch from your
origin repository:
$ git config remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*
This refspec tells Git to fetch what's on the left side of the colon
(everything in refs/heads/
; i.e., all branches) and to write them into
the hierarchy on the right-hand side. The *
means "match everything"
on the left-hand side and "replace with the matched part" on the
right-hand side.
You can have multiple refspecs, and they can refer to individual refs.
For example, this command instructs Git to additionally fetch any
git notes from the remote (the
--add
is important so that we don't overwrite the default refspec that
fetches branches):
$ git config --add remote.origin.fetch refs/notes/commits:refs/notes/origin-notes
Refspecs are used by git push
, as well. Even if you type only git push origin mybranch
, that last mybranch
is really a shorthand for
refs/heads/mybranch:refs/heads/mybranch
. This allows you to express
more complicated scenarios. Say you're tagging and want to push all of
the tags you have, but you're not quite ready to share the tips of all
of your branches. Here, you could write something like:
$ git push origin 'refs/tags/*:refs/tags/*'
Prior to Git 2.29, refspecs could only be used to say which selection of reference(s) you want. So, if you wanted to fetch all branches except one, you'd have to list them out as arguments one by one. Of course, that assumes that you know the names of all the other references beforehand, so in practice this would look something like:
$ git ls-remote origin 'refs/heads/*' |
grep -v ref-to-exclude |
awk '{ print $2:$2 }' |
xargs git fetch origin
to get all refs in refs/heads/*
except for refs/heads/ref-to-exclude
. Yeesh;
there must be a better way.
In Git 2.29, there is: negative refspecs. Now, if a refspec begins with ^
it
indicates which references are to be excluded. So, instead of the above, you
could write instead something like:
$ git fetch origin 'refs/heads/*:refs/heads/*' ^refs/heads/ref-to-exclude
and achieve the same result. When a negative refspec is present, the server considers a reference worth sending if it matches at least one positive refspec and does not match any negative refspecs. Negative refspecs behave exactly as you expect, with a couple of caveats:
-
Negative refspecs can contain wildcard patterns, but cannot specify the destination. These wouldn't mean anything, anyway (you wouldn't want to say
^refs/heads/foo/*:refs/heads/bar/*
which means, literally, "map heads fromfoo
tobar
, but don't send me anyfoo
refs to begin with"). To exclude a wildcard refspec, you'd just write^refs/heads/foo/*
. -
While positive refspecs can refer to a single object by its object id, negative refspecs cannot.
And of course those negative refspecs work equally well in configuration
values. If you always want to fetch every branch except foo
, you can
just add it to your config:
$ git config --add remote.origin.fetch ^refs/heads/foo
[source]
While you have almost certainly used (or heard of) git log
, the same might not
be necessarily true of git shortlog
. For those who haven't, git shortlog
acts a lot like git log
, except instead of displaying commits in a sequence,
it groups them by the author.
In fact, the Git release notes end with a shortlog of all of the patches in the
release, broken out by their author, generated by git shortlog
[source]. At the
time of writing, they look something like this:
Aaron Lipman (12):
t6030: modernize "git bisect run" tests
rev-list: allow bisect and first-parent flags
cmd_bisect__helper: defer parsing no-checkout flag
[...]
Adrian Moennich (1):
ci: fix inconsistent indentation
Alban Gruin (1):
t6300: fix issues related to %(contents:size)
[...]
In older versions of Git, git shortlog
could only group by commit author (the
default behavior), and optionally by the committer identity (with git shortlog -c
). This restricts who gets the credit for a commit by who that commit's
author/committer is. So, if your project uses the 'Co-authored-by'
trailer (like
this
commit in git/git
does), then your co-authors are out of luck: there is no way
to tell git shortlog
to group commits by co-authors.
...That is, until Git 2.29! In this release, git shortlog
learned a new
--group
argument, to specify how commits are grouped and assigned credit. It
takes --group=author
(the default behavior from before) and
--group=committer
(equivalent to git shortlog -c
), but it also accepts
a --group=trailer:<field>
argument.
Passing the latter allows us to group commits by their co-authors, and it also
allows for more creative uses. If your project is using the Reviewed-by
trailer, you can use git shortlog
to see who is reviewing the most patches:
$ git shortlog -ns --group=trailer:reviewed-by v2.28.0.. | head -n5
40 Eric Sunshine
10 Taylor Blau
4 brian m. carlson
2 Elijah Newren
1 Jeff King
git shortlog
also allows multiple --group=<type>
arguments, in which case
commits are counted once per each grouping. So, if you want to see who is
contributing the most whether that individual is the primary author, or is
listed as a co-author, then you can write:
$ git shortlog -ns --group=author --group=trailer:co-authored-by
...putting authors and co-authors on equal footing. Instead of counting, you can
also use the --format
option to find other fun ways to show the data. For
example:
$ git shortlog --format="...helped %an on %as" --group=trailer:helped-by v2.28.0..v2.29.0
Chris Torek (3):
...helped René Scharfe on 2020-08-12
...helped René Scharfe on 2020-08-12
...helped René Scharfe on 2020-08-12
David Aguilar (1):
...helped Lin Sun on 2020-05-07
Denton Liu (1):
...helped Shourya Shukla on 2020-08-21
Derrick Stolee (2):
...helped Taylor Blau on 2020-08-25
...helped Taylor Blau on 2020-09-17
[...]
[source]
git for-each-ref
learned a few new tricks in Git 2.29. Since there are a good
handful of them, let's start there:
-
git for-each-ref
usually outputs the name, type, and object id of each ref, but you can find out a lot more with its--format
option. Git 2.29 learned some new fields, includingcontents:size
,subject:sanitize
, and more consistent:short
modifiers to get abbreviated object ids. These last few were contributed by Hariom Verma, a Google Summer of Code student working on the Git project. -
git for-each-ref
can now take multiple--merged
and--no-merged
arguments, printing references if they are reachable from at least one--merged
argument, and aren't reachable from any--no-merged
ones.[source]
Now with all of the git for-each-ref
updates out of the way, let's move on to
all of the rest of the tidbits:
-
When your
git merge
results in a merge conflict, you're greeted by a message that looks something like the following (this example is courtesy of Elijah Newren):CONFLICT (rename/delete): foo.c deleted in b01dface... Removed unnecessary stuff and renamed in HEAD. Version HEAD of foo.c left in tree.
Can you tell what this message means? In versions of Git prior to 2.29, it was ambiguous: did Git remove stuff and rename files in
HEAD
, or is "Removed unnecessary stuff" the name of a commit message? It turns out that it's the latter, but you had no way of knowing that!In 2.29, Git will now enclose the subject of a commit message in parenthesis, making much clearer what part of the conflict message came from a commit, and what part was generated by Git.
[source]
-
Here's an easy one! Git supports a configuration option called
merge.renormalize
. In case you're not familiar on the entirety of Git's nearly 5,000 lineman git-config
, here's a refresher:merge.renormalize
causes Git to check-out and check-in each stage of a three-way merge. This can be useful if the line-endings change between two branches you're working on.This configuration used to not get picked up by
git checkout -m
(and a few related invocations), but now it is![source]
-
In our highlights from Git 2.26, we talked about protocol v2, which became the default in that release. Between 2.26 and 2.27, a bug in this new protocol was found and fixed (see
4fa3f00abb
for the juicy details). For safety, Git 2.27 went back to protocol "v0" to ease back the transition, and marked the feature experimental in Git 2.28.Now that we've had a few chances to iron out any linger bugs after Git 2.28, protocol v2 is re-enabled as the default protocol in Git 2.29.
[source]
-
git bisect
is an incredibly handy tool to use when trying to determine the source of a bug. You mark a "good" and "bad" endpoint, and then Git leads you through a binary search between the endpoints to find the commit that introduced the problem.In Git 2.29,
git bisect
learned a new--first-parent
option to modify which commits are traversed between those endpoints. One way to understand this option is to imagine what happens when bisecting through a merge. Before, your bisection would include commits on the branch being merged in, in addition to the merge commit itself.Passing
--first-parent
tells Git to avoid considering commits that are only on the branch being merged in as a potential source of the bug. If your workflow is such that only the merges on the main branch are interesting stopping points (because you primarily work by merging pull requests, and each individual pull request may have work-in-progress commits that might not even build), then--first-parent
lets you skip all that.[source]
-
Git optionally includes a remote backend for pushing and pulling from a MediaWiki instance. It is generally unsupported and not compiled by default, but a recently discovered vulnerability can lead to arbitrary command execution, which has been patched in this release.
Note that you are only affected by this vulnerability if you use the mediawiki backend against an untrusted MediaWiki instance (chances are if you have to ask yourself whether or not you are affected, you probably aren't).
[source]
-
Git uses the low-level
git index-pack
command to receive agit push
or process agit fetch
. In Git 2.29,git index-pack
learned to work more efficiently on multi-core machines, which means all of your pushes and fetches should get faster just by upgrading. -
When you create a merge commit, the default message reads something like "Merge
$upstream
into$dest
". Historically, merges into the main branch would omit the "into$dest
" part.Git 2.29 learned the
merge.suppressDest
configuration. Any branch in this multi-valued config variable will causegit merge
to omit "into$dest
" part.[source]
That's just a sample of changes from the latest release. For more, check out the release notes for 2.29, or any previous version in the Git repository.