Skip to content

Instantly share code, notes, and snippets.

@treyharris
Last active February 11, 2021 14:22
Show Gist options
  • Save treyharris/f98b708c9f5b753d60a2 to your computer and use it in GitHub Desktop.
Save treyharris/f98b708c9f5b753d60a2 to your computer and use it in GitHub Desktop.
Why ident attribute doesn't work in Git like CSV/SVN

Why the ident attribute doesn't work in Git like in CVS or SVN

This is a wrinkle that people often don't get about the ident attribute: in CVS or Subversion, it would be legitimate to have a VERSION file containing nothing but $Id$ as a way of tracking that the right push had happened; if you did that with git, you'd get nothing useful. If you run the script below (or just read it and the output that follows), you can see why.

(Please direct comments on this to https://plus.google.com/+TreyHarris/posts/KRgNG3zDk9x.)

How do you do it?

Setting it up in Git is relatively simple. You create (or edit) the file .gitattributes and add a line telling Git to replace $Id$ when found in files that match certain filenames:

*.txt ident

This tells Git to check all files with a .txt extension. Two things to note:

  1. For some reason (performance, most likely), you can't just wildcard * ident there to get all files to be scanned. You have to specify the files for which ident substitution will happen; all others are ignored.
  2. Unlike CVS or SVN, Git does not modify the files with $Id$ tags at commit time. It will only do the substitition when writing out the file in the course of a checkout. In the example, you'll see how I worked around this: After creating and checking in the files, I just deleted them all and used git reset --hard to force git to recreate them, this time with the $Id$ substitution.

What's wrong with it?

Here's an example script, which you can download from here. It uses a little utility library I wrote (to do output like below, letting you describe the commands in the script as it's running) that can be downloaded here. You don't need to download it to try the script, though, as this version just pulls it down from a gist.

Note that this is a Z-shell script and probably won't work with bash or any other shell.

The code

#!/bin/zsh

# Make any non-zero return fatal just in case
setopt err_exit

source =(curl https://gist.githubusercontent.com/treyharris/c486c9f8776802e270b7/raw/2d0519826f3b8ca8446f59211ecd0dc738221d3d/showshell.zsh 2> /dev/null)

local dir=/tmp/ident-test
local files=5

show "Setting up ${dir}" "
  rm -rf ${dir}
  mkdir ${dir}
  cd ${dir}
"

show "Setting up git" "
  git init

  echo '*.txt ident' > .gitattributes
"

for file ({1..${files}}.txt) {
  echo '$Id$' > ${file}
  git add "${file}"
  git commit -m "Adding ${file}"
}

show "Show we do have ${files} commits for each file:" "
  git log --stat  --oneline --decorate
"

show "Show the current contents" "
  cat *.txt
"

show "Git doesn't replace the \$Id\$ token until it must write the file" "
  rm *
  git reset --hard
  cat *.txt
"

The output

And the results are:

==============================================================================================================================================================================================================
Setting up /tmp/ident-test
==============================================================================================================================================================================================================
> rm -rf /tmp/ident-test
> mkdir /tmp/ident-test
> cd /tmp/ident-test
==============================================================================================================================================================================================================
Setting up git
==============================================================================================================================================================================================================
> git init
Initialized empty Git repository in /private/tmp/ident-test/.git/
> git email-work
trey@apcera.com
> echo '*.txt ident'
[master (root-commit) 47e610b] Adding 1.txt
 1 file changed, 1 insertion(+)
 create mode 100644 1.txt
[master 5ffefaf] Adding 2.txt
 1 file changed, 1 insertion(+)
 create mode 100644 2.txt
[master 3262753] Adding 3.txt
 1 file changed, 1 insertion(+)
 create mode 100644 3.txt
[master a5f485b] Adding 4.txt
 1 file changed, 1 insertion(+)
 create mode 100644 4.txt
[master 27b67fa] Adding 5.txt
 1 file changed, 1 insertion(+)
 create mode 100644 5.txt
==============================================================================================================================================================================================================
Show we do have 5 commits for each file:
==============================================================================================================================================================================================================
> git log --stat --oneline --decorate
27b67fa (HEAD, master) Adding 5.txt
 5.txt | 1 +
 1 file changed, 1 insertion(+)
a5f485b Adding 4.txt
 4.txt | 1 +
 1 file changed, 1 insertion(+)
3262753 Adding 3.txt
 3.txt | 1 +
 1 file changed, 1 insertion(+)
5ffefaf Adding 2.txt
 2.txt | 1 +
 1 file changed, 1 insertion(+)
47e610b Adding 1.txt
 1.txt | 1 +
 1 file changed, 1 insertion(+)
==============================================================================================================================================================================================================
Show the current contents
==============================================================================================================================================================================================================
> cat 1.txt 2.txt 3.txt 4.txt 5.txt
$Id$
$Id$
$Id$
$Id$
$Id$
==============================================================================================================================================================================================================
Git doesn't replace the $Id$ token until it must write the file
==============================================================================================================================================================================================================
> rm 1.txt 2.txt 3.txt 4.txt 5.txt
> git reset --hard
HEAD is now at 27b67fa Adding 5.txt
> cat 1.txt 2.txt 3.txt 4.txt 5.txt
$Id: 055c8729cdcc372500a08db659c045e16c4409fb $
$Id: 055c8729cdcc372500a08db659c045e16c4409fb $
$Id: 055c8729cdcc372500a08db659c045e16c4409fb $
$Id: 055c8729cdcc372500a08db659c045e16c4409fb $
$Id: 055c8729cdcc372500a08db659c045e16c4409fb $

All the $Id$ tags become the same sha, 055c872, none of which corresponds to any commit. That sha isn't even particularly useful:

$ git show 055c872
$Id$

What good is it, then?

The only thing this sha is useful for is to have a readily-visible signature of a single file's contents. In dealing with batch pushes that may partially succeed, this could be useful, but it shouldn't be thought to be able to stand in for a version number.

Still, while it's definitely not the same as CVS/SVN $Id$, and its scope is much more limited, that doesn't mean it's totally useless.

At a few sites I'm aware of, Git $Id$ is used for non-code things where a “build” consisted basically of pulling files from the git repo; for example, a repo for versioning and storing network gear configurations. When “build and install” entirely consists of downloading the version in Git, the $Id$ expansion was a useful thing. (Especially on network gear where it isn’t possible to re-download a configuration from the box in a form that can be diff'ed against the original.)

For the networking gear I worked with at a prior job, the device was "configured" by issuing a set of commands, not with a configuration file. So the Git-controlled "configuration file" was actually a dumb batch script.

So, at the beginning of an update, one custom variable, call it $cf_pushing, was set to the expansion of $Id$, and another variable, $cf_version, was set to the value ${cf_version}-dirty; at the end of the update $cf_version was reset to the value of $cf_pushing and $cf_pushing was nulled. So you could tell not only what version had been pushed most recently, but whether the push had completed, and it was easy to set up monitoring to alarm if a push didn't go smoothly, just by polling that variable for the substring dirty. (Of course, in real life you'd probably want to alarm only after X consecutive polls showed dirty, so that a normal push in-progress wouldn't set off alarms.)

@ruoso
Copy link

ruoso commented May 22, 2015

The ident actually comes from RCS when there wasn't a concept of a working copy, so you would need the ident string in order to know where that file actually came from. It ended up being cargo culted onto CVS (because CVS was just a wrapper around RCS anyway), and later onto SVN (because SVN was a CVS replacement).

One aspect, however, makes the ident string even less useful in git, which is the fact that the file itself doesn't have a history, so if two independent branches get to the same contents before being merged, what would the semantics of that even be?

One more interesting thing, however, is that git archive supports substitutions when generating the output, which is probably going to be more useful if you're trying to create a release tarball...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment