Skip to content

Instantly share code, notes, and snippets.

@masak
Last active September 2, 2024 12:13
Show Gist options
  • Save masak/2415865 to your computer and use it in GitHub Desktop.
Save masak/2415865 to your computer and use it in GitHub Desktop.
How is git commit sha1 formed

Ok, I geeked out, and this is probably more information than you need. But it completely answers the question. Sorry. ☺

Locally, I'm at this commit:

$ git show
commit d6cd1e2bd19e03a81132a23b2025920577f84e37
Author: jnthn <jnthn@jnthn.net>
Date:   Sun Apr 15 16:35:03 2012 +0200

    When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

So that's the sha1 I want to reproduce. d6cd1e2bd19e03a81132a23b2025920577f84e37

When I started my investigations, I thought it was something like these things that went into a commit:

$ git --no-replace-objects cat-file commit HEAD
tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <jnthn@jnthn.net> 1334500503 +0200
committer jnthn <jnthn@jnthn.net> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

That is

  • The source tree of the commit (which unravels to all the subtrees and blobs)
  • The parent commit sha1
  • The author info
  • The committer info (right, those are different!)
  • The commit message

But it turns out there is also a NUL-terminated header that gets appended to this, containing the word "commit", and the length in bytes of all of the above information:

$ printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c)
commit 327

(No, you can't see the NUL byte.)

Put this header and the rest of the information together:

$ (printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD)
commit 327tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <jnthn@jnthn.net> 1334500503 +0200
committer jnthn <jnthn@jnthn.net> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

...and what you get hashes to the right sha1!

$ (printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
d6cd1e2bd19e03a81132a23b2025920577f84e37  -
@masak
Copy link
Author

masak commented Apr 14, 2023

@lemanschik Nice. I like it.

Although I should add, I have successfully merged together unrelated repositories where a magic root commit did not sit at the top. The thing that happened to me was that Git gave a weak warning like "warning: these histories seem unrelated", and then went on and merged the repositories anyway, into disjoint history graphs. YMMV.

@lemanschik
Copy link

@masak i went with my own meta versioning system i simple store the additional git compatible information as additional meta i go for content unification and then SHA-512 also doing the same for Large assets as i can version blocks. I do not store initial files i store content Blobs of a fixed size 20MB Blocks as this looks like a magic number at present for performance. And as this does not depend on one block per file it reduces space needs a lot.

I do it a bit like dockers overlay file implementations which do also content hashing on block level but less deterministic and predictable. But when you use Docker + BTRFS you come near to my feature set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment