Instantly share code, notes, and snippets.

Embed
What would you like to do?
How is git commit sha1 formed

Ok, I geeked out, and this is probably more information than you need. But it completely answers the question. Sorry. ☺

Locally, I'm at this commit:

$ git show
commit d6cd1e2bd19e03a81132a23b2025920577f84e37
Author: jnthn <jnthn@jnthn.net>
Date:   Sun Apr 15 16:35:03 2012 +0200

    When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

So that's the sha1 I want to reproduce. d6cd1e2bd19e03a81132a23b2025920577f84e37

When I started my investigations, I thought it was something like these things that went into a commit:

$ git cat-file commit HEAD
tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <jnthn@jnthn.net> 1334500503 +0200
committer jnthn <jnthn@jnthn.net> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

That is

  • The source tree of the commit (which unravels to all the subtrees and blobs)
  • The parent commit sha1
  • The author info
  • The committer info (right, those are different!)
  • The commit message

But it turns out there is also a NUL-terminated header that gets appended to this, containing the word "commit", and the length in bytes of all of the above information:

$ printf "commit %s\0" $(git cat-file commit HEAD | wc -c)
commit 327

(No, you can't see the NUL byte.)

Put this header and the rest of the information together:

$ (printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD)
commit 327tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <jnthn@jnthn.net> 1334500503 +0200
committer jnthn <jnthn@jnthn.net> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

...and what you get hashes to the right sha1!

$ (printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
d6cd1e2bd19e03a81132a23b2025920577f84e37  -
@aredridel

This comment has been minimized.

Show comment
Hide comment
@aredridel

aredridel Apr 18, 2012

Excellent writeup!

(and excellent title)

aredridel commented Apr 18, 2012

Excellent writeup!

(and excellent title)

@dmerrick

This comment has been minimized.

Show comment
Hide comment
@dmerrick

dmerrick Mar 25, 2014

Thanks for this, very informative.

dmerrick commented Mar 25, 2014

Thanks for this, very informative.

@goldfeld

This comment has been minimized.

Show comment
Hide comment
@goldfeld

goldfeld Apr 24, 2014

This is really cool! But then how about also guessing the tree hash? I'm trying to apply a patch completely out of context, sort of like a patch-rebase, and I need to fabricate what would be a valid hash for the given commit info I have in hand (while changing the original patch timestamp to something that is more recent than the new HEAD commit I'm patching over), so the patch goes through.

goldfeld commented Apr 24, 2014

This is really cool! But then how about also guessing the tree hash? I'm trying to apply a patch completely out of context, sort of like a patch-rebase, and I need to fabricate what would be a valid hash for the given commit info I have in hand (while changing the original patch timestamp to something that is more recent than the new HEAD commit I'm patching over), so the patch goes through.

@icyflame

This comment has been minimized.

Show comment
Hide comment
@icyflame

icyflame Aug 6, 2014

Informative.
Nerd Climate (out of 10) : tending to 10!

icyflame commented Aug 6, 2014

Informative.
Nerd Climate (out of 10) : tending to 10!

@pchaigno

This comment has been minimized.

Show comment
Hide comment
@pchaigno

pchaigno Nov 4, 2014

Thanks for this, it has been very useful!

As @goldfeld, I'm trying to form the tree hash.
Any idea on how this one is formed?

pchaigno commented Nov 4, 2014

Thanks for this, it has been very useful!

As @goldfeld, I'm trying to form the tree hash.
Any idea on how this one is formed?

@Perlover

This comment has been minimized.

Show comment
Hide comment
@Perlover

Perlover Jan 29, 2015

Did anybody think that the git branches & commits - it looks like the Bitcoin blockchain without "Work of Proof"? :)

Perlover commented Jan 29, 2015

Did anybody think that the git branches & commits - it looks like the Bitcoin blockchain without "Work of Proof"? :)

@mohammadg

This comment has been minimized.

Show comment
Hide comment
@mohammadg

mohammadg Apr 3, 2015

Thanks for this!

mohammadg commented Apr 3, 2015

Thanks for this!

@colinschoen

This comment has been minimized.

Show comment
Hide comment
@colinschoen

colinschoen May 3, 2015

Very interesting. Thank you.

colinschoen commented May 3, 2015

Very interesting. Thank you.

@domgetter

This comment has been minimized.

Show comment
Hide comment
@domgetter

domgetter May 7, 2015

For those wondering, creating the tree hash is a little more involved. Git will lie to you (a little bit) when you ask for the contents of a tree object.

git cat-file -p HEAD^{tree}

will produce something like

100644 blob f73693a16cdf594532ee4c423a46d32ce3430c4e    blah.txt
040000 tree 86c2509f4c12c5d3bf9a486925ed051666ee2d97    new_dir
100644 blob b5fd817de972cdb092b7dfbeeb1bedb4f05eb218    new_file.txt
100644 blob 0861b9114fba8c82892d89e53f2a34447bd4c9e7    newer_file.txt

But this is not how a tree object is saved before it is compressed. For one, there are no newlines in the uncompressed tree object, but I'm going to add them for output here.

tree 196\0
100644 blah.txt\0f73693a16cdf594532ee4c423a46d32ce3430c4e
40000 new_dir\086c2509f4c12c5d3bf9a486925ed051666ee2d97
100644 new_file.txt\0b5fd817de972cdb092b7dfbeeb1bedb4f05eb218
100644 newer_file.txt\00861b9114fba8c82892d89e53f2a34447bd4c9e7

Okay, this looks a little better, but there's still one more "lie" (and if you count the characters and compare to the 196 I added in the tree header, you can see what it is). Unlike commit objects, tree object don't store sha1 hashed in plaintext. They are packed down to just 20 bytes. Each two-character pair is converted to a single hex value, which is more like this:

tree 196\0
100644 blah.txt\0\xf7\x36\x93\xa1\x6c\xdf\x59\x45\x32\xf70\xf71\xf72\xf73\xf74\xf75\xf76\xf77\xf78\xf79\x360
40000 new_dir\0\x86\xc2\x50\x9f\x4c\x12\xc5\xd3\xbf\x860\x861\x862\x863\x864\x865\x866\x867\x868\x869\xc20
100644 new_file.txt\0\xb5\xfd\x81\x7d\xe9\x72\xcd\xb0\x92\xb50\xb51\xb52\xb53\xb54\xb55\xb56\xb57\xb58\xb59\xfd0
100644 newer_file.txt\0\x08\x61\xb9\x11\x4f\xba\x8c\x82\x89\x080\x081\x082\x083\x084\x085\x086\x087\x088\x089\x610

So that is what you should be taking the sha1 hash of to create a tree object in git's object store.

Hope that helps!

In Ruby, you would open a file like this:

require 'zlib'
#  This will open that new_dir tree object above.
#  Be sure to open with "rb" since it's a binary file, and then run .read to grab the whole thing
file = File.open("c2509f4c12c5d3bf9a486925ed051666ee2d97", "rb").read
content = Zlib::Inflate.inflate(file)
=> "tree 44\x00100644 sub_dir_file.txt\x00=\xFD\xC5\x9BF\xD2\xAA7*vz\xA1$\xDFq\xB5\xDDs\x10A"

And if you unpack those last 20 bytes to something prettier:

hash = content.chars.last(20).map {|c| c.unpack("C")[0].to_s(16).rjust(2,"0")}.join
=> "3dfdc59b46d2aa372a767aa124df71b5dd731041"
content[0...-20] + hash
=> "tree 44\x00100644 sub_dir_file.txt\x003dfdc59b46d2aa372a767aa124df71b5dd731041"

MUCH better.

Here's the StackOverflow answer where I learned this: http://stackoverflow.com/questions/14790681/format-of-git-tree-object

Note that he adds in spaces and newlines for output as well.

domgetter commented May 7, 2015

For those wondering, creating the tree hash is a little more involved. Git will lie to you (a little bit) when you ask for the contents of a tree object.

git cat-file -p HEAD^{tree}

will produce something like

100644 blob f73693a16cdf594532ee4c423a46d32ce3430c4e    blah.txt
040000 tree 86c2509f4c12c5d3bf9a486925ed051666ee2d97    new_dir
100644 blob b5fd817de972cdb092b7dfbeeb1bedb4f05eb218    new_file.txt
100644 blob 0861b9114fba8c82892d89e53f2a34447bd4c9e7    newer_file.txt

But this is not how a tree object is saved before it is compressed. For one, there are no newlines in the uncompressed tree object, but I'm going to add them for output here.

tree 196\0
100644 blah.txt\0f73693a16cdf594532ee4c423a46d32ce3430c4e
40000 new_dir\086c2509f4c12c5d3bf9a486925ed051666ee2d97
100644 new_file.txt\0b5fd817de972cdb092b7dfbeeb1bedb4f05eb218
100644 newer_file.txt\00861b9114fba8c82892d89e53f2a34447bd4c9e7

Okay, this looks a little better, but there's still one more "lie" (and if you count the characters and compare to the 196 I added in the tree header, you can see what it is). Unlike commit objects, tree object don't store sha1 hashed in plaintext. They are packed down to just 20 bytes. Each two-character pair is converted to a single hex value, which is more like this:

tree 196\0
100644 blah.txt\0\xf7\x36\x93\xa1\x6c\xdf\x59\x45\x32\xf70\xf71\xf72\xf73\xf74\xf75\xf76\xf77\xf78\xf79\x360
40000 new_dir\0\x86\xc2\x50\x9f\x4c\x12\xc5\xd3\xbf\x860\x861\x862\x863\x864\x865\x866\x867\x868\x869\xc20
100644 new_file.txt\0\xb5\xfd\x81\x7d\xe9\x72\xcd\xb0\x92\xb50\xb51\xb52\xb53\xb54\xb55\xb56\xb57\xb58\xb59\xfd0
100644 newer_file.txt\0\x08\x61\xb9\x11\x4f\xba\x8c\x82\x89\x080\x081\x082\x083\x084\x085\x086\x087\x088\x089\x610

So that is what you should be taking the sha1 hash of to create a tree object in git's object store.

Hope that helps!

In Ruby, you would open a file like this:

require 'zlib'
#  This will open that new_dir tree object above.
#  Be sure to open with "rb" since it's a binary file, and then run .read to grab the whole thing
file = File.open("c2509f4c12c5d3bf9a486925ed051666ee2d97", "rb").read
content = Zlib::Inflate.inflate(file)
=> "tree 44\x00100644 sub_dir_file.txt\x00=\xFD\xC5\x9BF\xD2\xAA7*vz\xA1$\xDFq\xB5\xDDs\x10A"

And if you unpack those last 20 bytes to something prettier:

hash = content.chars.last(20).map {|c| c.unpack("C")[0].to_s(16).rjust(2,"0")}.join
=> "3dfdc59b46d2aa372a767aa124df71b5dd731041"
content[0...-20] + hash
=> "tree 44\x00100644 sub_dir_file.txt\x003dfdc59b46d2aa372a767aa124df71b5dd731041"

MUCH better.

Here's the StackOverflow answer where I learned this: http://stackoverflow.com/questions/14790681/format-of-git-tree-object

Note that he adds in spaces and newlines for output as well.

@hmeng-19

This comment has been minimized.

Show comment
Hide comment
@hmeng-19

hmeng-19 Jun 25, 2015

That is cool. Thanks.

hmeng-19 commented Jun 25, 2015

That is cool. Thanks.

@ytrezq

This comment has been minimized.

Show comment
Hide comment
@ytrezq

ytrezq Oct 23, 2015

@masak what a about the sha1 binary form that is used internally, is the hex form simply base64 encoded?

ytrezq commented Oct 23, 2015

@masak what a about the sha1 binary form that is used internally, is the hex form simply base64 encoded?

@tmarsteel

This comment has been minimized.

Show comment
Hide comment
@tmarsteel

tmarsteel Nov 10, 2015

Thanks :)

@ytrezq: it is base16 encoded: just a hex representation of the binary hash.

tmarsteel commented Nov 10, 2015

Thanks :)

@ytrezq: it is base16 encoded: just a hex representation of the binary hash.

@danger89

This comment has been minimized.

Show comment
Hide comment
@danger89

danger89 Jun 9, 2016

Thanks clear :)

danger89 commented Jun 9, 2016

Thanks clear :)

@ratzlaff

This comment has been minimized.

Show comment
Hide comment
@ratzlaff

ratzlaff Jun 28, 2016

Just used this information today. Thanks!

ratzlaff commented Jun 28, 2016

Just used this information today. Thanks!

@yeasy

This comment has been minimized.

Show comment
Hide comment
@yeasy

yeasy Jul 27, 2016

@Perlover blockchain is mostly a dynamic chain, while git is a dag.
However, the content-based-addressing idea is quite similar with each other!

yeasy commented Jul 27, 2016

@Perlover blockchain is mostly a dynamic chain, while git is a dag.
However, the content-based-addressing idea is quite similar with each other!

@xtbl

This comment has been minimized.

Show comment
Hide comment
@xtbl

xtbl Sep 1, 2016

Thanks, awesome explanation.

xtbl commented Sep 1, 2016

Thanks, awesome explanation.

@adunkman

This comment has been minimized.

Show comment
Hide comment
@adunkman

adunkman Nov 10, 2016

Just came across this — thanks for the writeup! :D

adunkman commented Nov 10, 2016

Just came across this — thanks for the writeup! :D

@bittenApple

This comment has been minimized.

Show comment
Hide comment
@bittenApple

bittenApple Dec 19, 2016

Thanks, very clear.

bittenApple commented Dec 19, 2016

Thanks, very clear.

@firogh

This comment has been minimized.

Show comment
Hide comment
@firogh

firogh Feb 15, 2017

Cool and thanks.

firogh commented Feb 15, 2017

Cool and thanks.

@asterion

This comment has been minimized.

Show comment
Hide comment
@asterion

asterion commented Mar 16, 2017

👍

@dalzuga

This comment has been minimized.

Show comment
Hide comment
@dalzuga

dalzuga Mar 31, 2017

Very nice!

dalzuga commented Mar 31, 2017

Very nice!

@jguevara

This comment has been minimized.

Show comment
Hide comment
@jguevara

jguevara Jul 2, 2017

Thanks, that proves that commit hashes are generated in a predictable and reproducible way. This info is useful for users of tools like subgit, which imports SVN repos into git.

jguevara commented Jul 2, 2017

Thanks, that proves that commit hashes are generated in a predictable and reproducible way. This info is useful for users of tools like subgit, which imports SVN repos into git.

@Codeacious

This comment has been minimized.

Show comment
Hide comment
@Codeacious

Codeacious Dec 15, 2017

Thanks for this; it saved me a lot of effort!

Codeacious commented Dec 15, 2017

Thanks for this; it saved me a lot of effort!

@EXORCIST94

This comment has been minimized.

Show comment
Hide comment
@EXORCIST94

EXORCIST94 Jun 26, 2018

Subarashii!!

EXORCIST94 commented Jun 26, 2018

Subarashii!!

@BillLucky

This comment has been minimized.

Show comment
Hide comment
@BillLucky

BillLucky commented Jun 30, 2018

thanks

@authmane512

This comment has been minimized.

Show comment
Hide comment
@authmane512

authmane512 Jul 30, 2018

Thanks. It's awesome.

authmane512 commented Jul 30, 2018

Thanks. It's awesome.

@serkanh

This comment has been minimized.

Show comment
Hide comment
@serkanh

serkanh Jul 31, 2018

For those who are on mac and don't have sha1sum installed. (printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | openssl sha1

serkanh commented Jul 31, 2018

For those who are on mac and don't have sha1sum installed. (printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | openssl sha1

@WingTillDie

This comment has been minimized.

Show comment
Hide comment
@WingTillDie

WingTillDie Sep 5, 2018

Simple script that verify the idea in thiis gist

func(){
    diff -y <((printf "commit %s\0" $(git cat-file commit $1 | wc -c); git cat-file commit $1) | sha1sum |egrep -o '\w+') <(git show $1|sed -n 1p|cut -d' ' -f2)
}
func @
func @~

WingTillDie commented Sep 5, 2018

Simple script that verify the idea in thiis gist

func(){
    diff -y <((printf "commit %s\0" $(git cat-file commit $1 | wc -c); git cat-file commit $1) | sha1sum |egrep -o '\w+') <(git show $1|sed -n 1p|cut -d' ' -f2)
}
func @
func @~
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment