Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save matthewmccullough/2695758 to your computer and use it in GitHub Desktop.
Save matthewmccullough/2695758 to your computer and use it in GitHub Desktop.
Git, Compression, and Deltas - An explanation

Git Compression of Blobs and Packfiles.

Many users of Git are curious about the lack of delta compression at the object (blob) level when commits are first written. This efficiency is saved until the pack file is written. Loose objects are written in compressed, but non-delta format at the time of each commit.

A simple run though of a commit sequence with only the smallest change to the image (in uncompressed TIFF format to amplify the observable behavior) aids the understanding of this deferred and different approach efficiency.

The command sequence:

Create the repo:

$ git init test6
Initialized empty Git repository in /Users/mccm06/Documents/Temp/Scratch/test6/.git/
[master (root-commit) 05e9c3e] First-commit
 0 files changed
 create mode 100644 README

$ du -c
72	./.git/hooks
8	./.git/info
8	./.git/logs/refs/heads
8	./.git/logs/refs
16	./.git/logs
8	./.git/objects/05
8	./.git/objects/54
8	./.git/objects/e6
0	./.git/objects/info
0	./.git/objects/pack
24	./.git/objects
8	./.git/refs/heads
0	./.git/refs/tags
8	./.git/refs
0	./.git/rr-cache
168	./.git
168	.
168	total

There's only a total of 168kb for the entire repo and working directory.

Now copy in the white image:

$ cp ../completely-white.tiff .

And show how large that image is (5294kb) in an uncompressed TIFF format:

$ ls -al
total 10344
drwxr-xr-x   5     170 .
drwxrwxr-x   6     204 ..
drwxr-xr-x  15     510 .git
-rw-r--r--   1       0 README
-rw-r--r--   1 5294996 completely-white.tiff

And show the size of the entire repo and working copy:

$ du -c
72	./.git/hooks
8	./.git/info
8	./.git/logs/refs/heads
8	./.git/logs/refs
16	./.git/logs
8	./.git/objects/05
8	./.git/objects/54
8	./.git/objects/e6
0	./.git/objects/info
0	./.git/objects/pack
24	./.git/objects
8	./.git/refs/heads
0	./.git/refs/tags
8	./.git/refs
0	./.git/rr-cache
168	./.git
10512	.
10512	total

Now add that file to the staging area and commit it.

$ git add .
$ git commit -m"White image added"
[master f93724b] White image added
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 completely-white.tiff

$ du -c
72	./.git/hooks
8	./.git/info
8	./.git/logs/refs/heads
8	./.git/logs/refs
16	./.git/logs
8	./.git/objects/05
8	./.git/objects/54
8	./.git/objects/87
56	./.git/objects/90
8	./.git/objects/e6
8	./.git/objects/f9
0	./.git/objects/info
0	./.git/objects/pack
96	./.git/objects
8	./.git/refs/heads
0	./.git/refs/tags
8	./.git/refs
0	./.git/rr-cache
240	./.git
10584	.
10584	total

The file is compressed when saved to its blob in the objects/90 directory and thus is only 56kb in size.

Make minor edits to the image:

$ open completely-white.tiff -a Pixelmator 

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   completely-white.tiff
#
no changes added to commit (use "git add" and/or "git commit -a")

$ git add .
$ git commit -m"Dot added to white image"
[master 3015c4a] Dot added to white image
 1 file changed, 0 insertions(+), 0 deletions(-)

Now, after the second minor alteration, we can see there is an equally sized 56kb directory called 31 that contains the blob for the modified image.

$ du -c
72	./.git/hooks
8	./.git/info
8	./.git/logs/refs/heads
8	./.git/logs/refs
16	./.git/logs
8	./.git/objects/05
8	./.git/objects/30
56	./.git/objects/31
8	./.git/objects/47
8	./.git/objects/54
8	./.git/objects/87
56	./.git/objects/90
8	./.git/objects/e6
8	./.git/objects/f9
0	./.git/objects/info
0	./.git/objects/pack
168	./.git/objects
8	./.git/refs/heads
0	./.git/refs/tags
8	./.git/refs
0	./.git/rr-cache
312	./.git
10656	.
10656	total

Lastly, we will compress the history into a single packfile instead of loose objects:

$ git gc --aggressive
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (9/9), done.
Total 9 (delta 1), reused 0 (delta 0)
$ du -c
72	./.git/hooks
16	./.git/info
8	./.git/logs/refs/heads
8	./.git/logs/refs
16	./.git/logs
8	./.git/objects/info
32	./.git/objects/pack
40	./.git/objects
0	./.git/refs/heads
0	./.git/refs/tags
0	./.git/refs
0	./.git/rr-cache
192	./.git
10536	.
10536	total

And you can see that the combined packfile is only 40kb instead of 168kb when stored separately in two unique blobs in separate object directories.

References

@jakesylvestre
Copy link

Cool!

@nelsonauner
Copy link

👍 great demo

Copy link

ghost commented Jun 1, 2016

amazing

@Zedive
Copy link

Zedive commented Jul 19, 2016

nice work

@holyhan
Copy link

holyhan commented Jun 9, 2017

I finally understand!Thanks for this gist!

@bhagasbhujang
Copy link

nice explanation..!

@huttarl
Copy link

huttarl commented Jun 19, 2018

Thanks for this explanation and demo -- it's helpful.

I think some details here are incorrect, or at least confusing, due to not specifying what block size du is reporting. On some systems at least, du uses a block size of 512 bytes by default. That's why, when you put the 5MB TIFF file in the working directory, the du count went over 10,000.
So if du is using a 512-byte block size, the first total is 84Kb, not 168Kb, for the repo and working directory; and the final combined packfile is only 20Kb, not 40Kb.
The point of your demo still stands, but since you're basing the demo on actual disk usage numbers, I figure you want to report them accurately... especially in relation to the TIFF size, which appears to be reported in bytes by ls -l.

You can use du -k to make du display block counts in kilobyte blocks.

@Shadowisdark
Copy link

This is lovely for understanding

@matthewmccullough
Copy link
Author

The point of your demo still stands, but since you're basing the demo on actual disk usage numbers, I figure you want to report them accurately

@huttarl Thank you for this feedback and improvement to my communication precision.

@alexdowad
Copy link

Great info. Thanks so much. 👍

Copy link

ghost commented May 9, 2021

You can use du -k to make du display block counts in kilobyte blocks.

@huttarl Thanks for your addition. du -h prints the sizes in human readable format.
I tried that du -k option and found that du -h works best for me.

Thanks for this info! :)

@paulpascal
Copy link

Great !

@specious
Copy link

I would also check du with the -b flag to see how many bytes the files are in terms of content (disregarding how much "space" they actually use on the device due to how they are stored).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment