Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jeremy-w/f1bf9d41f92a3bb6ba9c396cd2f9f87b to your computer and use it in GitHub Desktop.
Save jeremy-w/f1bf9d41f92a3bb6ba9c396cd2f9f87b to your computer and use it in GitHub Desktop.
Git in a Nutshell - from Reuven Lerner's *Better Developers* newsletter, 2018-03-19 edition
This week, let's take a break from Python and talk a little bit
about Git ( http://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiNjE2ODIxOCIsImRlb=
Gl2ZXJ5X2lkIjoiMjMxMTgyMzY3OSIsInVybCI6Imh0dHA6Ly9naXQtc2NtLmNvbS8_X19zPXhz=
d2dxc3FiNHFuMml6d3c0aXloIn0 ). I teach Git courses every few
months, and without fail people come into the class saying that
they have been using Git for a few months, and it seems to work
OK so long as they use the list of commands that their boss
provided. But if something happens that isn't on that list, and
if they cannot figure out what to do based on Stack Overflow,
they're sunk.
The goal of my Git course is not only to help them use the
different Git commands, but also to give them insights into
what's happening inside of Git, so that when things go wrong --
or appear to go wrong -- they can fix the problem, rather than
removing their local copy of the repository and cloning again.
Which is what a huge number of people do.
I want to point out that Git is one of the best tools I've ever
used, and has made me a better developer. And yet, I should also
point out that the user interface is exactly what you would
expect from a bunch of kernel hackers whose primary language is
C. The naming of Git features is terrible and inconsistent, the
number of options you can invoke is nearly infinite, and many of
the terms and commands were seemingly chosen because they clashed
with completely different commands used by other version-control
systems.
The thing is, once you understand how Git works, it suddenly
starts to make sense. And that's because Git doesn't do very
much at all: It's a specialized database, containing a very small
number of objects. And part of the genius of Git, in my opinion,
is that you can have a robust and fully operational
version-control system by implementing just a few ideas.
Indeed, you can think of Git as a database that contains just
four types of objects:
* blobs (i.e., file contents)
* trees (i.e., directories)
* tags
* commits
When you say "git commit", you're creating a new commit object.
That object points to a tree, and that tree then points to
additional trees and blobs. Assuming that your commit is not the
first in a repository, then it also points back to its parent.
Let's create and go through a Git repository to see what I'm
talking about. On the command line, I'll create a new directory
and repository:
$ mkdir gitfun
$ cd gitfun
$ git init
Git responds by saying:
Initialized empty Git repository in
/Users/reuven/Desktop/gitfun/.git/
Great! We now have a new repository!
Um, but what does that mean? It means that Git has configured a
few things, including the special ".git" directory, under which
things are stored. What is stored there? Well, right now,
there's not much to see. Looking at ".git/objects", which is
where Git stores things, we'll see two subdirectories, but no
actual objects.
$ ls .git/objects
info/ pack/
So, let's now create a new file in Git:
$ cat >> test1.txt
This is a test.
And a very good test it is!
$ git add test1.txt
$ git commit -m 'Added test1.txt'
[master (root-commit) 5816544] Added test1.txt
1 file changed, 2 insertions(+)
create mode 100644 test1.txt
In the above shell commands, I created a simple text file. Then
I staged it by using the "add" command -- what, you think that
there should be a "stage" command? But that would deprive
consultants of business opportunities! -- and then committed it
using "git commit".
The moment I did that, Git created a number of different objects.
Each object is represented in Git with a SHA-1 value. SHA-1 is a
hash function that doesn't guarantee that every file will have a
unique hash value, but it's close enough for all practical
purposes. If you had a way to deliberately create a file with a
given SHA-1, then Git would probably break -- but that's not
realistic, so far as I know, so we should be OK.
Git reported above that it created a new commit, and even gave us
the first few digits of its SHA-1, 5816544. We can see this more
clearly, and with a longer name, if we use "git log":
$ git log
commit 58165443eca522ef35bad68964fc09ec000449ef
Author: Reuven Lerner <reuven@lerner.co.il>
Date: Mon Jan 9 00:11:41 2017 +0200
Added test1.txt
We can thus see that our most recent commit has a SHA-1 that
starts with 5816544, and continues until we get a 40-character
SHA-1. But we can use the first four hex digits, so longer as
they're unique in our repository.
Where did Git store this object? Inside of .git/objects. But
because our repository might contain lots of objects, we aren't
going to store everything straight inside of .git/objects.
Rather, Git takes the first two characters of the SHA-1, and
uses that as the name of a subdirectory in which to store
objects. For example:
$ ls .git/objects
37/ 58/ 79/ info/ pack/
Our commit object is inside of the "58" directory:
$ ls .git/objects/58
165443eca522ef35bad68964fc09ec000449ef
So as you can see, knowing the SHA-1 of an object allows Git to
find it right away in our filesystem. That's one of the reasons
why Git is so fast; the file's contents tell Git where a file is.
And when the file changes? Then Git will create a new object,
with a new SHA-1, reflecting the hash value of the new contents.
And thus, Git stores separate copies of each version of each
file that you might have written.
You might have noticed that Git created two other directories
above, "37" and "79". Why are those there?
Well, because Git didn't just create a commit object. It also
created a tree object that sits between the commit and one or
more trees and blobs. We can use the low-level Git command
cat-file, along with its "p" option, to inspect these files:
$ git cat-file -p 58165443eca522ef35bad68964fc09ec000449ef
tree 37675fc023b0863cd8a702041de28282caa17c1d
author Reuven Lerner <reuven@lerner.co.il> 1483913501 +0200
committer Reuven Lerner <reuven@lerner.co.il> 1483913501
+0200
Added test1.txt
In other words, what are the contents of our commit object? it
contains a tree object (SHA-1 37675f), as well as information
about the author and committer (who are generally one and the
same), and then a comment. So the comment is actually part of
the commit object, which means that if you modify the comment on
a commit, you get a totally new commit object with new SHA-1.
Where is this tree object stored? Well, it has a SHA-1. And
look, its SHA-1 starts with 37! What if we look in that
directory? Can you guess what will be there? (I know, it's
obvious when I say it...)
$ ls .git/objects/37
675fc023b0863cd8a702041de28282caa17c1d
And if we get the contents of our tree object, what do we find?
$ git cat-file -p 37675fc023b0863cd8a702041de28282caa17c1d
100644 blob 797f7c1809e83fd6122cb4a247d345e7f5de4f5d
test1.txt
See? Our tree object points to a blob. And if we look at the
blob:
$ git cat-file -p 797f7c1809e83fd6122cb4a247d345e7f5de4f5d
This is a test.
And a very good test it is!
Now, what happens when I modify test1.txt, and then commit it?
The answer: None of the existing objects are affected. They
stay precisely the way they were before. But if we create a new
commit, then it is our main, default commit (known as the HEAD),
and is the basis for any new commits we make. But the existing
commits remain around... well, basically forever.
For example:
$ cat >> test1.txt
Still a great file, right?
$ git add test1.txt
$ git commit -m 'Added amazing brilliance to our text file'
[master b6c4ec9] Added amazing brilliance to our text file
1 file changed, 1 insertion(+)
Notice that the SHA-1 returned by Git is different from the
previous one. If we look at it:
$ git cat-file -p b6c4ec9
tree eeca41cd12f46cd4c237f28c78b7e11762a0b22b
parent 58165443eca522ef35bad68964fc09ec000449ef
author Reuven Lerner <reuven@lerner.co.il> 1483914372 +0200
committer Reuven Lerner <reuven@lerner.co.il> 1483914372
+0200
Added amazing brilliance to our text file
Notice that our commit, since it isn't the first one in the
system (the "root" commit), has a "parent" field, pointing back
to the commit from which it came. But we still have a tree -- a
different tree object -- and the other standard stuff. Following
the tree along to the new file, we see:
$ git cat-file -p 909f2de7c8a572d91f06b188790416a2c195f0ed
This is a test.
And a very good test it is!
Still a great file, right?
But what if I'm nostalgic for the old version of the file? Is it
gone? Definitely not; Git holds onto it forever. I can even
look it
$ git cat-file -p 797f7c1809e83fd6122cb4a247d345e7f5de4f5d
This is a test.
And a very good test it is!
Now, cat-file isn't the sort of thing you use every day with Git.
But it does let you see that Git manages to do a lot with just a
few objects.
Next time, I'll talk about branches in Git, and how they're far
simpler than you might think. (Unless you already think that
they're simple!) And of course, if you have questions (about Git
or anything else!) that you would like me to address, please
respond to this message. I've been overwhelmed with suggestions
and ideas, so it'll take a while to get to all of them, but I
promise that I will.
Until next week,
Reuven
Sign up for newsletter at: https://lerner.co.il/newsletter/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment