- a magical afternoon with scott chacon *
Hey everybody. Today I'm going to be talking about Git. This is not going to be an introduction to Git, so if you have never used Git before I apologize beforehand. Today we are going to be looking at how Git works by exploring the 'reset' command. How many of you are familiar with the 'reset' command? OK, now how many of you feel really comfortable using it? That's what I thought. So in order to understand 'reset' we need to understand the three trees of Git, so I give you "A Tale of Three Trees".
First of all, a tiny bit about me. My name is Scott Chacon and I work for GitHub. I designed and maintain this site, git-scm.com, which is the official Git homepage. If you're just starting out with Git, this is hopefully a good resource for you. I also wrote gitref.org, which is a reference or cheat sheet of all the commonly used Git commands and the most used options for each command. Finally, I wrote this book, Pro Git, which was published by Apress who were kind enough to let me creative commons license it so you can read it all for free online. It is also largely translated into Spanish, so if you prefer to read it in Spanish you can do so for most of it online. So, there are some resources for you to help you learn Git - git-scm.com, gitref.org, progit.org. Finally, if you have any questions after the talk, you can reach me on Twitter at @chacon. That's enough about me.
So, first off - how Git works. Git is primarily a tool for managing trees. Now when I say 'tree', I don't specifically mean the data structure, I mean a snapshot of content made up of files and subdirectories. What your project looks like at a given point in time - that is a 'tree' for the purposes of this talk. For example, if you look at a filesystem, that is a tree. A collection of files and subdirectories.
Here is an example. If I run the 'tree' command in Linux, it will give me a nice little graph of the files and subdirectory structure of my current directory. Git also stores it's data internally this way. Every time you do a commit, it records a snapshot of what your project looks like at that point. The commit object you create points to one tree that is a checksummed snapshot of the root of your project right then so that it can be recreated on someone elses machine at a later time. As we move forward and continue creating commits, we create a series of these snapshots. So that's the primary purpose of Git - to create a series of snapshots of the content of your project. When you run 'git commit', you are creating a snapshot of your project's tree.
So, what are these three trees that Git deals with? The first tree is called the HEAD. Who has typed 'HEAD' on the command line before when using Git? So what is the HEAD? HEAD is simply a pointer to the current branch you're on. When you commit, only the branch that HEAD points at is updated to point to the newly created commit. This means that HEAD indirectly points to the last commit you've done.
Let's take a look at this. If we look at the HEAD file directly, which is found in your dot-git directory, we can see that it's a simple text file that points to a branch. If we open up that branch file we can see that it points to a commit. If we look at that commit with the Git plumbing command 'cat-file', we can see that it has a pointer to a tree object and if we look at that tree object, we get the checksummed snapshot of our last commit. So, we can think of HEAD as representing our last commit. Again, HEAD is our last commit.
The second tree Git works with is called the Index. Technically, this is not actually kept in a tree data structure, it is simply a flattened manifest of all the files in the project, but essentially it stores the same data - a checksummed snapshot of your project. The Index is often referred to as the 'staging area'. We can inspect the Index with the plumbing command 'ls-files dash s' which shows us the same basic project. If we run 'git commit', this is the tree that will be stored in the next commit. The Index is the next proposed commit.
The third and final tree Git is concerned with is the Working Directory. This is simply the actual files on your filesystem. All three of these trees should be roughly the same. When you initially clone or checkout your project, Git will make them all the same. The HEAD is kept in dot-git slash HEAD, the index in the dot-git slash index file and the working directory is just your normal files on disk. They all should represent the same tree initially.
So those are Git's three trees - the HEAD, the index and the working directory. When you checkout a branch, Git moves the HEAD there, populates the index with the same tree and then checks out the same tree into your working directory. When you modify files with your editor, your work tree changes. When you run 'git add' to stage modifications, Git updates your index to match your work tree. When you run 'git commit', Git updates HEAD to match your index by creating a new commit that matches your index and moving the branch to point to it.
So, the roles the three trees play is: HEAD is your last commit and the parent of your next commit, Index is your proposed next commit tree and your working directory is your sandbox.
When you run 'git status', what you are asking Git to do is compare these three trees. If you see a file path in green, that means that the contents of that file are different between the last commit and the proposed next commit - the HEAD and the index. If you see file paths in red, that means the contents of that file differ between the next commit, your index, and what is in your working directory.
Let's look at an example. Let's say we have a directory with one file in it. When we run 'git init', Git will create a couple of things - a repository and a branch that points to nothing. When we run 'git add' on our file, it makes an index that now looks like our working directory. When we commit, Git creates a new commit object that points at a tree with that one file in it and points your branch to it.
Now all our trees are the same and 'git status' will show you no output. If we now edit the file again and we run 'git status' what will we see? That file will now be red, because the index and the working directory are different.
If we now run 'git add' on the file it will update our index. If we run 'git status' now what will we see? We will see that file in green, because our index and our HEAD are different. If we commit now Git will create a new commit pointing to that new tree and move our branch up. If we run 'git status' now, what will you see? You will see no output because all the trees are the same.
So this is how Git works. Now, let's take a look at the 'git reset' command. There are 2 different forms of the reset command, one with a file path limiter and another without. First we'll look at the form with a file path. If you simply give git reset
a path it acts as the opposite of the git add
command. That is, where git add
updates your index with the content in the working directory, git reset
will update the index entry for that file with the content from your last commit. This effectively 'unstages' it by setting the content of what that file would be in your next commit to be what it was in your last commit - unchanged. You can run git add file
and git reset file
to stage and unstage your file changes.
You can also use reset
to reset to an older version of the file. Let's say we have three commits. Since reset with a path updates the index, you can pull content into the index from an earlier commit than the last one by just putting some older commit SHA in the reset command. The two dashes mean "no more options, file paths are coming now". You can see the index revert - this means that if you commit now, you will commit a revert of that file without actually changing content in your working directory. It also means that if you run a status, you'll see the file both in red and green - you have changes in your index that haven't been committed but they also don't match your working directory.
The second and more interesting form of reset is to just give it a commit - this could be a SHA or a branch or anything that resolves to a commit. When you call reset like this, Git will do three things. The option you give it determines where in this process Git stops.
The first thing Git will do is to move the branch that you're currently on to the target commit. This is important: it moves your branch. If you specify --soft
, it will stop here - it will only move your branch, it will not touch anything else.
The second thing it will do is copy the tree that HEAD is now pointing at to your index, essentially resetting your staging area. This is the default, but you can be specific by specifying --mixed
.
Finally, if you provide the --hard
option, reset will continue to then copy your index into your working directory. Note that this is the only way to make reset unsafe. If you do not specify --hard
, reset will under no circumstances modify any file in your working directory. This is the only way you can lose work. This is the main reason that people are afraid of reset and ignore the other options because they don't know what else might be dangerous, but nothing else is.
So, let's look at each option.
First we have --soft
, which simply stops at the first step of any reset
run, right after it moves the branch HEAD points at to another commit. So, if we have three commits and we run "git reset --soft HEAD tilde" that essentially un-does our last git commit
. The tilde means "the parent of my last commit". It moves our branch back to where it was before we committed, but you can see it leaves our index and working directory how they are right now. This is exactly what your three trees should have looked like right before you ran your last git commit
command. So, if you accidentally run git commit
and realize that you didn't mean to, this is how you undo it.
Next we'll look at --mixed
, which can be left off because it's the default. It does exactly the same thing, but it resets your staging area after it moves the branch. Again, it doesn't touch anything in your working directory - it simply un-does your last git commit
command AND all the git add
commands you had done. If you run git status
you'll see this file in red.
Finally, let's look at --hard
. It does both of the last two things, but then further copies the index into your working directory - completely removing any changes there. Now you've undone your last commit, all the git adds AND all the actual file modifications.
So, why the fuck would I ever want to do this? Well, there are a couple of reasons.
First, unstaging changes. git reset
without specifying a commit will reset your index. If you give it a file path it will only reset that entry in your index. If you have staged a bunch of stuff and you want to reset your staging area, use git reset
.
You can also undo your last commit, as I mentioned. "git reset HEAD tilde" will move your branch back and reset your index, undoing all of your last commit work.
If you want to undo your last commit but keep your staging work, you can do that with "dash dash soft" - it will move your branch back but not touch your index.
Next, we can use reset to squash commits. If we soft reset our branch back to an older point and then commit again, it will rewrite our history to look like we just had one commit. So, first we reset soft back to the first commit we want to keep, then we commit again. You can see that we have abandoned these two commits and now our history looks like we are smarter than we really are. I use this all of the time now - being comfortable with this command makes you much less scared of doing commits constantly because you can always go back and squash them up easily. Instead of waiting until your code works perfectly to commit, commit all the time with 'work in progress' messages and then before you push, squash them all up.
So, that is reset
. You are now all reset experts - here is your certificate of achievement.
There are three trees that Git is concerned with. The HEAD is the last commit and the parent of your next commit. The index is your proposed next commit. The working directory is your sandbox. If you learn these roles and how to manipulate them effectively with tools like reset, you will feel much more comfortable using Git.
Thank you!