kazimuth/git.md

## git.md

      
    Raw
  

              git.md
            
          
    Let's talk about how git works. There are lots of tutorials explaining how to use git, and they're all great; this is just trying to give you an intuition as to why it works the way it does.
There are lots of 'version control systems', and they're all different. I'm using git here for two reasons: 1, it's ubiquitous right now; and 2, I understand it pretty well. No value judgement of any other tool.
(Everything here is true for a given value of true. It's true enough to use; if you need to know deeper specifics, check out the git tutorial site, which is wonderful.)
So, what is git?
Let's say you've worked in a group on some sort of computer-based project before. You have a folder full of (vital) files:
- Exciting-Project-2007/
    - howitworks.txt
    - images/
        - smile.jpg
        - flowchart.png
    - problemstatement.pdf
    - brainstorm.docx

And inevitably, everyone working on the project ends up with a slightly different version of the folder. Things are emailed, edits are made, etc.
- Exciting-Project-2007/
    - howitworks-james.txt
    - howitworks2.txt
    - howitworks2EDIT.txt
    - images/
        - smile.jpg
        - grin.jpg
        - flowchart.png
        - flowchart.psd
    - problemstatement.pdf
    - challenge statement 2.pdf
    - brainstorm.docx
    - brainstormJANEDONTEDIT.docx

and so on. Whenever anyone needs to make a change, they do it locally, then send everyone a copy of the file, or maybe a zip of the folder, and everyone has to figure out how to combine the new files with their work.
Git does this for you.
Git, at its core, is a little folder containing a bunch of zip files. Each zip file is a snapshot of the project folder at some specific point in time.
- git-storage-folder/
    - 2007-01-05-11am-jane.zip
    - 2007-01-04-7pm-jane.zip
    - 2007-01-03-5:30pm-james.zip
    - 2007-01-03-5pm-jane.zip
    - ...etc

Git gives you tools to create and share these snapshot with the other people working on your project.
Now, just that alone would be plenty helpful, but honestly you could do it yourself. Why is git better than a carefully-managed dropbox folder?
Well, git stores a little extra information along with each zip file.
zip     2007-01-03-5:30pm-james.zip
date    2007-01-03-5pm
author  james
note    'James added grin.jpg because we wanted things to be friendlier'
parent  2007-01-03-5pm-jane

When it was created, who created it, maybe some notes about why changes were made. That all seems pretty reasonable. The most important part, though, is that 'parent' line. 'Parent' gives the name of the snapshot that came immediately before this one. If you go to the 'parent', you can find out what its parent is - and so on, until you find the first version of the folder. (You can generally use lots of geneological terms with git - ancestors, children, etc.)
    * 2007-01-05-11am-jane.zip
    |
    |  (has parent)
    v
    * 2007-01-04-7pm-jane.zip
    |
    |  (has parent)
    v
    * (A bunch more zips...)
    |
    |  (has parent)
    v
    * 2007-01-01-1pm-jane.zip (first snapshot!)

You can also have multiple snapshots have the same parent:
    *   *   *
     \  |  /
      \ v /
       >*<
        |
        |
        v

This could happen when, for instance, everyone goes home for the weekend and works on their own stuff, which happens a lot. Then, of course, you have the problem of combining all the changes everyone did Monday morning, which is a huge headache and takes an hour. Right?
Git can do that for you.
This situation is called a 'merge', and git has a relatively simple process for figuring out how to do it. First, it walks back through the parents of the snapshots it's combining, until it can find a common ancestor that both snapshots there.
    * 2007-01-04-7pm-jane   * 2007-01-03-5:30pm-james
    |                       |
    |   ---------------------
    v   v
    * 2007-01-03-5pm-jane (base)
    |
    v

That ancestor is called the 'base'.
Then, git goes to each snapshot you're merging, and figures out what changes have happened to it since the base snapshot. It ends up with a bunch of lists of changes.
changes from base to 2007-01-04-7pm-jane:           changes from base to 2007-01-03-530pm-james:
    - add 'challenge statement 2.pdf'                   - add 'grin.png'
    - change line 5 of 'howitworks.txt' to say:         - change line 3 of 'howitworks.txt' to say:
        'Hello, how are you?'                               'is an interactive greeter application.'
    - change line 7 of 'howitworks.txt' to say:         - change line 7 of 'howitworks.txt' to say:
        'Then, we set everything on fire.'                  'then we activate our oxygenation mixin.'

Any changes that don't conflict with each other, it collects:
changes from base to NEW_MERGED_SNAPSHOT:
    - add 'challenge statement 2.pdf'
    - add 'grin.png'
    - change line 3 of 'howitworks.txt' to say:
        'is an interactive greeter application.'
    - change line 5 of 'howitworks.txt' to say:
        'Hello, how are you?'

If it runs into places where both people have modified something, it yells at you in BIG RED TEXT to make your own decision about them. Eventually, git has a final list of changes:
changes from base to NEW_MERGED_SNAPSHOT:
    - add 'challenge statement 2.pdf'
    - add 'grin.png'
    - change line 3 of 'howitworks.txt' to say:
        'is an interactive greeter application.'
    - change line 5 of 'howitworks.txt' to say:
        'Hello, how are you?'
    - change line 7 of 'howitworks.txt' to say:
        'Then, we set everything on fire with our oxygenation mixin.'

Then, it applies all of the changes it's found to the base, and creates a new snapshot, with two parents.
      *
     / \
    v   v
    *   *
    |   |
    v   v

And there you have it! Pretty simple.
One thing to note is that git calls snapshots 'commits', for some reason. You don't really need to be commited to make them; you should actually make them pretty often.
It also doesn't give commits easy-to-remember names. They tend to have names like '4949363b0cd1a861ab75e7bb7d1f1b56e4801330' - doesn't exactly roll of the toungue. The names are generated based on the precise state of the folder they store. There are some really interesting reasons for this - it makes changing history literally impossible - but I'm going to resist talking about them now. You should ask me about them later.
Anyway, if commits are confusingly named, how do we keep track of them? Well, git has a really useful feature called 'references', or 'refs'.
A ref is, basically, a little labeled arrow pointing into history. People will generally have a ref pointing at the current 'official' version of the project called 'master'. If you have a series of commits that diverges from the main version to add a new feature, you should make a ref pointing at the most recent version with a memorable name, like 'bananas'.
Refs that move are called, confusingly, 'branches'; refs that don't are called 'tags'.
...TODO finish this