BrianWill/understanding_git.md

## understanding_git.md

      
    Raw
  

              understanding_git.md
            
          
    Before getting into Git, let's establish a few ideas common to all version control systems:
Diffs between versions of a file

Given two versions of a file, we can find the diff(erence) between them, line-by-line. For example, given two versions of this file:
Version A:

Roses are red
Violets are green
And so are you

Version B:

Roses are red
Violets are blue
Sugar is sweet
And so are you

...we can express the diff from A to B as:
-Violets are green
+Violets are blue
+Sugar is sweet

...where + starts each line added and - starts each line removed. (Note that when a line is modified, we express that as removing the line and adding a replacement line.)
Merging two versions of a file

Given two versions of a file, we may wish to reconcile their differences into one new version. This can often be done reasonably well by an algorithm if we also have a common ancestor of these two versions (a version of the file from which they both derive). By comparing each of the two versions with the common ancestor, the algorithm can see what changes each of the two versions represent (relative to the common ancestor) and then combine those changes into one file. For example:
Version A:

Violets are blue
And so are you

Version B:

Roses are red
Violets are blue
Sugar is sweet
And so are you

Version C (the common ancestor):

Roses are red
Violets are blue
And so are you

Compared to version C, version A has removed the line 'Roses are red'.
Compared to version C, version B has added the line 'Sugar is sweet'.
These changes relative to C do not conflict, and so we can get this automatic merger:
Version D (the merger of A and B using common ancestor C)

Violets are blue
Sugar is sweet
And so are you

However, in some cases, the changes in the two versions relative to the common ancestor may conflict, in which case a human will have to resolve the differences. In this example, versions A and B edit the same line in different ways, and so the merge tool will inject nasty lines of <<<<<<<<<<<<<<, ================, and >>>>>>>>>>>>> to denote the conflict:
Version A:

Roses are green
And so are you

Version B:

Roses are brown
And so are you

Version C (the common ancestor):

Roses are red
And so are you

Version D (the merger of A and B using common ancestor C)

<<<<<<<<<< Version A <<<<<<<<<<
Roses are green
================================
Roses are brown
>>>>>>>>>> Version B >>>>>>>>>>
And so are you

As the human user, it is now our responsibility to edit the file into the state we want. This generally means picking which of the two conflicting versions to keep and deleting the unwanted lines, e.g.:
Version 'D (after fixing the merge conflict by hand)

Roses are green
And so are you

Merging two versions of a directory

If we want to merge two versions of a directory and all their contents, again, it's best if we can use a common ancestor to combine two sets of differences. For example, say we have three versions of one directory:
Directory listing in version A

cat.txt
dog.txt

Directory listing in version B

cat.txt
dog.txt
bird.txt
gorilla.bat

Directory listing in version C (the common ancestor)

cat.txt
dog.txt
bird.txt

Compared to version C, version A deleted bird.txt.
Compared to version C, version B added gorilla.bat.
So we get this automatic merger:
Directory listing in version D (the merger of A and B using common ancestor C)

cat.txt
dog.txt
gorilla.bat

Once we have the merged directory listing, we must also merge together the files common to both versions A and B using their respective common ancestors in C. In this case:

merge cat.txt of A with cat.txt of B (using common ancestor cat.txt of C)
merge dog.txt of A with dog.txt of B (using common ancestor dog.txt of C)

Each file merge might produce conflicts which we will have to resolve by hand.
How Git stores data

A Git repo (repository) stores a few kinds of things:

A commit stores a snapshot state of a directory and its content (including subdirectories). Once created, a commit is never modified and (usually) never deleted. A commit can point to one or more other commits which are its parents, versions of the directory state from which this commit derives. For example, if commit A has parent B, then A is a derivative version created from B. If commit A has parents B and C, then A represents the merger of B and C. The first commit created in a repo is usually the only commit with no parent.
A ref (reference) is simply a named pointer to a commit. Refs come in two kinds: tags and branches. A tag is meant to uniquely identify a particular commit in a fixed way, such as to designate a commit with a particular version number, e.g. 'v3.2alpha', 'v0.13', etc. A branch is meant to denote a commit and all of that commit's ancestors. When you create a new commit A derived from parent B, the branches pointing to B are automatically updated to point to A instead of B. Branches should generally point only to 'tips' a.k.a. 'heads' (commits with no children).
The working directory is not part of the repo but rather where we view and edit the files of our project. The repo itself is usually stored in the working directory under the subdirectory called .git. When a Git repo is hosted on a server, we generally don't need or want it to have a working directory; a Git repo with no working directory is said to be 'bare'.
We create a commit by first staging changes in the index a.k.a. staging area. For example, if I want the next commit to include modifications I've made to a file in my working directory, I tell the index to add the file's changes. If I want the next commit to remove a file that exists in the previous commit, I tell the index to remove the file. In other words, the index records the changes that the next commit will represent relative to its parent(s).
Inside the .git repo directory, a text file called config stores various options which control how Git operates upon the repo.

In centralized version control systems, such as Subversion and CVS, a single repo lives on a central server, and each user has their own working directory on their local machines.
In distributed version control systems, like Git, each user has their own repo, and commits are copied between repos. Typically, users on a project coordinate through a central repo on a server: a user makes new commits on their local machine and then copies them to the repo on the server; to get the work of others, a user copies commits from the repo on the server to their own local repo.
Git operations

Once you understand how Git stores data, the only hard part in learning to use Git is remembering exactly how the commands affect your repo and working directory. Some commands affect only the working directory, some only the commits, some only the index, and some only the refs. However, several commands affect a mix of all of these things, and some commands affect multiple repos. Here's a quick rundown of the most essential commands:

To create a new, empty repo, we use the git init command.
To copy a repo, we can simply copy the .git directory, but more commonly we use the git clone command, which copies the repo but then also conveniently sets certain configuration options in the new copy.
We can copy commits from another repo into our own using the git fetch command. If the other repo is on another machine running a Git server, we can fetch over the network.
To copy commits in the other direction—from our repo to another—we use the git push command. If the other repo is on another machine running a Git server, we can push over the network. (The git push command not only copies commits, it modifies certain tags and branches.)
To stage changes in the index, we use the git add and git rm commands.
To create a commit from the changes we've staged, we use the git commit command.
To merge a commit and the working directory, we use git merge. This does not create a new commit: it only puts the result of the merge in the working directory. Once we've fixed the merge conflicts, we then usually stage the changes (using git add) and make a commit (using git commit).
To set our working directory's state to match the state represented by an existing commit, we use the git checkout command.
To create, modify, and delete branches, we use git branch.
The git pull command is similar to git fetch, but it updates certain refs and in some cases also triggers a merge.