I noticed myself passing around similar copies of the same git description to friends and coworkers enough times that I finally decided to write it up for a wider audience, though mostly so I would only have to do it one last time.
First off, this recommendation is for novice git users, particularly those who are coming from a CVS or Subversion background.
This workflow is based on the Branching workflow described in the Pro Git book (which I recommend that you find some time to read). In my experience, I've found that the distributed model just doesn't make sense in a corporate environment where people want there to be a single canonical origin repository.
This is a work in progress. It's currently divided up into two sections, one describing what I hope is an easy walkthrough of recommended workflows, the other delving into much more detail about how git allows you to some of the more interesting, powerful, and helpful things it lets you do with your code history. I'm not completely happy with how different the two sections are, so there is a good chance I will be splitting things up into multiple documents.
We'll start with some basic information. Experienced git users will note that I avoid rebase and related practices that keep the history clean and flat. This is because I've found that a lot of developers get it into their heads that rebase is a Scary and Dangerous Thing, and just the thought of it hinders their ability to become comfortable with git in general.
- Read the Git - SVN Crash Course
- Install a git config file borrowed from someone who knows git. I'll
even recommend my .gitconfig, which is based on a lot of help and
advice from former coworkers _cxreg and _codon.
- If you use my file, the main things you'll get are some svn-like shortcuts (e.g. st, ci), default merge tool of vimdiff (see the link at the end of this document if you prefer a visual merge tool), and a handy "new" alias that will list committed-but-unpushed changes. There is also some nice bashrc stuff in the repo one level up if you want to check it out.
So here's how I do a basic simple fix:
git clone $REPO (or `git pull` to existing clone) ... work ... git pull ... no conflicts ... git commit -a git push
If there are conflicts, git will stop and complain. This is because git-pull actually triggers a merge, and if you have uncommitted changes it doesn't know how to merge them. For this reason, a better workflow might be:
... work ... git stash git pull git stash pop ... conflicts! ... git mergetool # or just fix manually and git-add the files git stash drop # more on this below git commit -a git push
The stash tool is pretty useful if you are working on a bunch of stuff you're not ready to commit, and want to go off and do something else. It's a stack, so you can pop off different layers, etc. I haven't had much opportunity to use it much more than what I described above, but even that much is really helpful at times.
Note that stash leaves a lot to be desired if there are conflicts. For one, your code stays in the stash stack so you have to manually drop it once you've fixed things.
The conflict resolution process is actually the same if you committed before the pull, but without the annoyances of dealing with the stash:
... work ... git commit -a git pull ... conflicts! ... git mergetool git commit
You could also use rebase in this situation, but I'll avoid talking about that here. Rebase is one of the greatest tools available for the git workflow but it can be a bit scary to new users.
It only took me working through one conflict with vimdiff (a tool I'd never used before) before I felt comfortable. However, if you don't feel comfortable on the command line, I would recommend you take look at this blog post on integrating git with a visual merge tool.
WARNING: get in the habit of always running mergetool from the repository root, not a subdirectory. Some versions of git can mark things in parent directories as resolved without showing you the diffs.
With anything that's going to take more than a few minutes, I always branch. Not only is it easier to keep track of things (bug number in the branch name) but it makes merges and conflict resolution a bit easier.
Creating and working in your branch is easy:
... clean working tree ... git checkout master # just in case git pull git checkout -b $BRANCHNAME # Create branch $BRANCHNAME and switch to it ... work ... git diff ... read ... git add $FILE1 git add -p $FILE2 # I will let you look up -p git diff --cached ... make sure only what I want is staged for commit ... git commit # could also use commit -v instead of diff above ... more work ... git add $FILE3 git commit -v
At this point, your work should be done and you want to merge your work to the master branch. You basically have two options; both start by getting back to master:
git checkout master git pull # in case someone else pushed in new changes
Now you can either merge your two changes as-is to master, like this:
git merge $BRANCHNAME
This command will merge and commit your changes to your local copy of master. git-log will show both of the commits from your branch (and things will get weird if you revert them later and then try to re-merge, because it knows you already applied them).
Or you can squash the merge so it looks more like an svn merge -- one big diff:
git merge --squash $BRANCHNAME
This does not commit because what it's really doing is creating and applying a patch. It lets you view your changes with git-diff and give your changes a new commit message, summarizing the branch. This technique is great for small changes, but as you know with svn, can get messy if you are working on a large project and want to merge things to master in multiple stages. I'd recommend using this for medium sized bug fixes, but not for projects.
Once your work is committed to your local copy of master, it's just a single push command to send it to the canonical/origin repository:
Once your branch has been merged and tests stable for a few days/weeks (whatever matches your deployment cycle), you can just delete it:
git branch -d $BRANCHNAME
Git will remember the history of your work, and if you didn't --squash it will even remember that a branch existed (and show it on the history graph).
You might have noticed that I didn't say anything about pushing your branch to the origin server. If you want to share your branch with other developers, you can do the following:
git checkout $BRANCH git push --set-upstream origin $BRANCH
This basically tells git to push your branch to origin, with branch name $BRANCH (for simplicity, I would avoid naming the origin branch differently than your local name), and tells your local branch to track origin/$BRANCH in case you want to pull or push again.
Once your branch has been pushed to origin, other developers just need to run the following in order to see and interact with your work (they can even push changes back to it):
git fetch git checkout $BRANCH
Files in git have 3 stages.. local changes, staged (sometimes called the index), and committed. If you accidentally staged (e.g. with git-add) something that you didn't mean to, you can unstage it with git-reset:
git add $FILE git reset $FILE
If you have some local changes that you want to blow away (something you would have used svn-revert for) you need to use git-checkout:
vim $FILE git checkout $FILE
If you ever get hopelessly lost (and don't care about losing any working pies), please take a look at git-reset to get you back to a clean up-to-date copy of the origin repository. Just run:
git checkout master git fetch git reset --hard origin/master
Lastly, git has excellent documentation. For every git command, you can run:
git help COMMAND # or man git-COMMAND
If it lacks at anything, it's that it assumes that you have read more of it than anyone else has (other than Linus). Sometimes it refers to terminology about its internals that makes some of the instructions difficult to follow, but I've found it far more helpful than I ever did from the man pages for CVS or SVN. Not to mention all of the online books and tutorials (yes, even Pro Git).
Now that you can find your way around a basic workflow, it's time to start thinking about keeping the history clean. This is where git-rebase comes into play.
When I first started using git, despite having some coworkers with expert git skills, it took me a long time to understand how it actually tracks changes. One day while someone was trying to explain the same thing to me for the half-dozenth time, I had an epiphany. So I always include this description up front when talking to people who want to dive deeper into git.
- CVS tracks revisions against individual files -- there is no "revision R" but there is "info.txt, revision R". There are some tricks you can play to keep things relatively in sync across the entire repository, but it's actually quite difficult to go back later and identify a specific group of changes committed together.
- Subversion tracks revisions as a cross-section slice of the entire repository. Not only does "revision R" exist but it exists on every single branch of the entire repository (true branching doesn't exist subversion; you only have the convention of calling a particular directory branches/). This works well enough, but I found that it was one of the largest barriers for my own understanding of Git.
Git doesn't track revisions, it tracks objects. As it happens, some of those objects happen to be revisions. Unlike CVS and SVN, git references its commits as a checksum (a.k.a. sha) built from a variety of pieces of meta-data. Among that meta-data are the identity of the parent commit(s) (multiple in the case of merges), the commit message, author, etc. Git doesn't have revisions, it has commits that happen to know where they exist in relation to all of your other commits. From those relationships (and a bunch of other information you can read about in that Pro Git link in the previous paragraph), git is able to build a representation of your code as it exists for any specific commit.
This is where I had my epiphany: unlike Subversion commits, which look like one long ever-progressing timeline, your git commits relate to each other as a web. This is very powerful, but it can also get very messy. And this is where rebase comes into play: you can move your commits around this web by changing their parents.
Rebase, or re-base is the process of moving and re-attaching the base of a git branch (or even a single commit, in the case of git commit --amend). This looks something like:
o---1---2---3---4 (master) \---A---B (topic) o---1---2---3---4 (master) \---A---B (topic)
Rebase literally works by rewinding your commits until it reaches a common ancestor (in this case, o) and then replaying them one by one as if they had been committed directly on the new base (4).
Why would you want to do this? Look at what your history graph looks like when you try to merge these two scenarios:
No rebase: o---1---2---3 (master) \---A---B (topic) o---2---2---3---3B--- \---A---B---/ With rebase: o---1---2---3 (master) \---A---B (topic) o---1---2---3---A---B---
The un-rebased merge causes git to create a new commit that references both parents (3 and B), and your history graph now has a bump in it to track the branch. Because the rebased version was already linear, it really just flattens things out when you merge your work back to master. The end result is a much cleaner history. This will make even more sense when we start talking about merge conflicts later on.
At this point, I feel that we need to take a quick glance backwards at squashed merges. If we still assume that there are no merge conflicts, this is what the graph would look like if you had used a squashed merge instead of rebasing and merging normally:
While the code at this new 4 commit will look identical to either of the earlier scenarios, the squashed merge acts as if you had applied a patch from an outside source. It has no history that commit 4 was generated from a diff of A+B against 3. On the other hand, in many ways you are left with an even cleaner branch history. This is why it's great for small and medium sized amounts of work.
Note: The main reason why I think it doesn't work for larger projects is specifically because it erases history. Imagine that your predecessor was asked to implement a new method, which he/she does in commit A, and the PM then asked for some non-trivial change, which was implemented at commit B. Now imagine what happens when the PGM comes to you and asks for the first request to be re-implemented. If your predecessor had squashed the merge, the original functionality would never have existed in the repository history. On the other hand, if he or she had merged normally you would just have to go back and look at commit A (and save yourself a lot of time re-implementing something someone else had already done).
When you merge, a new commit needs to be created to track those changes. Imagine something like this, where a developer needs to pull in changes from master in order to continue work on a particular topic:
o---1---2--- (master) \ \ A-------2A---B--- (topic)
Even if there are no conflicts, git needs to create a commit in order to track when you last synchronized (so it doesn't try to pull in the same changes more than once). If there is a conflict, commit 2A suddenly starts to look a lot more like clutter in the history of your branch, especially if you need to start looking backward to find out why a particular piece of code was created (you'll end up finding pertinent lines of code modified under the highly informative "sync with master at commit 2" instead of "implement feature N"). If you merge your topic branch back to master, you end up with something fairly ugly like this:
- o---1---2--------2B--- (master)
- A-------2A---B (topic)
Now assume that both 2A above resulted in a conflict. Let's watch what happens when you rebase onto 2 instead of merging it:
A---B--- (rebased-topic) / o---1---2--- (master) \ (A-0)
What happened? Remember that git keeps track of each commit as a relationship with its parents. When you rebase, those parents change so your commit (A, in this case) is basically reinvented in its new place on the tree.
Where is the conflict? Remember the part about rebase re-playing your commits in the order they happen? When you resolve a conflict during a rebase, you're essentially editing the original commit as it would have happened if you had just decided to start your topic branch at commit 2 in the first place. The original A-0 commit will eventually be garbage collected and you'll never have to think about it again.
When you merge back to master, your tree will again be clean:
Let's step through the workflow of the merge example from above:
git checkout master git pull git checkout -b topic ... work ... git commit -am "A" git checkout master git pull git checkout topic git merge master # Conflict at 2A git mergetool git commit ... work ... git commit -am "B" git checkout master git merge topic git push
Now look at the workflow for rebase:
git checkout master git pull git checkout -b topic ... work ... git commit -am "A" git fetch git rebase origin/master # Conflict at 2A git mergetool git rebase --continue ... work ... git commit -am "B" git checkout master git pull --rebase # need to do this to pull in commits 1 and 2 git merge topic git push
Besides calling rebase instead of merge, There are only a few differences:
- You don't need to actually check out your local copy of master in order to rebase. Because git-pull triggers a merge, you need to actually be "using" the master banch in order to babysit it in case something goes wrong. When you use git-rebase, you can work directly against the up to date branch data from origin rather than your local copy -- everything will catch up later.
- When you fix a merge conflict, you commit. When you fix a conflict during rebase, you tell it to continue re-playing your commits.
- git-pull has a --rebase parameter that tells it to rebase your local master rather than merge it. It's not actually necessary here because we don't have any commits on our local master, but it's a good habit to get into so you can avoid dealing with merging differences between your local master and origin/master.
As far as your actual work is concerned, these are all quite trivial. From a higher level, both workflows look like:
- checkout master and make sure it's up to date
- create branch
- do some work
- synchronize with master and call git-mergetool when there is a problem
- merge branch back to master and push it up to origin.
Given the benefits of rebase, combined with the fact that you're not really doing any extra work, it's definitely a tool worth learning.
Let's revisit the basic workflow from the end of the first section:
... work ... git commit -a git pull --rebase ... conflicts! ... git mergetool git rebase --continue
The only difference is that we've added the now-familiar --rebase to git-pull and use --continue for ``git-rebase instead of committing the resolved merge. The high level workflow stays the same.
And now for the branch-based bugfix workflow:
... clean working tree ... git checkout master # just in case git pull --rebase # just in case git checkout -b $BRANCHNAME ... work ... git add $FILE1 git add -p $FILE2 git commit -v # make sure only what I want is staged for commit ... more work ... git add $FILE3 git commit -v # And now the rebase and merge git fetch git rebase origin/master ... conflicts! ... git mergetool git rebase --continue git checkout master git pull --rebase git merge $BRANCHNAME git push
I hope this has been informative. If you would like to learn more or see other people's take on things, check out the list below, which I hope to add to over time: