Skip to content

Instantly share code, notes, and snippets.

@aijaz
Last active May 5, 2023 22:33
Show Gist options
  • Save aijaz/383bf994ee0534bff6bccf8e0647cbce to your computer and use it in GitHub Desktop.
Save aijaz/383bf994ee0534bff6bccf8e0647cbce to your computer and use it in GitHub Desktop.
Notes for my Advanced Git class

Advanced Git

by Aijaz Ansari

Gogo Con 5/3/2023

Introduction

What you will learn

  • correct mental model necessary to understand git
  • how to manipulate commits and branches
  • intricacies of reset, switch, merge, rebase
  • how to fix errors after the fact
  • how to debug and test using git
  • how to solve problems using git

Intended audience

  • Developers
    • what you can do with git, and how to do it
  • Their managers
    • why process is beneficial

Notes

  • This workshop does not include stuff that's easy to Google
  • Get your hands dirty
  • Interrupt at any time

The SHA

  • git hash-object <file> to get the hash of a file
  • If even a single byte changes, the SHA changes completely
  • Two objects with identical SHAs are identical

Clean Start

Global configs are stored in ~/.gitconfig. Local configs are stored. in <DIR>/.git/config

# backup your existing .gitconfig
cp ~/.gitconfig ~/.gitconfig_pre_ggs 

# Add global configs that are useful, IMHO
git config --global alias.cat 'cat-file -p'
git config --global alias.ind 'ls-files -s'
git config --global log.date 'format:%F %R'
git config --global log.abbrevCommit yes
git config --global core.abbrev 4
git config --global alias.l 'log --graph --pretty="%h %C(blue)%aN %C(yellow)%cd% %C(auto)%d%Creset %s"'
git config --global alias.ll '!git --no-pager l'

Faking the user name, email, and times

/bin/rm -rf .git 
git init
git config user.name 'FirstName LastName'
git config user.email 'you@example.com'
export GIT_COMMITTER_DATE='2020/03/24 09:00:00 MST'
export GIT_AUTHOR_DATE='2020/03/24 09:00:00 MST'

Internals

  • The Index is a staging that contains the next commit
  • Git stores its data in .git
  • Git stores objects in .git/objects
  • find .git/objects -type f to find all objects
  • git add <file> adds <file> to the Index
  • git add . recursively adds all unstaged files to the Index
  • git cat is our alias to show the contents of an object
  • git ind is our alias to show the contents of the Index
  • Every commit has the following objects:
    • One or more object blobs
    • A tree object
    • A commit object
  • A commit tree may refer to blobs that were created for previous commits.
  • Any two files that have the same contents are guaranteed to have the same SHA hash
  • No two commits will have the same hash because all of the following data is part of the hash:
    • Committer Name
    • Committer Email
    • Author Name
    • Author Email
    • Commit Date
    • Author Date
    • Commit Message
  • A tree object contains blobs or other tree objects
  • After a commit, the index is not emptied. It contains the contents of the next commit.
  • A single file may have some changes staged and some unstaged.
  • git status -s shows two columns for file status
    • The first column is for staged changes
    • The second column is for unstaged changes
  • git diff shows the difference between the Index and the Working Directory
  • git diff --staged shows the difference between HEAD and the Index
  • If there are unstaged changes, git diff will create an object before running the diff
  • Git will keep orphaned objects around for a week or two before automatically running the garbage collector.

Commits

# set up the mailmap
echo "FN <you@example.com>" >> ~/.mailmap

# tell Git to use the mailmap file
git config --global mailmap.file ~/.mailmap
$ git ll
* 4878 FN 2019-07-26 17:22  (HEAD -> main) v2
* d47b FN 2019-07-26 10:13  v1
$ 

Colloquially: HEAD points to main. main points to commit 4878. More strictly, main is a simple ref that refers to commit 4878. HEAD is a symbolic ref that refers to main.

HEAD always points to the current commit.

You can check if a ref is a symbolic ref this way: git symbolic-ref HEAD

The contents of .git/HEAD are the referent of HEAD. What HEAD points to. (*HEAD)

# check to see what commit an expression evaluates to
git rev-parse main^

main^ should be read as "the parent of main"

Detached HEAD

If HEAD is not a symbolic ref, but a simple ref, it is 'detached'. It doesn't point to a simple ref, but rather to a commit directly. If you create a commit off a detached HEAD, you will lose it if HEAD moves, because nothing will point to it.

git log --all shows all branches, not just the current branch.

$ git config --global  alias.all  'l --all'
$ git config --global  alias.alll  'll --all'

git commit -a adds all tracked files to the index and then creates the commit.

git branch safe 'f790 creates a branch named safe that's a simple ref to commit f790

git switch -c safe creates a new branch named safe that's a simple ref to whatever HEAD points to. Then it checks out that branch.

git add --interactive can be used to interactively stage hunks. I prefer using a GUI app like Tower for this.

Reset

git reset <commit> does one to three things.

git reset --soft <commit> moves what HEAD points to to the commit specified.

git reset --mixed <commit> is the default if no -- option is specified. It does what reset --soft does and also copies the tree from the new HEAD to the Index.

git reset --hard <commit> does what --mixed does and also copies the tree from the new HEAD to the working directory. Any changes in the working directory are lost.

The default value of <commit> is HEAD.

If HEAD is detached, git reset works as above with one exception: Instead of moving what HEAD points to, it moves HEAD only. This makes sense because when detached, HEAD doesn't point to any branch.

git reset <file> has only one flavor: similar in behavior to the --mixed case. It copies what's the version of the file from HEAD to the Index. Effectively upstaging any staged changes of the file.

In the more generic case, git reset <commit> <file> copies the version of <file> from <commit> to the Index.

git reset --soft followed by git commit can be used to squash multiple commits into a single one.

Checkout/Switch

git switch <commit> is always working-directory-safe. You will never lose unstaged changes with git switch <commit>.

switch always moves HEAD, never what HEAD points to.

switch can be used to reattach a detached HEAD. If the Index and WD are different from HEAD, and you're attaching HEAD to the same commit you're already on, the Index and WD will not be overwritten. See Merge Strategies for more information.

git checkout [<commit>] <file> updates the index and working directory with the version of file from commit. commit defaults to HEAD. This is not working-directory-safe. It's like one would expect git reset --hard <commit> <file> to work, if such a thing existed.

Use restore instead of checkout.

git restore --staged f1 unstages f1 - It copies f1 from HEAD to the staging area.

git restore f1 unedits f1 - It copies f1 from the staging area to the working directory.

git restore --source cccc f1 copies f1 from commit cccc to the working directory.

Merging

  • Proper merging should respect both, content and history.
  • Many tools use git log --first-parent to display logs. This is why preservation of history is important.
  • Fast-forward merges can litter the logs with minutiae of incremental feature changes. In such cases, --no-ff would be desirable.

Merge Strategies

With a normal merge, conflicts must be resolved by the developer.

A recursive merge strategy has two options: -X ours and -X theirs.

With -X ours, if there is a conflict, the version of the file from the branch from which merge was invoked will be used.

With -X theirs, if there is a conflict, the version of the file from the branch being merged in will be used.

Git also supports a -s ours merge strategy. With -s ours the branch being merged in is NEVER considered. There are never any conflicts because the resulting merge commit will have a tree identical to the tree of the commit from which merge was invoked.

There is no corresponding -s theirs merge strategy. There are two ways to simulate one. The first does not respect history. The second one does.

Let's assume you want to merge feature into main with feature override main (like a -s theirs)

A----B (HEAD -> main)
|
+----C (feature)

Merge -s theirs strategy 1

git switch feature
git merge -m merge -s ours main
git switch main
git merge -m merge feature

This results in the wrong first and second parent of the merge commit:

A----B
|     XD (HEAD -> main, feature)
+----C

Best Merge -s theirs strategy

Given that:

$ git cat HEAD | grep tree
tree f4a198ba1240ce2951057eecd2ceabbc6fc8641d
$ git rev-parse HEAD^{tree}
f4a198ba1240ce2951057eecd2ceabbc6fc8641d
$ # commit^{tree} refers to the tree of that commit.

Then:

git commit-tree -p main -p feature -m "merge" feature^{tree} 
# creates a commit where the first parent is main,
# and the second parent is feature, 
# and the tree is the tree belonging to the feature commit,
# and finally returns the hash of the newly-created commit

Creates a Commit like this:

A----B (HEAD -> main) 
|     >E ([tree from feature])
+----C (feature)
$ git commit-tree -p main -p feature -m "merge" feature^{tree}
1c79a0d7f13d9e4f400aa86a6a4473c39f410ed7
$ git reset --hard 1c79a0d7f13d9e4f400aa86a6a4473c39f410ed7

All of this can be combined into one snippet:

git reset --hard \
        $(git commit-tree -p main -p feature -m "merge" feature^{tree})

Rebasing

Rebasing replays the changes introduced by one or more commits onto a different branch. It gives those commits a new base, or rebases them.

Rebasing changes history. It should not be used for commits pushed upstream.

Typical use of rebase is to move your changes to a branch to the tip of the upstream branch after the upstream branch has changed. See slides for more info.

In my opinion, rebasing should be done for individual commits, and merging should be done for major feature work. This lets future you know what happened.

Fixing Errors

Commits that appear 'lost' will be in the reflog for a few days or weeks, until git runs its garbage collection. Use git reflog to see the reflog.

Use git commit --amend to replace the last commit's tree with the one at the Index. The commit message can also be amended. This only works for the last commit.

Use git rebase -i to make major changes to several earlier commits. You can change comments and trees. You can also delete commits. Like with rebasing, this should not be done for commits that been pushed upstream.

Use git rebase -i --root to rebase the root of the repo. This is done in the slides to show how to squash the first n commits into a single commit.

Debugging

Use git grep to grep the current commit's tree for the pattern specified.

Use git log --grep to grep the logs for the pattern specified.

Use git log -G to list commits where the commit adds or removes lines matching the pattern specified.

git bisect

git bisect runs a binary search on the branch. You specify a 'bad' commit, and a 'good' commit, and git bisects the branch and moves HEAD to the center. Then you mark HEAD as 'good' or 'bad', and continue. This continues until git finds the commit that changed from good to bad.

You start a bisect with git bisect start and end it with git bisect reset.

If you have a command that exits 0 for good and non-zero for bad, you can automate the bisect with git bisect run <command>. This way you can run test suites to pinpoint the commit that broke the build. See slides for more information.

Remotes

When you clone a repo, remote tracking branches are created. A local branch for the current branch (usually main) is also created.

git pull is a git fetch followed by a git merge.

You can choose to git pull --rebase instead. This is essentially a git fetch followed by a git rebase.

Git Toolkit

diff

# compare 2 versions of file
git diff main:file server:file
# or 
git diff main server -- file
# or
git diff main..server file

# diff server:file to head
git diff server -- file
# or
git diff server file

# git help diff for more examples

show a version of a file

git show commit:file
# or
git cat commit:file

# show formats data in a way that it
# thinks makes sense

# cat tries to dump the data in its
# original format. cat is our alias.

# git help show
# git help cat-file 

git notes

# add notes to any commit anywhere in 
# your commit history

git notes append HEAD^

# git log shows the notes
# git log --grep includes notes in search

# git help notes

The BFG repo-cleaner

A tool that cleanses bad data from your repo history:

  • log files
  • passwords
  • secret keys

https://rtyley.github.io/bfg-repo-cleaner/

Change log Generator

$ cat git-change-log 
#!/bin/bash

git log $* | grep -e '^ *!' | sed -e 's/^ *!/*/'
$

This script replaces any lines in the logs that start with ! to start with * and then prints out only those lines.

If you get into the habit of putting change-log comments in your git logs this way, this script can automate generation of the change log.

When you type in git foo, git searches your path for a file named git-foo. If that exists, it invokes that file. Therefore, if you save this script in your path, you can invoke it like this:

git change-log build1..build3

This would display the change log for all changes introduced between commits build1 and build3 inclusive.

If you invoke the command with

git change-log build1..build3 --reverse

then it will reverse the order of the commits, preserving the chronological order of the series of change log entries.

git flow

A branching model to keep you disciplined

  • all work is done on the feature branch
  • feature merges into develop
  • develop merges into release
  • release merges into main
  • hotfix branches created when necessary

Git Flow

git cherry-pick

apply changes from the specified commits, reintroducing a new commit for each

git cherry-pick commit

Resources and links (in no particular order)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment