aijaz/summary.md

## summary.md

      
    Raw
  

              summary.md
            
          
    Advanced Git

by Aijaz Ansari

Gogo Con 5/3/2023

Introduction

What you will learn


correct mental model necessary to understand git
how to manipulate commits and branches
intricacies of reset, switch, merge, rebase
how to fix errors after the fact
how to debug and test using git
how to solve problems using git

Intended audience


Developers

what you can do with git, and how to do it


Their managers

why process is beneficial


Notes


This workshop does not include stuff that's easy to Google
Get your hands dirty
Interrupt at any time

The SHA


git hash-object <file> to get the hash of a file
If even a single byte changes, the SHA changes completely
Two objects with identical SHAs are identical

Clean Start

Global configs are stored in ~/.gitconfig.
Local configs are stored. in <DIR>/.git/config
# backup your existing .gitconfig
cp ~/.gitconfig ~/.gitconfig_pre_ggs 

# Add global configs that are useful, IMHO
git config --global alias.cat 'cat-file -p'
git config --global alias.ind 'ls-files -s'
git config --global log.date 'format:%F %R'
git config --global log.abbrevCommit yes
git config --global core.abbrev 4
git config --global alias.l 'log --graph --pretty="%h %C(blue)%aN %C(yellow)%cd% %C(auto)%d%Creset %s"'
git config --global alias.ll '!git --no-pager l'

Faking the user name, email, and times

/bin/rm -rf .git 
git init
git config user.name 'FirstName LastName'
git config user.email 'you@example.com'
export GIT_COMMITTER_DATE='2020/03/24 09:00:00 MST'
export GIT_AUTHOR_DATE='2020/03/24 09:00:00 MST'
Internals


The Index is a staging that contains the next commit
Git stores its data in .git
Git stores objects in .git/objects
find .git/objects -type f to find all objects
git add <file> adds <file> to the Index
git add . recursively adds all unstaged files to the Index
git cat is our alias to show the contents of an object
git ind is our alias to show the contents of the Index
Every commit has the following objects:

One or more object blobs
A tree object
A commit object


A commit tree may refer to blobs that were created for previous commits.
Any two files that have the same contents are guaranteed to have the same SHA hash
No two commits will have the same hash because all of the following data is part of the hash:

Committer Name
Committer Email
Author Name
Author Email
Commit Date
Author Date
Commit Message


A tree object contains blobs or other tree objects
After a commit, the index is not emptied. It contains the contents of the next commit.
A single file may have some changes staged and some unstaged.
git status -s shows two columns for file status

The first column is for staged changes
The second column is for unstaged changes


git diff shows the difference between the Index and the Working Directory
git diff --staged shows the difference between HEAD and the Index
If there are unstaged changes, git diff will create an object before running the diff
Git will keep orphaned objects around for a week or two before automatically running the garbage collector.

Commits

# set up the mailmap
echo "FN <you@example.com>" >> ~/.mailmap

# tell Git to use the mailmap file
git config --global mailmap.file ~/.mailmap
$ git ll
* 4878 FN 2019-07-26 17:22  (HEAD -> main) v2
* d47b FN 2019-07-26 10:13  v1
$ 
Colloquially: HEAD points to main. main points to commit 4878. More
strictly, main is a simple ref that refers to commit 4878. HEAD is a
symbolic ref that refers to main.
HEAD always points to the current commit.
You can check if a ref is a symbolic ref this way: git symbolic-ref HEAD
The contents of .git/HEAD are the referent of HEAD. What HEAD points
to. (*HEAD)
# check to see what commit an expression evaluates to
git rev-parse main^
main^ should be read as "the parent of main"
Detached HEAD

If HEAD is not a symbolic ref, but a simple ref, it is 'detached'. It
doesn't point to a simple ref, but rather to a commit directly. If you
create a commit off a detached HEAD, you will lose it if HEAD moves,
because nothing will point to it.
git log --all shows all branches, not just the current branch.
$ git config --global  alias.all  'l --all'
$ git config --global  alias.alll  'll --all'
git commit -a  adds all tracked files to the index and then creates the
commit.
git branch safe 'f790 creates a branch named safe that's a simple ref
to commit f790
git switch -c safe creates a new branch named safe that's a simple
ref to whatever HEAD points to. Then it checks out that branch.
git add --interactive can be used to interactively stage hunks. I prefer
using a GUI app like Tower for this.
Reset

git reset <commit> does one to three things.
git reset --soft <commit> moves what HEAD points to to the commit specified.
git reset --mixed <commit> is the default if no -- option is
specified. It does what reset --soft does and also copies the tree from
the new HEAD to the Index.
git reset --hard <commit> does what --mixed does and also copies the
tree from the new HEAD to the working directory. Any changes in the
working directory are lost.
The default value of  <commit> is HEAD.
If HEAD is detached, git reset works as above with one exception:
Instead of moving what HEAD points to, it moves HEAD only. This makes
sense because when detached, HEAD doesn't point to any branch.
git reset <file> has only one flavor: similar in behavior to the
--mixed case. It copies what's the version of the file from HEAD to the
Index. Effectively upstaging any staged changes of the file.
In the more generic case, git reset <commit> <file> copies the version
of <file> from <commit> to the Index.
git reset --soft followed by git commit can be used to squash multiple
commits into a single one.
Checkout/Switch

git switch <commit> is always working-directory-safe. You will
never lose unstaged changes with git switch <commit>.
switch always moves HEAD, never what HEAD points to.
switch can be used to reattach a detached HEAD. If the Index and WD
are different from HEAD, and you're attaching HEAD to the same commit
you're already on, the Index and WD will not be overwritten. See Merge
Strategies for more information.
git checkout [<commit>] <file> updates the index and working directory
with the version of file from commit. commit defaults to HEAD. This is
not working-directory-safe. It's like one would expect git reset --hard <commit> <file> to work, if such a thing existed.
Use restore instead of checkout.
git restore --staged f1 unstages f1 - It copies f1 from HEAD to the staging area.
git restore f1 unedits f1 - It copies f1 from the staging area to the working directory.
git restore --source cccc f1  copies f1 from commit cccc to the working directory.
Merging


Proper merging should respect both, content and history.
Many tools use git log --first-parent to display logs. This is why preservation of history is important.
Fast-forward merges can litter the logs with minutiae of incremental feature changes. In such cases, --no-ff would be desirable.

Merge Strategies

With a normal merge, conflicts must be resolved by the developer.
A recursive merge strategy has two options: -X ours and -X theirs.
With -X ours, if there is a conflict, the version of the file from the
branch from which merge was invoked will be used.
With -X theirs, if there is a conflict, the version of the file from the
branch being merged in will be used.
Git also supports a -s ours merge strategy. With -s ours the branch
being merged in is NEVER considered. There are never any conflicts
because the resulting merge commit will have a tree identical to the tree
of the commit from which merge was invoked.
There is no corresponding -s theirs merge strategy. There are two ways
to simulate one. The first does not respect history. The second one does.
Let's assume you want to merge feature into main with feature override
main (like a -s theirs)
A----B (HEAD -> main)
|
+----C (feature)

Merge -s theirs strategy 1

git switch feature
git merge -m merge -s ours main
git switch main
git merge -m merge feature
This results in the wrong first and second parent of the merge commit:
A----B
|     XD (HEAD -> main, feature)
+----C

Best Merge -s theirs strategy

Given that:
$ git cat HEAD | grep tree
tree f4a198ba1240ce2951057eecd2ceabbc6fc8641d
$ git rev-parse HEAD^{tree}
f4a198ba1240ce2951057eecd2ceabbc6fc8641d
$ # commit^{tree} refers to the tree of that commit.
Then:
git commit-tree -p main -p feature -m "merge" feature^{tree} 
# creates a commit where the first parent is main,
# and the second parent is feature, 
# and the tree is the tree belonging to the feature commit,
# and finally returns the hash of the newly-created commit
Creates a Commit like this:
A----B (HEAD -> main) 
|     >E ([tree from feature])
+----C (feature)

$ git commit-tree -p main -p feature -m "merge" feature^{tree}
1c79a0d7f13d9e4f400aa86a6a4473c39f410ed7
$ git reset --hard 1c79a0d7f13d9e4f400aa86a6a4473c39f410ed7
All of this can be combined into one snippet:
git reset --hard \
        $(git commit-tree -p main -p feature -m "merge" feature^{tree})
Rebasing

Rebasing replays the changes introduced by one or more commits onto a
different branch. It gives those commits a new base, or rebases them.
Rebasing changes history. It should not be used for commits pushed
upstream.
Typical use of rebase is to move your changes to a branch to the tip of
the upstream branch after the upstream branch has changed. See slides for
more info.
In my opinion, rebasing should be done for individual commits, and merging
should be done for major feature work. This lets future you know what
happened.
Fixing Errors

Commits that appear 'lost' will be in the reflog for a few days or weeks,
until git runs its garbage collection. Use git reflog to see the reflog.
Use git commit --amend to replace the last commit's tree with the one at
the Index. The commit message can also be amended. This only works for the
last commit.
Use git rebase -i to make major changes to several earlier commits. You
can change comments and trees. You can also delete commits. Like with
rebasing, this should not be done for commits that been pushed upstream.
Use git rebase -i --root to rebase the root of the repo. This is done in the slides to show how to squash the first n commits into a single commit.
Debugging

Use git grep to grep the current commit's tree for the pattern
specified.
Use git log --grep to grep the logs for the pattern specified.
Use git log -G to list commits where the commit adds or removes lines
matching the pattern specified.
git bisect

git bisect runs a binary search on the branch. You specify a 'bad'
commit, and a 'good' commit, and git bisects the branch and moves HEAD to
the center. Then you mark HEAD as 'good' or 'bad', and continue. This
continues until git finds the commit that changed from good to bad.
You start a bisect with git bisect start and end it with git bisect reset.
If you have a command that exits 0 for good and non-zero for bad, you can automate the bisect with git bisect run <command>.  This way you can run test suites to pinpoint the commit that broke the build. See slides for more information.
Remotes

When you clone a repo, remote tracking branches are created. A local
branch for the current branch (usually main) is also created.
git pull is a git fetch followed by a git merge.
You can choose to git pull --rebase instead. This is essentially a git fetch followed by a git rebase.
Git Toolkit

diff

# compare 2 versions of file
git diff main:file server:file
# or 
git diff main server -- file
# or
git diff main..server file

# diff server:file to head
git diff server -- file
# or
git diff server file

# git help diff for more examples
show a version of a file

git show commit:file
# or
git cat commit:file

# show formats data in a way that it
# thinks makes sense

# cat tries to dump the data in its
# original format. cat is our alias.

# git help show
# git help cat-file 
git notes

# add notes to any commit anywhere in 
# your commit history

git notes append HEAD^

# git log shows the notes
# git log --grep includes notes in search

# git help notes
The BFG repo-cleaner

A tool that cleanses bad data from your repo history:

log files
passwords
secret keys

https://rtyley.github.io/bfg-repo-cleaner/
Change log Generator

$ cat git-change-log 
#!/bin/bash

git log $* | grep -e '^ *!' | sed -e 's/^ *!/*/'
$
This script replaces any lines in the logs that start with ! to start
with * and then prints out only those lines.
If you get into the habit of putting change-log comments in your git logs
this way, this script can automate generation of the change log.
When you type in git foo, git searches your path for a file named
git-foo. If that exists, it invokes that file. Therefore, if you save
this script in your path, you can invoke it like this:
git change-log build1..build3
This would display the change log for all changes introduced between
commits build1 and build3 inclusive.
If you invoke the command with
git change-log build1..build3 --reverse
then it will reverse the order of the commits, preserving the
chronological order of the series of change log entries.
git flow

A branching model to keep you disciplined

all work is done on the feature branch
feature merges into develop
develop merges into release
release merges into main
hotfix branches created when necessary

Git Flow
git cherry-pick

apply changes from the specified commits,
reintroducing a new commit for each
git cherry-pick commit
Resources and links (in no particular order)


Pro Git
Learn Git Branching
Simulating git merge -s theirs
Why you should stop using Git rebase
Git Bisect Debugging with Feature Branches
Git First-Parent -- Have your messy history and eat it too
Maintaining a consistent linear history for git log --first-parent
Interactive rebase in Git Tower
Interactive rebase from the command line
Git Search and find
The Advanced Git Kit