The majority of blogs and documentation about git are one of two things:
- technically correct while being very difficult for newbies to digest,
OR
- technically incorrect "guides" which help people learn, but miss many CS fundamentals and details and/or advise learners to "avoid/ignore" "advanced" features which software engineers use daily on the job.
This workshop serves to instruct on the industry usage of git using methods and theory that are agreed-upon by most senior engineers, and to do so while covering CS and SWE fundamentals. As such, this workshop will address the features of git which many blogs state are "too hard to use" such as rebasing, as these git workflows are considered a fundamental practice of intermediate git operation when collaborating on teams.
git is a command line tool for version tracking created by Linus Torvalds. Github is not git. Github is a website that allows for UI (Graphical User Interface) interaction with git features and functionality, but it is not necessary to use Github in order to use git.
git is the most popular version control tool used by developers today. In previous eras, SVN or Subversion was a similar version control tool.
An online version of man git help
: https://git.github.io/htmldocs/git.html
A great instructional site with awesome graphics. (SCM stands for Source Code Management).
https://git.github.io/htmldocs/gitworkflows.html
https://git.github.io/htmldocs/howto-index.html https://news.ycombinator.com/item?id=3762710
https://www.kernel.org/pub/software/scm/git/docs/user-manual.html
- Avoid merge commits, as they are noisy.
- Don't rebase history that other engineers are working on, or you may make them sad, angry, or worse.
Remember Google Maps? Dijkstra's "shortest path" algorithms continue to serve us.
Princeton Algorithms Class (available as an MOOC on Coursera site with Sedgewick, highly recommended):
https://algs4.cs.princeton.edu/lectures/42DirectedGraphs.pdf
Blog posts from industry engineers that I've collected for this workshop. These blog posts don't seem to say anything outwardly incorrect (a few blogs out there try to make git seem really simple, and by doing so, summarize the features improperly).
From the Jayway post: functional programming helps with concurrent resource access (and multithreading) .... and collab
"If ... someone was in the process of iterating through that [old, now-mutated] list, they now get a nice exception."
- So, it's pretty clear that functional data structures are great for concurrent accesses of the same memory!
- It's also really clear that functional data structures are great for multithreaded appliations!
- For use cases where you want to update data without affecting what "other people" (and things) are doing.
- This would apply directly to collaborating with multiple people on a code repository.
That snapshot is of current state of files and metadata (commit hash, commit message, comment, author, time, pointer to parent...)
C +---> B +---> A
C is master
git does the following:
- moves current branch pointer
- makes master point to D, the new / current commit
- the history C -> B -> A is the "parent" of our new commit D. That parent is
^master
, and D ismaster
.
D +---> C +---> B +---> A
D is master
C is ^master
and C is the parent of D
- Make the drawings correct! point "back in time" ... at previous commit. Check this helpful graphic out
- The git workflow will look at the "previous commit" as the "parent" of current commit. Check this out. You'll notice in one of the drawings that "the
current commit
ismaster
and theprevious commit
ismaster^, the parent of master
." That really makes sense!
- A commit can have many labels, including its hash, pointer to previous commit (its parent), author, time, etc.
- HEAD is a label for "currently active commit".
- A branch name is just a label for a commit. You can
git checkout 0289789c
orgit checkout branch-name
, right? Try it. - So you can check out either a branch-name, OR a commit by its hash.
- So it makes sense to conclude that a branch-name is related to a commit.
- Hence, a branch-name is pretty much a
label
for a commit. Checking out a branch name or a commit
makes the labelHEAD
point at that commit.
The .git
directory in your repo root will have a text file called HEAD that shows you what branch you're checked out on.
git checkout master
- Navigate to
.git
directory andcat HEAD
. You'll see something like:
$ cd ~/arepo
$ ls -al
$ cd .git
$ cat HEAD
ref: refs/heads/master
$
- Looks like you're on master!
- Checkout a branch
- git "scoots" the HEAD pointer along. Now, "you're at HEAD."
- The details:
- In another terminal window,
git checkout -b new-branch
. - In first terminal window (where you're in the
.git
directory), run thecat HEAD
again. You'll see something like:
$ cat HEAD
ref: refs/heads/new-branch
- Clearly your HEAD is at new branch. Recall that as you make new commits that
HEAD
andmaster
will "scoot along." Recall also thatHEAD
andbranch-name
are just labels for commits. - Does "normal workflow" include "going to .git directory and doing a cat on HEAD text file to see refs?" Nope. I'm just providing an in-depth way to examine "what is going on under the hood with git." Getting into the
.git
directory is a part of understanding git like an engineer.
- Checkout a branch
- git does NOT "scoot" the HEAD pointer along. Now, "you're NOT at HEAD."
- Repeat the process above, but check out a specific commit by HASH. (Use
git log
to grab a hash). - Once you checkout that commit, git will warn you that you're in detached head state, and this:
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
... amongst other things, is what's going on..
Why can we make commits without affecting other branches? Well, because Detached Head State doesn't move HEAD. You're not "at HEAD," so the changes that you make while in Detached Head State don't get saved to any branch, unless you do stuff. But committing while in Detached Head State isn't really a common workflow.
If you cat HEAD text file to see local refs, you'll see:
$ cat HEAD
987654321
~/arepo/.git ((HEAD detached at 987654321)*) $
Note the listing of a commit hash rather than a branch is another indication that you're in Detached Head State.
- These ["scoot along as you check out new commits" (see helpful graphics)(https://medium.com/girl-writes-code/git-is-a-directed-acyclic-graph-and-what-the-heck-does-that-mean-b6c8dec65059).
See the git SCM site for more details on branching and merging
-
git branch The concept of branching is fairly simple and has been covered, see the above link for details.
-
git merge: We're going to discuss how merging works on a team using Github. There are indeed CLI strategies for merging,
ours
,theirs
, et al. But you probably won't use those on a team.
Instead, you'll be following this kind of workflow.
- Workshop team activity: Pull, commit, push, Github PR, merge
Pull
down master fromremote
(more ongit pull
later and why it can be a problem)- Install packages (think of this as static and/or dynamic linking and loading, if you're a compiler person)
- create a new branch off of master
- now your HEAD is at the new branch
- Write code
locally
, save it, andcommit
it. Push
your branch up to theremote
repo.Request a PR
Review from your teammates.- Address issues.
Push
up new commits as needed. - After approval,
merge
the branch. - That merge will create what is called a "
merge commit
." That's a commit that happens when you merge stuff. Merge commits create extra "noise" in the commit history of your repository. That's why a lot of engineers usegit fetch
followed bygit rebase
orgit merge
instead of usinggit pull
. More on Stack Overflow about git pull
Great, that was pretty easy.
Use cases in which that may not suffice:
- What about if you have a branch for longer than a day? (a long term feature development branch, for example). Tons of commits will happen to master by the team between the time you pull down changes, and the time you push up your branch for a PR.
- What about if your teammate merges some code into master that you need for your local branch/work?
In that case, you want to "update the master/commit" that YOUR branch, branches off of.
You need to "update" your
local
working repository and "sync" it with thelatest
version of master.
Remember how I mentioned all those blog posts by developers that tell you to "just ignore git rebase because it's hard"?
Yeah, that's what we are going to do: git rebase
.
- From your branch:
git pull -r origin master
What's this do? A rebase pull, instead of afetch && merge
pull like above.-r
stands for rebase.- We are syncing up with the remote set as
origin
, andmaster
branch. - This will pull in the latest changes from master and then "replay" YOUR
local
commits "on top of" the master branch commits. - Like this: (note that this is not syntactically correct, just conceptual):
$ git log
your commit yesterday 982374328
comment rad stuff
your commit the day before yesterday 47328472831798
comment even radder stuff
your teammate's commit today
comment super awesome stuff
your teammate's commit yesterday
comment super super great stuff
... See? It's no longer just "sequential."
- Workshop team activity: Interactive rebase, resolve merge conflicts
- Do not do this on a public or shared branch. Avoid making people sad, angry, or worse.
- You may also need to create a fresh branch based on upstream.
- You may also run into a
conflict
, which you'll need to resolve usinggit rebase -- continue
,git rebase -- skip
, and commits, all intertwined together, as covered here in git scm and here on Hackernews. - See also: interactive rebase
- Don't be scared of git rebase
- Make sure you force push your branch up, and remember, other people can't be working on it, or they'll be sad, angry, or worse after you rewrite history!
- Workshop team activity: Do a hotfix
from git scm: "At this stage, you’ll receive a call that another issue is critical and you need a hotfix. You’ll do the following:
- Switch to your production branch.
- Create a branch to add the hotfix.
- After it’s tested, merge the hotfix branch, and push to production.
- Switch back to your original story and continue working.
- Semantic versioning semver.org
npm run release:patch
/ major / minor et al- verify that semver bump worked. look at Changelog if one exists (hopefully you made one!)
- git tags to deploy to CI (
git push && git push --tags
is one option) - Build artifacts covered by CI
- What is continuous integration
http://ericsink.com/vcbe/html/basics_clone.html
https://github.com/psas/psas-git-workshop
https://github.com/git-tips/tips#track-upstream-branch
https://longair.net/blog/2009/04/16/git-fetch-and-merge/
https://stackoverflow.com/questions/292357/what-is-the-difference-between-git-pull-and-git-fetch
https://www.raywenderlich.com/74258/git-tutorial-intermediate
https://www.learnenough.com/git-tutorial Note: this site says to ignore git rebase....we will most definitely not be following that advice. Merge commits are noisy and to be avoided.