Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active January 3, 2023 09:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/6837c1e9d38b49df3dd89d8a44914826 to your computer and use it in GitHub Desktop.
Save misho-kr/6837c1e9d38b49df3dd89d8a44914826 to your computer and use it in GitHub Desktop.
Summary of "Introduction to version control with Git" from Datacamp.Org

Discover the importance of version control when working on data science projects and explore how to use Git to track files, compare differences, modify and save files, undo changes, and allow collaborative development through the use of branches. Introduction to the structure of a repository, create new repositories and clone existing ones, and show how Git stores data. Skills to handle conflicting files.

By George Boorman, Analytics and Data Science Curriculum Manager, DataCamp

Ressources: Git Cheatsheet

1. Introduction to Git

Learn what version control is and why it is essential for data projects. Discover what Git is and how to use it for a version control workflow.

  • Version control is a group of systems and processes to manage changes made to documents, programs, and directories
  • Why is version control important?
  • Git is not GitHub, but it's common to use Git with GitHub
  • Benefits of Git
  • Using Git
    • Repository
    • Staging and committing
    • Comparing with diff
$ git --version
$ git status
$ git add .
$ git commit -m "initial commit"
$ git diff -r HEAD filename

2. Making changes

Examine how Git stores data, learn essential commands to compare files and repositories at different times, and understand the process for restoring earlier versions of files in your data projects.

  • The commit structure - metadata, tree, blob
  • Git log and hash
  • What changed between two commits?
  • Unstaging a file and restoring last version of file
  • Customizing the log output
  • Cleaning a repository
$ got log -2
$ git show c27fa856
$ git annotate report.md
$ git reset HEAD summary_statistics.csv
$ git checkout -- summary_statistics.csv
$ git checkout .
$ git log --since='Apr 2 2022' --until='Apr 11 2022'
$ git clean -n && git clean -f

3. Git workflows

Tips and tricks for configuring Git to make you more efficient! Discover branches, identify how to create and switch to different branches, compare versions of files between branches, merge branches together, and deal with conflicting files across branches.

  • Levels of settings - local repo, global and system
  • Ignoring specific files
  • Branches
    • Creating, reporting, merging
    • The difference between branches
    • Switch between branches
    • Handling conflicts
$ git config --list
$ git config --global user.name 'John Smith'
$ git config --global alias.ci 'commit -m'
$ git checkout -b report
$ git diff main summary-statistics
$ git merge source destination
$ git mergetool

$ cat merge.txt
<<<<<<< HEAD
this is some content to mess with
content to append
=======
totally different content to merge later
>>>>>>> new_branch_to_merge_later

4. Collaborating with Git

Introduction to remote repositories and how to work with them to synchronize content between the cloud and your local computer. Create new repositories and clone existing ones, discover a workflow to minimize the risk of conflicts between local and remote repositories.

  • Creating repos
  • Remote repos
  • Collaborating on Git projects
    • Fetching from a remote
    • Synchronizing content
    • Pulling from a remote
    • Pushing to a remote
    • Resolving a conflict
$ git init
$ git init mental-health-workspace

$ git remote -v
$ git clone path-to-project-directory
$ git clone https://github.com/datacamp/project
$ git remote add george https://github.com/george_datacamp/repo

$ git fetch origin main
$ git merge origin main
$ git pull origin main
$ git push remote local_branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment