Skip to content

Instantly share code, notes, and snippets.

@tuxdna
Last active Oct 25, 2020
Embed
What would you like to do?
Git clone without deep history

Git references

Git clone without deep history

Git clone without deep history make cloning faster when we only need the latest code.

With full history

First run

[tuxdna@valley tmp]$ time git clone git@github.com:tuxdna/jcalltracer.git jct1
Cloning into 'jct1'...
remote: Enumerating objects: 246, done.
remote: Counting objects: 100% (246/246), done.
remote: Compressing objects: 100% (132/132), done.
remote: Total 246 (delta 99), reused 246 (delta 99), pack-reused 0
Receiving objects: 100% (246/246), 7.72 MiB | 587.00 KiB/s, done.
Resolving deltas: 100% (99/99), done.

real	0m22.817s
user	0m0.452s
sys	0m0.143s

Second run

[tuxdna@valley tmp]$ time git clone git@github.com:tuxdna/jcalltracer.git jct2
Cloning into 'jct2'...
remote: Enumerating objects: 246, done.
remote: Counting objects: 100% (246/246), done.
remote: Compressing objects: 100% (132/132), done.
remote: Total 246 (delta 99), reused 246 (delta 99), pack-reused 0
Receiving objects: 100% (246/246), 7.72 MiB | 555.00 KiB/s, done.
Resolving deltas: 100% (99/99), done.

real	0m22.583s
user	0m0.499s
sys	0m0.179s

Without full history

First run

[tuxdna@valley tmp]$ time git clone --depth 1 git@github.com:tuxdna/jcalltracer.git jct3
Cloning into 'jct3'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 50 (delta 0), reused 40 (delta 0), pack-reused 0
Receiving objects: 100% (50/50), 7.68 MiB | 941.00 KiB/s, done.

real	0m17.563s
user	0m0.429s
sys	0m0.153s

Second run

[tuxdna@valley tmp]$ time git clone --depth 1 git@github.com:tuxdna/jcalltracer.git jct4
Cloning into 'jct4'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 50 (delta 0), reused 40 (delta 0), pack-reused 0
Receiving objects: 100% (50/50), 7.68 MiB | 1017.00 KiB/s, done.

real	0m16.735s
user	0m0.383s
sys	0m0.107s

Verification

Verify the contents are same using recursive diff over source code:

[tuxdna@valley tmp]$ diff -r jct1/src/ jct4/src/
[tuxdna@valley tmp]$ for f in bin java src test-src; do diff -r jct1/$f jct4/$f; done

Check history

Only last commit present:

[tuxdna@valley tmp]$ cd jct4/
[tuxdna@valley jct4]$ git log --format=oneline | wc -l
1

All commits are present:

[tuxdna@valley jct4]$ cd ../jct1
[tuxdna@valley jct1]$ git log --format=oneline | wc -l
40

Check contents:

[tuxdna@valley tmp]$ ls jct4/
bin  java  LICENSE  Makefile  README.md  src  test-src  TODO

Summary

Overall:

  • Full clone has 246 objects vs 50 objects (in depth=1 clone)
  • Full clone takes 22.82 seconds vs 16.74 seconds (in depth=1 clone)

Full clone

Receiving objects: 100% (246/246), 7.72 MiB | 587.00 KiB/s, done.

real	0m22.817s
user	0m0.452s
sys	0m0.143s

Clone with depth = 1

Receiving objects: 100% (50/50), 7.68 MiB | 941.00 KiB/s, done.

real	0m16.735s
user	0m0.383s
sys	0m0.107s

This is useful for faster automated builds with git repos.

Creating a git repo from existing SVN repo

Pulling SVN repo history into a git repository:

git svn clone https://example.com/svn/project_name

When your SVN repo is huge, the above command might fail in between. In that case you just go into the partial clone and invoke git svn fetch to resume the operations until full history is cloned:

cd project_name
until git svn fetch; do echo "Retry again in 2 seconds..."; sleep 2; done

You may also encounter this error:

Retry again in 2 seconds...
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
error: The last gc run reported the following. Please correct the root cause
and remove .git/gc.log.
Automatic cleanup will not be performed until the file is removed.

warning: There are too many unreachable loose objects; run 'git prune' to remove them.

gc --auto: command returned error: 255

Retry again in 2 seconds...

In that case first prune the repo, and then do git garbage collection. Lets fist take a look at the size of checkout:

$ du -sh .git
15.0G	.git

Well that is a big repository indeed, so lets prune it first

$ git prune
Checking connectivity: 613666, done.

Check size:

$ du -sh .git
15.0G	.git

There is no change in size, so lets perform git gc. This is a resource intensive operation and it may take from minutes to hours.

$ git gc
Counting objects: 613666, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (231609/231609), done.
Writing objects: 100% (613666/613666), done.
Total 613666 (delta 429315), reused 431082 (delta 302787)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 613666, done.

Check size:

$ du -sh .git
6.0G	.git

So that is a huge reduction in size of the repo i.e. from 15G down to 6G in size !

Lets try git gc once more:

$ git gc
Counting objects: 613666, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (105081/105081), done.
Writing objects: 100% (613666/613666), done.
Total 613666 (delta 429315), reused 613666 (delta 429315)
Checking connectivity: 613666, done.

$ du -sh .git
6.0G	.git

Size is still the same, so apparently that is the best that git gc can do.

Most recent contributors across project

$ find . -name "*.scala" -exec git log -1 "{}" \; | grep 'Author:' | grep -EiEio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' | sort | uniq -c | sort -nr

Moving Files from one Git Repository to Another, Preserving History

Reference: http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/

git remote add r1 https://gist.github.com/tuxdna/40fedb45887e11f98e5279e842bdbf69
git fetch r1
git pull r1 master --allow-unrelated-histories
# resolve conflicts
git commit
git push
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment