Git clone without deep history make cloning faster when we only need the latest code.
First run
[tuxdna@valley tmp]$ time git clone git@github.com:tuxdna/jcalltracer.git jct1
Cloning into 'jct1'...
remote: Enumerating objects: 246, done.
remote: Counting objects: 100% (246/246), done.
remote: Compressing objects: 100% (132/132), done.
remote: Total 246 (delta 99), reused 246 (delta 99), pack-reused 0
Receiving objects: 100% (246/246), 7.72 MiB | 587.00 KiB/s, done.
Resolving deltas: 100% (99/99), done.
real 0m22.817s
user 0m0.452s
sys 0m0.143s
Second run
[tuxdna@valley tmp]$ time git clone git@github.com:tuxdna/jcalltracer.git jct2
Cloning into 'jct2'...
remote: Enumerating objects: 246, done.
remote: Counting objects: 100% (246/246), done.
remote: Compressing objects: 100% (132/132), done.
remote: Total 246 (delta 99), reused 246 (delta 99), pack-reused 0
Receiving objects: 100% (246/246), 7.72 MiB | 555.00 KiB/s, done.
Resolving deltas: 100% (99/99), done.
real 0m22.583s
user 0m0.499s
sys 0m0.179s
First run
[tuxdna@valley tmp]$ time git clone --depth 1 git@github.com:tuxdna/jcalltracer.git jct3
Cloning into 'jct3'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 50 (delta 0), reused 40 (delta 0), pack-reused 0
Receiving objects: 100% (50/50), 7.68 MiB | 941.00 KiB/s, done.
real 0m17.563s
user 0m0.429s
sys 0m0.153s
Second run
[tuxdna@valley tmp]$ time git clone --depth 1 git@github.com:tuxdna/jcalltracer.git jct4
Cloning into 'jct4'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 50 (delta 0), reused 40 (delta 0), pack-reused 0
Receiving objects: 100% (50/50), 7.68 MiB | 1017.00 KiB/s, done.
real 0m16.735s
user 0m0.383s
sys 0m0.107s
Verify the contents are same using recursive diff over source code:
[tuxdna@valley tmp]$ diff -r jct1/src/ jct4/src/
[tuxdna@valley tmp]$ for f in bin java src test-src; do diff -r jct1/$f jct4/$f; done
Only last commit present:
[tuxdna@valley tmp]$ cd jct4/
[tuxdna@valley jct4]$ git log --format=oneline | wc -l
1
All commits are present:
[tuxdna@valley jct4]$ cd ../jct1
[tuxdna@valley jct1]$ git log --format=oneline | wc -l
40
Check contents:
[tuxdna@valley tmp]$ ls jct4/
bin java LICENSE Makefile README.md src test-src TODO
Overall:
- Full clone has 246 objects vs 50 objects (in depth=1 clone)
- Full clone takes 22.82 seconds vs 16.74 seconds (in depth=1 clone)
Full clone
Receiving objects: 100% (246/246), 7.72 MiB | 587.00 KiB/s, done.
real 0m22.817s
user 0m0.452s
sys 0m0.143s
Clone with depth = 1
Receiving objects: 100% (50/50), 7.68 MiB | 941.00 KiB/s, done.
real 0m16.735s
user 0m0.383s
sys 0m0.107s
This is useful for faster automated builds with git repos.
git svn clone https://example.com/svn/project_name
When your SVN repo is huge, the above command might fail in between. In that case you just go into the partial clone and invoke git svn fetch
to resume the operations until full history is cloned:
cd project_name
until git svn fetch; do echo "Retry again in 2 seconds..."; sleep 2; done
You may also encounter this error:
Retry again in 2 seconds...
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
error: The last gc run reported the following. Please correct the root cause
and remove .git/gc.log.
Automatic cleanup will not be performed until the file is removed.
warning: There are too many unreachable loose objects; run 'git prune' to remove them.
gc --auto: command returned error: 255
Retry again in 2 seconds...
In that case first prune the repo, and then do git garbage collection. Lets fist take a look at the size of checkout:
$ du -sh .git
15.0G .git
Well that is a big repository indeed, so lets prune it first
$ git prune
Checking connectivity: 613666, done.
Check size:
$ du -sh .git
15.0G .git
There is no change in size, so lets perform git gc
. This is a resource intensive operation and it may take from minutes to hours.
$ git gc
Counting objects: 613666, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (231609/231609), done.
Writing objects: 100% (613666/613666), done.
Total 613666 (delta 429315), reused 431082 (delta 302787)
Removing duplicate objects: 100% (256/256), done.
Checking connectivity: 613666, done.
Check size:
$ du -sh .git
6.0G .git
So that is a huge reduction in size of the repo i.e. from 15G down to 6G in size !
Lets try git gc
once more:
$ git gc
Counting objects: 613666, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (105081/105081), done.
Writing objects: 100% (613666/613666), done.
Total 613666 (delta 429315), reused 613666 (delta 429315)
Checking connectivity: 613666, done.
$ du -sh .git
6.0G .git
Size is still the same, so apparently that is the best that git gc
can do.
$ find . -name "*.scala" -exec git log -1 "{}" \; | grep 'Author:' | grep -EiEio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' | sort | uniq -c | sort -nr
Reference: http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/
git remote add r1 https://gist.github.com/tuxdna/40fedb45887e11f98e5279e842bdbf69
git fetch r1
git pull r1 master --allow-unrelated-histories
# resolve conflicts
git commit
git push