mhimanshu0101/gitpull.md

## gitpull.md

      
    Raw
  

              gitpull.md
            
          
    Why cloning a new Git repo is so slow?

If you are cloning a 100MB repo, it will take around 5-10min to clone the repo even at 100MB/sec.
To understand why, we need to take a look at how git stores file changes and how it fetches repo over the network.
Git stores snapshot of each files you have changed in each commit.
Suppose your repo has 2,000 commits and 20 files changed in each commit then there will be 40,000 snapshots (+ number of files in repo).
When you do git clone, it internally uses git fetch.
You generally have remote URLs of 2 type HTTP or SSH based both of them uses TCP internally.
TCP is a 3 way handshake protocol, it means to create a TCP connection your device has to ask the server that it can connect or not.
Suppose your latency with the git server is 100ms, for creating a TCP connection it will take 300ms.
For sending a repo with 50,000 files, it will have to create over 50,000 TCP connections.
Git also does some compression in the server side, so it also adds some extra time while fetching.
How can you speed up the Git clone?
If your repo has a long history:
git clone --depth=1
then
git fetch --unshallow

Else:
Create a git bundle
git bundle create
then
git clone bundle

If you own the git server, take a look at the pack compression doc.