Skip to content

Instantly share code, notes, and snippets.

@indradhanush
Created February 27, 2023 08:33
Show Gist options
  • Save indradhanush/257ddebeb33d66d2ba06c1e688b2bd32 to your computer and use it in GitHub Desktop.
Save indradhanush/257ddebeb33d66d2ba06c1e688b2bd32 to your computer and use it in GitHub Desktop.
Sourcegraph: Where do we ensure that uncloned repositories are cloned? And how often? And why?

Where do we ensure that uncloned repositories are cloned? And how often? And why?

Modes of cloning a repo

In gitserver we may clone a repo if it receives a request to one of the following endpoints:

  1. /exec or /archive: If any of the allowed exec commands (gitCmdAllowList) are "executed" on a repo through we may clone the repo in maybeStartClone
  2. /repo-update: If a request to update a repo is received and it is not cloned, then we will clone a repo
  3. /repo-clone: If a request to clone the repo is received
  4. /search: If a search is made in a repo and the repo is not cloned

Finally, gitserver may also reclone a repo as part of the cleanup jobs (more on it in the next section)

Frequency and reclone reasons

  1. The frequency of attempting to clone / reclone a repo in the cleanup jobs is controlled with the help of a git config named sourcegraph.recloneTimestamp which allows the cleanup job to decide if it needs to reclone the repo or not based on hardcoded repo TTL values:
    • repoTTL: 45 days - reclone due to a repo being marked as "old" (if recloning is cheaper than running git-gc)
    • repoTTLGC: 2 days - reclone due to git-gc issues being seen in this repo for longer than this period and the repo being not recloned for at least the last 2 days
    • sgmRetries: If sg maintenance had too many retries - but this is effectively dead code at this point since we have sgm disabled by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment