Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@threepointone
Last active September 9, 2020 07:57
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save threepointone/d62b4d92a1e92df5f2f4d2d91a0582cd to your computer and use it in GitHub Desktop.
Save threepointone/d62b4d92a1e92df5f2f4d2d91a0582cd to your computer and use it in GitHub Desktop.

(This is a draft for something I'm writing internally but figured it would be useful for everyone.)

tl;dr -

  • make sure git --version returns 2.27.0 or higher.
  • git clone --filter=blob:none --sparse <repo> --depth=1
  • cd <repo>
  • git sparse-checkout set <path> <path> <...path>

So. You've just joined a new product team, and you got a fresh laptop, and you're ready to write some code. You head to the github/bitbucket/internal git hosting page, and notice the codebase is HUGE. There could be many reasons for this -

  • It could be a so called 'monorepo', hosting many applications and dependencies, being worked on by many teams concurrently.
  • It could have a long history, possibly spanning decades, and thousands and thousands of commits.
  • It could be holding a number of large files, like movies or large .psd/.ai/etc asset files, heavy on graphics/audio/video.
  • [more reasons?]

Now you could run git clone <path> and head off for a couple of hours to get introduced to office gossip and terrible coffee, but you're smarter than that. If only there was a way to:

  • checkout just a slice of the codebase, with only the folders you're interested in.
  • only the latest code, since you're not interested in having the past history of the codebase on your local machine

Drum roll... This is totally doable! The git feature is called a 'sparse checkout'; it was introduced in January this year (2020). This github post goes into some detail and is a recommended read. Combined with --depth <n> which only brings the state/commits of the repo for the last n commits, we can get only the part of the repository we want.

(inb4; mercurial fans will love to point out that sparse profiles have been a thing with hg for a long time now, but I'd like to remind them that svn had it for years before that, so phbbt.)

NB: It's worth noting the disclaimer on this page, "THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE." Keep an eye out for any changes in syntax/commands, and we'll make a note to keep this article up to date as we learn of any changes.

NB2: The existing documentation seems to be broken/incorrect, but my above snippet (under tl;dr) works.

Further ideas:

  • Some approximation of sparse 'profiles', so devs don't have to write folder names themselves, they could read it from a plain text file. Different teams could have separate text files with folder lists in them.
  • Some kind of helper to verify whether any missing dependent folders haven't been checked out, either via calculating a dependency graph, or something else.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment