Skip to content

Instantly share code, notes, and snippets.

@toraritte
Created August 1, 2022 02:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save toraritte/ee35da9aadca4c4cf18b39d864cd5f8a to your computer and use it in GitHub Desktop.
Save toraritte/ee35da9aadca4c4cf18b39d864cd5f8a to your computer and use it in GitHub Desktop.
Re-imagining the documentation for Nix's `builtins.fetchGit`

The original builtins.fetchGit docs; see also comment in Nix issue #5128.


builtins.fetchGit referenceToGitRepo -> storeResult

Fetch a Git repo.

0. Pseudo Types

referenceToGitRepo = URL | path | gitArgs

URL :: string = httpURL | httpsURL | ftpURL | fileURL
  Supported code hosting services: GitHub, GitLab, SourceHut.

httpURL = string
  Needs to conform to the http:// URI scheme (see RFC 9110, section 4.2).

httpsURL = string
  Needs to conform to the https:// URI scheme (see RFC 9110, section 4.2).

ftpURL = string
  Needs to conform to the ftp:// URI scheme (see RFC 1738, section 3.2).

webLikeURL :: string = httpURL | httpsURL | ftpURL

fileURL :: string = "file://" + fileURLPathPart
fileURLPathPart = string
  fileURLPathPart is a shorthand for fileURL (i.e., it will be prefixed with "file://" during evaluation) therefore both need to conform to the file:// URI scheme (see the path syntax of RFC 8089).

path = Nix path | fileURLPathPart

gitArgs :: attribute set =
  { url :: (URL | path);
    [ name :: string ? "source" ];
    [ ref :: gitReference ? "HEAD" ];
    [ rev :: gitFullCommitHash ? <ref dereferenced> ];
    [ submodules :: boolean ? false ];
    [ shallow :: boolean ? false ];
    [ allRefs :: boolean ? false ];
  }

webLikeGitArgs :: attribute set =   (Erlang-y)  gitArgs#{ url :: webLikeURL; }   (Haskell-y) gitArgs { url :: webLikeURL; }

pathLikeGitArgs :: attribute set =   (Erlang-y)  gitArgs#{ url :: (path | fileURL); }   (Haskell-y) gitArgs#{ url :: (path | fileURL); }

webLike = webLikeURL | webLikeGitArgs   Argument that is a URL or has a URL member conforming to the http://, https://, and ftp:// URI schemes.

pathLike = path | fileURL | pathLikeGitArgs   Argument that resolves or has a member that resolves to a file system path.

gitReference = string
  Needs to be valid Git reference.

gitFullCommitHash = string
  Has to be full SHA-1 (for now object name (40-byte hexadecimal string) that refers to an existing commit in the repo.

storeResult :: attribute set =
  { lastModified :: ?;
    lastModifiedDate :: ?;
    narHash :: ?;
    outPath :: nixStorePath;
    rev :: gitFullCommitHash;
    revCount :: ?;
    shortRev :: ?;
    submodules :: boolean;
  }

1. Behaviour

builtins.fetchGit behaves differently when called with pathLike or webLike arguments.

1.1 "Web-like" semantics

These sections describe the behaviour of builtins.fetchGit when called with webLike arguments:

1.1.1 webLikeURL type argument

NOTE

The file:// URI scheme is omitted on purpose, and is discussed in section 1.2 "Path-like" semantics.

Table 1.1.1-1 builtins.fetchGit string
String format Outcome
webLikeURL httpURL "http://..." The latest commit (or HEAD)
of the repo's default branch
(typically called main or
master) will be fetched.
httpsURL "https://..."
ftpURL "ftp://..."

HTTPS examples with the supported code hosting sites:

builtins.fetchGit "https://github.com/NixOS/nix"
builtins.fetchGit "https://git.sr.ht/~rycee/configurations"
builtins.fetchGit "https://gitlab.com/rycee/home-manager"

1.1.2 webLikeGitArgs type argument

NOTE

gitArgs attributes rev and ref will only be discussed in subsequent sections, but they also needed to be addressed here because of the significant role they play regarding the call results.

Table 1.1.2-1 builtins.fetchGit attribute set
gitArgs
attributes
Outcome Example argument Example resolved to full webLikeGitArgs attribute set
url
attribute

(mandatory)
rev
attribute

   (optional)   
ref
attribute

   (optional)   
webLikeURL omitted1.1.2-1 omitted
(or default value of HEAD used)
Same as
builtins.fetchGit webLikeURL
(see Table 1.1.1-1 above)
{ url = "https://github.com/nixos/nix"; }
{
  url = "https://github.com/NixOS/nix";
  name = "source";
  ref = "HEAD";
  rev = "<SHA-1 commit hash of HEAD>"
  submodules = false;
  shallow = false;
  allRefs = false;
}
present ignored1.1.2-2 Fetch repo at rev commit
{ url = "https://github.com/nixos/nix";
  rev = "be4654c344f0745d4c2eefb33a878bd1f23f6b40";
}
{ url = "https://github.com/nixos/nix";
  name = "source";
  ref = ""
  rev = "be4654c344f0745d4c2eefb33a878bd1f23f6b40";
  submodules = false;
  shallow = false;
  allRefs = false;
}
omitted1.1.2-1 present Fetch repo at ref branch / tag
{ url = "https://github.com/nixos/nix";
  ref = "refs/tags/2.10.3";
}
{ url = "https://github.com/nixos/nix";
  name = "source";
  ref = "refs/tags/2.10.3";
  rev = "309c2e27542818b74219d6825e322b8965c7ad69";
  submodules = false;
  shallow = false;
  allRefs = false;
}

[1.1.2-1]: See section 3.3 rev

[1.1.2-2]: See section 3.4 ref

1.2 "Path-like" semantics

Calls with pathLike arguments attempt to fetch a repo in a directory on a local or remote file system. The target repo may be a project under active development so their status and state may need to be determined before trying to copy the repo to the Nix store.

1.2.1 Git repository characteristics

That is, characteristics that builtins.fetchGit cares about.

1.2.1.1 Status

The status of a Git repo is

  • dirty, if there are modified tracked files and/or staged changes. (Untracked content does not count.)

  • clean, if the output of git diff-index HEAD is empty. (If there are only untracked files in git status, the repo is clean.)

1.2.1.2 State

The state of a Git repo is the specific commit where the HEAD reference points to (directly or indirectly) at the moment when the repo is fetched.

Directly, if the repo is in a "detached HEAD" state, and indirectly when the commit is also the target of other references as shown on the figure below.

Visualized Git repo showing two branches and HEAD points to a commit that is tagged and is also the head of a branch.

1.2.1.2-1. State of a Git repo

LEGEND: orange label = branch, blue label = tag

1.2.2. Argument of type Nix path, fileURL, or fileURLPathPart

Table 1.2.2-1
STATUS De-reference process
dirty clean
STATE on BRANCH Copy directory contents verbatim Fetch repo at HEAD of BRANCH HEAD -> refs/heads/BRANCH -> <SHA-1 commit hash>
at TAG Copy directory contents verbatim Fetch repo at TAG HEAD -> refs/tags/TAG -> <SHA-1 commit hash>
detached HEAD Copy directory contents verbatim Fetch repo at HEAD HEAD -> <SHA-1 commit hash>

In fact, the 3 "STATE" rows could easily be collapsed into one as Git branches and tags are only labels to a Git object and what matters to fetchGit is the specific commit at the end of the de-reference process.

Example calls:

  • via Nix path:
    builtins.fetchGit ~/clones/nix

  • via fileURL:
    builtins.fetchGit "file:///home/nix_user/clones/nix"

  • via fileURLPathPart:
    builtins.fetchGit "/home/nix_user/clones/nix"

1.2.3 pathLikeGitArgs type argument

This means one of the following:

  • via { url :: Nix path } builtins.fetchGit { url = ~/clones/nix; ... }

  • via { url :: fileURLPathPart } builtins.fetchGit { url = "/home/nix_user/clones/nix"; ... }

  • via { url :: fileURL } builtins.fetchGit { url = "file:///home/nix_user/clones/nix"; ... }

The following table takes advantage of the fact that state is simply determined by the current value of the HEAD reference:

NOTE

gitArgs attributes rev and ref will only be discussed in subsequent sections, but they also needed to be addressed here because of the significant role they play regarding the call results.

Table 1.2.3-1.
    STATUS     gitArgs
attributes
        Outcome         Example argument
url
attribute

         (mandatory)         
rev
attribute

   (optional)   
ref
attribute

   (optional)   
dirty   Nix path
| fileURLPathPart
| fileURL

(See examples at the top of this section.)
omitted1.2.3-1 omitted
(or default value of HEAD used)
Copy directory contents verbatim
{ url = "https://github.com/nixos/nix"; }
clean omitted1.2.3-1 omitted
(or default value of HEAD used)
Fetch repo at HEAD
ignored1.2.3-2 present ignored1.2.3-3 Ignore changes (if any) and fetch repo at rev commit
{ url = "https://github.com/nixos/nix";
  rev = "be4654c344f0745d4c2eefb33a878bd1f23f6b40";
}
ignored1.2.3-2 omitted1.2.3-1 present Ignore changes (if any) and fetch repo at ref tag / branch
{ url = "https://github.com/nixos/nix";
  ref = "refs/tags/2.10.3";
}

[1.2.3-1]: See section 3.3 rev

[1.2.3-2]: When ref or rev is present, the intention is probably to fetch a known past state from the repo's history, thus the most recent changes are not relevant (neither the status of the repo).

[1.2.3-3]: See section 3.4 ref

As a corollary, here are some tips:

  • If you need to fetch a local repo, calling builtins.fetchGit with ref (branch or tag) or rev (commit hash) will make sure that a repo is fetched with a predictable content, ignoring any changes that may have been made since you last touched it.

  • If you are packaging a project under active development and want to test changes without commiting, you'll probably want to call builtins.fetchGit with { url = ...; } or the specified in 1.2.2. Argument of type Nix path, fileURL, or fileURLPathPart.

2. gitArgs attributes

Reminder:

gitArgs :: attribute set =
  { url :: (URL | path);
    [ name :: string ? "source" ];
    [ ref :: gitReference ? "HEAD" ];
    [ rev :: gitFullCommitHash ? <ref dereferenced> ];
    [ submodules :: boolean ? false ];
    [ shallow :: boolean ? false ];
    [ allRefs :: boolean ? false ];
  }

3.1 url (mandatory)

Description This attribute is covered extensively in section 1. Behaviour (specifically, in sections 1.1.2 webLikeGitArgs type argument and 1.2.3 pathLikeGitArgs type argument).
Type string
Default value none

3.2 name (optional)

Description The name part of the Nix store path where the Git repo's content will be copied to.
Type string
Default value "source"

Examples:

nix-repl> builtins.fetchGit { url = ./.; }
{ ...; outPath = "/nix/store/zwp1brk7ndhls3br4hk4h9xhpii17zs5-source"; ...; }

nix-repl> builtins.fetchGit { url = ./.; name = "miez"; }
{ ...; outPath = "/nix/store/zwp1brk7ndhls3br4hk4h9xhpii17zs5-miez"; ...; }

3.3 rev (optional)

Description The rev attribute is used to refer to a specific commit by the full SHA-1 Git object name (40-byte hexadecimal string) - or as it is more colloquially called, the commit hash.
Type string
Additional
constraints
40-byte hexadecimal SHA-1 string
Default value The dereferenced value of the Git reference held by the ref attribute. (See next section.)

Sections 1.1.2 webLikeGitArgs type argument and 1.2.3 pathLikeGitArgs type argument) in 1. Behaviour describe the prevailing behaviour builtins.fetchgit when the rev attribute is used.

NOTE

Specifying the rev attribute will render the ref attribute irrelevant no matter if it is included in the input attribute set or not. See next section for more.

3.4 ref (optional)

Description The ref attribute accepts a Git reference that is present in the target repo.
Type string
Additional
constraints
See Git reference syntax
Default value "HEAD"

WARNING

By default, the ref value is prefixed with refs/heads/. After Nix 2.3.0, it will not be prefixed with refs/heads/ if ref starts with refs/.

3.3.1 ref attribute ignored when the rev attribute is provided

The rev attribute (i.e., the commit hash) has higher specificity; a ref reference will need to be resolved and its value may change with time, but a commit hash will always point to the same exact commit object and thus to the same state of the the repo during the lifetime of a Git repo. (TODO: right?)

4. Examples

TODO: Re-work original examples


TODO/NOTE: Stopping here for now to wait for the resolution of comment on Nix issue #5128

@toraritte
Copy link
Author

Quick notes (expect this to be edited):

  • The in-docs links (e.g., links to types such as "string", "attribute set", etc.) are just empty ones for now.

  • The inspiration for the "pseudo types" section come from the Erlang docs that is also a dynamically typed language, but has a "sub-language" for type specification for functions and each function doc starts with these, thus making the descriptions unambiguous.

improvement ideas:

  • Make "Types" section collapsible and add hover tooltip to expand type mentions in the descriptive portion of this reference.

  • Link footnotes and in-doc references

  • Make typespec notes less prominent (italics doesn't seem to do much. CSS?)

...and questions:

  • How to treat ancillary texts?
  • Levels of detail?

@toraritte
Copy link
Author

How do Git revisions and references relate to each other?:

TL;DR

A Git reference (i.e., the alternative name of a Git object) can be a Git revision (i.e., a Git object query)example-1,, but not vice versaexample-2 because:

  • (MANY-TO-ONE) A ref can only stand for a single Git object, but each Git object can have many refs.

  • (ONE-TO-MANY) A rev can resolve to one or more Git objects.

[example-1]: git log main
[example-2]: git log main..topic (even though the revision main..topic uses two references, main and topic).

Git references (refs)

Many-to-one relationship between:

┌─────────┐             ┌──────┐
│  Git    │ *         1 │  Git │
│reference├────────────►│object│
└─────────┘             └──────┘

Git reference points to a singular Git object2, and multiple Git references can point
to the same Git object.

To belabor the point:

[1]: Or "alias", "pointer", "label", etc.
[2]: There are 4 Git object types: tree, blob, commit, and tag.
[3]: For now, at least.

For example:

[~/my-project]$ git cat-file --batch-check --batch-all-objects
10d5ab2b502faadff680c6904cbd60d7a8b5d0af tree 34
11f61d01b7af5c657c13109777a577ef6a3d3a7a tree 34
1d41fcffd528c1ee950b630d939407fe5f3b22d0 tree 34
40267b7fcf0d4490a45e0d70618a5d7b63895a60 blob 25
5a6bdceda9ae20b80fed214776b4423f522f2d01 tree 68
5b76730490981c045b186fd9651f91f0492c5b07 blob 12
5f45e9c854941c72deb9d36fb3e95e4feb4d698f commit 234
64a77169fe44d06b082cbe52478b3539cb333d45 tree 34
6692c9c6e231b1dfd5594dd59b32001b70060f19 commit 237
740481b1d3ce7de99ed26f7db6687f83ee221d67 blob 50
83cb3ab54ca122d439bdd9997a21f399cac69692 blob 16
864333c0eccabdaba6df27166ac616c922569b47 blob 42
abb08192ed875ef73fa66029994aa2f6700befd0 commit 231
c277976fce0b2b32b954a66d4345730b5b08f1db commit 230
e67cb07f9ddb0ecd0f88fcf36093d8d8bf928b75 commit 175
e95dd8284a84af5418c0dcf9cbdc0b1061624907 blob 25

[~/my-project]$ git show-ref --head --dereference
5f45e9c854941c72deb9d36fb3e95e4feb4d698f HEAD
c277976fce0b2b32b954a66d4345730b5b08f1db refs/heads/main
5f45e9c854941c72deb9d36fb3e95e4feb4d698f refs/heads/topic
c277976fce0b2b32b954a66d4345730b5b08f1db refs/remotes/origin/main
5f45e9c854941c72deb9d36fb3e95e4feb4d698f refs/remotes/origin/topic
e95dd8284a84af5418c0dcf9cbdc0b1061624907 refs/tags/balabab
e95dd8284a84af5418c0dcf9cbdc0b1061624907 refs/tags/lofa
5f45e9c854941c72deb9d36fb3e95e4feb4d698f refs/tags/miez

enter image description here

Git revisions (revs)

One-to-many relationship between

┌────────┐              ┌──────┐
│  Git   │ 1          * │  Git │
│revision├──────────────┤object│
└────────┘              └──────┘

A Git revision is a Git object query that resolves to one or more Git objects.

A Git revision is a string of characters conforming to a special notation syntax - or "revision query system" - that are used to unambiguously select one or more Git objects2.

This is akin how database systems (e.g., PostgreSQL) use a query language (e.g., SQL), but in this case Git is the database system and the revision syntax is the query language. The analogy seems apt to the extent to revisions being able to refer to a range of Git objects too.

For example, given this commit history,

* ebc9079 (HEAD -> main) karikittyom
* 982b806 edes
* ccccccc tyukom
* bbbbbbb megis van
* aaaaaaa egy felpenzem

the revision aaaaaaa..ccccccc will return commits bbbbbbb and ccccccc:

$ git log aaaaaaa..ccccccc

commit cccccccccccccccccccccccccccccccccccccccc
Author: toraritte
Date:   Mon Jan 9 03:29:24 2023 +0000

    tyukom

commit bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Author: toraritte
Date:   Mon Jan 9 03:29:24 2023 +0000

    megis van  

The connection between refs and revs

Git references are simply labels for specific Git objects, but there are plenty of times when one would like to carry out operations on other objects as well. The only way to do it without revisions is to manually find them and then list all the SHA-1 hashes of the Git objects involved.

The revision notation is a query system to reach any Git object (or a range of them) in a repo by traversing the directed acyclic graph or DAG.

The fundamental building blocks of relative5 revision queries are

where references serve as starting points to begin traversing the graph.

[5]: The use of "relative" is important here, because there are also :/<text> and :\[<n>:\]<path> that require no anchor.

At least, every notation from the gitrevisions docs boil down to the above conclusion:

  • <describeOutput>, e.g. v1.7.4.2-679-g3bee7fb
    git describe "finds the most recent tag that is reachable from a commit". Tags are Git references, and git describe already has its own revision-esque notation for its results.

  • [<branchname>]@{upstream}, e.g. master@{upstream}, @{u}
    Branch names are Git references, and the rest is the revision query notation.

  • <rev>^{<type>}, e.g. v0.99.8^{commit}
    Where <rev> means to "dereference the object at recursively", so in the end we'll get to a tag or <sha1>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment