I’m interested in this topic from a specific perspective which is probably
relevant to the reader: I view platform configuration mainly from the
perspective of nixpkgs
which is a multi-platform package repository which
has first-class (?) support for cross-compilation. Here the package set
itself must have an idea of the involved platform(s) and pass that information
on to the configure scripts et cetera of the packaged software.
I started researching this topic more seriously when refactoring code in
cabal2nix
(rather distribution-nixpkgs
) that “parses” Nix’s system
strings.
Seemingly everyone has slightly different jargon for this weird little strings with the dashes in them that describe a kind of system people are running:
- LLVM/clang calls them LLVM Triples.
- autotools calls them system type (or name) or target triplet — quite confusingly as
it is not only used to describe the target platform. Occasionally they are also
referred to as configuration (or configuration name (of the system)), e.g.
in
config.sub
andconfig.guess
. This is mirrored in nixpkgs where the autotools system type is stored in theconfig
attribute of the platform attribute set. - Nix calls (a specific subset of) them
system
. Note thatsystem
is not what nixpkgs would call a platform — in fact the latter contains the former in thesystem
attribute.
We’re going to have to settle for an umbrella term for this document which doesn’t appear in this list. Something with “triple” or “triplet” would be great as people usually understand this term, but actually only LLVM’s triples are triples at all — autotools’ “triplets” can have from 2 to 4 components.
For this document we are going to call everything that is a thing with dashes that describes a system a “platform string” because ultimately that’s what we can say about them for sure.
The autoconf manual keeps the description of target triplets relatively short and vague. The main documented points are:
- The form of the triplet is
“~cpu-vendor-os~, where os can be
system
or ~kernel-system~” (taken verbatim from section 14.1 of the autoconf manual). - Configure scripts should look at triplets by using shell globbing
(which reinforces the point that they are, first and foremost: strings).
For example,
i?86-*-*
checks for 32-bit x86 or*-*-linux*
for something with a Linux kernel. - The primary source of truth for triplets is autoconf, mostly in the form
of the
config.guess
script that works out (or tries to) which target triplet is appropriate for the machine its running on. - Additionally, the set of triplets is subject to change: autotools may start supporting new ones at any time, with very little constraints what they might look like.
The most canonical version of a target triplet is actually a quadruplet.
So the most canonical version of a target triplet is actually a quadruplet!
Even weirder, though, is that they don’t even have to have three components or more.
It is perfectly legal to omit the vendor, yielding e.g. x86_64-linux
which
is expanded to x86_64-pc-linux-gnu
, and the OS, e.g. riscv64
defaults
to riscv64-unknown-none
. This creates further ambiguity, as none
is both
a valid vendor and OS type, for example (see also the section on riscv-none-elf
).
The key consequence of this is the following: You can’t split the target triplet into its components without knowledge about the possible values for each. Assuming you know that something is a valid autotools target triplet (which is already quite the assumption in some cases):
- If the triplet has four components, you’re golden, split at the dashes. However,
I’m not certain if autotools may allow additional dashes in some components in
the future, like for the OS part, maybe to allow
x86_64-v2
et cetera as CPU parts? I’m not certain. - If the triplet has three components it may either be
cpu-kernel-system
orcpu-vendor-os
. - If the triplet has two, the first component is the CPU part and the second one
is “usually, but not always the OS” — instead it can also be a vendor,
so it is either
cpu-os
orcpu-vendor
. Additionally, there is (at the moment) one two component target triplet which is a special alias formips-dec-ultrix4.2
, namelydecstation-3100
. - If the triplet has one, it is
cpu
, implying OSnone
usually. Alternatively, it can be a “single-component [shorthand] not valid as part of multi-component configurations”.
This means, effectively, that you can’t do split it correctly unless you are autoconf
yourself, since the most common case, three components, is ambiguous: You need to know
what strings are legal for the vendor, OS, kernel and system part — and even worse
some strings like none
may be valid for the OS and vendor part.
So the options you are left with are: Limit yourself to a subset of all triplets and
reject everything else or reimplement the 1860 line config.sub
shell script.
The former option sort of works for downstream projects (e.g. nixpkgs does it),
but it’s still problematic, as the vendor part is not restricted to a certain
number of legal values which may also cause you grief if you misinterpret
something as the vendor part that isn’t.
Additionally, note that splitting the triple is not all config.sub
/ autoconf
does: After identifying the parts, it normalizes them and fills in the missing
pieces accordingly to non-trivial rules, e.g. it will supplement ibm
as
vendor to s390x-*
if it is omitted. Consequently, it would be very beneficial
to run the target triplet through config.sub
first when you have to parse
one. This also will guarantee that the resulting triplet has three or four
components with the second always being vendor
. However, this is often
not possible, as you would need to ship the script somehow and be able
to shell out to it.
In conclusion, autotools target triples are strings consisting of multiple string components joined by dashes, but only autoconf knows how exactly. They can be ambiguous and, since autotools is the ultimate, but changing source of truth, it is hard to untangle the string correctly in all cases.
In general, though, it’ll make sense to humans what a target triple describes.
You also won’t run into trouble if you either obtain the target triple by
running config.guess
on the machine you are interested in or are told
by your device’s vendor or cross toolchain distributor.
If you are dealing with a lot of hypothetical triplets (e.g. as a nixpkgs
maintainer), the config.sub
script will help you get a sense of how
autoconf interprets a specific input string. If you’re feeling brave,
read its source (both config.sub
and config.guess
are available as
part of the gnu-config
nixpkgs package or
in the build-aux
directory
of the autoconf source tree).
The main takeaway to be had here is to treat target triplets as opaque
strings as much as possible. The autoconf developers actively discourage
something as profane as using globbing on the triplet, recommending
probing for the specific property instead (e.g. check for linux headers
instead of *-linux*
). The target triplets are subject to change, so
you will only be on the safe side if you treat them as implementation
details of autoconf.
Fundamentally, a system
is some kind of string which the Nix daemon uses to
decide whether it can build a particular derivation. Every derivation, at the
core, looks like this:
derivation {
name = "my-derivation";
system = "i686-linux";
/* instructions how to build the derivation */
}
The system
string the daemon uses is picked in Nix’s configuration script.
If it is equal to a derivation’s one, it’ll happily buildi it, otherwise
it will try to defer the task to a remote builder.
Nix’s configuration script uses autoconf, so the ultimate source of truth is autoconf, mediated by these normalizations:
- In
host_cpu
,i*86
is transformed toi686
amd64
is transformed tox86_64
(autoconf does this by itself nowadays)armv{6,7}
is transformed toarmv{6,7}l
- In
host_os
,linux-{musl,gnu}*
are transformed to plainlinux
- Version numbers attached are dropped (e.g. the
0.3
ingnu0.3
)
The final system
is obtained by joining the autoconf CPU and OS parts
by a dash, so it is a more predictable variant of the two component
autotools target triplet, forming either cpu-os
or cpu-kernel-system
,
although the latter is very rare for Nix itself.
Nix is not the most portable software in the world, so the number of system
strings actually generated by the configure script is limited:
*-linux
(probably mainly withaarch64
,i686
,x86_64
,armv6
,armv7
and other common CPUs in use today)x86_64-darwin
,aarch64-darwin
i686-netbsd
,x86_64-netbsd
(maybe more CPUs?)x86_64-freebsd
(maybe other CPUs?)x86_64-openbsd
(although the port never made it into the main Nix source tree, so extremely rare)
However, these are not the only system
strings in use today: nixpkgs not only
receives system
strings from Nix via builtins.currentSystem
, but also interprets
system
strings passed in by the user as cross compilation targets. Among these
are systems Nix will probably never run on, like avr-none
. As a result, a world
of Nix system
strings exists somewhat disconnected from autoconf.
Additionally, due to the transformation of linux-{musl,gnu}*
to just linux
,
no three component Nix systems exist in practice and support for them is
probably poor.
Don’t.
I wouldn't fret too much about the vendor field. It is basically a comment.
And
system
is<libc>(e?abi<abi>)?
. Yeah, it's a mess.That's a bit pessimistic. You can do the split as long as you have an enumeration of valid kernels, and you require that a kernel name be present in the input. At least for nixpkgs that isn't a very tall order.
I think this conclusion is unsupported. With the very minor restriction above (no kernel-less triplets allowed) and a list of supported kernels, we're able to parse triplets/quadruplets/quintuplets. Without the ability to parse, there is no way to make
lib/systems/predicates.nix
work, and that isn't something we can give up.... and that lossy transformation is a big problem for nixpkgs.