sternenseemann/platforms.org Secret

## platforms.org

      
    Raw
  

              platforms.org
            
          
    WIP: The wonders of build target configuration

Background

I’m interested in this topic from a specific perspective which is probably
  relevant to the reader: I view platform configuration mainly from the
  perspective of nixpkgs which is a multi-platform package repository which
  has first-class (?) support for cross-compilation. Here the package set
  itself must have an idea of the involved platform(s) and pass that information
  on to the configure scripts et cetera of the packaged software.
I started researching this topic more seriously when refactoring code in
  cabal2nix (rather distribution-nixpkgs) that “parses” Nix’s system
  strings.
Terminology

Seemingly everyone has slightly different jargon for this weird little strings
  with the dashes in them that describe a kind of system people are running:

  LLVM/clang calls them LLVM Triples.
  autotools calls them system type (or name) or target triplet — quite confusingly as
    it is not only used to describe the target platform. Occasionally they are also
    referred to as configuration (or configuration name (of the system)), e.g.
    in config.sub and config.guess. This is mirrored in nixpkgs where the
    autotools system type is stored in the config attribute of the platform
    attribute set.
  Nix calls (a specific subset of) them system. Note that system is
    not what nixpkgs would call a platform — in fact the latter contains
    the former in the system attribute.

We’re going to have to settle for an umbrella term for this document which
  doesn’t appear in this list. Something with “triple” or “triplet” would be
  great as people usually understand this term, but actually only LLVM’s
  triples are triples at all — autotools’ “triplets” can have from 2 to 4
  components.
For this document we are going to call everything that is a thing with dashes
  that describes a system a “platform string” because ultimately that’s what
  we can say about them for sure.
The Schools of Thought

autotools’ target triplets

The autoconf manual keeps the description of target triplets
  relatively short and vague. The main documented points are:

  The form of the triplet is
    “~cpu-vendor-os~, where os can be system or ~kernel-system~”
    (taken verbatim from section 14.1 of the autoconf manual).
  Configure scripts should look at triplets by using shell globbing
    (which reinforces the point that they are, first and foremost: strings).
    For example, i?86-*-* checks for 32-bit x86 or *-*-linux* for something
    with a Linux kernel.
  The primary source of truth for triplets is autoconf, mostly in the form
    of the config.guess script that works out (or tries to) which target
    triplet is appropriate for the machine its running on.
  Additionally, the set of triplets is subject to change: autotools may
    start supporting new ones at any time, with very little constraints
    what they might look like.

The most canonical version of a target triplet is actually a quadruplet.
So the most canonical version of a target triplet is actually a quadruplet!
  Even weirder, though, is that  they don’t even have to have three components or more.
  It is perfectly legal to omit the vendor, yielding e.g. x86_64-linux which
  is expanded to x86_64-pc-linux-gnu, and the OS, e.g. riscv64 defaults
  to riscv64-unknown-none. This creates further ambiguity, as none is both
  a valid vendor and OS type, for example (see also the section on riscv-none-elf).
The key consequence of this is the following: You can’t split the target triplet into
  its components without knowledge about the possible values for each.
  Assuming you know that something is a valid autotools target triplet (which is already
  quite the assumption in some cases):

  If the triplet has four components, you’re golden, split at the dashes. However,
    I’m not certain if autotools may allow additional dashes in some components in
    the future, like for the OS part, maybe to allow x86_64-v2 et cetera as CPU parts?
    I’m not certain.
  If the triplet has three components it may either be cpu-kernel-system or
    cpu-vendor-os.
  If the triplet has two, the first component is the CPU part and the second one
    is “usually, but not always the OS” — instead it can also be a vendor,
    so it is either cpu-os or cpu-vendor. Additionally, there is (at the moment)
    one two component target triplet which is a special alias for mips-dec-ultrix4.2,
    namely decstation-3100.
  If the triplet has one, it is cpu, implying OS none usually.
    Alternatively, it can be a “single-component [shorthand] not valid as part of
    multi-component configurations”.

This means, effectively, that you can’t do split it correctly unless you are autoconf
  yourself, since the most common case, three components, is ambiguous: You need to know
  what strings are legal for the vendor, OS, kernel and system part — and even worse
  some strings like none may be valid for the OS and vendor part.
So the options you are left with are: Limit yourself to a subset of all triplets and
  reject everything else or reimplement the 1860 line config.sub shell script.
  The former option sort of works for downstream projects (e.g. nixpkgs does it),
  but it’s still problematic, as the vendor part is not restricted to a certain
  number of legal values which may also cause you grief if you misinterpret
  something as the vendor part that isn’t.
Additionally, note that splitting the triple is not all config.sub / autoconf
  does: After identifying the parts, it normalizes them and fills in the missing
  pieces accordingly to non-trivial rules, e.g. it will supplement ibm as
  vendor to s390x-* if it is omitted. Consequently, it would be very beneficial
  to run the target triplet through config.sub first when you have to parse
  one. This also will guarantee that the resulting triplet has three or four
  components with the second always being vendor. However, this is often
  not possible, as you would need to ship the script somehow and be able
  to shell out to it.
In conclusion, autotools target triples are strings consisting of multiple
  string components joined by dashes, but only autoconf knows how exactly.
  They can be ambiguous and, since autotools is the ultimate, but changing source
  of truth, it is hard to untangle the string correctly in all cases.
In general, though, it’ll make sense to humans what a target triple describes.
  You also won’t run into trouble if you either obtain the target triple by
  running config.guess on the machine you are interested in or are told
  by your device’s vendor or cross toolchain distributor.
If you are dealing with a lot of hypothetical triplets (e.g. as a nixpkgs
  maintainer), the config.sub script will help you get a sense of how
  autoconf interprets a specific input string. If you’re feeling brave,
  read its source (both config.sub and config.guess are available as
  part of the gnu-config nixpkgs package or
  in the build-aux directory
  of the autoconf source tree).
The main takeaway to be had here is to treat target triplets as opaque
  strings as much as possible. The autoconf developers actively discourage
  something as profane as using globbing on the triplet, recommending
  probing for the specific property instead (e.g. check for linux headers
  instead of *-linux*). The target triplets are subject to change, so
  you will only be on the safe side if you treat them as implementation
  details of autoconf.
Nix’s systems

Fundamentally, a system is some kind of string which the Nix daemon uses to
  decide whether it can build a particular derivation. Every derivation, at the
  core, looks like this:
derivation {
  name = "my-derivation";
  system = "i686-linux";

  /* instructions how to build the derivation */
}
The system string the daemon uses is picked in Nix’s configuration script.
  If it is equal to a derivation’s one, it’ll happily buildi it, otherwise
  it will try to defer the task to a remote builder.
Nix’s configuration script uses autoconf, so the ultimate source of truth
  is autoconf, mediated by these normalizations:

  In host_cpu,
    
      i*86 is transformed to i686
      amd64 is transformed to x86_64 (autoconf does this by itself nowadays)
      armv{6,7} is transformed to armv{6,7}l
    
  
  In host_os,
    
      linux-{musl,gnu}* are transformed to plain linux
      Version numbers attached are dropped (e.g. the 0.3 in gnu0.3)
    
  
The final system is obtained by joining the autoconf CPU and OS parts
  by a dash, so it is a more predictable variant of the two component
  autotools target triplet, forming either cpu-os or cpu-kernel-system,
  although the latter is very rare for Nix itself.
Nix is not the most portable software in the world, so the number of system
  strings actually generated by the configure script is limited:

  *-linux (probably mainly with aarch64, i686, x86_64, armv6, armv7
    and other common CPUs in use today)
  x86_64-darwin, aarch64-darwin
  i686-netbsd, x86_64-netbsd (maybe more CPUs?)
  x86_64-freebsd (maybe other CPUs?)
  x86_64-openbsd (although the port never made it into the main Nix source tree,
    so extremely rare)

However, these are not the only system strings in use today: nixpkgs not only
  receives system strings from Nix via builtins.currentSystem, but also interprets
  system strings passed in by the user as cross compilation targets. Among these
  are systems Nix will probably never run on, like avr-none. As a result, a world
  of Nix system strings exists somewhat disconnected from autoconf.
Additionally, due to the transformation of linux-{musl,gnu}* to just linux,
  no three component Nix systems exist in practice and support for them is
  probably poor.
nixpkgs’ platforms

LLVM Triples

Parsing Platform Strings

Don’t.
Case studies

riscv-none-elf