Skip to content

Instantly share code, notes, and snippets.

@tayloraswift
Last active January 12, 2022 21:44
Show Gist options
  • Save tayloraswift/a2c84120da3128cbc07087917feaa674 to your computer and use it in GitHub Desktop.
Save tayloraswift/a2c84120da3128cbc07087917feaa674 to your computer and use it in GitHub Desktop.

Swift integer literals

introduction

Swift allows you to initialize any type from an integer literal in source code by conforming it to the ExpressibleByIntegerLiteral protocol. Currently, Int8, Int16, Int32, Int64, Int, and their unsigned counterparts are supported as bootstrap types when the compiler emits code passing source code literals to an ExpressibleByIntegerLiteral.init(integerLiteral:) implementation. This effectively caps the maximum supported integer literal width to 64 bits.

This proposal aims to unlock arbitrary-precision integer literals in a way that:

  • will decouple the compile-time overflow checking from the actual passed integer value storage,
  • will eliminate the need for magic underscored protocols,
  • does not resort to inlining or constexpr-like ideas,
  • will obsolete components of ExpressibleByFloatLiteral that overlap with ExpressibleByIntegerLiteral,
  • will make it easier to reason about the behavior of textual literal protocols such as ExpressibleByExtendedGraphemeClusterLiteral,

as well as laying foundations for features that we may want Swift to have in the future, including but not limited to:

  • lossless decimal types,
  • arbitrary-precision float types,
  • significant figures, and
  • hex color literals.

terms

  • literal A pre-parsed representation of a source code string (e.g. (base: .hex, words: [0xabcd], places: 4) for "0xabcd").

  • bootstrap type A value supplied by the compiler for the purposes of initializing a literal-expressible type. Currently, it is the argument passed to an ExpressibleByIntegerLiteral.init(integerLiteral:) implementation. For integer literals right now (but not literals in general), they are always available to the caller at compile time.

  • expressed type A type conforming to ExpressibleByIntegerLiteral. It is initialized from a bootstrap value.

motivation

overview of current implementation

Right now, ExpressibleByIntegerLiteral generally works like this:

// module-A.swift 

struct Foo:ExpressibleByIntegerLiteral 
{
    typealias IntegerLiteralType = UInt8 
    init(integerLiteral:UInt8)
    ...
}
// module-B.swift 
import struct A.Foo 

func foo() -> Foo
{
    255 as Foo
}

Key things to note:

Module B does not know anything about A.Foo’s ExpressibleByIntegerLiteral conformance, other than that it has one.

That is, the body of B.foo is equivalent to:

Foo.init(integerLiteral: 255 as UInt8) // as Foo

When you get an overflow error like:

error: integer literal '256' overflows when stored into 'Foo'
    256 as Foo
    ^

the compiler did not actually know how to check if 256 was a valid Foo, it only knew how to decompose 256 as Foo into Foo.init(integerLiteral: 256 as UInt8), and pattern-match the 256 as UInt8 bootstrap expression. The compiler only knows how to pattern-match a fixed set of builtin bootstrap types conforming to _ExpressibleByBuiltinIntegerLiteral, which is why you cannot use ExpressibleByIntegerLiteral to implement compile-time overflow checking for, say, 24-bit values.

The ExpressibleByIntegerLiteral conformance for A.Foo in module A does not know anything about the literal written in the B.foo method, other than its bootstrap value.

For example, B.foo could have written 255, 0255, 0xff, or even 0b1111_1111.

This information could often be valuable to us. For example, an RGB hex color type might want to only accept hexadecimal literals exactly 24 digits long. The numeric values 0xff11ff (hot pink) and 0x00ff11ff are equivalent, but the second form is ambiguous, because it’s not clear if the final ff is the alpha component or the blue component. This forces things like CSS builders to resort to awkward APIs like (r: 0xff, g: 0x11, b: 0xff) instead of the more natural 0xff11ff.

Moreover, in many decimal-related applications, the number of leading zeroes is significant. Having a way to preserve digit count information will allow us to reuse parts of our integer literal system when designing lossless decimal types in the future.

proposed solution

We should scrap the magic _ExpressibleByBuiltinIntegerLiteral protocol, and replace it with an officially-supported IntegerLiteral protocol. (Only standard library types can conform to _ExpressibleByBuiltinIntegerLiteral, so the ABI impact should be limited.)

protocol IntegerLiteral
{
    associatedtype Base where Base:IntegerLiteralBase
    init(_:Base, words:[UInt], places:Int)
}

The precise definition of words and places is not important to this proposal, and can be bikeshed at a later date. More important is the Base type and the IntegerLiteralBase protocol.

why do we need Base?

If all we cared about was preserving the numeric base information, we could simply have the standard library vend a concrete SwiftBase enumeration like:

enum SwiftBase 
{
    case binary, octal, decimal, hexadecimal 
}

Then we could omit the associatedtype Base. We don’t, because having Base as an associatedtype requirement has a lot of advantages:

  • Associated types are always known at compile time. This means the compiler can use Base to implement compile-time overflow checking. This also means the compiler could statically forbid expressed types from being written in certain bases in accordance to the Base type.
  • Base can conform to protocols (besides IntegerLiteralBase), and these protocol conformances can be orthogonal to each other. The compiler can vend the set of static checks it knows how to do as marker protocols, and use these conformances to perform compile-time validation. Types conforming to ExpressibleByIntegerLiteral can then opt-in to different kinds of compile-time validation through their Base type.
  • Base effectively decouples the compile-time validation from the physical bootstrap type used to transfer data from source code to an ExpressibleByIntegerLiteral initializer. This will enable us to, for example, implement 24-bit overflow checking without having to add an Int24 integer type to the standard library.

why do we need IntegerLiteralBase?

We need IntegerLiteralBase because we want to have some way of passing along the original literal’s base to the ExpressibleByIntegerLiteral implementation. Users cannot use the Base associated type to enable any numeric base, because that would change the syntax of the language. So we need to have some way of translating the four known Swift bases (binary, octal, decimal, hexadecimal) into an instance of an ExpressibleByIntegerLiteral type’s Base.

protocol IntegerLiteralBase 
{
    static 
    var binary:Self 
    {
        get 
    }
    static 
    var octal:Self 
    {
        get 
    }
    static 
    var decimal:Self 
    {
        get 
    }
    static 
    var hexadecimal:Self 
    {
        get 
    }
}

Base types that don’t support certain numeric bases can implement those requirements as Never.

why do we need IntegerLiteral?

The data flow proposed here has three steps:

  1. Compiler lexes a (Builtin.Base, words:[UInt], places:Int) tuple, from source code. All fields are known, including the Builtin.Base value and the Self.IntegerLiteral.Base type. (But not the Self.IntegerLiteral.Base value.)
  2. At run time, instantiate a Self.IntegerLiteral.Base value from the Builtin.Base value, and then instantiate the Self.IntegerLiteral bootstrap value.
  3. At run time, call Self.init(integerLiteral:) with the newly-constructed Self.IntegerLiteral bootstrap value.

The intermediate step is useful because many conforming types do not actually need an arbitrary-precision words vector, and may not care about the places count. The standard library can provide IntegerLiteral conformances for all the concrete standard library integer types (much like it currently does for _ExpressibleByBuiltinIntegerLiteral, but with less compiler magic), and downstream users can simply piggyback off of those.

More importantly, it preserves source-compatibility with existing ExpressibleByIntegerLiteral implementations.

The intermediate step can also improve performance when crossing module boundaries. A third-party library might only need an IntegerLiteral type of Int, and it is much more efficient for the compiler to pass the library an Int value generated from a standard library Int:IntegerLiteral implementation that it knows how to constant-fold. This also has the upside of making third-party libraries less dependent on inlining.

source compatibility

This proposal is designed to be backwards-compatible with all existing ExpressibleByIntegerLiteral implementations.

binary compatibility

Replacing the _ExpressibleByBuiltinIntegerLiteral associated type constraint with IntegerLiteral will break ABI. However, only standard library types can conform to _ExpressibleByBuiltinIntegerLiteral, so the ABI impact should be limited.

binary resilience

This proposal is unlikely to harm or improve standard library binary resilience. The IntegerLiteral abstraction layer minimizes overhead from bigint traffic, which will make third-party libraries less dependent on inlining. This will improve the binary resilience of the Swift ecosystem in the long run.

future directions

The proposed changes to ExpressibleByIntegerLiteral can be used to implement lossless decimal literals. The ExpressibleByFloatLiteral.FloatLiteralType associated type can then be replaced with conformances to DecimalLiteral (analogous to IntegerLiteral) on the concrete types Float, Float80, and Double, which models their relationship to FloatLiteralType much better than the _ExpressibleByBuiltinFloatLiteral protocol.

Decoupling literals from bootstrap values will make it straightforward for us to enable implementing ExpressibleByStringLiteral and ExpressibleByExtendedGraphemeClusterLiteral initializers that operate on raw UTF-8 data. This will increase the amount of constant-folding the compiler is able to perform, since the raw UTF-8 data is known to the caller at compile time.

alternatives considered

This proposal is an alternative to StaticBigInt, pitched here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment