Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Unicode integer literals
  • Proposal: SE-XXXX
  • Author: Kelvin Ma (@taylorswift)
  • Review manager:
  • Status: Awaiting review
  • Implementation:
  • Threads: 1

Introduction

Character processing is everywhere - particularly when working with ASCII. However, Swift's support for working with it is awkward and suboptimal. This proposal seeks to improve character processing support by allowing Swift integers to be written using single quoted character literals, which is much more consistent with other languages and will reduce a point of friction in string processing code.

let niceInteger: Int8 = 'E'

Motivation

In C, 'a', is a char (uint8_t) literal, equivalent to 97. Swift has no such equivalent, requiring awkward spellings like ("a" as Unicode.Scalar).value, which may or may not need additional casts to convert it to the right integer type, or UInt8(ascii: "a") in the case of UInt8. Alternatives, like spelling out the values in hex or decimal directly, are even worse. This harms readability of code, and is one of the sore points of string processing in Swift.

There are many examples where this causes harm. First, consider this simple table in C:

static char const hexcodes[16] = {
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
};

It has to be written like this in Swift:

// what do these numbers mean???
let hexcodes: [UInt8] = [
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102
] 
    
// This is the best we can get right now, while showing the ascii letter form.
let hexcodes = [
    UInt8(ascii: "0"), UInt8(ascii: "1"), UInt8(ascii: "2"), UInt8(ascii: "3"),
    UInt8(ascii: "4"), UInt8(ascii: "5"), UInt8(ascii: "6"), UInt8(ascii: "7"),
    UInt8(ascii: "8"), UInt8(ascii: "9"), UInt8(ascii: "a"), UInt8(ascii: "b"),
    UInt8(ascii: "c"), UInt8(ascii: "d"), UInt8(ascii: "e"), UInt8(ascii: "f")
]    

UInt8 has a convenience initializer for converting from ASCII, but if you're working with other types like Int8 (common when dealing with C APIs that take char, it is much more awkward. Consider scanning through a char* buffer as an UnsafeBufferPointer<Int8>:

for scalar in int8buffer {
    switch scalar {
    case Int8(UInt8(ascii: "a")) ... Int8(UInt8(ascii: "f")):
        // lowercase hex letter
    case Int8(UInt8(ascii: "A")) ... Int8(UInt8(ascii: "F")):
        // uppercase hex letter
    case Int8(UInt8(ascii: "0")) ... Int8(UInt8(ascii: "9")):
        // hex digit
    default:
        // something else
    }
}

Proposed solution

Let's do the obvious thing here, and allow 'x' to be a literal corresponding to the unicode representation of the contained character value. The standard library will adopt this syntax on integer types, Character, Unicode.Scalar, and types like UTF16.CodeUnit. The default literal type for var x = 'a' will be Character.

With these changes, the above code can be written much more naturally:

let hexcodes: [UInt8] = [
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
] 

for scalar in int8buffer {
    switch scalar {
    case 'a' ... 'f':
        // lowercase hex letter
    case 'A' ... 'F':
        // uppercase hex letter
    case '0' ... '9':
        // hex digit
    default:
        // something else, perhaps an extended UTF8 digit.
    }
}

Choice of single quotes

Use of single quotes for character/scalar literals is heavily precedented in other languages, including C, Objective-C, C++, Java, and Rust, although different languages have slightly differing ideas about what a “character” is. We choose to use single quote syntax specifically because it reinforces the notion that strings and character values are different: the former is a sequence, the later is a scalar (and "integer-like"). Character types also don't support string literal interpolation, which is another reason to move away from double quotes.

One significant corner case is worth mentioning: some methods may be overloaded on both Character and String. This design allows natural user-side syntax for differentiating between the two.

Single quotes in Swift, a historical perspective

In Swift 1.0, we wanted to reserve single quotes for some yet-to-be determined syntactical purpose. However, today, pretty much all of the things that we once thought we might want to use single quotes for have already found homes in other parts of the Swift syntactical space. For example, syntax for multi-line string literals uses triple quotes ("""), and string interpolation syntax uses standard double quote syntax.

Current proposals for raw string literals use r-prefixes (r"). For regex literals, most people seem to prefer slashes (/), but they could also fall into the same syntax as raw strings.

At this point, it is clear that the early syntactic conservatism was unwarranted. We do not forsee another use for this syntax, and given the strong precedent in other languages for characters, it is natural to use it.

Existing double quote initializers for characters

We propose deprecating the double quote initializer for Character and unicode scalar types and slowly migrating them out of Swift.

let c2 = 'f'               // preferred
let c1 : Character = "f"   // deprecated

Overflow checking

Just as the compiler and standard library work together to detect overflowing integer literals, overflowing character literals will be statically diagnosed:

let a: Int16 = 128 // ok
let b: Int8 = 128  // error: integer literal '128' overflows when stored into 'Int8' 

let c: Int16 = 'Ƥ' // ok
let d: Int8  = 'Ƥ' // error: character literal 'Ƥ' overflows when stored into 'Int8' 

Detailed Design

TODO (from clattner): This needs to be significantly expanded. We need to have a 'CharacterLiteralType = Character', describe the lexer changes, describe the actual protocol and changes to the standard library. This section should talk about the additive pieces of this proposal, not the deprecations.

This proposal will add ExpressibleByUnicodeScalarLiteral conformance to all Swift integer types. It will also slightly change the Swift lexer to interpret single quoted tokens as Unicode.Scalar literals.

Source compatibility

This proposal could be done in a way that is strictly additive, but we feel it is best to deprecate the existing double quote initializers for characters, and the UInt8.init(ascii:) initializer.

Here is a specific sketch of a deprecation policy:

  1. continue accepting these in Swift 4 mode with no change.
  2. Introduce the new syntax support into Swift 4.2 (if there is time).
  3. Swift 5 mode would start producing deprecation warnings (with a fixit to change double quotes to single quotes).
  4. The Swift 4 to 5 migrator would change the syntax (by virtue of applying the deprecation fixits).
  5. Swift 6 would not accept the old syntax.

Effect on ABI stability

No effect as this is an additive change. Heroic work could be done to try to prevent the UInt8.init(ascii:) initializer and other to-be-deprecated conformances from being part of the ABI. This seems unnecessary though.

Effect on API resilience

None.

Alternatives considered

Why not make this apply to Character too?

TODO (from clattner): I really don't think this is a good idea. We should have a consistent character model.

Lots of people suggested this in the prepitch thread for this proposal. Since Character literals and Unicode.Scalar literals would look exactly the same (since we would have to make Character take single quotes too), there would be zero sugaring benefit gained from this.

Some people like the idea of making the quote type indicate the “length” (1 or greater than 1) of the literal object, in which case Character would take single quotes as well. This notational change is orthogonal to this proposal and can be done separately in another proposal.

@griotspeak

This comment has been minimized.

Copy link

griotspeak commented Jun 26, 2018

Isn't another alternative something along the lines of ux0, u0, or u"A"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.