beccadax/NNNN-fix-expressible-by-string-interpolation.md

## NNNN-fix-expressible-by-string-interpolation.md

      
    Raw
  

              NNNN-fix-expressible-by-string-interpolation.md
            
          
    Fix ExpressibleByStringInterpolation


Proposal: SE-NNNN
Authors: Brent Royal-Gordon
Review Manager: TBD
Status: First draft

Introduction

String interpolation is a simple and powerful feature for expressing
complex, runtime-created strings, but the current design of the
ExpressibleByStringInterpolation protocol is so defective that it
shipped deprecated in Swift 3. We propose a new, often-source-compatible
design which distinguishes between literal and interpolated segments and
tightens up the typing of literal segments.
Motivation

ExpressibleByStringInterpolation allows users of String-like types
to express those types as literals, even when the data is partially
dynamic. Broadly speaking, there are three kinds of use cases for it:


Types which represent unconstrained plain text, like Swift.String
itself.


Types which represent machine-readable code fragments, like
SQLKit.SQLStatement or this blog post's SanitizedHTML.
In these types, it's often unsafe to insert interpolated data
verbatim; it must be escaped or passed out-of-band.


Types which represent human-readable text with some sort of special
structure, like this gist's LocalizableString.


Unfortunately, Swift 3's ExpressibleByStringInterpolation doesn't do
a very good job of supporting the more sophisticated use cases like 2
and 3. This is because of four major defects in the protocol's design:


All literal segments come in as String, not Self.


Conforming types cannot easily tell which segments are literals and
which are interpolations.


Types supporting interpolation cannot easily constrain the types
interpolated into them.


Formatting is not integrated into interpolation.


This proposal addresses 1 and 2; 3 and 4 require more creativity and
will be left for a separate proposal.
Technical background

When the Swift compiler parses a string literal, it divides it into
chunks called segments, which are either literal segments
(containing verbatim text) or interpolated segments (containing
an expression to be evaluated and inserted into the text). Like all
literals, string literals are then translated into calls to
initializers in a corresponding literal protocol, but the segmentation
makes string literals a little more complicated than most.
The degenerate case—one literal segment, no interpolated segments:
"Hello, world!"
Is handled very simply. Swift marks the expression as belonging to a
type conforming to ExpressibleByStringLiteral, then translates the
literal into a call to that protocol's init(stringLiteral:)
initializer:
.init(stringLiteral: "Hello, world!")
And type inference will later determine the concrete type of the
expression, defaulting to Swift.StringLiteralType (String by
default) if the context does not constrain it.
If there is more than one segment, however:
"Hello, \(name)!"
The story gets more complicated. First, the literal segments undergo
the processing described above; second, each segment is wrapped in a
call to init(stringInterpolationSegment:); and finally, all of the
calls are wrapped in a single call to init(stringInterpolation:),
which concatenates them all into a single instance:
.init(stringInterpolation:
	.init(stringInterpolationSegment: .init(stringLiteral: "Hello, ")),
	.init(stringInterpolationSegment: name),
	.init(stringInterpolationSegment: .init(stringLiteral: "!"))
)
There are two important wrinkles here. The first is that
init(stringInterpolationSegment:)'s parameter is an unconstrained
generic type—that is, Swift has no reason to prefer any particular
type for that parameter. The second is that
init(stringInterpolation:)'s parameter type is Self..., so the
init(stringInterpolationSegment:) calls themselves will always
return the same type as that initializer.
Defect 1: Literal segments come in as String

Because init(stringInterpolationSegment:)'s parameter is an
unconstrained generic type, Swift will always default to String.
This is a problem for types like SQLKit.SQLStatement, which needs
to tell the difference between SQLStatement data (which is
treated as code) and String data (which is passed out-of-band using
a placeholder). It would also be a problem for types which want to
represent String data in a different way, like a hypothetical
ASCIIString type.
Defect 2: Literals and interpolations are indistinguishable

init(stringInterpolationSegment:) is called for both literal and
interpolated segments, and—partially due to defect 1—there is no way
to determine at that stage which you are processing at a given time.
There is a hack you can use to work around this. As currently
implemented, the Swift parser always generates a literal segment first,
and always alternates interpolation and literal segments. That means
that even-indexed segments are always literal, and odd-indexed segments
are always interpolated. If init(stringInterpolationSegment:) merely
stores its parameter away, init(stringInterpolation:) can correctly
interpret it in the context of the other segments.
Exploiting this trick makes it possible to do sophisticated things with
interpolation, but it distorts the design of the type, sometimes forces
the type to permit otherwise invalid states, and essentially requires
init(stringInterpolationSegment:) to create "wrong" instances,
trusting init(stringInterpolation:) to correct them later.
It also relies on an undocumented parser quirk that could easily be
changed in the future. There are no tests in Swift 3 to ensure this
behavior does not change, and doing so might speed up string literal
construction.
Defect 3: Cannot constrain interpolated types

Types like SQLKit.SQLStatement only support certain certain types
being interpolated—String, Data, Date, Bool, BinaryIntegers,
FloatingPoints, and Decimals, but not UIViews, Arrays,
or Users. Even LocalizableString, if implemented using
String(format:) for Objective-C compatibility, can only handle
CVarArg interpolations. ExpressibleByStringInterpolation cannot
easily express these kind of constraints.
Like defect 2, there is a hack that sort of works around this. Swift
does not only use the specific init<T>(stringInterpolationSegment: T)
implementation listed in the protocol; it will actually consider all of
its overloads as well. So you can implement the required initializer,
but mark it @available(*, unavailable: 0.0), and then overload it
without the annotation for the types you want to support. The compiler
will emit an error if you try to use the unconstrained generic
overload. Exploiting this, however, is a pretty dirty trick.
Defect 4: No formatting syntax

When interpolating a value into a string, it's often necessary to
control details of how it's formatted—think of the radix of an
integer or the precision of a floating-point value. Swift's
standard way to do this is to write an expression which generates
a formatted string, but this adds extra verbiage to the
interpolated expression and (partially due to defect 3) limits
the ability of types to override formatting.
When considering this problem, keep in mind that "formatting" is a very
broad category of actions. For instance, SQLStatement might want to
support inserting raw SQL from a String; that's a form of formatting.
If SanitizedHTML wanted to wrap interpolated Dates in HTML5
<time> tags, that would be a form of formatting too.
Proposed solution

We defer working on defects 3 and 4 for a future proposal; these
require more design work.
We address defects 1 and 2 by making ExpressibleByStringInterpolation
a refinement of ExpressibleByStringLiteral and removing the
init(stringInterpolationSegment:) call around literal segments. The
"Hello, \(name)!" example above becomes:
.init(stringInterpolation:
	.init(stringLiteral: "Hello, "),
	.init(stringInterpolationSegment: name),
	.init(stringLiteral: "!")
)
This fixes both problems at once: literal segments are processed

by Self.init(stringLiteral:) instead of String.init(stringLiteral:),
and only interpolated segments are processed by
Self.init(stringInterpolationSegment:).
Detailed design

A prototype of this design is available in this branch.
In the constraint generator, we constrain all literal segments'
types to equal the type of the InterpolatedStringLiteralExpr itself.

Draft note: This will complicate a constraint system which was radically
simplified in 21ee10b, apparently to improve compile times;
not having access to the underlying bug, I can't tell if this might
cause a regression. I'd appreciate input from someone involved in the
original fix, or who can at least see rdar://problem/29389887.

In the constraint applier, we only wrap interpolated segments, not
literal segments, in init(stringInterpolationSegment:) calls.
Finally, we update the _ExpressibleByStringInterpolation protocol in the
standard library to make it require ExpressibleByStringLiteral conformance.
(This is actually a sensible design anyway: you need this conformance to
support literals without any interpolations.)
-public protocol _ExpressibleByStringInterpolation {
+public protocol _ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
We also update its documentation to describe the new semantics; see
this commit for precise proposed wording.
And...that's it. This change is surprisingly surgical.
Source compatibility

Strictly speaking, since ExpressibleByStringInterpolation is
currently deprecated, source compatibility is not a concern. However,
this design is source-compatible with most ExpressibleByStringInterpolation
conformances in the wild. Most conforming types also conform to
ExpressibleByStringLiteral, and it doesn't disrupt the
segment-counting trick used in Swift 3 conformances.
Effect on ABI stability

This change breaks the current ExpressibleByStringInterpolation ABI,
but brings it closer to a stage where it could be frozen, an important
goal for ABI stability.
Effect on API resilience

There isn't a ton we could do to this API in the future without
breaking compatibility; in particular, we can't narrow the types
accepted by init(stringInterpolationSegment:), because there's
no way to add an associated type which can be used to constrain its
unconstrained T type parameter.
That's why we do not yet propose removing the deprecation on
ExpressibleByStringInterpolation. Once we decide we're happy with
this initializer's parameter type, we can do so.
Alternatives considered

We considered a more radical redesign of both string literals and string
interpolation protocols, along the lines of:
protocol ExpressibleByStringLiteral: ExpressibleByExtendedGraphemeClusterLiteral {
  associatedtype StringLiteralSegmentType: _ExpressibleByBuiltinStringLiteral
  
  init(stringLiteral segments: Self...)
  init(stringLiteralSegment string: StringLiteralSegmentType)
}

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  init<T>(stringInterpolationSegment expr: T)
}
The idea would be to allow more consistent ASTs and code generation:
instead of the current situation, where StringLiteralExprs sometimes
stand alone and other times are nested within
InterpolatedStringLiteralExprs, every string literal would be
segmented, and it'd just be a question of whether all of those segments
were StringLiteralExprs or not. It would also
allow the compiler to generate segments for its own purposes—perhaps a
multiline string literal would be easier to split into one segment per
line. And it would make implementing string interpolation a little more
consistent; as things are, the value returned by init(stringLiteral:)
may or may not pass through init(stringInterpolation:), and
init(stringLiteral:) has no way to know.
On the other hand, it would generate more calls and require types like
StaticString to unnecessarily support concatenating multiple segments
together. Those are very concrete harms compared to the abstract
benefits of such a design.
We also considered renaming the initializers in
ExpressibleByStringInterpolation:
protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  init(stringInterpolationSegments segments: Self...)
  init<T>(stringInterpolation expr: T)
}
This would give init(stringInterpolation:) a name more similar to
init(stringLiteral:), but that doesn't seem like a particularly
large benefit.
Future directions

We are considering a simple formatting system in which initializers on
an associated type are used to express formatting.
ExpressibleByStringInterpolation would be modified to add an
associated type:
protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype StringInterpolationSegmentType = String

  init(stringInterpolation segments: Self...)
  init(stringInterpolationSegment string: StringInterpolationSegmentType)
}
And the \() syntax would be parsed as a parameter list, rather than
a single expression, which would be used to look up an initializer on
the StringInterpolationSegmentType:
"Hello, \(name)!"

.init(stringInterpolation:
	.init(stringLiteral: "Hello, "),
	.init(stringInterpolationSegment: .init(name)),
	.init(stringLiteral: "!")
)
Therefore, by adding additional parameters and using parameter labels,
you could adjust the formatting of a value:
"Commit \(id, radix: 16)" as String

String(stringInterpolation:
	String(stringLiteral: "Commit "),
	String(stringInterpolationSegment: String.StringInterpolationSegmentType(id, radix: 16)),
	String(stringLiteral: "!")
)
Types which just wanted String's standard formatting behavior would
use String as their StringInterpolationSegmentType, but types which
wanted custom formatting could use a different type:
// assume SQLStatement's StringInterpolationSegmentType is SQLStatement.

"SELECT \(raw: columnName) FROM users WHERE name = \(userName)" as SQLStatement

SQLStatement(stringInterpolation:
	SQLStatement(stringLiteral: "SELECT "),
	SQLStatement(stringInterpolationSegment: SQLStatement.StringInterpolationSegmentType(raw: columnName)),
	SQLStatement(stringLiteral: " FROM users WHERE name = "),
	SQLStatement(stringInterpolationSegment: SQLStatement.StringInterpolationSegmentType(userName)),
	SQLStatement(stringLiteral: "")
)
This has not been prototyped; there are open questions about
self-interpolations, and it's not clear how well the type checker would
handle the complexity. We will propose this separately once we've
firmed it up a little.