Skip to content

Instantly share code, notes, and snippets.

@johnno1962
Forked from erica/oldraw.md
Last active July 2, 2018 12:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnno1962/154ed03ef11974845022e5a17852c731 to your computer and use it in GitHub Desktop.
Save johnno1962/154ed03ef11974845022e5a17852c731 to your computer and use it in GitHub Desktop.

"Raw" mode string literals

Introduction

In most computer languages, string literals use escaping characters (generally \) to represent non-printing characters, to escape the string's escaping delimiter, or (in the case of Swift) to allow interpolation of expressions into a string. By contrast "raw" strings available in many languages forgo escaping to pass string contents into the literal as represented in memory. Raw strings suspend normal escaping and interpolation rules. They allow you to establish string literals including quotes and backslashes that normally require escape sequences.

This proposal introduces raw strings to the Swift Programming Language using a simple mechanism that respects existing language conventions and adds Rust-inspired delimiters. This design applies to both single and multi-line strings to represent any raw source.

This proposal has been extensively revised based on the Core Team feedback for SE-0200. It was discussed on the Swift online forums and tuned by a dedicated working group with frequently updated toolchains.

Background

Raw strings and their design have been discussed in the following Evolution forum threads:

Background: Escape Sequences

Normal string literals may include the following special character sequences:

  • The escaped special characters \0 (null character), \ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quotation mark) and ' (single quotation mark)

  • An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit hexadecimal number with a value equal to a valid Unicode code point

The backslash escape tells the compiler that a sequence should combine for a special literal.

In raw strings, escapes are neither required nor recognized. In a raw string, the sequence \\\n represents three backslashes followed by the letter n, not a backslash followed by a carriage return.

Motivation

Raw strings are used for non-trivial content that cannot be satisfactorily maintained or read in escaped form, that should be co-located in source files and not be directed to an external text file.

Hand-escaped strings require time and effort to transform source material to an escaped form. It is difficult to validate the process to ensure the escaped form properly represents the original text. This task is also hard to automate as it may not pick up intended nuances, such as recognizing embedded dialog quotes.

Escaping actively interferes with inspection. Developers should be able to inspect and modify raw strings in-place without removing that text from source code. This is especially important when working with precise content such as code sources and regular expressions.

Pre-escaped source should not be interpreted by the Swift compiler. The already escaped source should be maintained as presented so it can be used, for example, when contacting web-based services.

Finally, raw strings are transportable. They allow developers to cut and paste content both from and to the literal string. This allows testing, reconfiguration, and adaption of raw content without the hurdles escaping and unescaping that limit development.

Examples

Raw string literals may include characters normally used for escaping (such as the backslash \ character) and characters normally requiring escaping (such as a double quote "). For example, consider the following multiline string. It represents code to be output at some point in the program execution:

let separators = """
    public static var newlineSeparators: Set<Character> = [
        // [Zl]: 'Separator, Line'
        "\u{2028}", // LINE SEPARATOR

        // [Zp]: 'Separator, Paragraph'
        "\u{2029}", // PARAGRAPH SEPARATOR
    ]
    """

Unescaped backslash literals cause the unicode escape sequences to be evaluated and replaced in-string. This produces the following result:

public static var newlineSeparators: Set<Character> = [
    // [Zl]: 'Separator, Line'
    "
", // LINE SEPARATOR

    // [Zp]: 'Separator, Paragraph'
    "
", // PARAGRAPH SEPARATOR
]

To preserve the intended text, each backslash must be escaped, for example \\u{2029}. This is a relatively minor edit but if the code is being copied in and out of the source to permit testing and modification, then each hand-escaped cycle introduces the potential for error.

Single-line string literals may similarly be peppered with backslashes to preserve their original intent, as in the following examples.

// Quoted Text
let quote = "Alice: "How long is forever?" White Rabbit: "Sometimes, just one second."" 
let quote = "Alice: \"How long is forever?\" White Rabbit: \"Sometimes, just one second.\""

// and

// Regular Expression
let ucCaseCheck = "enum\s+.+\{.*case\s+[:upper:]"
let ucCaseCheck = "enum\\s+.+\\{.*case\\s+[:upper:]"

Escaping blurs readability and interferes with inspection, especially in the latter example, where the content contains secondary escape sequences. Using a raw form ensures the expression can be read and updated as needed in the form that will be passed by the literal string.

Candidates

A good candidate for raw strings:

  • Is non-trivial.
  • Is obscured by escaping. Escaping actively harms code review and validation.
  • Is already escaped. Escaped material should not be pre-interpreted by the compiler.
  • Requires easy transport between source and code in both directions, whether for testing or just updating source.

The following example is a poor case for using a raw string:

let path = "C:\\AUTOEXEC.BAT"

The example is trivial and the escaping is not burdensome. It's unlikely that the string contents will require any further modification or re-use in a raw form.

Utility

Raw strings are most valuable for the following scenarios.

Metaprogramming: Use cases include code-producing-code. This incorporates utility programming and building test cases without escaping. Apps may generate color scheme type extensions (in Swift, ObjC, for SpriteKit/SceneKit, literals, etc) or date formatters, perform language-specific escaping, create markup, and more.

Escaping complicates copying and pasting from working code into your source and back. When you're talking about code, and using code, having that code be formatted as an easily updated raw string is especially valuable.

Examples of popular apps that perform these tasks include Kite Compositor and PaintCode. Any utility app that outputs code would benefit in some form.

Regular expressions: While regex in general is a much larger problem than raw strings, it is a primary (if not the primary) use case for many Swift developers. Adding raw strings to Swift now helps support the development of regular expressions down the line. It is not unreasonable to imagine a ExpressibleByRawStringLiteral protocol playing a role in regex design.

Pedagogy: Not all Swift learning takes place in the playground and not all code described in Swift source files use the Swift programming language.

Code snippets extend beyond playground-only solutions for many applications. Students may be presented with source code, which may be explained in-context within an application or used to populate text edit areas as a starting point for learning.

Removing escaped snippets to external files makes code review harder. Escaping (or re-escaping) code is a tedious process, which is hard to inspect and validate.

Data Formats and Domain Specific Languages: It's useful to incorporate short sections of unescaped or pre-escaped JSON and XML. It may be impractical to use external files and databases for each inclusion. Doing so reduces the ease of inspection, maintenance, and updates.

Windows paths: Windows uses backslashes to delineate descent through a directory tree: e.g., C:\Windows\All Users\Application Data. The more complex the path, the more intrusive the escapes.

Status

"Raw-mode" strings were first discussed during the SE-0168 Multi-Line String literals review and postponed for later consideration. This proposal focuses on raw strongs to allow the entry of single and multi-line string literals.

The first iteration of SE-0200 proposed adopting Python's model, using r"...raw string...". The proposal was returned for revision with the following feedback:

The review of SE-0200: "Raw" mode string literals 71 ran from March 16…26, 2018. The proposed is returned for revision, and should be further discussed as a pitch to coalesce further before coming up for review again.

During the review discussion, a few issues surfaced with the proposal, including:

The proposed r"..." syntax didn’t fit well with the rest of the language. The most-often-discussed replacement was #raw("..."), but the Core Team felt more discussion (as a pitch) is necessary.

The proposal itself leans heavily on regular expressions as a use case for raw string literals. Several reviewers remarked that the motivation wasn’t strong enough to justify the introduction of new syntax in the language, so a revised proposal will need additional motivating examples in other domains.

To move forward, the new raw string design must provide a suitable Swift-appropriate syntax that works within the language's culture and conventions.

Existing Art

The following links explore the existing art in other languages. We were inspired by the Rust raw string RFC discussion when researching these features.

Syntax Language(s) Possible in Swift? Swifty?
'Hello, world!' Bourne shell, Perl, PHP, Ruby, Windows PowerShell Yes Yes if Rust-style multiplicity allows incorporating ' into raw strings. May be too narrow a use-case to burn '.
q(Hello, world!) Perl (alternate) Maybe (depends on delimiter) No
%q(Hello, world!) Ruby (alternate) No (% is a valid prefix operator) No
@"Hello, world!" C#, F# Yes (but would be awful for Obj-C switchers) No
R"(Hello, world!)" C++11 Yes No
r"Hello, world!" D, Python Yes No
r#"Hello, world!"# Rust Yes Would need to drop the opening r and maybe change the delimiter from #.
raw"Hello, world!" Scala Yes No
`Hello, world!` D, Go, `...` No (conflicts with escaped identifiers) No, needs Rust multiplicity
``...`` Java, any number of ` No (conflicts with escaped identifiers) Yes

Design

We determined that a Rust-like approach, using an as-needed repeated delimiter offers the greatest flexibility with the smallest typical footprint.

In Rust, you may add as many pound signs needed before and after the raw string to disambiguate the end of a raw string.

Rust developers assured us that even one pound sign was unusual and more than one almost never needed.

Leading Backslash

The r"..." syntax failed to fit with Swift's design aesthetics. Instead, we chose to use a leading backslash, Swift's existing "escape" symbol. Under this design, a raw string looks like this:

\"This is a raw string"

\"""
    This is also a 
    raw string
    """

Both forms resemble existing string literals and the leading backslash suggests escaping.

Ignoring Escape Sequences

Raw strings allow you to eliminate escape sequences to present text as intended for use:

\"c:\windows\system32" // vs. "c:\\windows\\system32"
\"\d{3) \d{3} \d{4}" // vs "\\d{3) \\d{3} \\d{4}"

The following example terminates with backslash-r-backslash-n:

\"a raw string containing \r\n" 
// vs "a raw string containing \\r\\n"

The same raw behavior is extended to multi-line strings:

\"""
    a raw string containing \r\n
    """

Incorporating Escape Sequences

Raw strings allow you to incorporate already-escaped text. For example, you can paste static data without having to worry about re-escaping a JSON message

\"""
	[
		{
			"id": "12345",
			"title: "A title that \"contains\" \\\""
		}
	]
	"""

Without raw strings this would be silently un-escaped to yield an invalid JSON message. Even if you did remember to escape this process would be error prone and difficult to maintain.

Custom Delimiters

A raw string is normally terminated by " or """ for single and multi-line strings. These default string delimiters can no longer be escaped for inclusion.

We follow Rust's example to override this behavior and permit embedded quotes by creating custom delimiters. Just add an arbitrary number of pound signs before the first quote. Match these after the final quote:

\#"a string with "double quotes" in it"#

\##"a string that needs "# in it"##

\###"""
	a string with 
	"""
	in it
	"""###

These custom delimiters enable you to embed " and """ within the string, ensuring the raw string can represent all strings including embedded ones.

There is also space in Swift to allow the custom delimiter syntax to be used with conventional strings:

#"Hello "World""#
#"""
	print("""
		Hello \(what)
		""")
	"""#

This example is a conventional string in all ways other than the opening and closing delimiters. The interpolation sequence in this example will be evaluated. The leading backslash's absence signifies this is a non-raw string.

  • Custom delimiters ensure you can use elements that normally terminate strings within the string literal without escaping them.
  • This syntax uses one or more pound sign delimiters to adapt either raw or conventional strings.
  • A leading \ means a raw string literal is being defined.
  • # means custom delimiters are in use.
  • The number of leading pound signs matches the number of trailing pound signs.

Discoverability and Recognition

There are two questions of developer approach: discoverability ("how do I do a raw string in Swift") and recognition ("Why do some strings in Swift start with \ or #?"). Both are relatively easy to search for.

When presented to developers unfamiliar with the raw string syntax, we felt that \ used an existing semantic cue to indicate "escaping". We do not believe it is overly burdensome to search the web for:

  • "Why is there a backslash before the quote in Swift strings?"
  • "What do #/pound/number/etc signs mean in Swift strings?"
  • "How do I use raw strings in Swift?"

Implementation

Changes are largely confined to the file lib/Parse/Lexer.cpp and involvs a slight modification to the main lexer loop Lexer::lexImpl() to adapt the processing of tokens starting with # and \ to look for the presence of what could be a custom delimited string. If one is detected Lexer::lexStringLiteral() is called with modified arguments. Targeted changes to Lexer::lexCharacter() and Lexer::getEncodedStringSegment() bypass processing of the escape character \ when selected.

A new RawString flag in Token.h carries the the string escaping mode from the parsing to code generation phases of compiling.

Source compatibility

This is a purely additive change. The syntax proposed is not currently valid Swift.

Effect on ABI stability

None.

Effect on API resilience

None.

Alternatives considered

We excluded several designs from this proposal.

Excluding single quotes and backticks

Although ' may not be used for single character literals, there are some ongoing and important explorations into their use. Burning them on raw strings, a fairly niche use, is inadvisable.

Similarly, while backticks preserve the meaning of "code voice" and "literal", as you are used to in markdown, they would conflict with escaped identifiers.

We decided to stick with double quotes as currently used in single-line and multi-line Swift strings.

Using "raw" and "rawString"

The original design r"..." was rejected in part for not being Swifty, that is, not taking on the look and feel and characteristics of existing parts of the language. Similar approaches like raw"..." and #raw"..." carry the same issues. Leading text is distracting and competes for attention with the content of the string that follows.

In our samples, we concluded that both the leading backslash and any pound signs did not overwhelm string content.

Using user-specified delimiters

We felt user-specified delimiters overly complicated the design space, were harder to discover and use, and were generally un-Swifty. The pound sign is rarely used and a minor burden on the syntax.

@rawString(delimiter: @) \@"Hello"@ // no

We also rejected a standalone raw string attribute for being wordy and heavy, especially for short literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment