Skip to content

Instantly share code, notes, and snippets.

@karwa
Last active February 16, 2023 19:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karwa/bb8eb387dac10fd7c0c1fffc020c1c7c to your computer and use it in GitHub Desktop.
Save karwa/bb8eb387dac10fd7c0c1fffc020c1c7c to your computer and use it in GitHub Desktop.

WebURL Key-Value Pairs

In order to make it easier to read/write key-value pairs from URL components, WebURL 0.5.0 will include a new KeyValuePairs type. The current formParams view will be deprecated and removed in the next minor release.

Overview

URLs are universal identifiers, meaning they are able to subsume any other kind of identifier and express its information using a common set of components. Some of these components have a defined structure - for example, the URL's path can be used to express hierarchical information, and that idea of a hierarchy allows the standard to define operations which all URLs can support, such as partial paths (../foo) which are resolved relative to a base location.

Other components, such as the URL's query and fragment, are opaque strings with no standardized internal structure or operations. This opacity is what makes URLs so flexible, because applications can freely decide what they need to put there - it is perfectly valid to place a JSON or XML document in the query, or binary data in the fragment, etc.

One structure chosen by many applications is a list of key-value pairs. For instance, a search website might encode the user's search query and other filters and options in the URL's query component. These pairs are sometimes referred to as "query parameters":

                             key   value
                              │ ┌────┴────┐
https://www.google.com/search?q=hello+world&start=2&num=5&safe=on&as_rights=cc_publicdomain
                              └─────┬─────┘ └──┬──┘ └─┬─┘ └──┬──┘ └────────────┬──────────┘
                                  pair       pair   pair   pair              pair

This convention started with HTML, as a simple way for browsers to encode data in a form and submit it to a web-server. Due to the ubiquity of HTML and the fact that key-value pairs are a very versatile way of encoding information, this convention has since spread beyond just submitting forms, and is used in other URL components and outside of the browser.

It is important to note that this is a loose convention, and many aspects of what query parameters mean or how they are used remain application-defined. For example:

  • HTML forms do not have an encoding for arrays, so applications have invented their own formats. The following are all quite popular:

    foo=1,2,3                   // Single pair, value is a comma-separated list
    foo=1&foo=2&foo=3           // Multiple pairs with the same key
    foo[0]=1&foo[1]=2&foo[2]=3  // Indexes in square brackets
    foo[]=1&foo[]=2&foo[]=3     // Empty square brackets
    
  • HTML forms do not have an encoding for nested key-value pairs, which are required to encode objects. Again, applications have invented their own ways of encoding this, including:

    person[name]=John&person[age]=24               // Properties in square brackets
    person.name=John&person.age=24                 // Dot syntax
    person={%22name%22:%22John%22%2C%22age%22:24}  // JSON values below top level
    
  • The nature of the list of pairs is not always clear.

    In HTML, form content is serialized as a list of key-value pairs in the same order as their <input> tags appear in the document. There is not necessarily any logical relation between input elements with the same key name; it is simply a list of pairs, like an Array<(String, String)>.

    <!-- Navigates to http://localhost:8080/test?a=foo&a=bar&a=baz -->
    <form method="get" action="http://localhost:8080/test">
      <input name="a" value="foo"/>
      <input name="a" value="bar"/>
      <input name="a" value="baz"/>
      <input type="submit" />
    </form>

    Many applications also use this list-based interpretation. For instance, in the example below, chart pairs are used as a kind of separator, and the color pairs are interpreted in the context of the previously-specified chart. There is no relationship between the two pairs named color - instead, the position of pairs within the list is more important than whether they share the same key.

    chart=APPL&color=blue&chart=GOOGL&color=green
    └─────────┬─────────┘ └──────────┬──────────┘
           context                context
    

    Other applications may treat the list as a kind of multimap, with 1 key being associated with N values. This is the opposite interpretation - what matters is the relative order of pairs with the same key, and pairs with different keys can be reordered without changing meaning.

    chart=APPL&color=blue&chart=GOOGL&color=green
    chart=APPL&chart=GOOGL&color=blue&color=green
               ^^^^^^^^^^^^^^^^^^^^^^
    
    In a multimap interpretation, these strings have the same contents:
    - chart: [APPL, GOOGL]
    - color: [blue, green]
    

    Yet other applications expect only 1 value to be associated with a key, and interpret the list as if it were a kind of Dictionary. If a key happens to be present multiple times, they only care about the first entry, and all others are ignored. When associating a value with a key, they remove all other pairs with the same key.

    So essentially there are 3 common usage patterns for query strings: a list (Array), map (Dictionary), and multimap.

Given all of this variation, I'd like WebURL to offer a view which interprets an opaque URL component as an abstract list of key-value pairs. It will allow creating, reading, and modifying pairs in any opaque URL component, offer a variety of encoding options, and will support manipulating the component as either a list, map, or multimap.

It should be an excellent API for working with query parameters - but should not be limited to the query, or to HTML form-encoding, which is why why we call it the KeyValuePairs API.

Issues with the current API

WebURL's current query parameters API is a port of JavaScript's URLSearchParams class. This was done because it was well-specified (the JavaScript URL API is described in the URL Standard itself) and quick to implement, although it includes a number of limitations and unfortunate design decisions which JavaScript developers often struggle with. Some the standard's editors would like to see a new JS API one day.

Form-Encoding

The current API only supports form-encoding (from HTML forms; also known as application/x-www-form-urlencoded). This consists of two things:

  • The overall concept of a list of key-value pairs, separated by delimiter characters (& and =).
  • The way in which keys and values are encoded within a string.

The former is fine; the latter has serious issues which make it unsuitable in many contexts outside of HTML forms.

To understand why, let's talk about escaping. In Swift code, we can write string literals by enclosing text in double-quotes. If the text itself contains a double-quote, we have to escape it - either as \" or \u{0022}:

"I said "hello!""                // Error - we have to escape the internal " characters
"I said \"hello!\""              // OK
"I said \u{0022}hello!\u{0022}"  // Also fine.

Similarly, URLs have characters with special meaning (like / and ?), and if we want to use one of those characters in the content of a URL component, we have to escape it. Moreover, applications are able to define their own subcomponents, with internal delimiter characters of their choice (like the & and = chosen by HTML forms), which requires them to escape other uses of those delimiters (e.g. if our key name contains an & or =, we have to escape it). Some characters, such as spaces, newlines, and other control characters, must always be escaped.

The way in which we escape characters in URLs is called percent-encoding - we write %XX, where XX is a hexadecimal byte value. So for example, a forward slash (/) has byte value 47 (0x2F), and we escape it as %2F. Spaces have byte value 32 (0x20), so we escape them as %20 - the string "hello world" would be escaped as "hello%20world".

Except in HTML forms. Form-encoding allows + to be used as a shorthand for %20, meaning the above may also be written "hello+world". And since it gives "+" custom meaning, actual plus signs in the content need to be escaped with percent-encoding.

🕰 Historical note

This shorthand was described in an early, informational RFC (page 7) as a way of making it easier to hand-write URLs. It only ever applied to the URL's query component (never the path, fragment, or other components), and was removed from the standards-track RFC submitted 6 months later. It managed to stay around as an application-level extension in HTML and became part of HTML 2.0, the first version with a formal specification.

This shorthand is a problem. Since it is an application-level option, everybody who processes a URL needs to know in advance that it is in effect, and that the "+" they see in "hello+world" is actually an escaped space character, rather than a literal plus sign. Some libraries don't offer this option at all (e.g. Foundation), or only offer it (e.g JavaScript's URLSearchParams), with no way for users to choose. This can cause interoperability issues, as the next section demonstrates.

Foundation.URL and JavaScript

As mentioned above, Foundation.URL does not encode spaces with HTML forms' "+" shorthand, instead escaping them with regular percent-encoding (which is perfectly fine, remember the shorthand is an HTML feature, not a URL feature). Javascript's URLSearchParams class decodes these percent-encoded spaces correctly, but it also decodes "+" characters as spaces. Foundation, not anticipating that it is preparing content for a system which assumes form encoding, does not escape "+" characters.

The result is that URLs we create with Foundation are misinterpreted by JS:

// Swift
import Foundation

var urlComponents = URLComponents(string: "http://example/")!
urlComponents.queryItems = [
  URLQueryItem(name: "formula", value: "1 + 1 = 2")
]
print(urlComponents.url!)
// "http://example/?formula=1%20+%201%20%3D%202"
//                              ^
// JavaScript

var url = new URL("http://example/?formula=1%20+%201%20%3D%202");
for (const [key, value] of url.searchParams) {
  console.log("key: " + key);
  console.log("value: " + value);
}

// key: formula
// value: 1   1 = 2
// ❗️      ^^^

Likewise, since JS uses the shorthand when it writes content (with no option to disable it), and Foundation does not know to expect it, URLs we create in JS are misinterpreted by Foundation:

// JavaScript

var url = new URL("http://example/");
url.searchParams.append("message", "hello world");
console.log(url.href);
// "http://example/?message=hello+world"
//                               ^
// Swift
import Foundation

var urlComponents = URLComponents(string: "http://example/?message=hello+world")!
for pair in urlComponents.queryItems! {
  print("key:", pair.name)
  print("value:", pair.value!)
}

// key: message
// value: hello+world
// ❗️          ^

Non-HTML Applications

So what's the answer to this? How can we ensure these systems interoperate?

Foundation could produce more robust URLs by escaping plus signs in certain contexts - even though it is not technically required by any URL standard, in practice it can help interoperability (Spoiler alert: WebURL's new API will escape them).

However, that still wouldn't help it correctly interpret URLs prepared using form-encoding. Perhaps it should just always interpret plus signs as spaces, as JavaScript does? Well, it turns out that many applications use key-value pairs, but explicitly opt out of the "+" shorthand. For example, mailto: URLs:

Software creating 'mailto' URIs likewise has to be careful to encode any reserved characters that are used. HTML forms are one kind of software that creates 'mailto' URIs. Current implementations encode a space as '+', but this creates problems because such a '+' standing for a space cannot be distinguished from a real '+' in a 'mailto' URI. When producing 'mailto' URIs, all spaces SHOULD be encoded as %20, and '+' characters MAY be encoded as %2B. Please note that '+' characters are frequently used as part of an email address to indicate a subaddress, as for example in bill+ietf@example.org.

RFC 6068

Indeed, popular mail applications (such as Apple's Mail.App) interpret "+" characters in mailto: URLs as being literal plus signs. When pasting the following address in to a browser's URL bar, the CC field is populated by someone+swift@example.com, not someone swift@example.com, and the subject is "1+1".

mailto:user@example.com?cc=someone+swift@example.com&subject=1+1

The result is that, when using JavaScript's URLSearchParams to process a mailto: URL, it gives us incorrect values:

// key: cc
// value: someone swift@example.com
// ❗️            ^

// key: subject
// value: 1 1
// ❗️      ^

But Foundation's results match the RFC:

// key: cc
// value: someone+swift@example.com
// ✅            ^

// key: subject
// value: 1+1
// ✅      ^

What I'm to illustrate here is that neither of these APIs are wrong. Even though they both seem to support "query parameters", they are actually using different, incompatible encodings (both justifiable, but used in different contexts), and neither offers any options in how they encode or decode key-value pairs.

Restrictive Character Set

Form-encoding has an incredibly restrictive encode-set, which escapes all characters except ASCII alphanumerics and *-._. As a result, popular JavaScript libraries such as qs (used by Express.js and others, >295M downloads/month) include options to only escape the value portions of key-value pairs, and normalize-url (>120M downloads/month) tries to selectively unescape bits of the query after using URLSearchParams.

The URL standard does not require such restrictive escaping, and it is not inherently required by the key-value pairs structure. We'd like to allow applications to opt for a less-restrictive encode-set.

🔖 In Summary

Whilst the broad concept of a list of key-value pairs is fine, the specific way application/x-www-form-urlencoded encodes keys and values is a problem. It includes shorthands which can lead to interoperability issues, has a needlessly restrictive character set, and is incompatible with other applications which use key-value strings in URLs. WebURL should provide an API for key-value pairs, but it should not be limited to form-encoding.

Limited to the Query.

WebURL's current API is limited to reading and writing key-value pairs in the URL's query component, but it is perfectly reasonable to encode key-value pairs in other URL components, as well.

For instance, Media fragments make use of key-value pairs:

http://www.example.com/example.ogv#track=french&t=10,20
                                   └─┬─┘ └─┬──┘ │ └─┬─┘
                                    key  value key value

As do PDF fragment identifiers:

http://example.com/example.pdf#page=105&zoom=200
                               └┬─┘ └┬┘ └┬─┘ └┬┘
                               key value key value

The OAuth 2.0 implicit grant flow also used key-value pairs in the fragment:

http://example.com/cb#access_token=2YotnFZFEjr1zCsicMWpAA&state=xyz&token_type=example&expires_in=3600
                      └────┬─────┘ └─────────┬──────────┘ └─┬─┘ └┬┘
                          key              value           key   value  [...]

Note: The implicit flow is no longer recommended, because returning access tokens in URLs is inherently insecure (e.g. they might be logged, stored in history, etc). Nonetheless, it is perfectly valid to encode less sensitive data in the fragment; the implicit flow would not be more secure if the key-value pairs were in the query rather than the fragment.

Related WHATWG/URL Issue: Consider adding support for fragment parameters

Beyond the fragment, there is also evidence of applications encoding key-value strings in opaque paths. We haven't mentioned these so far, but they are a kind of URL where the path component is non-hierarchical and can also be used to store an opaque string. See WebURL.hasOpaquePath for more information.

FoxyProxy is an example:

proxy:host=foo.com&port=999&action=add
      └┬─┘ └──┬──┘ └┬─┘ └┬┘
      key   value  key value  [...]

Spotify, as well - although it uses an unusual schema where all the delimiters are the same:

spotify:user:<username>:playlist:<id> 
spotify:search:<text>

In addition to opaque paths, the URL standard allows for opaque hostnames. MongoDB connection strings are an example of a URL-like format where the hostname has internal structure, in that case consisting of a comma-separated lists of hosts. These too could be viewed as a list of pairs, perhaps with the host as the "key" and port number as the "value".

mongodb://mongodb0.example.com:27017,mongodb1.example.com:27017,mongodb2.example.com:27017/?replicaSet=myRepl
          └────────────┬───────────┘ └────────────┬───────────┘ └────────────┬───────────┘
                    element                    element                    element

Unfortunately MongoDB connection strings are not valid URLs, because : is reserved by the URL syntax itself within the hostname, so it is not available as a subcomponent delimiter there. However, if another delimiter was chosen, it would be a valid URL, and we could interpret the hostname as a list of key-value pairs:

// Uses '$' between the hostname and port. Now it's a valid URL.

otherdb://otherdb0.example.com$27017,otherdb1.example.com$27017,otherdb2.example.com$27017/?replicaSet=myRepl
                              ^                          ^                          ^

🔖 In Summary

A list of key-value pairs is a simple, versatile way to encode information, which makes it a useful structure for custom URL schemes in basically every component. We've seen how they can be used to add client-side options in media and PDF fragments, structured deep-links in FoxyProxy and Spotify, and they even seem like a useful structure in opaque hostnames.

With WebURL, I want to empower Swift applications to easily produce or consume any kind of URL, including inventing their own schemes, as these applications do. WebURL should provide an expressive, efficient, and well-tested implementation for manipulating key-value strings - it should not be limited to form-encoding, and it should not be limited to the URL's query component.

Lack of Positional APIs

JavaScript's URLSearchParams class (and WebURL's current API, which is based on it) focuses on the map and multimap usage patterns, and provides insufficient APIs for working with key-value strings at their fundamental level - as a list of pairs. The following table contains a brief summary of the URLSearchParams API.

API Description
get(key) Returns the first value associated with a key
set(key, value) Replaces the value in the first occurrence of the key, and erases all other occurrences
has(key) Indicates whether a key is present at all
delete(key) Erases all occurrences of a key
getAll(key) Returns all values associated with a key, in order
append(key, value) Inserts a key-value pair at the end of the list
sort() Sorts the list by key name
init(S) Constructs a URLSearchParams from a sequence of tuples
iteration Iterates through all key-value pairs in the list, in order

Considering our chart example from earlier:

chart=APPL&color=blue&chart=GOOGL&color=green

With this API, we cannot identify a particular key-value pair (e.g. the color pair for chart 'APPL'), or modify or remove only that pair. There is also no ability to insert new pairs at a particular location in the list (e.g. before chart=GOOGL, so still within the context of the preceding chart). APIs such as sort() cannot be limited to a particular range of the list, so there is no ability to sort a particular chart's context. In short, this API lacks indexes, and index-based functionality such as slices and positional modifications.

Indexes are also useful for multimap usage patterns. For instance, it is not uncommon to specify multiple sort criteria using multiple pairs, all with the key sort:

http://example/?q=test&sort=fieldA&sort=fieldB&anotherParam
                       ^^^^        ^^^^

Now imagine we want to insert a new field to sort by, before the existing fields. We would need to getAll existing values, delete them from the string, then append the new fields one-by-one. In JavaScript:

// JavaScript
var url = new URL("http://example/?q=test&sort=fieldA&sort=fieldB&anotherParam");

var existing = url.searchParams.getAll("sort");
url.searchParams.delete("sort");
url.searchParams.append("sort", "newField");
for (const val of existing) {
  url.searchParams.append("sort", val);
}

console.log(url.href);
// "http://example/?q=test&anotherParam=&sort=newField&sort=fieldA&sort=fieldB"
//                                       ^^^^^^^^^^^^^

This requires a lot of code and moves sort around in the list. It is also inefficient, requiring us to decode and re-encode every one of those sort fields, even the ones we do not change. With indexes, it could be much better:

// Swift

var url = WebURL("http://example/?q=test&sort=fieldA&sort=fieldB&anotherParam")!

let sortIdx = url.queryParams.firstIndex(where: { $0.key == "sort" }) ?? url.queryParams.endIndex
url.queryParams.insert(key: "sort", value: "newField", at: sortIdx)

print(url)
// "http://example/?q=test&sort=newField&sort=fieldA&sort=fieldB&anotherParam"
//                         ^^^^^^^^^^^^^

JavaScript developers also struggle with this lack of expressivity (example: URLSearchParams delete all vs delete one). Using Swift's Collection model, we can do better, and expose a rich set of operations which allow developers to express operations like the above more naturally, more precisely, and which can be implemented with greater efficiency.

Re-encoding

Conceptually, JavaScript's URLSearchParams class (and WebURL's current API, which is based on it) works by splitting the entire key-value string in to a list of (key, value) pairs, each of which is decoded using form-encoding. Modifications are then performed on that internal list, and when serializing, each key and value in that list is encoded to form the overall key-value string.

While it may seem like a simple mental model, it has a number of issues. Firstly, because applications can define their own subcomponent delimiters, percent-decoding is a destructive process. In other words, we can't say that a literal $ and %24 are necessarily equivalent - perhaps the escaping is intentional, and by removing it, or escaping previously-unescaped characters, we actually corrupt the information. Previous URL standards described it thus:

Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data.

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters.

So this model essentially disallows applications from introducing subcomponent delimiters. That may be fine for HTML forms, but as we've seen, applications in the wild like to use square brackets for arrays and nested key-value pairs (foo[bar]=baz, etc), as well as dot-notation (foo.bar=baz) - so they are defining subcomponents.

Moreover, this model decodes and re-encodes the entire list for any modification. In the following example, we append a query parameter, and it doesn't just append content to the string - it also re-encodes the existing 'subject' parameter using HTML forms quirks.

// JavaScript
// From https://github.com/whatwg/url/issues/573

let a = new URL('mailto:test@example.com?subject=Important%20business');
//                                                        ^^^

a.searchParams.append('body', 'hello');
console.log(a.href);
// "mailto:test@example.com?subject=Important+business&body=hello"
// ❗️                                        ^

In practice, this issue comes up, frequently in JavaScript.

WebURL should replace this model with one that leaves the contents of pairs intact unless they are explicitly being written to.

Unicode Awareness

This is a somewhat niche issue. The key-lookup functions of URLSearchParams (and WebURL's existing API) do not consider Unicode canonical equivalence.

As background - the word "café" may be encoded in at least two ways - "cafe\u{0301}" and "caf\u{00E9}"; the former contains a plain ASCII "e" code-point followed by U+0301 COMBINING ACUTE ACCENT, whilst the latter uses a single combined U+00E9 LATIN SMALL LETTER E WITH ACUTE code-point. The former is known as the "decomposed" form because we've split the single accented é character in to base and combining code-points, and the latter is the "precomposed" form, because we use pre-joined characters.

The Equatable conformance for Swift's String type uses Unicode canonical equivalence, so both forms of this character, and both of the above strings, compare as ==. This has secondary effects, as it means that "cafe\u{0301}" and "caf\u{00E9}" may not coexist as keys in a Swift Dictionary - if either one is present, we can successfully look it up or modify the entry using the other form.

// Swift

let dict = ["cafe\u{0301}" : "present"]
dict["caf\u{00E9}"]  // "present"

In order to provide a truly Swift-native key-value pairs experience, key lookup must also be Unicode aware. Otherwise, lookup and iteration will produce inconsistent results:

// Swift
// If key lookup were not Unicode-aware:

url.queryParams["caf\u{00E9}"]  // nil
url.queryParams.firstIndex { $0.key == "caf\u{00E9}" }  // ❗️ not-nil

Even though it seems to be rare to have Unicode keys, the above (and current) situation is may cause an unacceptable amount of confusion. Users provide the key to search for as a String, so keys should be matched in a way that is consistent with String.==.

It is important to note that using canonical equivalence in key lookup APIs mean we are more lenient than JavaScript at detecting matches (we will consider pairs to match which JavaScript does not). If this causes an undesirable match, it is trivial to implement a JavaScript-style code-point lookup function:

// Swift

extension WebURL.KeyValuePairs {
  func exactCodePointLookup(_ key: String) -> Index? {
    firstIndex { $0.key.unicodeScalars.elementsEqual(key.unicodeScalars) }
  }
}

Proposed Solution

A new type will be added, WebURL.KeyValuePairs, which is a view of a URL component as a list of key-value pairs. It is parameterized by a Schema type:

extension WebURL {
  public struct KeyValuePairs<Schema> where Schema : KeyValueStringSchema { /* ... */ }
}

extension WebURL.KeyValuePairs: Collection {}
extension WebURL.KeyValuePairs: Sendable where Schema: Sendable {}
extension WebURL.KeyValuePairs: CustomStringConvertible {}

To obtain a view, use the keyValuePairs function on a WebURL value, specifying the URL component and schema. There is also a closure-based API which provides an inout KeyValuePairs for modifying the list of pairs.

extension WebURL {

  public func keyValuePairs<Schema>(
    in component: KeyValuePairsSupportedComponent, schema: Schema
  ) -> KeyValuePairs<Schema>

  public mutating func withMutableKeyValuePairs<Schema, Result>(
    in component: KeyValuePairsSupportedComponent, schema: Schema,
    _ body: (inout KeyValuePairs<Schema>) throws -> Result
  ) rethrows -> Result
}

// Ideally, this would be a non-frozen enum :(
public struct KeyValuePairsSupportedComponent {
  public static var query: Self { get }
  public static var fragment: Self { get }
}
// Example: Reading and modifying PDF fragment identifiers.

var url = WebURL("http://example.com/example.pdf#page=105&zoom=200")!
let kvps = url.keyValuePairs(in: .fragment, schema: .percentEncoded)

for pair in kvps {
  print(pair.key, " -- ", pair.value)
  // page -- 105
  // zoom -- 200
}

print(kvps["page"])
// "105"

url.withMutableKeyValuePairs(in: .fragment, schema: .percentEncoded) { kvps in
  kvps["zoom"] = "350"
}
// "http://example.com/example.pdf#page=105&zoom=350"
//                                               ^^^

For convenience, the queryParams property is a shorthand for the above where component: .query, schema: .formEncoded. It includes a _modify accessor, so it allows one-off modifications without requiring a closure scope:

// Example: Reading query parameters.

let url = WebURL("http://example/search?q=quick+recipes&start=10&limit=20")!

let (searchQuery, start, limit) = url.queryParams["q", "start", "limit"]
// ("quick recipes", "10", "20")
// Example: Modifying query parameters.

var url = WebURL("http://example/search")!

url.queryParams["q"] = "quick recipes"
// "http://example/search?q=quick+recipes"
//                       ^^^^^^^^^^^^^^^^

url.queryParams += [
  ("start", "10"),
  ("limit", "20")
]
// "http://example/search?q=quick+recipes&start=10&limit=20"
//                                       ^^^^^^^^^^^^^^^^^^

Let's examine how it resolves the issues identified in the previous section.

Flexible Encoding with Good Defaults

WebURL.KeyValuePairs is parameterized by a KeyValueStringSchema object, which allows customizing various aspects of the encoding. It supports application/x-www-form-urlencoded (form-encoding), but is not limited to it.

public protocol KeyValueStringSchema {

  // - Delimiters.

  /// Whether a given ASCII code-point is a delimiter between key-value pairs.
  ///
  /// Schemas may accept more than one delimiter between key-value pairs.
  /// For instance, some systems must allow both semicolons and ampersands between pairs:
  ///
  /// ```
  /// key1=value1&key2=value2;key3=value3
  ///            ^           ^
  /// ```
  ///
  /// The default implementation returns `true` if `codePoint` is equal to ``preferredPairDelimiter``.
  ///
  func isPairDelimiter(_ codePoint: UInt8) -> Bool

  /// Whether a given ASCII code-point is a delimiter between a key and a value.
  ///
  /// Schemas may accept more than one delimiter between keys and values.
  /// For instance, a system may decide to accept both equals-signs and colons between keys and values:
  ///
  /// ```
  /// key1=value1&key2:value2
  ///     ^           ^
  /// ```
  ///
  /// The default implementation returns `true` if `codePoint` is equal to ``preferredKeyValueDelimiter``.
  ///
  func isKeyValueDelimiter(_ codePoint: UInt8) -> Bool

  /// The delimiter to write between key-value pairs when they are inserted in a string.
  /// [...]
  ///
  var preferredPairDelimiter: UInt8 { get }

  /// The delimiter to write between the key and value when they are inserted in a string.
  /// [...]
  ///
  var preferredKeyValueDelimiter: UInt8 { get }

  // - Escaping.

  /// Whether the given ASCII code-point should be percent-encoded.
  ///
  /// Some characters which occur in keys and values must be escaped
  /// because they have a reserved purpose for the key-value string or URL component:
  ///
  /// - Pair delimiters, as determined by ``WebURL/KeyValueStringSchema/isPairDelimiter(_:)-5cnsh``.
  /// - Key-value delimiters, as determined by ``WebURL/KeyValueStringSchema/isKeyValueDelimiter(_:)-4o6s0``.
  /// - The percent sign (`%`), as it is required for URL percent-encoding.
  /// - Any other code-points which must be escaped in the URL component.
  ///
  /// All other characters are written to the URL component without escaping.
  ///
  /// If this function returns `true` for a code-point, that code-point will _also_ be considered reserved and,
  /// if it occurs within a key or value, will be escaped when written to the key-value string.
  ///
  /// The default implementation returns `false`, so no additional code-points are escaped.
  ///
  func shouldPercentEncode(ascii codePoint: UInt8) -> Bool

  /// Whether a non-escaped plus sign (`+`) is decoded as a space.
  ///
  /// Some key-value strings support encoding spaces using the plus sign (`+`),
  /// as a shorthand alternative to percent-encoding them.
  /// This property must return `true` in order to accurately decode such strings.
  ///
  /// An example of key-value strings using this shorthand are `application/x-www-form-urlencoded`
  /// ("form encoded") strings, as used by HTML forms.
  ///
  /// ```
  /// Encoded: name=Johnny+Appleseed
  /// //                  ^
  /// Decoded: (key: "name", value: "Johnny Appleseed")
  /// //                                   ^
  /// ```
  ///
  /// Other key-value strings give no special meaning to the plus sign.
  /// This property must return `false` in order to accurately decode _those_ strings.
  ///
  /// An example of key-value strings for which plus signs are defined to simply mean literal plus signs
  /// are the query components of `mailto:` URLs.
  ///
  /// ```
  /// Encoded: cc=bob+swift@example.com
  /// //             ^
  /// Decoded: (key: "cc", value: "bob+swift@example.com")
  /// //                              ^
  /// ```
  ///
  /// Unfortunately, you need to know in advance which interpretation is appropriate
  /// for a particular key-value string.
  ///
  var decodePlusAsSpace: Bool { get }

  /// Whether spaces should be encoded as plus signs (`+`).
  ///
  /// Some key-value strings support encoding spaces using the plus sign (`+`),
  /// as a shorthand alternative to percent-encoding them.
  ///
  /// If this property returns `true`, the shorthand will be used.
  /// Otherwise, spaces will be percent-encoded, as all other disallowed characters are.
  ///
  /// ```
  /// Pair:  (key: "text", value: "hello, world")
  ///
  /// True:  text=hello,+world
  /// //                ^
  /// False: text=hello,%20world
  /// //                ^^^
  /// ```
  ///
  /// The default value is `false`. Use of the shorthand is **not recommended**,
  /// as the receiving system may not know that these encoded values require special decoding logic.
  /// The version without the shorthand can be accurately decoded by any system,
  /// even without that prior knowledge.
  ///
  /// If this property returns `true`, ``WebURL/KeyValueStringSchema/decodePlusAsSpace`` must also return `true`.
  ///
  var encodeSpaceAsPlus: Bool { get }
}

Defining a custom key-value string schema is a "pro feature". Most users will never have to do it, because WebURL ships with two built-in schemas - .formEncoded and .percentEncoded:

extension KeyValueStringSchema where Self == FormCompatibleKeyValueStringSchema {

  /// A key-value string compatible with form-encoding.
  ///
  /// **Specification:**
  ///
  /// - Pair delimiter: `&`
  /// - Key-value delimiter: `=`
  /// - Decode `+` as space: `true`
  /// - Encode space as `+`: `false`
  /// - Characters which do not require escaping:
  ///   - ASCII alphanumerics
  ///   - U+002A (\*), U+002D (-), U+002E (.), and U+005F (\_)
  ///
  /// This schema is used to read `application/x-www-form-urlencoded` content,
  /// such as that produced by HTML forms or Javascript's `URLSearchParams` class.
  ///
  /// Unlike Javascript's `URLSearchParams`, spaces in inserted key-value pairs
  /// are written using regular percent-encoding, rather than as plus signs.
  /// This removes a potential source of ambiguity for other systems processing the string.
  ///
  public static var formEncoded: Self { ... }
}

extension KeyValueStringSchema where Self == PercentEncodedKeyValueStringSchema {

  /// A key-value string which uses regular percent-encoding, rather than form-encoding.
  ///
  /// **Specification:**
  ///
  /// - Pair delimiter: `&`
  /// - Key-value delimiter: `=`
  /// - Decode `+` as space: `false`
  /// - Encode space as `+`: `false`
  /// - Characters which do not require escaping:
  ///   - All characters allowed by the URL component, except `+`.
  ///
  /// The differences between this schema and ``WebURL/FormCompatibleKeyValueStringSchema`` are:
  ///
  /// - Plus signs are interpreted simply as literal plus signs, rather than as encoded spaces.
  ///
  /// - All characters allowed in the URL component (except `+`) may be written without escaping.
  ///   `+` characters are escaped in order to avoid ambiguity on systems that use form encoding.
  ///
  public static var percentEncoded: Self { ... }
}

The main difference between these schemas is whether they interpret "+" characters already present in the URL component as escaped spaces:

  • If the content was prepared with the forms shorthand (e.g. by HTML, or JavaScript's URLSearchParams), use the .formEncoded schema,
  • Otherwise (e.g. for mailto: URLs, media fragments, content prepared with Foundation's URLQueryItem) choose .percentEncoded.
let url = WebURL("mailto:user@example.com?cc=someone+swift@example.com&subject=1+1")!

for pair in url.keyValuePairs(in: .query, schema: .formEncoded /* 👈 */) {
  print(pair)
  // ❗️ (key: "cc", value: "someone swift@example.com")
  //                               ^
  // ❗️ (key: "subject", value: "1 1")
  //                              ^
}

for pair in url.keyValuePairs(in: .query, schema: .percentEncoded /* 👈 */) {
  print(pair)
  // ✅ (key: "cc", value: "someone+swift@example.com")
  // ✅ (key: "subject", value: "1+1")
}

When it comes to modifying the URL component, neither of these schemas use the forms shorthand in data they write (even .formEncoded does not use it), and both opt to escape "+" characters in keys and values. This is a good default, as it means that both systems which expect form-encoding (such as JavaScript's URLSearchParams) and those which do not (such as Foundation) will accurately decode the contents.

✅ To reiterate - for writing content, you can use either schema and the contents will be accurately decoded by both JavaScript's URLSearchParams and Foundation's URLComponents.

// Content prepared with WebURL.KeyValuePairs, using either .formEncoded or .percentEncoded schemas.

var url = WebURL("http://example")!
url.withMutableKeyValuePairs(in: .query, schema: .formEncoded /* or .percentEncoded */) { kvps in
  kvps.append(key: "formula", value: "1 + 1 = 2")
}
// "http://example/?formula=1%20%2B%201%20%3D%202"

// ✅ Foundation can understand that.

import Foundation

let c = URLComponents(string: "http://example/?formula=1%20%2B%201%20%3D%202")!
for pair in c.queryItems! {
  print("key:", pair.name)
  print("value:", pair.value!)
  // key: formula
  // value: 1 + 1 = 2
}

// ✅ JavaScript can understand that.

var url = new URL("http://example/?formula=1%20%2B%201%20%3D%202");
for (const [key, value] of url.searchParams) {
  console.log("key: " + key);
  console.log("value: " + value);
  // key: formula
  // value: 1 + 1 = 2
}

The only other difference between .formEncoded and .percentEncoded is that the latter allows more characters to be used without escaping, and thus allows subcomponent delimiters to be defined.

// '.formEncoded' is more restrictive about which characters can be used without escaping.
// It uses the same rules as JS's URLSearchParams.

var url = WebURL("http://example")!
url.withMutableKeyValuePairs(in: .query, schema: .formEncoded /* 👈 */) { kvps in
  kvps += [
    ("$someValue(x)", "42"),
    ("$someValue(y)", "99"),
  ]
}
// "http://example/?%24someValue%28x%29=42&%24someValue%28y%29=99"
//                  ^^^         ^^^ ^^^    ^^^         ^^^ ^^^

// '.percentEncoded' is more lenient.

var url = WebURL("http://example")!
url.withMutableKeyValuePairs(in: .query, schema: .percentEncoded /* 👈 */) { kvps in
  kvps += [
    ("$someValue(x)", "42"),
    ("$someValue(y)", "99"),
  ]
}
// "http://example/?$someValue(x)=42&$someValue(y)=99"
//                  ^         ^ ^    ^         ^ ^

In the future, it would be nice to extend KeyValueStringSchema to allow customizing more aspects of the encoding process. For example, it is occasionally necessary to produce or decode content using pre-Unicode encodings such as SHIFT-JIS.

Supports Query and Fragment (and more later)

The initial version of WebURL.KeyValuePairs supports key-value strings in the query and fragment components. Unlike other libraries (and the current WebURL API), which deal solely with key-value pairs in the query, the new implementation supports reading and writing media fragments, PDF fragment identifiers, OAuth callback URLs, and more.

// Example: Reading and writing a media fragment

var url = WebURL("http://www.example.com/example.ogv#track=french&t=10,20")!

let (track, t) = url.keyValuePairs(in: .fragment, schema: .percentEncoded)["track", "t"]
// ("french", "10,20")

url.withMutableKeyValuePairs(in: .fragment, schema: .percentEncoded) {
  $0["t"] = "0:02:00"
}
// "http://www.example.com/example.ogv#track=french&t=0:02:00"
//                                                  ^^^^^^^^^
// Example: Custom schema in the fragment

struct CommaSeparated: KeyValueStringSchema {
  var preferredPairDelimiter: UInt8     { UInt8(ascii: ",") }
  var preferredKeyValueDelimiter: UInt8 { UInt8(ascii: ":") }
  var decodePlusAsSpace: Bool { false }

  func shouldPercentEncode(ascii codePoint: UInt8) -> Bool {
    PercentEncodedKeyValueStringSchema().shouldPercentEncode(ascii: codePoint)
  }
}

var url = WebURL("http://example")
url.withMutableKeyValuePairs(in: .fragment, schema: CommaSeparated()) { kvps in
  kvps += [
    ("foo", "bar"),
    ("baz", "qux"),
    ("flag", "true")
  ]
}
// "http://example#foo:bar,baz:qux,flag:true"
//                ^^^^^^^^^^^^^^^^^^^^^^^^^^
// Example: Reading from an OAuth 2.0 implicit grant callback URL

let oauth = WebURL("http://example.com/cb#access_token=2YotnFZFEjr1zCsicMWpAA&state=xyz&token_type=example&expires_in=3600")!
let (token, timeout) = oauth.keyValuePairs(in: .fragment, schema: .formEncoded)["access_token", "expires_in"]
// ("2YotnFZFEjr1zCsicMWpAA", "3600")

// Example: Creating an OAuth 2.0 implicit grant callback URL

var callbackURL = WebURL("http://example.com/cb")!
callbackURL.withMutableKeyValuePairs(in: .fragment, schema: .formEncoded) { kvps in
  kvps += [
    ("access_token", "some token"),
    ("state", "foobar"),
    ("token_type", "xyz"),
    ("expires_in", "1800"),
  ]
}
// "http://example.com/cb#access_token=some%20token&state=foobar&token_type=xyz&expires_in=1800"
//                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Other components, such as opaque hostnames and paths, could be supported in the future by extending KeyValuePairsSupportedComponent.

Great Positional APIs

WebURL.KeyValuePairs conforms to Collection. That means each key-value pair is assigned an Index, which uniquely identifies its position in the list. Slicing, and operations such as map, compactMap, reduce, filter, and contains(where:), are built-in. This already provides a lot of very natural, efficient APIs for parsing key-value strings.

let url = WebURL("http://example/?chart=APPL&color=green&chart=GOOGL&color=blue")!

struct ChartInfo {
  var name: String
  var color: String?
}

enum ParsingError: Error {
  case noChartContext
  case unknownKey(String)
}

let parsed = try url.queryParams.reduce(into: [ChartInfo]()) { charts, kvp in
  switch kvp.key {
  case "chart":
    charts.append(ChartInfo(name: kvp.value))
  case "color":
    guard !charts.isEmpty else { throw ParsingError.noChartContext }
    charts[charts.count - 1].color = kvp.value
  default:
    throw ParsingError.unknownKey(kvp.key)
  }
}
// [
//    ChartInfo(name: "APPL", color: "green"),
//    ChartInfo(name: "GOOGL", color: "blue"),
// ]

The Element type is a slice of the key-value string, meaning it shares storage with the URL it comes from and decodes its key and value properties on-demand. This is a somewhat unusual choice, but most of the time when searching or filtering the list, users only care about the key and we can achieve significant performance improvements by not eagerly decoding the value. The value also tends to be much larger than the key, and is more likely to require percent-decoding, so it's more expensive to decode and store in a String.

Also, since it is a slice, we can provide the encoded key/value exactly as they appear in the URL string, which is important if subcomponent delimiters are being used as we can distinguish escaped from non-escaped characters.

extension WebURL.KeyValuePairs {

  /// A slice of a URL component, containing a single key-value pair.
  ///
  /// Use the ``key`` and ``value`` properties to access the pair's decoded components.
  ///
  /// ```swift
  /// let url = WebURL("http://example/?name=Johnny+Appleseed&age=28")!
  /// //                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ///
  /// for kvp in url.queryParams {
  ///   print(kvp.key, "-", kvp.value)
  /// }
  ///
  /// // Prints:
  /// // "name - Johnny Appleseed"
  /// // "age - 28"
  /// ```
  ///
  /// Because these key-value pairs are slices, they share storage with the URL and decode the key/value on-demand.
  ///
  public struct Element {

    /// The key component, as decoded by the schema.
    ///
    public var key: String { get }

    /// The value component, as decoded by the schema.
    ///
    public var value: String { get }

    /// The key component, as written in the URL (without decoding).
    ///
    public var encodedKey: String { get }

    /// The value component, as written in the URL (without decoding).
    ///
    public var encodedValue: String { get }
  }
}

extension WebURL.KeyValuePairs.Element: Sendable where Schema: Sendable {}
extension WebURL.KeyValuePairs.Element: CustomStringConvertible {}

KeyValuePairs also comes with a full complement of positional mutation APIs. It is based on APIs you may be used to from RangeReplaceableCollection or Array, so it will feel familiar, but these methods return new indexes so they compose much better than RRC does.

Keys and values are to be provided without any percent-encoding. The KeyValuePairs view automatically escapes the strings to preserve their values exactly as given.

extension WebURL.KeyValuePairs {

  // - Appending.

  /// Inserts a collection of key-value pairs at the end of this list.
  ///
  /// This function is equivalent to the `+=` operator.
  ///
  @discardableResult
  public mutating func append(
    contentsOf newPairs: some Collection<(some StringProtocol, some StringProtocol)>
  ) -> Range<Index>

  /// Inserts a collection of key-value pairs at the end of this list.
  ///
  /// This operator is equivalent to the ``append(contentsOf:)-84bng`` function.
  ///
  public static func += (
    lhs: inout WebURL.KeyValuePairs<Schema>,
    rhs: some Collection<(some StringProtocol, some StringProtocol)>
  )

  /// Inserts a key-value pair at the end of this list.
  ///
  @inlinable
  public mutating func append(
    key: some StringProtocol,
    value: some StringProtocol
  ) -> Index

  // - Inserting.

  /// Inserts a collection of key-value pairs at a given location.
  ///
  /// The new pairs are inserted before the pair currently at `location`.
  ///
  @discardableResult
  public mutating func insert(
    contentsOf newPairs: some Collection<(some StringProtocol, some StringProtocol)>,
    at location: Index
  ) -> Range<Index>

  /// Inserts a key-value pair at a given location.
  ///
  /// The pair is inserted before the pair currently at `location`.
  ///
  @discardableResult
  public mutating func insert(
    key: some StringProtocol, value: some StringProtocol,
    at location: Index
  ) -> Range<Index>

  // - Removing.

  /// Removes the key-value pairs at the given locations.
  ///
  @discardableResult
  public mutating func removeSubrange(_ bounds: some RangeExpression<Index>) -> Index

  /// Removes the key-value pair at a given location.
  ///
  @discardableResult
  public mutating func remove(at location: Index) -> Index
 
  /// Removes all key-value pairs in the given range which match a predicate.
  ///
  public mutating func removeAll(
    in bounds: some RangeExpression<Index>,
    where predicate: (Element) -> Bool
  )

  /// Removes all key-value pairs which match a predicate.
  ///
  public mutating func removeAll(where predicate: (Element) -> Bool)
 
  // - Replacing.

  /// Replaces the key-value pairs at the given locations.
  ///
  /// The number of new pairs does not need to equal the number of pairs being replaced.
  ///
  @discardableResult
  public mutating func replaceSubrange(
    _ bounds: some RangeExpression<Index>,
    with newPairs: some Collection<(some StringProtocol, some StringProtocol)>
  ) -> Range<Index>

  /// Replaces the 'key' portion of the pair at a given location.
  ///
  @discardableResult
  public mutating func replaceKey(
    at location: Index,
    with newKey: some StringProtocol
  ) -> Index

  /// Replaces the 'value' portion of the pair at a given location.
  ///
  @discardableResult
  public mutating func replaceValue(
    at location: Index,
    with newValue: some StringProtocol
  ) -> Index
}

Again, the goal is that by providing a comprehensive set of APIs, operations can be expressed naturally and implemented efficiently. Here are some examples:

// Example: Building a query string by appending.

var url = WebURL("https://example.com/convert")!
url.queryParams += [
 ("amount", "200"),
 ("from",   "USD"),
 ("to",     "EUR"),
]
// "https://example.com/convert?amount=200&from=USD&to=EUR"
//                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
// Example: Inserting a sort field at a particular location.

var url = WebURL("http://example/students?class=12&sort=name")!

// Find the location of the existing 'sort' key,
// insert an additional 'sort' key before it.
let sortIdx = url.queryParams.firstIndex(where: { $0.key == "sort" }) ?? url.queryParams.endIndex
url.queryParams.insert(key: "sort", value: "age", at: sortIdx)

// "http://example/students?class=12&sort=age&sort=name"
//                                   ^^^^^^^^

Recall that the above example was incredibly awkward to perform using the JavaScript URLSearchParams API.

// Example: Replacing a sort field in-place.

var url = WebURL("http://example/students?class=10&sort=age&sort=name")!

url.withMutableKeyValuePairs(in: .query, schema: .percentEncoded) { kvps in
  if let match = kvps.firstIndex(where: { $0.key == "sort" && $0.value == "age" }) {
    kvps.replaceValue(at: match, with: "age[desc]")
  }
}
// "http://example/students?class=10&sort=age[desc]&sort=name"
//                                   ^^^^^^^^^^^^^^
// Example: Faceted search.
// This is like toggling a filter on a search UI.

extension WebURL.KeyValuePairs {
  mutating func toggleFacet(name: String, value: String) {
    if let i = firstIndex(where: { $0.key == name && $0.value == value }) {
      removeAll(in: i..., where: { $0.key == name && $0.value == value })
    } else {
      append(key: name, value: value)
    }
  }
}

var url = WebURL("http://example.com/products?brand=A&brand=B")!

url.queryParams.toggleFacet(name: "brand", value: "C")
// "http://example.com/products?brand=A&brand=B&brand=C"
//                                              ^^^^^^^

url.queryParams.toggleFacet(name: "brand", value: "B")
// "http://example.com/products?brand=A&brand=C"
//                                    ^^^
// Example: Filtering tracking parameters.

let trackingKeys: Set<String> = [
  "utm_source", "utm_medium", "utm_campaign", "utm_term", "utm_content", /* ... */
]
  
var url = WebURL("http://example/p?sort=new&utm_source=swift.org&utm_campaign=example&version=2")!

url.queryParams.removeAll { trackingKeys.contains($0.key) }
// "http://example/p?sort=new&version=2"

The next example is my personal favorite.

// Example: Take some data from a document (e.g. a configuration file)
// and convert it to URL query parameters -- in one line.

let document = """
- foo: bar
- baz: quxes & quxettes
- formula: 1 + 1 = 2
"""

var url = WebURL("http://example/p")!
url.queryParams += document.matches(of: /- (?<key>.+): (?<value>.+)/).lazy.map { ($0.key, $0.value) }

// "http://example/p?foo=bar&baz=quxes%20%26%20quxettes&formula=1%20%2B%201%20%3D%202"

The reason I really like this example is not because I think it's an especially common thing to do, but because it really shows the advantages of using generics:

String.matches returns an Array of Regex matches, - but that's no problem, because the KeyValuePairs API accepts generic Collections, so we can lazy-map it, dig out the parts of the regex match we need (the captures), and feed that directly to the += operator. And those regex captures? Those are exposed using Substring (not String), but that's also not a problem, because the API accepts generic Collections of generic StringProtocol values.

Other than the lazy map, we don't need to perform any conversions. This code is very efficient - but more importantly, it's really easy to read.

To illustrate that point, let's compare to the equivalent using Foundation's URLComponents:

import Foundation

var url = URLComponents(string: "http://example/p")!
url.queryItems = document.matches(of: /- (?<key>.+): (?<value>.+)/).map {
//                                                                 ^ - copy
  URLQueryItem(name: String($0.key), value: String($0.value))
//                   ^^^^^^                 ^^^^^^ - copies
}
// "http://example/p?foo=bar&baz=quxes%20%26%20quxettes&formula=1%20+%201%20%3D%202"

Foundation's URLQueryItem only works with String, so the key and value from each match needs to be copied in to an individual allocation. Furthermore, URLComponents.queryItems only works with Array, so we also need to make an eager copy of the entire list of matches.

WebURL's version is cleaner at the point of use and more efficient and the way it encodes values doesn't have the same ambiguity that Foundation or JavaScript have, so it produces a more robust, interoperable result. It's a big improvement across the board.

No Surprise Re-Encoding

WebURL.KeyValuePairs does not follow the URLSearchParams conceptual model which eagerly decodes all pairs in to an internal list. Instead, opaque indexes refer to regions of the underlying URL string, and are modified in-place. Changes only affect regions of the string referred to by the given index or range of indexes.

Recall that previously, we saw how appending a value using JavaScript's URLSearchParams would re-encode all existing key-value pairs and introduce undesirable escaping in them. JavaScript libraries often try to reduce escaping after using that API, but at the same time, altering escaping can be a destructive process, so this API and the processes around it are all flawed.

The following example shows how WebURL approaches a similar situation - we start with a query, which is escaped as necessary by URL rules (so "$" and "()" are allowed without escaping). Then, we append a pair using the queryParams property, which escapes characters according to HTML forms rules (so "$" and "()" must be escaped). The new value is encoded using forms rules, but, importantly, the existing content is not modified.

var url = WebURL("http://example/?$someValue(x)=42&$someValue(y)=99")!

url.queryParams.append(key: "$someValue(z)", value: "-1")
// "http://example/?$someValue(x)=42&$someValue(y)=99&%24someValue%28z%29=-1"
//                                                    ^^^^^^^^^^^^^^^^^^^

Unicode Aware Map/MultiMap APIs

WebURL.KeyValuePairs also includes convenient key-based APIs to support applications which view the list of pairs as a map or multi-map.

extension WebURL.KeyValuePairs {

  // - Finding Values By Key

  /// Returns the first value associated with the given key.
  ///
  subscript(some StringProtocol) -> String? { get set }

  /// Returns the first value associated with each of the given keys.
  ///
  subscript(some StringProtocol, some StringProtocol) -> (String?, String?) { get }

  /// Returns the values associated with each of the given keys.
  ///
  subscript(some StringProtocol, some StringProtocol, some StringProtocol) -> (String?, String?, String?) { get }

  /// Returns the values associated with each of the given keys.
  ///
  subscript(some StringProtocol, some StringProtocol, some StringProtocol, some StringProtocol) -> (String?, String?, String?, String?) { get }

  /// Returns all values associated with the given key.
  ///
  func allValues(forKey: some StringProtocol) -> [String]

  // - Modifying Values With a Given Key

  /// Associates a value with a given key.
  ///
  func set(key: some StringProtocol, to: some StringProtocol) -> Index
}

These APIs all accept a key as a generic StringProtocol and perform Unicode-aware key matching (equivalent to .first { $0.key == someKey }). As far as I am aware, no other URL library does this, but it seems the most appropriate fit for Swift, and matches how Strings work elsewhere in the language.

// Example: Unicode-aware key lookup.
// We encode a precomposed character, but lookup works even if we provide the decomposed version. 

var url = WebURL("http://example")!

url.queryParams.append(key: "caf\u{00E9}", value: "present")
// "http://example/?caf%C3%A9=present"

print(url.queryParams["caf\u{00E9}"])   // ✅ "present"
print(url.queryParams["cafe\u{0301}"])  // ✅ "present"

Unlike looking up values in a Dictionary, key lookup here has O(n) complexity - ultimately this is just a view over a URL component; not a hash table. It would be quite expensive to look up multiple keys in that way, so we offer batched lookups which can search for 1-4 keys in a single traversal.

Not only does this perform better than independent lookups, it ends up actually being quite convenient due to Swift's ability to destructure the result in to multiple local variable bindings.

let url = WebURL("http://example/search?q=indian+recipes&sort=time&limit=20")!

let (searchQuery, sort, limit) = url.queryParams["q", "sort", "limit"]
// ("indian recipes", "time", "20")
// Batched lookup of N keys vs N independent lookups:

// NonEncoded = ASCII keys, no percent-encoding
// Encoded    = Unicode keys, must be decoded
// LongKeys   = Decoded key length > 15 bytes. Small-string optimization cannot be used.

name                                      batched     ratio    independent  ratio    
----------------------------------------------------------------------------------
QueryParameters.Get.NonEncoded            479.000 ns  1.00x     461.000 ns  1.00x  
QueryParameters.Get2.NonEncoded           696.000 ns  1.45x     873.000 ns  1.89x
QueryParameters.Get3.NonEncoded           796.000 ns  1.66x    1161.000 ns  2.52x
QueryParameters.Get4.NonEncoded           912.000 ns  1.90x    1519.000 ns  3.30x

QueryParameters.Get.Encoded               683.000 ns  1.00x     609.000 ns  1.00x
QueryParameters.Get2.Encoded              960.000 ns  1.40x    1112.000 ns  1.83x
QueryParameters.Get3.Encoded             1154.000 ns  1.69x    1634.000 ns  2.68x
QueryParameters.Get4.Encoded             1265.000 ns  1.85x    2142.000 ns  3.52x

QueryParameters.Get.LongKeys.NonEncoded   947.000 ns  1.00x     862.000 ns  1.00x
QueryParameters.Get2.LongKeys.NonEncoded 1331.000 ns  1.40x    1728.000 ns  2.00x

QueryParameters.Get.LongKeys.Encoded     1251.000 ns  1.00x    1122.000 ns  1.00x
QueryParameters.Get2.LongKeys.Encoded    1725.000 ns  1.38x    2245.000 ns  2.00x

Future Work

Normalization

Users frequently want to normalize key-value strings in order to maximize cache hits (e.g. sorting pairs, if it is known that order doesn't matter). We should offer a normalization API which works within a given range, and offers things like Unicode normalization in addition to sorting. Given that the goal is to maximize cache hits, we may also want to offer sorting options which work similarly to JavaScript.

Non-Unicode Text

Sometimes key-value pairs contain non-Unicode text (e.g. SHIFT-JIS, Latin-1). There is a particular way to encode that, which is not supported right now, but we could add hooks to KeyValueStringSchema for reading and writing that kind of content.

Pre-escaped Keys

Typically, when you write some key-value pairs (e.g. using append), the content is escaped for you. However, applications which define custom subcomponent delimiters (e.g. using [] for arrays or . for object properties) need more precise control over escaping. More specifically, they need to be able to perform their own percent-encoding when content conflicts with their subcomponent delimiters (e.g. if a key contains [], or contains a dot), and when they write that content, it should not be escaped a second time.

The idea is to offer a special scope to do that work in:

struct BracketsEncoder: PercentEncodeSet {
  func shouldPercentEncode(ascii codePoint: UInt8) -> Bool {
    codePoint == UInt8(ascii: "[") || codePoint == UInt8(ascii: "]") || codePoint == UInt8(ascii: "%")
  }
}

extension WebURL.KeyValuePairs {

  mutating func appendArray(_ name: String, _ value: [String]) {
    let escapedName = name.percentEncoded(using: BracketsEncoder()) + "[]"
    writingPreEscaped([.key]) {  // <-- This function is what we'll add. Maybe with a better name :)
      $0.append(contentsOf: value.lazy.map { (escapedName, $0) })
    }
  }

  func getArray(_ name: String) -> [String] {
    let arrayName = name + "[]"
    return compactMap { kvp in
      kvp.key == name || kvp.key == arrayName ? kvp.value : nil
    }
  }
}

var url = WebURL("http://example/")!
url.queryParams.appendArray("foo[bar]", ["hello", "world"])

// "http://example/?foo%5Bbar%5D[]=hello&foo%5Bbar%5D[]=world"
//                     ^^^   ^^^            ^^^   ^^^
// These brackets are part of the key name, so they are escaped.

It's complicated because within that scope, it breaks the symmetry between the keys you write and keys you read. Maybe that's okay, but more investigation is needed to say for sure.

There's also the question of what to do if your subcomponent delimiters are not valid for a given schema (e.g. all of those JavaScript libraries which use [] despite it being banned by form-encoding). I don't have a good answer for that right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment