Skip to content

Instantly share code, notes, and snippets.

@jmschonfeld
Created March 14, 2023 22:52
Show Gist options
  • Save jmschonfeld/a75fb8d60444321586759a1bcda7340d to your computer and use it in GitHub Desktop.
Save jmschonfeld/a75fb8d60444321586759a1bcda7340d to your computer and use it in GitHub Desktop.

Predicate Archiving

Revision history

  • v1 Initial version

Introduction/Motivation

Now that we have introduced the base Predicate type along with its related APIs in the Swift Predicates pitch, we'd like to expand upon the serialization capabilities of Predicate. Providing the ability for safe and secure serialization is a critical feature of Predicate since predicates are commonly passed between processes for evaluation in an out-of-process database or even between hosts for communicating with a remote server. In our previous proposal we mentioned that Predicate will be Codable. We'd like to add conformance to Codable and CodableWithConfiguration (as well as expand upon how predicates will be encoded into an archive) in order to support the full range of situations where developers may want to write predicates into an archive.

Proposed solution and example

We will introduce a PredicateCodableConfiguration type that allows clients to be able to specify which types and keypaths they expect to find in an archive. Clients can construct an allowlist and provide it at encode/decode time:

var configuration = PredicateCodableConfiguration.standardConfiguration
configuration.allowType(Message.self, identifier: "MyApp.Message")
configuration.allowType(Person.self, identifier: "MyApp.Person")
configuration.allowKeyPath(\Message.sender, identifier: "MyApp.Message.sender")
configuration.allowKeyPath(\Person.firstName, identifier: "MyApp.Person.firstName")
configuration.allowKeyPath(\Person.lastName, identifier: "MyApp.Person.lastName")
	
struct MyRequest : Codable {
    let predicate: Predicate<Message>
    
    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(predicate, forKey: .predicate, configuration: configuration)
    }
    
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        predicate = try container.decode(Predicate<Message>.self, forKey: .predicate, configuration: configuration)
    }
}

Additionally, we will provide some conveniences to allow for simpler creation of the configuration. Developers can choose to omit the identifier for allowed types, in which case we will automatically use the fully qualified name of the type:

var configuration = PredicateCodingConfiguration.standardConfiguration
configuration.allowType(Person.self) // The identifier will be the fully qualified name, ex. "MyApp.Person"

Developers can also choose to allow all listed properties on a particular type rather than listing them all explicitly:

var configuration = PredicateCodableConfiguration.standardConfiguration
configuration.allowType(Message.self)
configuration.allowType(Person.self)
configuration.allowKeyPathsForPropertiesProvided(by: Message.self, recursive: true) // Includes all keypaths provided by Message as well as those provided by any of the Value types for each property

And finally, a predicate's input types are always implicitly included in the allowlist using their fully qualified names, so the caller does not need to specify Message.self as an allowed type in this instance.

Detailed design

Predicate Conformances

We propose adding the following conformances to Predicate:

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Predicate : Codable {
	public func encode(to encoder: Encoder) throws
	public init(from decoder: Decoder) throws
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Predicate : CodableWithConfiguration {
	public typealias EncodingConfiguration = PredicateCodableConfiguration
	public typealias DecodingConfiguration = PredicateCodableConfiguration
	
	public func encode(to encoder: Encoder, configuration: EncodingConfiguration) throws
	public init(from decoder: Decoder, configuration: DecodingConfiguration) throws
}

The CodableWithConfiguration conformance allows callers to specify a configuration to use when encoding/decoding while the standard Codable conformance will behave as if the default PredicateCodableConfiguration.standardConfiguration was provided.

PredicateCodableConfiguration

We also propose adding the following PredicateCodableConfiguration type which will be used to construct the information provided to Predicates for encoding/decoding:

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct PredicateCodableConfiguration : Sendable, CustomDebugStringConvertible {
	/// The list of keypaths and types allowed by default for Predicate
	public static let standardConfiguration: Self
	
	/// Creates an empty configuration that does not allow any types or keypaths
	public init()
	
	public mutating func allowType(_ type: Any.Type, identifier: String? = nil)
	public mutating func disallowType(_ type: Any.Type)
	
	public mutating func allowPartialType(_ type: Any.Type, identifier: String)
	public mutating func disallowPartialType(_ type: Any.Type)
	
	public mutating func allowKeyPath(_ keyPath: AnyKeyPath & Sendable, identifier: String)
	public mutating func disallowKeyPath(_ keyPath: AnyKeyPath & Sendable)
	
	public mutating func allowKeyPathsForPropertiesProvided<T: PredicateCodableKeyPathProviding>(by type: T.Type, recursive: Bool = false)
	public mutating func disallowKeyPathsForPropertiesProvided<T: PredicateCodableKeyPathProviding>(by type: T.Type, recursive: Bool = false)
	
	public mutating func allow(_ other: Self)
}

PredicateCodableKeyPathProviding

We propose adding a new protocol that will allow a type to provide a list of keypaths (and identifiers) for properties that can be encoded into a predicate. Conformance to this protocol does not indicate that the listed keypaths are always allowed, but rather that this type can be provided to the convenience PredicateCodableConfiguration.allowKeyPathsForPropertiesProvided(by:) function.

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public protocol PredicateCodableKeyPathProviding {
    // A dictionary mapping String identifiers to PartialKeyPaths. The string identifiers match what would be provided to the allowKeyPath(_:identifier:) API.
    public static var predicateCodableKeyPaths: [String : PartialKeyPath<Self>] { get }
}

We don't intend for this to be a widely used protocol like Codable or Equatable that would require an audit of all types in the SDK. Instead, we're providing this for API authors to add conveniences for interfacing types commonly used with Predicate and the PredicateCodableConfiguration type. For example, we won't be conforming every Foundation type to this protocol, but the developer of a framework like Spotlight might wish to conform their database item type to this protocol if that type will be commonly used with predicates.

Allowing KeyPaths

When you allow a keypath in your configuration, there are a few key pieces of behavior to keep in mind:

  • Predicate only supports single-component keypaths. Developers must only provide keypaths with one component to these functions (and the macro and predicate implementations also enforce this). Unfortunately as there is no way to restrict this as part the API declaration, so if the developer passes a multi-component keypath to the PredicateExpressions.KeyPath initializer or any PredicateCodableConfiguration functions we will fatalError with a description of the issue.
  • Allowing a KeyPath<Root, Value> also implicitly allows its Root and Value in the archive (since by nature of a keypath being included, it requires that usage of the Root and Value types will also be included within the predicate.

The Standard Configuration

PredicateCodableConfiguration.standardConfiguration contains a variety of standard types and keypaths that can be used in Predicate's without needing to specify them explicitly. These types are:

  • Any types explicitly specified as one of the input types of the predicate being decoded
  • All PredicateExpression types defined within Foundation (declared as a concrete list within the implementation)
  • String
  • Substring
  • Character
  • Int
  • Bool
  • Double
  • Array of any allowed element type
  • Set of any allowed element type
  • Dictionary of any allowed key and value type
  • OrderedSet of any allowed element type
  • Optional of any allowed wrapped type

The default list of allowed keypaths includes the \.count, \.isEmpty, \.first, and \.last properties on all applicable concrete types in the list above. Clients can add their own types/keypaths to this list by retrieving the PredicateCodableConfiguration.standardConfiguration and mutating the configuration via the allow/disallow APIs.

Note: The contents of this list are implicitly part of the API contract. Items can be added later, but removing an item from the list could cause binary compatibility issues for clients relying on the implicit inclusion of that item.

API Support for Third-Party Predicates

Predicate is designed to allow third party developers to create their own predicate types that can be serialized. In order to facilitate this and ensure developers can easily create safe and secure methods of encoding/decoding their custom predicates, we will provide custom utility functions that other predicate types can use to encode/decode their expression trees. These custom functions behave similarly to their Codable/CodableWithConfiguration counterparts, but are specifically for predicate expressions and produce/read both the expression tree and the structure tree in the archive so that developers do not need to create and reconstruct the structure themselves.

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension KeyedEncodingContainer {
    mutating func encodePredicateExpression<T: PredicateExpression<Bool> & Encodable, each Input>(_ expression: T, forKey key: Self.Key, variables: (repeat PredicateExpressions.Variable<each Input>), predicateConfiguration: PredicateCodableConfiguration) throws
    mutating func encodePredicateExpressionIfPresent<T: PredicateExpression<Bool> & Encodable, each Input>(_ expression: T?, forKey key: Self.Key, variables: (repeat PredicateExpressions.Variable<each Input>), predicateConfiguration: PredicateCodableConfiguration) throws
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension KeyedDecodingContainer {
    func decodePredicateExpression<each Input>(forKey key: Self.Key, inputs: (repeat each Input.Type), predicateConfiguration: PredicateCodableConfiguration) throws -> (expression: any PredicateExpression<Bool>, variables: (repeat PredicateExpressions.Variable<each Input>))
    func decodePredicateExpressionIfPresent<each Input>(forKey key: Self.Key, inputs: (repeat each Input.Type), predicateConfiguration: PredicateCodableConfiguration) throws -> (expression: any PredicateExpression<Bool>, variables: (repeat PredicateExpressions.Variable<each Input>))?
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension UnkeyedEncodingContainer {
    mutating func encodePredicateExpression<T: PredicateExpression<Bool> & Encodable, each Input>(_ expression: T, variables: (repeat PredicateExpressions.Variable<each Input>), predicateConfiguration: PredicateCodableConfiguration) throws
    mutating func encodePredicateExpressionIfPresent<T: PredicateExpression<Bool> & Encodable, each Input>(_ expression: T?, variables: (repeat PredicateExpressions.Variable<each Input>), predicateConfiguration: PredicateCodableConfiguration) throws
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension UnkeyedDecodingContainer {
    func decodePredicateExpression<each Input>(inputs: (repeat each Input.Type), predicateConfiguration: PredicateCodableConfiguration) throws -> (expression: any PredicateExpression<Bool>, variables: (repeat PredicateExpressions.Variable<each Input>))
    func decodePredicateExpressionIfPresent<each Input>(inputs: (repeat each Input.Type), predicateConfiguration: PredicateCodableConfiguration) throws -> (expression: any PredicateExpression<Bool>, variables: (repeat PredicateExpressions.Variable<each Input>))?
}

Developers can use these functions to implement the CodableWithConfiguration functions for their custom predicate type. For example:

// Example code (not proposed additional API)

extension PredicateCodableConfiguration {
    static let spotlightConfiguration: Self = {
        // Add any implicitly allowed types/keypaths for SpotlightPredicates, for example:
        var config = Self.standardConfiguration // Allow everything in the standard Predicate allowlist
        config.allowType(CSPerson.self) // Allow standard Spotlight types
        config.allowPartialType(PredicateExpressions.ModifiedWithin<PredicateExpressions.Value<Int>, PredicateExpressions.Value<Int>>.self) // Allow any uses of the PredicateExpressions.ModifiedWithin partial type (to allow this operator to be used in the predicates)
        return config
    }()
}

struct SpotlightPredicate<each Input> : CodableWithConfiguration {
    typealias EncodingConfiguration = PredicateCodableConfiguration
    typealias DecodingConfiguration = PredicateCodableConfiguration
    
    let variables: (repeat PredicateExpressions.Variable<each Input>)
    let expression: any SpotlightPredicateExpression<Bool>
    
    func encode(to encoder: Encoder, configuration: EncodingConfiguration) throws {
        var configuration = configuration
        configuration.allow(.spotlightConfiguration)
        var container = encoder.unkeyedContainer()
        try container.encodePredicateExpression(expression, variables: variables, predicateConfiguration: configuration)
    }
    
    init(from decoder: Decoder, configuration: DecodingConfiguration) throws {
        var configuration = configuration
        configuration.allow(.spotlightConfiguration)
        let container = try decoder.unkeyedContainer()
        let decoded = try container.decodePredicateExpression(inputs: repeat each Input.self, predicateConfiguration: configuration)
        self.variables = decoded.variables
        guard let decodedExpression = decoded.expression as? any SpotlightExpression<Bool> else {
            // Throw decoding error if decoded expression is not supported by SpotlightPredicate
        }
        self.expression = decodedExpression
    }
}

Choosing Codable vs. CodableWithConfiguration

As mentioned previously, Predicate (and third party predicates) offer a few options when archiving with regard to what is allowed to be serialized to/from the archive. Developers can:

  1. Use Predicate's default Codable implementation (providing no configuration) or provide PredicateCodableConfiguration.standardConfiguration - these have equivalent behavior and will only allow the predicate to contain the specified input types, the standard predicate expressions defined by Foundation, and a select list of common standard library/Foundation types and keypaths. Developers with very simple predicates can use this conformance as a convenience.
  2. Provide a custom configuration via Predicate's CodableWithConfiguration - this allows the predicate to contain anything listed in the provided configuration. Developers with more complex predicates and/or expanding/restricting the list of allowed types from the standard configuration can use this conformance.

Third party predicates can choose to provide either (or both) of these conformances. For example, a SpotlightPredicate would likely provide a Codable conformance that allows, by default, all types/keypaths in its declared spotlightConfiguration alongside a CodableWithConfiguration conformance to allow adding custom keypaths/types to that list.

Encoded Format

While not strictly API, it's also important to discuss the format that a predicate will take when encoded to an archive. A predicate must encode three pieces of information:

  1. The variables that represent its input parameters (these are trivially Codable since the variables are just represented by unique UInts)
  2. The expression tree contained within the predicate (these are also required to be Codable)
  3. A structure to describe the layout of the expression

The key to predicate's encoded format is the third item - the structure of the expression - which is used at decode time to construct a swift type that we can provide to the decoder in order to decode the expression tree itself. Predicate uses the provided PredicateCodableConfiguration in order to create this structure using the provided identifiers to represent the components of the expression type. For example, a developer may wish to encode the following predicate:

#Predicate<Message> { message
	message.content.count == 10
}

If the developer encoded this predicate using the coding configuration listed in the proposed solution section, the archived predicate would look like the following:

{
  "expression" : [
    {
      "root" : {
        "root" : {
          "key" : 5
        },
        "identifier" : "MyApp.Message.content"
      },
      "identifier" : "Swift.String.count"
    },
    2
  ],
  "variables" : [
    {
      "key" : 5
    }
  ],
  "structure" : {
    "args" : [
      {
        "args" : [
          {
            "args" : [
              {
                "args" : [
                  "MyApp.Message"
                ],
                "identifier" : "Foundation.PredicateExpressions.Variable"
              },
              "Swift.String"
            ],
            "identifier" : "Foundation.PredicateExpressions.KeyPath"
          },
          "Swift.Int"
        ],
        "identifier" : "Foundation.PredicateExpressions.KeyPath"
      },
      {
        "args" : [
          "Swift.Int"
        ],
        "identifier" : "Foundation.PredicateExpressions.Value"
      }
    ],
    "identifier" : "Foundation.PredicateExpressions.Equal"
  }
}

In this archived format, the "variable" key contains the identifier for the input variable, the "expression" key contains the encoded format of the wrapped expression itself (the result of directly encoding the Equal<KeyPath<KeyPath<Variable<Message>, String>, Int>, Value<Int>> value), and the "structure" key contains a tree of identifiers representing the type of the expression. Note that some basic types and keypaths such as Int.Type or \String.count are provided by default for Predicate and do not need to be manually specified in the provided configuration.

Impact on existing code

This API is additive only and does not impact existing code.

Alternatives considered / Future directions

Make KeyPath itself Codable

We considered adding a Codable conformance directly to KeyPath itself (rather than implementing the Codable behavior in PredicateExpressions.KeyPath). However, we decided to limit the scope of the KeyPaths that we can encode to just those supported by Predicate (those with only one component and no subscript arguments). Due to this limitation, we decided to encode KeyPaths as part of the PredicateExpressions.KeyPath type specifically for Predicate's use cases leaving the door open for a future, general solution to encoding arbitrary KeyPaths.

@CodableKeyPath attribute

We also considered using the runtime metadata attributes feature in order to provide an attribute that can be used to annotate which properties can or cannot have their keypaths encoded. However, this approach had a few limitations including:

  1. A large API audit overhead - we did not feel it was worthwhile to annotate almost every property across the standard library with this attribute
  2. Some properties' keypaths are not statically safely Codable - the attribute only allows for an all-or-nothing approach where some properties may only be safe to use in certain contexts
  3. Limitations of the runtime metadata attributes API - namely, attributes are not currently supported on properties of generic types (ex. Array<T>.count)

For those reasons, we felt that a @CodableKeyPath attribute did not provide the full solution that we are seeking. Rather than providing multiple incomplete avenues for specifying the codability of a property's keypath, we decided to focus on providing one fully supported mechanism via the PredicateCodableConfiguration. If in the future we're able to remove some of these limitations (for example by supporting attributes on generic types and creating a system in which standard library properties are implicitly annotated) then this approach leaves room for adding an attribute like this in the future.

Implicitly allow stored property based key paths

This proposal requires that keypaths are specified explicitly (either by the caller with an allowKeyPath call or via the type with a PredicateCodableKeyPathProviding conformance complemented by an allowKeyPathsForPropertiesProvided(by:) call). We previously investigated allowing for implicit inclusion of all stored properties on a type (with requiring the type itself be specified explicitly). However, this relied on the type's reflection data to extract the list of properties. Unfortunately, the reflection metadata does not include access level information (leading to the inclusion of private properties) and only includes stored properties (not computed properties). These limitations left this solution incomplete and potentially dangerously attractive to use, so we decided to pivot to the current approach using the PredicateCodableKeyPathProviding protocol.

Denylist style APIs instead of allowlist

Rather than providing an allowlist-like configuration, we could instead provide a denylist-like configuration that only requires listing which types/properties should not be encoded/decoded. However, we chose to strongly avoid any strictly denylist-based configration due to the large amount of security issues that we've seen with NSCoding and NSPredicate. Prior Objective-C APIs which allow the decoding of any arbitrary data subject only to the requirement of a denylist have historically led to many security vulnerabilities with everchanging denylists to catch bad actors. Instead, we think it's important that we lead this API with a strict requirement that types/properties must be specified by the encoder/decoder (aside from a few slight conveniences) rather than relying on the decoder to validate the contents after decoding potentially malicious data.

Macro support for PredicateCodableKeyPathProviding

In the future, we may decide to add a macro to support conformance to PredicateCodableKeyPathProviding. A macro, having visibility to see access-level information of each property, could use an attached annotation to synthesize conformance to the protocol by listing all public properties (or perhaps providing an option to specify a subset of properties to list). This would allow developers to more easily adopt this protocol. I'm not proposing this new macro as part of this proposal since a small subset of developers will actually need to conform to this protocol and I think there are a fair number of design questions about what that macro should look like. In the short term, developers can still conform to this protocol manually for types expected to be commonly used with predicates.

Always allow .standardConfiguration types within Predicate

A previous design did not require a caller to manually begin with or allow the .standardConfiguration in order to include the default list of allowed types/keypaths when decoding a predicate. However, it is important that some processes that are more secure have the ability to specify a configuration that is a restricted subset of the types included in the .standardConfiguration. For that reason, we made the decision that clients must explicitly allow (or begin from) the .standardConfiguration when providing a coding configuration to predicate so that clients who don't can create a more restrictive list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment