Skip to content

Instantly share code, notes, and snippets.

@davedelong
Last active April 3, 2018 15:33
Show Gist options
  • Save davedelong/faf5a37cf429a9534e873725cd67b056 to your computer and use it in GitHub Desktop.
Save davedelong/faf5a37cf429a9534e873725cd67b056 to your computer and use it in GitHub Desktop.
Proposal: ContainmentSet

Adding the ContainmentSet Protocol to Swift

  • Proposal: SE-TBD
  • Author(s): Dave DeLong
  • Review manager: TBD
  • Status: TBD

Introduction

A new public protocol named ContainmentSet unifies set semantics.

This proposal was discussed on-forum in the Pitch: ContainmentSet thread.

Motivation

A major source of "why doesn't this just work?" frustration rises from the turbulent edge of Cocoa and Swift. Consider the disparity between CharacterSet and Set<Character>. The former, bridged from Foundation, is an algorithmic set describing Unicode.Scalar values. The latter is an actual Set of extended grapheme clusters.

Being a struct, Set<Element> cannot perform simple set-like operations. It cannot be inverted like a CharacterSet. This is a common operation to perform when parsing a string, for example. Given a CharacterSet that describes allowed characters, the inversion describes disallowed characters.

let newlines = CharacterSet.newlines
let charactersThatAreNotNewlines = newlines.inverted

Set<Element> is a Collection. Inverting it is impossible. Set<Character> cannot know the entire domain of Character values over which it should iterate.

Inverting CharacterSet over a Swift type is also impractical. It is limited to Unicode.Scalar values and cannot, for example, contain emoji. Most emoji are composed of multiple Unicode.Scalar values.

Swift needs a way to describe an abstract set, which can test containment but does not promise iteration support.

Detailed Design

public protocol ContainmentSet {
    associatedtype ContainedElement
    
    /// Returns a truth value indicating whether an element is a
    /// member of a conforming instance.
    /// Where possible, this should be O(1).
    func contains(_ element: ContainedElement) -> Bool
    
    /// Returns a union of two instances of the conforming type
    /// This is a logical-or operation.
    func union(_ other: Self) -> Self
    
    /// Returns the intersection (possibly empty) of two instances
    /// of the conforming type.
    /// This is a logical-and operation.
    func intersection(_ other: Self) -> Self
    
    /// Returns `self`, removing any items that appear in the union of
    /// `self` and `other`.
    func subtracting(_ other: Self) -> Self
    
    /// Returns the union of items unique to `self` and `other`.
    /// This is a logical exclusive-or operation.
    func symmetricDifference(_ other: Self) -> Self

}

ContainmentSet will also have a protocol extension that defines:

public extension ContainmentSet {
    
    /// Returns a `ContainmentSet` over the associated type domain, 
    /// with the current elements removed.
    /// This is a logical-not operation.
    var inverted: PredicateSet<ContainedElement> { get }
    
    func union<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
    
    func intersection<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
    
    func symmetricDifference<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
    
    func subtracting<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
    
}

The Standard Library will also add a new type, called PredicateSet, which adopts ContainmentSet:

public struct PredicateSet<T>: ContainmentSet {
    
    public typealias ContainedElement = T
    
    /// Initializes a `PredicateSet` with another `ContainmentSet`,
    /// effectively making a type-erased version of the other `ContainmentSet`
    public init<C>(_ set: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
   
    /// Create a `PredicateSet` containing all elements that pass the predicate
    public init(predicate: @escaping (T) -> Bool) 
    
    /// Create a `PredicateSet` that contains everything in the provided `Collection`
    public init<C>(collection: C) where C : Collection, C.Element == ContainedElement
    
    /// Updates `self` by inverting its predicate
    public mutating func invert()
   
    /// Updates `self` by forming a union with `other`
    public mutating func formUnion<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
   
    /// Updates `self` by removing any members that do not appear
    /// in `other`.
    public mutating func formIntersection<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
   
    /// Updates `self` by removing members that appear in `other` and
    /// adding items that appear in `other` that are not in `self`.
    public mutating func formSymmetricDifference<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
   
    /// Updates `self` by removing any members that appear in `other`.
    public mutating func subtract<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
    
}

SetAlgebra will be made to conform to ContainmentSet, and it will lose its direct declarations of union, subtracting, etc, since those will be provided by ContainmentSet.

Since SetAlgebra will be adopting ContainmentSet, this means that Set<Element>, IndexSet, OptionSet, and CharacterSet will all adopt ContainmentSet. No change will be needed to their respective implementations.

Usage in the Standard Library

While ContainmentSet has utility on its own, it is clear that deeper integration with the standard library is warranted.

Since ContainmentSet's core functionality is fundamentally a (Element) -> Bool check, it stands to reason that other instances of (Element) -> Bool checks throughout the library could reasonably be supplemented with analogous ContainmentSet versions.

A survey of the various methods on Sequence, Collection, and MutableCollection indicates the following additions might be appropriate:

extension Sequence {
    func filter<C>(_ isIncluded: C) where C : ContainmentSet, C.ContainedElement == Element
    func prefix<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
    func first<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
    func last<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element // assuming SE-0204 passes
    func drop<C>(while: C) where C : ContainmentSet, C.ContainedElement == Element
}

extension Collection {
    func index<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
    func lastIndex<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element // assuming SE-0204 passes
    func split<C>(maxSplits: Int = default, omittingEmptySubsequences: Bool = default, whereSeparator isSeparator: C) -> [Self.SubSequence] where C : ContainmentSet, C.ContainedElement == Element
}

extension MutableCollection {
    func partition<C>(by belongsInSecondPartition: C) -> Self.Index where C : ContainmentSet, C.ContainedElement == Element
}

Additionally, existing uses of CharacterSet in the standard library can now be expanded to take ContainmentSets instead.

Thus, these methods on StringProtocol:

extension StringProtocol {
    func components(separatedBy separator: CharacterSet) -> [String]
    func addingPercentEncoding(withAllowedCharacters allowedCharacters: CharacterSet) -> String?
    func trimmingCharacters(in set: CharacterSet) -> String
    func rangeOfCharacter(from aSet: CharacterSet, options mask: String.CompareOptions = default, range aRange: Range<Self.Index>? = default) -> Range<Self.Index>?
}

will become:

extension StringProtocol {
    func components<C>(separatedBy separator: C) -> [String] where C : ContainmentSet, C.ContainedElement == Element
    func addingPercentEncoding<C>(withAllowedCharacters allowedCharacters: C) -> String? where C : ContainmentSet, C.ContainedElement == Element
    func trimmingCharacters<C>(in set: C) -> String where C : ContainmentSet, C.ContainedElement == Element
    func rangeOfCharacter<C>(from aSet: C, options mask: String.CompareOptions = default, range aRange: Range<Self.Index>? = default) -> Range<Self.Index>? where C : ContainmentSet, C.ContainedElement == Element
}

Open Questions

Character Classes

CharacterSet has a bunch of nice static properties to define things like alphanumerics, illegal characters, symbols, uppercase letters, etc.

Should PredicateSet provide these same static properties when its contained element is Character?

Operators

ContainmentSet types are naturally usable with operators.

  • Union is expressible as concatenation (+) or logical-or (||)
  • Intersection is expressible using the logical-and operator (&&)
  • Subtraction is expressible using the subtraction operator (-)
  • Symmetric Difference is expressible using the exclusive-or operator (^)
  • Inversion is expressible using the prefix operators for negation (!) and bitwise-not (~)

Should the standard library provide implementations of these operators that work on ContainmentSets?

Expressibility

Under certain circumstances, you may want to create a ContainmentSet using literals:

let everything: PredicateSet<Foo> = true
let someBars: PredicateSet<Bar> = [bar1, bar2, bar3]

Should PredicateSet conform to ExpressibleByBooleanLiteral and ExpressibleByArrayLiteral?

Additionally, you might want shorthand syntax for defining a set based on the characters in a string:

let allowed: PredicateSet<Character> = "0123456789"
let nowAllowed = allowed.inverted

Should PredicateSet conform to ExpressibleByStringLiteral when the contained element is Character?

Source compatibility

While this proposal is strictly additive, it will involve changes to existing standard library methods.

However, since the changes will be to relax the type constraints on the methods (ex: Allowing a ContainmentSet of Unicode.Scalar where previously only a CharacterSet was allowed), there is no source compatibility issue.

Functionality Compatibility

There is a potential for minor functionality changes. To illustrate these, consider the current StringProtocol method:

func trimmingCharacters(in set: CharacterSet) -> String

Since this takes a CharacterSet, it is limited to trimming Unicode.Scalar values off the ends of the receiver. However, if this method is changed to:

func trimmingCharacters<C>(in set: C) -> String where C : ContainmentSet, C.ContainedElement == Element

then it changes to trimming Characters off the ends of the receiver.

Effect on ABI stability

This proposal does not affect ABI stability.

Effect on API resilience

This proposal does not affect ABI resilience.

Alternatives Considered and Rejected

Add this functionality to SetAlgebra directly.

This sounds appealing at first, but SetAlgebra defines methods that are at odds with the non-iterability of ContainmentSet, such as isEmpty or isSuperset(of:).

inverted should be part of the base protocol, and not solely implemented in an extension.

The logical way to do this would have var inverted: Self { get }. However, it's impossible to implement this on Set<Element>, because a Set does not know its entire domain over which to iterate.

Thus, the only rational course is to implement it in a protocol extension, where we can specify an explicit return type of PredicateSet<ContainedElement>.

I did consider adding more associated types to the base protocol, so an adopter of the protocol could define a custom type to return for inversion, unioning, subtraction, etc. However, the utility of this seemed quite minimal, as there are only 4 SetAlgebra types.

Sequence (or Collection) should adopt ContainmentSet.

The fundamental idea of ContainmentSet is to express the notion that you can query a value to see if it contains other values.

From this point of view, it seems natural to make Sequence adopt ContainmentSet. However, it seems unnatural to want to union two arrays. Additionally, if we adopt operators for ContainmentSet operations, then + would become ambiguous: it could mean either concatenation or unioning for two arrays.

Other Names Considered In Lieu of Containment

  • Space
  • Manifold
  • Surface
  • Group
  • Class
  • Domain
  • Container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment