- Proposal: SE-TBD
- Author(s): Dave DeLong
- Review manager: TBD
- Status: TBD
A new public protocol named ContainmentSet
unifies set semantics.
This proposal was discussed on-forum in the Pitch: ContainmentSet thread.
A major source of "why doesn't this just work?" frustration rises from the turbulent edge of Cocoa and Swift. Consider the disparity between CharacterSet
and Set<Character>
. The former, bridged from Foundation, is an algorithmic set describing Unicode.Scalar
values. The latter is an actual Set
of extended grapheme clusters.
Being a struct, Set<Element>
cannot perform simple set-like operations. It cannot be invert
ed like a CharacterSet
. This is a common operation to perform when parsing a string, for example. Given a CharacterSet
that describes allowed characters, the inversion describes disallowed characters.
let newlines = CharacterSet.newlines
let charactersThatAreNotNewlines = newlines.inverted
Set<Element>
is a Collection
. Inverting it is impossible. Set<Character>
cannot know the entire domain of Character
values over which it should iterate.
Inverting CharacterSet
over a Swift type is also impractical. It is limited to Unicode.Scalar
values and cannot, for example, contain emoji. Most emoji are composed of multiple Unicode.Scalar
values.
Swift needs a way to describe an abstract set, which can test containment but does not promise iteration support.
public protocol ContainmentSet {
associatedtype ContainedElement
/// Returns a truth value indicating whether an element is a
/// member of a conforming instance.
/// Where possible, this should be O(1).
func contains(_ element: ContainedElement) -> Bool
/// Returns a union of two instances of the conforming type
/// This is a logical-or operation.
func union(_ other: Self) -> Self
/// Returns the intersection (possibly empty) of two instances
/// of the conforming type.
/// This is a logical-and operation.
func intersection(_ other: Self) -> Self
/// Returns `self`, removing any items that appear in the union of
/// `self` and `other`.
func subtracting(_ other: Self) -> Self
/// Returns the union of items unique to `self` and `other`.
/// This is a logical exclusive-or operation.
func symmetricDifference(_ other: Self) -> Self
}
ContainmentSet
will also have a protocol extension that defines:
public extension ContainmentSet {
/// Returns a `ContainmentSet` over the associated type domain,
/// with the current elements removed.
/// This is a logical-not operation.
var inverted: PredicateSet<ContainedElement> { get }
func union<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
func intersection<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
func symmetricDifference<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
func subtracting<C: ContainmentSet>(_ other: C) -> PredicateSet<ContainedElement> where C.ContainedElement == ContainedElement
}
The Standard Library will also add a new type, called PredicateSet
, which adopts ContainmentSet
:
public struct PredicateSet<T>: ContainmentSet {
public typealias ContainedElement = T
/// Initializes a `PredicateSet` with another `ContainmentSet`,
/// effectively making a type-erased version of the other `ContainmentSet`
public init<C>(_ set: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
/// Create a `PredicateSet` containing all elements that pass the predicate
public init(predicate: @escaping (T) -> Bool)
/// Create a `PredicateSet` that contains everything in the provided `Collection`
public init<C>(collection: C) where C : Collection, C.Element == ContainedElement
/// Updates `self` by inverting its predicate
public mutating func invert()
/// Updates `self` by forming a union with `other`
public mutating func formUnion<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
/// Updates `self` by removing any members that do not appear
/// in `other`.
public mutating func formIntersection<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
/// Updates `self` by removing members that appear in `other` and
/// adding items that appear in `other` that are not in `self`.
public mutating func formSymmetricDifference<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
/// Updates `self` by removing any members that appear in `other`.
public mutating func subtract<C>(_ other: C) where C : ContainmentSet, ContainedElement == C.ContainedElement
}
SetAlgebra
will be made to conform to ContainmentSet
, and it will lose its direct declarations of union
, subtracting
, etc, since those will be provided by ContainmentSet
.
Since SetAlgebra
will be adopting ContainmentSet
, this means that Set<Element>
, IndexSet
, OptionSet
, and CharacterSet
will all adopt ContainmentSet
. No change will be needed to their respective implementations.
While ContainmentSet
has utility on its own, it is clear that deeper integration with the standard library is warranted.
Since ContainmentSet
's core functionality is fundamentally a (Element) -> Bool
check, it stands to reason that other instances of (Element) -> Bool
checks throughout the library could reasonably be supplemented with analogous ContainmentSet
versions.
A survey of the various methods on Sequence
, Collection
, and MutableCollection
indicates the following additions might be appropriate:
extension Sequence {
func filter<C>(_ isIncluded: C) where C : ContainmentSet, C.ContainedElement == Element
func prefix<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
func first<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
func last<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element // assuming SE-0204 passes
func drop<C>(while: C) where C : ContainmentSet, C.ContainedElement == Element
}
extension Collection {
func index<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element
func lastIndex<C>(in: C) where C : ContainmentSet, C.ContainedElement == Element // assuming SE-0204 passes
func split<C>(maxSplits: Int = default, omittingEmptySubsequences: Bool = default, whereSeparator isSeparator: C) -> [Self.SubSequence] where C : ContainmentSet, C.ContainedElement == Element
}
extension MutableCollection {
func partition<C>(by belongsInSecondPartition: C) -> Self.Index where C : ContainmentSet, C.ContainedElement == Element
}
Additionally, existing uses of CharacterSet
in the standard library can now be expanded to take ContainmentSets
instead.
Thus, these methods on StringProtocol
:
extension StringProtocol {
func components(separatedBy separator: CharacterSet) -> [String]
func addingPercentEncoding(withAllowedCharacters allowedCharacters: CharacterSet) -> String?
func trimmingCharacters(in set: CharacterSet) -> String
func rangeOfCharacter(from aSet: CharacterSet, options mask: String.CompareOptions = default, range aRange: Range<Self.Index>? = default) -> Range<Self.Index>?
}
will become:
extension StringProtocol {
func components<C>(separatedBy separator: C) -> [String] where C : ContainmentSet, C.ContainedElement == Element
func addingPercentEncoding<C>(withAllowedCharacters allowedCharacters: C) -> String? where C : ContainmentSet, C.ContainedElement == Element
func trimmingCharacters<C>(in set: C) -> String where C : ContainmentSet, C.ContainedElement == Element
func rangeOfCharacter<C>(from aSet: C, options mask: String.CompareOptions = default, range aRange: Range<Self.Index>? = default) -> Range<Self.Index>? where C : ContainmentSet, C.ContainedElement == Element
}
CharacterSet
has a bunch of nice static properties to define things like alphanumerics, illegal characters, symbols, uppercase letters, etc.
Should PredicateSet
provide these same static properties when its contained element is Character
?
ContainmentSet
types are naturally usable with operators.
- Union is expressible as concatenation (
+
) or logical-or (||
) - Intersection is expressible using the logical-and operator (
&&
) - Subtraction is expressible using the subtraction operator (
-
) - Symmetric Difference is expressible using the exclusive-or operator (
^
) - Inversion is expressible using the prefix operators for negation (
!
) and bitwise-not (~
)
Should the standard library provide implementations of these operators that work on ContainmentSets
?
Under certain circumstances, you may want to create a ContainmentSet
using literals:
let everything: PredicateSet<Foo> = true
let someBars: PredicateSet<Bar> = [bar1, bar2, bar3]
Should PredicateSet
conform to ExpressibleByBooleanLiteral
and ExpressibleByArrayLiteral
?
Additionally, you might want shorthand syntax for defining a set based on the characters in a string:
let allowed: PredicateSet<Character> = "0123456789"
let nowAllowed = allowed.inverted
Should PredicateSet
conform to ExpressibleByStringLiteral
when the contained element is Character
?
While this proposal is strictly additive, it will involve changes to existing standard library methods.
However, since the changes will be to relax the type constraints on the methods (ex: Allowing a ContainmentSet
of Unicode.Scalar
where previously only a CharacterSet
was allowed), there is no source compatibility issue.
There is a potential for minor functionality changes. To illustrate these, consider the current StringProtocol
method:
func trimmingCharacters(in set: CharacterSet) -> String
Since this takes a CharacterSet
, it is limited to trimming Unicode.Scalar
values off the ends of the receiver. However, if this method is changed to:
func trimmingCharacters<C>(in set: C) -> String where C : ContainmentSet, C.ContainedElement == Element
then it changes to trimming Characters
off the ends of the receiver.
This proposal does not affect ABI stability.
This proposal does not affect ABI resilience.
This sounds appealing at first, but SetAlgebra
defines methods that are at odds with the non-iterability of ContainmentSet
, such as isEmpty
or isSuperset(of:)
.
The logical way to do this would have var inverted: Self { get }
. However, it's impossible to implement this on Set<Element>
, because a Set
does not know its entire domain over which to iterate.
Thus, the only rational course is to implement it in a protocol extension, where we can specify an explicit return type of PredicateSet<ContainedElement>
.
I did consider adding more associated types to the base protocol, so an adopter of the protocol could define a custom type to return for inversion, unioning, subtraction, etc. However, the utility of this seemed quite minimal, as there are only 4 SetAlgebra
types.
The fundamental idea of ContainmentSet
is to express the notion that you can query a value to see if it contains other values.
From this point of view, it seems natural to make Sequence
adopt ContainmentSet
. However, it seems unnatural to want to union
two arrays. Additionally, if we adopt operators for ContainmentSet
operations, then +
would become ambiguous: it could mean either concatenation or unioning for two arrays.
- Space
- Manifold
- Surface
- Group
- Class
- Domain
- Container