Skip to content

Instantly share code, notes, and snippets.

@CTMacUser
Created March 26, 2019 03:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CTMacUser/014a9faa6116ccfe9c09aa07e4ef9329 to your computer and use it in GitHub Desktop.
Save CTMacUser/014a9faa6116ccfe9c09aa07e4ef9329 to your computer and use it in GitHub Desktop.
An improvement to the Swift library cancelled due to ABI stability.

Multiple-Pass Sequences and Cycle Detection

During the review process, add the following fields as needed:

Introduction

This proposal introduces two new protocols to the standard library, RepeatableSequence and PersistentSequence. They are placed between Sequence and Collection in the collection hierarchy, to mark that a sequence type supports multiple passes.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

The documentation for Sequence includes the following "Repeated Access" section:

The Sequence protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration. As a consequence, don't assume that multiple for-in loops on a sequence will either resume iteration or restart from the beginning:

   for element in sequence {
       if ... some condition { break }
   }

   for element in sequence {
       // No defined behavior
   }

In this case, you cannot assume either that a sequence will be consumable and will resume iteration, or that a sequence is a collection and will restart iteration from the first element. A conforming sequence that is not a collection is allowed to produce an arbitrary sequence of elements in the second for-in loop.

To establish that a type you've created supports nondestructive iteration, add conformance to the Collection protocol.

With the documentation for IteratorProtocol having a similar "Using Multiple Iterators" section:

Whenever you use multiple iterators (or for-in loops) over a single sequence, be sure you know that the specific sequence supports repeated iteration, either because you know its concrete type or because the sequence is also constrained to the Collection protocol.

Obtain each separate iterator from separate calls to the sequence's makeIterator() method rather than by copying. Copying an iterator is safe, but advancing one copy of an iterator by calling its next() method may invalidate other copies of that iterator. for-in loops are safe in this regard.

These notes suggest room for a protocol between Sequence and Collection, one for sequences that shouldn't support indexing or other collection semantics but can still support multiple iteration passes. Using a protocol instead of "knowing the concrete type" allows the creation of algorithms that can safely assume multi-pass semantics on its sequence arguments.

Proposed solution

The size of the solution depends on specifying an unanswered policy on sequences.

Is a sequence allowed to model an unstable underlying sequence? These unstable sequences publish a different string of elements each iteration cycle, even though the sequence's source of truth isn't considered mutated in between calls. This can happen when iterators access an unstable source of truth, like a random number generator, or faked when a Sequence-conforming type is a class type and therefore suspends enforcement of immutability during makeIterator() calls.

If the answer is "yes," then two intermediate protocols are needed, one for allowing multiple iteration passes, and one for flagging that the underlying sequence doesn't change its string of returned elements between iterations. If the answer is "no," then only the later intermediate protocol is needed. Neither protocol adds required members, because they differ from Sequence in restrictions in state sharing between Iterator instances or additionally their underlying sequence's stability.

As a demonstration, methods to detect cycles in (infinite) sequences shall be added. They eagerly find cycles in the data, so if a sequence of values has mini-cycles in its non-repeating prefix and/or the actual loop, they may be fooled.

Detailed design

Add the following protocols:

/// A sequence that supports nondestructive iteration.
///
/// Repeatable sequences can be used in multiple `for`-`in` loops, where each
/// loop iterates from the beginning.  And they can receive multiple methods
/// that iterate through their elements.
///
/// Conforming to the RepeatableSequence Protocol
/// =============================================
///
/// The `RepeatableSequence` protocol adds a constraint on the conforming type
/// and its `Iterator` type, but otherwise imposes no additional requirements
/// over the `Sequence` protocol.  Any return from `makeIterator()`, or one of
/// its copies, must maintain independent iteration state from any separate
/// return (or their copies) from that method.
public protocol RepeatableSequence: Sequence {}

/// A sequence that supports repeated iteration over the same elements.
///
/// A persistent sequence has to produce the same string of returned elements,
/// element count and equivalence for corresponding elements, per call to
/// iteration, as long as the sequence doesn't receive a method that mutates
/// its underlying sequence.  In other words, the string of elements
/// represented by the underlying sequence must be stable.  This enables
/// algorithms that use iteration multiple times and exploit synchronization
/// between the runs.
///
/// Conforming to the PersistentSequence Protocol
/// =============================================
///
/// The `PersistentSequence` protocol adds a constraint on the conforming
/// type, but otherwise imposes no additional requirements over the
/// `RepeatableSequence` protocol.  Between calls to methods that mutate the
/// sequence's underlying string of elements, all returns of `makeIterator()`
/// must generate the same string of elements.
public protocol PersistentSequence: RepeatableSequence {}

For the following types, change their primary conformance from Sequence to PersistentSequence:

  • StrideTo
  • StrideThrough

For the following protocols, change their primary refinement from Sequence to PersistentSequence:

  • Collection

(The way LazySequenceProtocol uses its elements property isn't specified enough to enforce the property allowing multiple passes, so this protocol can neither fully nor conditionally refine RepeatableSequence or PersistentSequence. Conforming types can individually evaluate whether to add conformance for multiple passes, possibly conditionally.)

For the following types, set them to conditionally conform to RepeatableSequence when their wrapped sequences do, and to PersistentSequence too when their wrapped sequences do too.

  • DropFirstSequence
  • DropWhileSequence
  • EnumeratedSequence
  • FlattenSequence
  • JoinedSequence
  • LazyDropWhileSequence
  • LazyFilterSequence
  • LazyMapSequence
  • LazyPrefixWhileSequence
  • LazySequence
  • PrefixSequence
  • UnfoldSequence
  • Zip2Sequence

(Some of these types currently need non-trivial changes to support multiple passes. For FlattenSequence and JoinedSequence, both the wrapped sequence and its element type need to conform. For Zip2Sequence, both component sequences need to conform. IteratorSequence can never conform, since there's no way to make a disconnected copy of the internal iterator.)

Add the following existential sequence types:

/// A type-erased repeatable sequence.
///
/// An instance of `AnyRepeatableSequence` forwards its operations to an
/// underlying base sequence having the same `Element` type, hiding the
/// specifics of the underlying sequence.
public struct AnyRepeatableSequence<Element>: RepeatableSequence {
    public init<S: RepeatableSequence>(_ base: S) where S.Element == Element

    public func makeIterator() -> AnyIterator<Element>
    public var underestimatedCount: Int
}

/// A type-erased persistent sequence.
///
/// An instance of `AnyPersistentSequence` forwards its operations to an
/// underlying base sequence having the same `Element` type, hiding the
/// specifics of the underlying sequence.
public struct AnyPersistentSequence<Element>: PersistentSequence {
    public init<S: PersistentSequence>(_ base: S) where S.Element == Element

    public func makeIterator() -> AnyIterator<Element>
    public var underestimatedCount: Int
}

(Since the new protocols have no new required interface, the internals of the existential wrappers could reuse the implementation of AnySequence.)

Add the following methods:

extension PersistentSequence {

    /// Find the period and initial delay for this sequence, if possible,
    /// using the given predicate as the equivalence test within the given
	/// attempt limit.
    ///
    /// The algorithm may be fooled if either the preface and/or the cyclical
    /// part have mini-cycles.  The algorithm may stop after the minimal
    /// length to find an answer, even if either the actual delay and period
	/// are larger or the sequence isn't truly infinite or periodic.
    ///
    /// The predicate must be an *equivalence relation* over the elements.
    /// That is, for any elements `a`, `b`, and `c`, the following conditions
    /// must hold:
    ///
    /// - `areEquivalent(a, a)` is always `true`.  (Reflexivity)
    /// - `areEquivalent(a, b)` implies `areEquivalent(b, a)`.  (Symmetry)
    /// - If `areEquivalent(a, b)` and `areEquivalent(b, c)` are both `true`,
    ///   then `areEquivalent(a, c)` is also `true`.  (Transitivity)
    ///
    /// - Precondition: `maxAttempts > 0`.
    ///
    /// - Parameter maxAttempts: The maximum number of times the underlying
    ///   sequence will be read before giving up.  If not given, defaults to
    ///   `Int.max`.
    /// - Parameter areEquivalent: A predicate that returns `true` if its two
    ///   arguments are equivalent; otherwise, `false`.
    ///
    /// - Returns: A tuple of the cycle period and count of elements delaying
    ///   the start of the first cycle.
    ///
    /// - Complexity: O(*λ* + *μ*), where *λ* is the cycle length and *μ* is
    ///   the initial delay.
    func detectCycle(maxAttempts: Int = Int.max, by areEquivalent: (Element,
			 Element) throws -> Bool) rethrows -> (delay: Int, period: Int)?

}

extension PersistentSequence where Element: Equatable {

    /// Find the period and initial delay for this sequence, if possible,
    /// within the given attempt limit.
    ///
    /// The algorithm may be fooled if either the preface and/or the cyclical
    /// part have mini-cycles.  The algorithm may stop after the minimal
    /// length to find an answer, even if either the actual delay and period
	/// are larger or the sequence isn't truly infinite or periodic.
    ///
    /// - Precondition: `maxAttempts > 0`.
    ///
    /// - Parameter maxAttempts: The maximum number of times the underlying
    ///   sequence will be read before giving up.  If omitted, defaults to
    ///   `Int.max`.
    ///
    /// - Returns: A tuple of the cycle period and count of elements delaying
    ///   the start of the first cycle.
    ///
    /// - Complexity: O(*λ* + *μ*), where *λ* is the cycle length and *μ* is
    ///   the initial delay.
    func detectCycle(maxAttempts: Int = Int.max) -> (delay: Int, period: Int)?
		{ return detectCycle(maxAttempts: maxAttempts, by: ==) }

}

Looking at Wikipedia's Cycle detection page, detectCycle(maxAttempts: by:) could be implemented by algorithms such as Floyd's cycle-finding algorithm or Brent's algorithm.

Source compatibility

Most changes are additive. Some of the existing sequence types dual-conform to IteratorProtocol, which would have to be removed in order for a type to conform to PersistentSequence.

Effect on ABI stability

The hierarchy for Sequence and Collection is locked because of ABI stability, so proposals involving intermediate protocols are stalled until a new version of the ABI is authorized, including a transition plan.

Effect on API resilience

All changes are additive, modulo concerns in the ABI-stability section above.

Alternatives considered

The main alternative is to do nothing. The new protocols come from filling in the hierarchy than direct need. But their concepts behind them are already implied in the documentation, but this proposal gives them force to be recognized by the compiler instead of by convention, helping infinite sequences and other sequences otherwise not compatible with indexing (or other parts of Collection) support.

Due to the ABI freeze, an alternative is to introduce the concepts here in a parallel protocol hierarchy that users have to additionally conform their types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment