erica/fold.md

## fold.md

      
    Raw
  

              fold.md
            
          
    Reducing a sequence onto its own elements


Proposal: SE-nnnnnnnn
Authors: Erica Sadun, Nate Cook
Review Manager: TBD
Status: Awaiting implementation


Introduction

This proposal introduces sequence folding as an overload to Swift's reduce method. This design simplifies call sites and can eliminate common errors associated with reduce.
Swift-evolution thread: Pitch: Reducing a sequence onto its own elements to simplify code and reduce errors
Motivation

While Swift's reduce method plays an important role in functional coding, many calls to the method can be simplified by tweaking the API. This proposal introduces a folding operator that seeds itself from the elements passed to reduce, removing the need for an explicit value seed.
The Standard Library reduce method combines elements of a sequence by applying a closure:
public func reduce<Result>(
  _ initialResult: Result, 
  _ nextPartialResult: (Result, Self.Element) throws -> Result) rethrows 
  -> Result { ... }
reduce(:_,:_) is used to calculate sums and products, append strings together, search for maxima and minima, and so forth. The call site supplies a seed value (the initialResult) of an arbitrary generic Return type. Each subsequent partial result is built by applying the nextPartialResult closure to the previous result and the next element of the sequence. The reduce method returns after exhausting all elements in the sequence.
The proposed reduce(:_) overload omits the initial seed. It is seeded by the first element of the sequence if one exists. Otherwise it returns nil for empty sequences. Like reduce(:_,:_), it combines sequence elements with a closure:
public func reduce(
    _ nextPartialResult:
    (_ partialResult: Element, _ current: Element) throws -> Element) rethrows 
    -> Element?
The re-architected calls are simpler to read, although they necessarily bring sequence reduction into the Optional space. They eliminate the first argument and form a result from the sequence itself:
let sum: Int = intSequence.reduce(0, +)
let sum: Int? = intSequence.reduce(+)

let product: Int = intSequence.reduce(1, *)
let product: Int? = intSequence.reduce(*)
reduce(_:, _:)'s initial result seed uses a "magic value". It is supplied by the call site, and not the language or the Standard Library. Because of this, it can be technically brittle. As the following sections explore, that initial result can be the source of both semantic and seeding errors.
Semantic Errors

Most functional programmers treat reduce(_:_:) as a monoid-based folding operation. A monoid is an algebraic operation that combines two values to produce another value within the same domain. Each monoid declares an identity that leaves other values unchanged when called with its function.
For example, summing numbers has an identity of 0. Adding 0 to any number returns that number. Similarly, the product identity for numbers is 1, and the and identity for Boolean values is true. When constructing reduce, many Swift developers use the identity to seed the first argument:
let boolean: Bool = booleanSequence.reduce(true, { $0 && $1 })
While Swift's reduce(_:_:) method does not require monoids, it's common to use a monoid to populate reduce's two arguments: the algebraic identity as the initial result and its binary (Element, Element) -> Element function to calculate the next partial result.
The monoid approach ensures that any non-empty sequence can be reduced correctly. The initial value will not affect the operation. Essentially, this best practices approach to reduce(_:,_:) says: "use a first argument that has no effect on the final result." In doing so, it's worth considering eliminating this identity.
When following the monoid patter, any empty sequence returns its identity. The product of no numbers is 1, the the combination of no matrices is the identity matrix, and the greatest common divisor of no 0-based natural numbers is 0, and so forth. Removing that identity means approaching empty sequences in a different manner.
The design of reduce(:_) returns nil in the absense of sequence members rather than an identity. It does this for two reasons:

Some identities are not universal across Swift types, creating a bar to building generic implementations.
Some identities simply cannot be represented in Swift.

Generics and Identities

Swift previously decided that the minimum value of an empty integer array should not be Int.max, even though Int.max is the identity of the min function across Int. We know this because Swift has already adopted an optional approach in the Standard Library for no-element sequences of any Comparable elements, which may not support a min or max property:
/// - Returns: The sequence's minimum element. If the sequence has no
///   elements, returns `nil`.
public func min() -> Self.Element? // ...
Since extreme values are not guaranteed to be present for every Comparable type, the design of the min() and max() methods cannot produce an identity for every sequence fed to them. In the absense of identities, they return nil. Adding a requirement to support an extreme-reporting protocol (for example, Comparable & ExtremeReporting) would narrow the number of types serviced by these features to the detriment of the language.
reduce(:_) follows the no-identity pattern to provide a consistent alternative to reduce(:_:_) when processing empty sequences of any type. It always returns nil, indicating a missing value, instead of the default seed.
let minimumValue: Int = [99, 2, -55, 6]
  .reduce(.max) { $0 > $1 ? $1 : $0 } // -55

let minimumValue: Int? = [99, 2, -55, 6]
  .reduce({ $0 > $1 ? $1 : $0 }) // -55
  
let minimumValue: Int = []
  .reduce(.max) { $0 > $1 ? $1 : $0 } // Specific to `Int`

let minimumValue: T? = emptySequenceOfT
  .reduce({ $0 > $1 ? $1 : $0 }) // nil, regardless of `T`
Unrepresentable Identities

In some cases, it's simply not reasonable to represent an identity in Swift. For example, the intersection identity for sets is the complete set. Consider the following code. It returns a set of strings common to all the sets passed to it. No identity can be used here because a canonical set of all possible strings cannot be constructed within the Swift language. In this case, a solution can be modeled with reduce(_:) but not reduce(_:_:):
// This returns a set of common strings
// e.g., let stringSetSequence: [Set<String>] = 
//   [["now", "is", "the", "time"], ["today", "is", "the", "day"]]
// returns {"is", "the"}

// Not constructable with `reduce(_:_:)`
let result = stringSetSequence
    .reduce(WHAT_GOES_HERE?, { $0.intersection($1) })

// Constructable with `reduce(_:)`
let result = stringSetSequence
    .reduce({ $0.intersection($1) })
Absent an identity that can be specified within the language, there is no other option than to return nil, representing a missing value, which is what the reduce(_:) design does.
Seeding Errors

The first argument of result(_:_:) is prone to error. These initial result errors may be simple call-site typos. The coder may know the correct identity but misstate it in code. For example, they may replace one well-known identity with another, as is common when forming a product, or they may simply type the right identity incorrectly.
These scenarios are easily remedied with adequate tests. For example, [value].reduce(identity, f) should always equal value for values across the domain. Still, these errors are better avoided than fixed.
In other cases, a coder may populate the first argument with the wrong value, not knowing the right one. This is less easily resolved as the person writing the code may write incorrect or insufficient tests as a result of confirmation bias.
Using reduce(:_) eliminates both types of errors because there is no need to supply and validate an identity element in code.
Call-Site Typos

Magic values rely on proper recall and text entry and are a point of coding fragility. This fragility extends to well-known algebraic identities like 0 for addition and 1 for multiplication. In using reduce(:_,:_), a simple brain freeze may introduce errors when seeding the first argument, as shown in the example below. Converting from reduce(:_,:_) to reduce(:_) eliminates this class of errors from the call site:
// common typo
let product: Int = intSequence.reduce(0, *)

// always correct, no magic value
let product: Int? = intSequence.reduce(*)

//  another common typo
let minimumValue: Int = intSequence
  .reduce(.min) { $0 > $1 ? $1 : $0 }
  
// again correct, no constant substitution
let minimumValue: Int? = intSequence
  .reduce({ $0 > $1 ? $1 : $0 })
Incorrect Identities

Eliminating identities is an important benefit when using less common seeds. For example, you can build minimum bounding frames for rectangles using both reduce(:_,:_) and reduce(:_) but specifying the wrong reduce(:_,:_) seed introduces the following value error:
let frames = [
  CGRect(x: 10, y: 10, width: 50, height: 50),
  CGRect(x: 40, y: 80, width: 20, height: 30),
  CGRect(x: -5, y: 10, width: 10, height: 10),
]

frames.reduce(CGRect.zero, { $0.union($1) })
// (x: -5.0, y: 0.0, width: 65.0, height: 110.0), incorrect

frames.reduce(CGRect.null, { $0.union($1) })
// (x: -5.0, y: 10.0, width: 65.0, height: 100.0), correct

frames.reduce({ $0.union($1) })
// (x: -5.0, y: 10.0, width: 65.0, height: 100.0), correct
When the coder selects the more common .zero constant over the less well known .null, the .zero seed pulls the y bounds to 0.0, leaving ten extra points of space preceding the minimum frame. While CGRect.null returns the right results, reduce(:_) uses a simpler approach that cannot be affected by incorrectly chosen identities.
If the developer uses insufficient test cases that wrap the origin, they may not discover the error. Using reduce(:_) gets rid of these errors and removes any resposibility for representing the identity from code.
Detailed design

This design introduces an unseeded variation of reduce(:_,:_). This overload uses the first value of each sequence to form its initial partial result. If the sequence is empty, it returns nil.
While reduce uses a partial result generator with a potentially distinct return type f(Result, Element) -> Result, reduce(:_) constrains its results to the same type as the source element f(Element, Element) -> Element. This change is required as the seed value will always be the sequence element type should a first value exist and nil otherwise.
Although applications of reduce may return a type distinct from the sequence elements, this can often be broken down into (Element) -> Result and (Result, Result) -> Result steps. For example, let stringSum = stringSequence.reduce(0, { $0 + $1.count }) is essentially the same as stringSequence.map({ $0.count }).reduce(0, +).
The design of reduce(:_) uses an iterator to distinguish empty sequences from those with values.
Preliminary Implementation

import Foundation

extension Sequence {
  /// Combines the elements of the sequence using a closure, returning the
  /// result (or nil, for a no-element sequence).
  ///
  /// Use the `reduce(_:)` method to produce a combined value from a
  /// sequence. For example, you can return the sum or product of a
  /// sequence's elements or its minimum or maximum value.
  /// Each `nextPartialResult` closure is called sequentially, accumulating
  /// the value initialized to the first element of the sequence.
  /// This example shows how to find the sum of an array of numbers.
  ///
  ///     let numbers = [1, 2, 3, 4]
  ///     let numberSum = numbers.reduce({ x, y in
  ///         x + y
  ///     })
  ///     // numberSum == 10
  ///
  /// Alternatively:
  ///
  ///     let numberSum = numbers.reduce(+) // 10
  ///     let numberProduct = numbers.reduce(*) // 24
  ///
  /// When `numbers.reduce(_:)` is called, the following steps occur:
  ///
  /// 1. The partial result is initialized from the first sequence member,
  ///    returning nil for an empty sequence. The first number is 1.
  /// 2. The closure is called repeatedly with the current partial result and
  ///    each successive member of the sequence
  /// 3. When the sequence is exhausted, the method returns the last value
  ///    returned from the closure.
  ///
  /// If the sequence has no elements, `reduce` returns `nil`.
  ///
  /// If the sequence has one element, `reduce` returns that element.
  ///
  /// For example, `reduce` can combine elements to calculate the minimum
  /// bounds fitting a set of rectangles defined by an array of `CGRect`
  ///
  ///     let frames = [
  ///       CGRect(x: 10, y: 10, width: 50, height: 50),
  ///       CGRect(x: 40, y: 80, width: 20, height: 30),
  ///       CGRect(x: -5, y: 10, width: 10, height: 10),
  ///     ]
  ///
  ///     frames.reduce({ $0.union($1) })
  ///     // (x: -5.0, y: 10.0, width: 65.0, height: 100.0)

  ///
  /// - Parameters:
  ///   - nextPartialResult: A closure that combines an accumulating value and
  ///     an element of the sequence into a new accumulating value, to be used
  ///     in the next call of the `nextPartialResult` closure or returned to
  ///     the caller.
  ///   - partialResult: The accumulated value of the sequence, initialized as the 
  ///     first sequence member
  ///   - current: The next element of the sequence to combine into the partial result
  /// - Returns: The final accumulated value or if the sequence has
  ///   no elements, returns `nil`.
  ///
  /// - Complexity: O(*n*), where *n* is the length of the sequence.
  @inlineable
  public func reduce(
    _ nextPartialResult:
    (_ partialResult: Element, _ current: Element) throws -> Element) rethrows 
    -> Element? {
    var iterator = makeIterator()
    guard var accumulator = iterator.next() else {
      return nil
    }
    while let element = iterator.next() {
      accumulator = try nextPartialResult(accumulator, element)
    }
    return accumulator
  }
}
Precedent

This design follows the Standard Library precedent for sequence extremes (min, max) by returning nil when a sequence has no elements. This allows seedless calls that apply across many types without having to declare identities for each type and each operation.
Naming

This design overloads reduce to accept a single closure argument. Here is a quick overview of alternate naming options.
Generally speaking, a fold or reduce higher order function processes a data structure to build a return value. An unfold seeds a start value to generate a data structure.
Many languages include up to four styles of reduction: left to right with an initial value, right to left with an initial value, left to right without an initial value, and right to left without an initial value. Swift currently implements just one of these, the left-to-right reduce(:_,:_), which takes an initial value.
The name fold is a term of art, commonly interchanged with reduce in various languages.  Other terms include accumulate, aggregate, compress, and inject. The names are used somewhat interchangably among languages, with a slight tendency towards fold for using an initial value and reduce without.
If the reduce feature were being designed today, this proposal would recommend fold over reduce, and prefer overloading fold for both applications. As reduce already exists, it overloads the existing method.


Language
Fold with Initial Value
Fold without Initial Value


C# 3.0
Aggregate
Aggregate


C++
accumulate


CFML, Clojure, CLisp, D, Java 8+, Perl, Python
reduce
reduce


Elm, Erlang, Standard ML
foldl, foldr


F#
fold, foldBack
reduce, reduceBack


Gosu
fold, reduce


Groovy
inject
inject


Haxe, Rust
fold


JavaScript
reduce, reduceRight
reduce, reduceRight


Kotlin
fold, foldRight
reduce, reduceRight


Logtalk, OCaml
fold_left, fold_right


Oz
FoldL, FoldR


PHP
array_reduce
array_reduce


R
Reduce
Reduce


Ruby
inject, reduce
inject, reduce


Scala
foldLeft, foldRight
reduceLeft, reduceRight


Scheme
fold-left, fold-right
reduce-left, reduce-right


Haskell
foldl, fold
foldl1, foldr1


Xtend
fold
reduce


Source compatibility

This change is purely additive.
Effect on ABI stability

N/A
Effect on API resilience

N/A
Alternatives considered


The forum thread discussed introducing monoids either as a protocol or type, for example as shown here and shown here, allowing calls to sequenceOfInt.fold(Add.self) or ["a", "bc"].reduce(String.join) or frames.reduce(CGRect.union). This may be an avenue worth exploring in the future but its scope lies outside this proposal.


Stephen Celis had a really fun approach for combining keypaths with reduce.


Acknowledgements

Thanks Soroush Khanlou, Tim Vermeulen, Lily Vulcano, Davide De Franceschi, Stephen Cellis, Matthew Johnson, Nevin, Brandon Williams, Tellow Krinkle, David Hart, Peter Tomaselli, Ben Cohen, Lantua, Stephen Cellis, and everyone else who offered their feedback, functional programming experience, and design insights.
Language	Fold with Initial Value	Fold without Initial Value
C# 3.0	Aggregate	Aggregate
C++	accumulate
CFML, Clojure, CLisp, D, Java 8+, Perl, Python	reduce	reduce
Elm, Erlang, Standard ML	foldl, foldr
F#	fold, foldBack	reduce, reduceBack
Gosu	fold, reduce
Groovy	inject	inject
Haxe, Rust	fold
JavaScript	reduce, reduceRight	reduce, reduceRight
Kotlin	fold, foldRight	reduce, reduceRight
Logtalk, OCaml	fold_left, fold_right
Oz	FoldL, FoldR
PHP	array_reduce	array_reduce
R	Reduce	Reduce
Ruby	inject, reduce	inject, reduce
Scala	foldLeft, foldRight	reduceLeft, reduceRight
Scheme	fold-left, fold-right	reduce-left, reduce-right
Haskell	foldl, fold	foldl1, foldr1
Xtend	fold	reduce