Skip to content

Instantly share code, notes, and snippets.

@dlbuckley
Last active February 27, 2019 16:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dlbuckley/c15a60aa27ade5dea4cd71b5cbef6d15 to your computer and use it in GitHub Desktop.
Save dlbuckley/c15a60aa27ade5dea4cd71b5cbef6d15 to your computer and use it in GitHub Desktop.
Remove duplicate elements from a Collection

Remove duplicate elements from a Collection

Introduction

This proposal introduces methods to remove duplicate elements from a collection to the standard library whilst preserving the order of elements.

Swift-evolution thread: [Pitch] Remove duplicate elements from a collection

Motivation

A common operation whilst working with collections of data is to remove duplicates (elements that are equatable) and leave only one copy of an element within the collection. Typically this can be done by using a Set. Unfortunately using this method results in the order of the elements being lost.

let originalElements = [1, 2, 3, 4, 5, 1, 4]
let uniqueElements = Set(originalElements)

// uniqueElements == [4, 2, 1, 3, 5]
// Duplicate elements have been removed but the elements are now out of order and the oder will 
// change after each run.

Currently there is no OrderedSet type in Swift which could solve this issue. Instead, if you wish to remove duplicates from a list of elements whilst also preserving their order you have to write a custom solution.

Proposed solution

We propose to introduce new methods on RangeReplaceableCollection to solve this issue with both mutating and nonmutating variations. Other languages use the naming distinct() and distinct(by:) for this operation, but in the case of Swift we want to use a verb to keep in line with the API design guidelines. With that in mind we will be using removeDuplicates() and removingDuplicates() which are mutating and nonmutating respectively.

By using a name that starts with remove the method will be more easily discoverable through method suggestions and easier to understand for non-english speakers.

removeDuplicates()

removeDuplicates() is a new method on RangeReplaceableCollection which is used to mutate a collection to remove duplicates whilst preserving their order. Order is preserved by keeping the first instance of an element and removing any subsequent elements that are equatable to any of the preceding elements.

let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
numbers.removeDuplicates()

// numbers == [1, 2, 3, 4, 5]
// Duplicate elements have been removed and the order has been preserved.

removingDuplicates()

removingDuplicates() is similar to removeDuplicates() but does not mutate the collection in place. Instead it will return a new collection that has the duplicates removed.

let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
let uniqueNumbers = numbers.removingDuplicates()

// numbers == [1, 2, 3, 4, 5, 4, 1, 2, 6]
// uniqueNumbers == [1, 2, 3, 4, 5, 6]
// A new array is returned with the duplicate elements removed and the order preserved.

removeDuplicates(by:)

The removeDuplicates(by:) method is a generic method that can be used for more fine grained control over how duplications are reasoned about. A good use case of this is a collection that doesn't use the equality of the entire object to reason about duplication, but some other property of that object instead.

struct Animal {
    let kind: String
    let name: String
}

let animals = [
    Animal(kind: "Dog", name: "Fido"),
    Animal(kind: "Cat", name: "Tibbles"),
    Animal(kind: "Lion", name: "Simba"),
    Animal(kind: "Dog", name: "Spot"),
]

animals.removeDuplicates { $0.kind }

// [
//   Animal(kind: "Dog", name: "Fido"), 
//   Animal(kind: "Cat", name: "Tibbles"), 
//   Animal(kind: "Lion", name: "Simba")
// ]
// Duplicate elements have been removed based on the `kind` property and the order has been preserved.

removingDuplicates(by:)

The removingDuplicates(by:) method is is similar to removeDuplicates(by:) but does not mutate the collection in place. Instead it will return a new collection that has the duplicates removed.

struct Animal {
    let kind: String
    let name: String
}

let animals = [
    Animal(kind: "Dog", name: "Fido"),
    Animal(kind: "Cat", name: "Tibbles"),
    Animal(kind: "Lion", name: "Simba"),
    Animal(kind: "Dog", name: "Spot"),
]

let uniqueAnimals = animals.removingDuplicates(by: { $0.kind })

// animals == [
//  Animal(kind: "Dog", name: "Fido"),
//  Animal(kind: "Cat", name: "Tibbles"),
//  Animal(kind: "Lion", name: "Simba"),
//  Animal(kind: "Dog", name: "Spot"),
// ]
// uniqueAnimals == [
//   Animal(kind: "Dog", name: "Fido"), 
//   Animal(kind: "Cat", name: "Tibbles"), 
//   Animal(kind: "Lion", name: "Simba")
// ]
// A new array is returned with the duplicate elements removed based on the `kind` property and the order preserved.

Detailed design

Due to the nature of the operation, specialised implementations will be provided to provide the most performant solution based on confrormance to Hashable or to Equatable.

extension RangeReplaceableCollection where Self: MutableCollection {

    /// Removes any duplicate elements where the equality is based on
    /// a value other than the element itself.
    ///
    /// Use this method to remove duplicate elements where their equality
    /// is based on the value returned by the closure. The order of the remaining
    /// elements is preserved.
    ///
    /// This example removes objects based on the equality of one of the
    /// properties of the object.
    ///
    ///     struct Animal {
    ///         let kind: String
    ///         let name: String
    ///     }
    ///
    ///     let animals = [
    ///         Animal(kind: "Dog", name: "Fido"),
    ///         Animal(kind: "Cat", name: "Tibbles"),
    ///         Animal(kind: "Lion", name: "Simba"),
    ///         Animal(kind: "Dog", name: "Spot"),
    ///     ]
    ///
    ///     animals.removeDuplicates(by: { $0.kind })
    ///
    ///     // animals == [
    ///     //   Animal(kind: "Dog", name: "Fido"),
    ///     //   Animal(kind: "Cat", name: "Tibbles"),
    ///     //   Animal(kind: "Lion", name: "Simba")
    ///     // ]
    ///
    /// - Parameter distinctValue: A closure that takes an element of the
    ///   sequence as its argument and returns a value that conforms to
    ///   `Hashable` which will in turn be used to compare if an object
    ///   is a duplicate.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    mutating func removeDuplicates<T: Hashable>(by distinctValue: (Element) -> T)

    /// Removes any duplicate elements where the equality is based on
    /// a value other than the element itself.
    ///
    /// Use this method to remove duplicate elements where their equality
    /// is based on the value returned by the closure. The order of the remaining
    /// elements is preserved.
    ///
    /// This example removes objects based on the equality of one of the
    /// properties of the object.
    ///
    ///     struct Animal {
    ///         let kind: String
    ///         let name: String
    ///     }
    ///
    ///     let animals = [
    ///         Animal(kind: "Dog", name: "Fido"),
    ///         Animal(kind: "Cat", name: "Tibbles"),
    ///         Animal(kind: "Lion", name: "Simba"),
    ///         Animal(kind: "Dog", name: "Spot"),
    ///     ]
    ///
    ///     animals.removeDuplicates(by: { $0.kind })
    ///
    ///     // animals == [
    ///     //   Animal(kind: "Dog", name: "Fido"),
    ///     //   Animal(kind: "Cat", name: "Tibbles"),
    ///     //   Animal(kind: "Lion", name: "Simba")
    ///     // ]
    ///
    /// - Parameter distinctValue: A closure that takes an element of the
    ///   sequence as its argument and returns a value that conforms to
    ///   `Equatable` which will in turn be used to compare if an object
    ///   is a duplicate.
    ///
    /// - Complexity: O(*n2*), where *n* is the length of the collection.
    mutating func removeDuplicates<T: Equatable>(by distinctValue: (Element) -> T)
}

extension RangeReplaceableCollection {

    /// Removes any duplicate elements where the equality is based on
    /// a value other than the element itself.
    ///
    /// Use this method to remove duplicate elements where their equality
    /// is based on the value returned by the closure. The order of the remaining
    /// elements is preserved.
    /// This example removes objects based on the equality of one of the
    /// properties of the object.
    ///
    ///     struct Animal {
    ///         let kind: String
    ///         let name: String
    ///     }
    ///
    ///     let animals = [
    ///         Animal(kind: "Dog", name: "Fido"),
    ///         Animal(kind: "Cat", name: "Tibbles"),
    ///         Animal(kind: "Lion", name: "Simba"),
    ///         Animal(kind: "Dog", name: "Spot"),
    ///     ]
    ///
    ///     let uniqueAnimals = animals.removingDuplicates(by: { $0.kind })
    ///
    ///     // uniqueAnimals == [
    ///     //   Animal(kind: "Dog", name: "Fido"),
    ///     //   Animal(kind: "Cat", name: "Tibbles"),
    ///     //   Animal(kind: "Lion", name: "Simba")
    ///     // ]
    ///
    /// - Parameter distinctValue: A closure that takes an element of the
    ///   sequence as its argument and returns a value that conforms to
    ///   `Hashable` which will in turn be used to compare if an object
    ///   is a duplicate.
    ///
    /// - Returns: A new collection with all duplicates removed.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    func removingDuplicates<T: Hashable>(by distinctValue: (Element) -> T) -> Self

    /// Returns a new collection of the same type with all duplicate elements
    /// removed based on a value other than the element itself.
    ///
    /// Use this method to create a new collection with all duplicate elements
    /// removed based on the value returned by the closure. The order of the
    /// remaining elements is preserved.
    ///
    /// This example removes objects based on the equality of one of the
    /// properties of the object.
    ///
    ///     struct Animal {
    ///         let kind: String
    ///         let name: String
    ///     }
    ///
    ///     let animals = [
    ///         Animal(kind: "Dog", name: "Fido"),
    ///         Animal(kind: "Cat", name: "Tibbles"),
    ///         Animal(kind: "Lion", name: "Simba"),
    ///         Animal(kind: "Dog", name: "Spot"),
    ///     ]
    ///
    ///     let uniqueAnimals = animals.removingDuplicates(by: { $0.kind })
    ///
    ///     // uniqueAnimals == [
    ///     //   Animal(kind: "Dog", name: "Fido"),
    ///     //   Animal(kind: "Cat", name: "Tibbles"),
    ///     //   Animal(kind: "Lion", name: "Simba")
    ///     // ]
    ///
    /// - Parameter distinctValue: A closure that takes an element of the
    ///   sequence as its argument and returns a value that conforms to
    ///   `Equatable` which will in turn be used to compare if an object
    ///   is a duplicate.
    ///
    /// - Returns: A new collection with all duplicates removed.
    ///
    /// - Complexity: O(*n2*), where *n* is the length of the collection.
    func removingDuplicates<T: Equatable>(by distinctValue: (Element) -> T) -> Self
}

extension RangeReplaceableCollection where Self: MutableCollection, Element: Hashable {

    /// Removes any duplicate elements.
    ///
    /// Use this method to remove duplicate elements. The order of the
    /// remaining elements is preserved.
    /// This example removes duplicate numbers from an array.
    ///
    ///     let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
    ///     numbers.removeDuplicates()
    ///     // numbers == [1, 2, 3, 4, 5, 6]
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    mutating func removeDuplicates()
}

extension RangeReplaceableCollection where Element: Hashable {

    /// Returns a new collection of the same type with all duplicate elements
    /// removed.
    ///
    /// Use this method to create a new collection with all duplicate elements
    /// removed. The order of the remaining elements is preserved.
    ///
    /// This example takes an array of numbers and creates a new array from it
    /// with all duplicates removed.
    ///
    ///     let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
    ///     let uniqueNumbers = numbers.removingDuplicates()
    ///     // uniqueNumbers == [1, 2, 3, 4, 5, 6]
    ///
    /// - Returns: A new collection with all duplicates removed.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    func removingDuplicates() -> Self
}

extension RangeReplaceableCollection where Self: MutableCollection, Element: Equatable {

    /// Removes any duplicate elements.
    ///
    /// Use this method to remove duplicate elements. The order of the
    /// remaining elements is preserved.
    /// This example removes duplicate numbers from an array.
    ///
    ///     let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
    ///     numbers.removeDuplicates()
    ///     // numbers == [1, 2, 3, 4, 5, 6]
    ///
    /// - Complexity: O(*n2*), where *n* is the length of the collection.
    mutating func removeDuplicates()
}

extension RangeReplaceableCollection where Element: Equatable {

    /// Returns a new collection of the same type with all duplicate elements
    /// removed.
    ///
    /// Use this method to create a new collection with all duplicate elements
    /// removed. The order of the remaining elements is preserved.
    ///
    /// This example takes an array of numbers and creates a new array from it
    /// with all duplicates removed.
    ///
    ///     let numbers = [1, 2, 3, 4, 5, 4, 1, 2, 6]
    ///     let uniqueNumbers = numbers.removingDuplicates()
    ///     // uniqueNumbers == [1, 2, 3, 4, 5, 6]
    ///
    /// - Returns: A new collection with all duplicates removed.
    ///
    /// - Note: For reduced complexity, ensure `Element` conforms to `Hashable`.
    ///
    /// - Complexity: O(*n2*), where *n* is the length of the collection.
    func removingDuplicates() -> Self
}

Source compatibility

This change is purely additive.

Effect on ABI stability

N/A

Effect on API resilience

N/A

Alternatives considered

distinct and unique names

distinct and unique were possible names but ran into problems when trying to keep in line with Swift API design guidelines for mutating and nonmutating variants. Trying to come up with naming that made sense and was obvious to a user wasn't possible with either distinct or unique, so it was decided that it would be better to continue building upon the already existing remove method names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment