- Proposal: SE-0TBD
- Author(s): Andrew McKnight, Erica Sadun
- Status: TBD
This proposal adds a new trim()
method to the standard library. It removes leading and trailing whitespaces using the Regex and Unicode notion of whitespace.
This proposal was first discussed on the Swift Evolution list in the
Surveying how Swift evolves, String Hygiene, and Corner-cases in Character
classification of whitespace threads.
Surveying Swift utility libraries on GitHub revealed many interesting trends. String trimming was by far the most popular third party customization for Swift.
These are the top 10 function declarations from extensions on String, in their canonical form from swiftc:
- 24
trim() -> String
- 13
substring(from: Int) -> String
- 12
substring(to: Int) -> String
- 11
isValidEmail() -> Bool
- 10
trimmed() -> String
- 10
toBool() -> Bool?
- 10
height(withConstrainedWidth width: CGFloat, font: UIFont) -> CGFloat
- 9
trim()
- 9
toDouble() -> Double?
- 9
isNumber() -> Bool
The count for the first of these, 24, is misleading, as trimming also appears in the #5 and #8 spots, including a mutating variation. It appears in many forms further down the list, with 84 methods in just this sample:
- 24
trim() -> String
- 10
trimmed() -> String
- 9
trim()
- 3
trimPhoneNumberString() -> String
- 3
trimNewLine() -> String
- 3
trimForNewLineCharacterSet() -> String
- 2
trimmedRight(characterSet set: NSCharacterSet = default) -> String
- 2
trimmedLeft(characterSet set: NSCharacterSet = default) -> String
- 1
trimmingWhitespacesAndNewlines() -> String
- 1
trimmedStart(characterSet set: CharacterSet = default) -> String
- 1
trimmedRight() -> String
- 1
trimmedLeft() -> String
- 1
trimmedEnd(characterSet set: CharacterSet = default) -> String
- 1
trimWhitespace() -> String
- 1
trimPrefix(prefix: String)
- 1
trimInside() -> String
- 1
trimDuplicates() -> String
- 1
trim(trim: String) -> String
- 1
trim(_ characters: String) -> String
- 1
trim(_ characterSet: CharacterSet) -> <>
- 1
stringByTrimmingTailCharactersInSet(_ set: CharacterSet) -> String
- 1
sk4TrimSpaceNL() -> String
- 1
sk4TrimSpace() -> String
- 1
sk4Trim(str: String) -> String
- 1
sk4Trim(charSet: NSCharacterSet) -> String
- 1
prefixTrimmed(prefix: String) -> String
- 1
omTrim()
- 1
m_trimmed() -> String
- 1
m_trim()
- 1
jjs_trimWhitespaceAndNewline() -> String
- 1
jjs_trimWhitespace() -> String
- 1
jjs_trimNewline() -> String
- 1
jjs_emptyOrStringAndTrim(str: String?) -> String
- 1
hyb_trimRight(trimNewline: Bool = default) -> String
- 1
hyb_trimLeft(trimNewline: Bool = default) -> String
- 1
hyb_trim(trimNewline: Bool = default) -> String
Lots of people are solving the same problem the same way, a function that is sufficiently universal to justify inclusion in the Standard Library in Swift 5.
This proposal trailblazes a new area of community-driven design. Because of that, it has had to take several challenges, moving both with and against conventional wisdom, into account in developing its design.
This implementation does not wrap NSString
's trimmingCharacters(in:)
API, ensuring that it can be decoupled from Cocoa and Cocoa Touch for use on other platforms.
This implementation uses NSString
's categorization of newlines and white spaces, specifically Unicode General Category Z*, U+000A ~ U+000D, and U+0085. This is not a user-facing detail and the discussion and implementation of Swift-only standards-based character sets can be resolved at a future time.
This implementation offers the simplest tooling and returns a String
rather than a Substring
, following StringProtocol
's existing art, to best match the community-sourced problem space it is trying to satisfy. StringProtocol
declares func trimmingCharacters(in set: CharacterSet) -> String
, which returns a string.
- The API should be as useful as possible and as Swifty as possible but if it returns substrings, third party libraries will start implementing
var trimmedAsString
because the API is not giving people the tool that does what they want and need. - Producing a string isn't the most efficient approach nor is it the most general but it provides tooling that expresses the task common to an overwhelming number of use cases.
- A full trimming API, that enables you to select direction and exclusion set, lies outside the scope of this proposal. That full API might be able to work on any bidirection collection and any element set. Or it might simply cover
StringProtocol
,UnicodeScalarView
,String
, andSubstring
.
Quite a lot of the preliminary discussion of this proposal covered whether it was better to use enumerations or option sets to provide an affordance that allows trimming from one side or the other. This proposal uses an enumeration for the following reasons:
- There is no precedent in the standard library for using option set arguments.
- Option set call-site vocabulary is overly large for the needs of the API. You can call the function using static members (for example,
.start
), with set-array notation (for example,[.start]
), and raw value initialization. - Raw value initialization, in particular, permits call-sites to use meaningless values that are legal, sanctioning poor call-site hygiene.
- The number of customizations will never be more than 2 and call-sites should use either none or one. Calling with no options is preferable to
[.start, .end]
or even[.end, .start]
.
extension String {
/// The direction from which a string is trimmed, where `full`
/// (the typical default) trims from the `start` and `end`.
public enum Trimming { case start, end, full }
/// Whitespace and newline characters, which are defined as Unicode General
/// Category Z* (Zl, Zp, Zs), U+000A ~ U+000D, and U+0085.
public static var whitespaceAndNewlineCharacters: Set<Character> = [
// [Zl]: Unicode Characters Category 'Separator, Line'
"\u{2028}", // LINE SEPARATOR
// [Zp]: Unicode Character Category 'Separator, Paragraph'
"\u{2029}", // PARAGRAPH SEPARATOR
// [Zs]: Unicode Character Category 'Separator, Space'
"\u{0020}", // SPACE
"\u{00A0}", // NO-BREAK SPACE
"\u{1680}", // OGHAM SPACE MARK
"\u{2000}", // EN QUAD
"\u{2001}", // EM QUAD
"\u{2002}", // EN SPACE
"\u{2003}", // EM SPACE
"\u{2004}", // THREE-PER-EM SPACE
"\u{2005}", // FOUR-PER-EM SPACE
"\u{2006}", // SIX-PER-EM SPACE
"\u{2007}", // FIGURE SPACE
"\u{2008}", // PUNCTUATION SPACE
"\u{2009}", // THIN SPACE
"\u{200A}", // HAIR SPACE
"\u{202F}", // NARROW NO-BREAK SPACE
"\u{205F}", // MEDIUM MATHEMATICAL SPACE
"\u{3000}", // IDEOGRAPHIC SPACE
// U+000A ~ U+000D, and U+0085, per Foundation documentation
// for
"\u{000A}",
"\u{000B}",
"\u{000C}",
"\u{000D}",
"\u{0085}",
]
/// Returns a new string removing whitespace from
/// both ends of the source string. Whitespace characters
/// are defined as Unicode General Category Z*,
/// U+000A ~ U+000D, and U+0085.
///
/// Trimming takes place over the characters of a string,
/// so that the unicode grapheme clusters have already been
/// resolved. The grapheme clustering pass will happen
/// before escape sequences like `\n` are processed.
///
/// - Parameter trim: A direction from which to trim, legal values
/// are `.left` and `.right`. If omitted, the string is trimmed
/// from both sides.
/// - Returns: A string trimmed of its whitespace on both
/// the leading and trailing text.
public func trimmed(from trim: String.Trimming = .full) -> String {
// Ensure that this implementation does not rely on the
// NSString implementation of trimmingCharacters(in: .whitespacesAndNewlines)
guard !isEmpty else { return String(self[...]) }
var (trimStart, trimEnd) = (startIndex, index(before: endIndex))
if [.start, .full].contains(trim) {
guard let start = indices.first(where: {
!String.whitespaceAndNewlineCharacters.contains(self[$0])
}) else { return String(self[endIndex ..< endIndex]) }
trimStart = start
}
if [.end, .full].contains(trim) {
guard let end = indices.reversed().first(where: {
!String.whitespaceAndNewlineCharacters.contains(self[$0])
}) else { return String(self[endIndex ..< endIndex]) }
trimEnd = end
}
return String(self[trimStart ... trimEnd])
}
/// Trims a string in-place by removing whitespace from
/// both ends of the source string. Whitespace characters
/// are defined as Unicode General Category Z*,
/// U+000A ~ U+000D, and U+0085.
///
/// - Parameter trim: A direction from which to trim, legal values
/// are `.left` and `.right`. If omitted, the string is trimmed
/// from both sides.
public mutating func trim(from trim: String.Trimming = .full) {
self = self.trimmed(from: trim)
}
}
Not adopting this proposal.
This proposal is strictly additive.
This proposal does not affect ABI stability.
This proposal does not affect ABI resilience.
Could I make a request for an option that specifies the side of the string to trim? I've run into a few cases where trimming leading whitespace is not desirable, and I can imagine the other case is true for someone as well.