Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
/*
# Problem
In Swift, it can be cumbersome to work with Unicode characters that are
non-printing, confusable, or have difficulty rendering in the editor.
For example, to generate the "Family: Woman, Girl" emoji:
*/
// Option 1: Unicode Scalar Value Escapes
"\u{1F469}\u{200D}\u{1F467}"
// Option 2: Commented Declaration + Interpolation
let zwj: Character = "\u{200D}" // ZERO WIDTH JOINER
"👩\(zwj)👧"
/*
# Proposed Solution
Add \N{name} escape sequence for named Unicode characters.
*/
"\N{WOMAN}\N{ZERO WIDTH JOINER}\N{GIRL}"
// Consider the 24 Unicode characters
// comprising the Punctuation, Dash [Pd] category,
// such as:
/*
U+002D HYPHEN-MINUS -
U+2010 HYPHEN ‐
U+2011 NON-BREAKING HYPHEN ‑
U+2012 FIGURE DASH ‒
U+2013 EN DASH –
U+2014 EM DASH —
U+2015 HORIZONTAL BAR ―
U+2E3A TWO-EM DASH ⸺
U+2E3B THREE-EM DASH ⸻
*/
// Which of these would you rather find in a code base?
"‒? \u{2012}? or \N{FIGURE DASH}?"
// The \N{} escape sequence is obscure,
// but supported in Python and a few other languages.
// Most notably, though, it's the output you get
// when you call the method `applyingTransform(_:reverse:)`
// with the `.toUnicodeName` transform:
import Foundation
"🍩".applyingTransform(.toUnicodeName, reverse: false) // \N{DOUGHNUT}
"\\N{DOUGHNUT}".applyingTransform(.toUnicodeName, reverse: true) // 🍩
@mattt

This comment has been minimized.

Copy link
Owner Author

mattt commented Nov 6, 2018

@AliSoftware

This comment has been minimized.

Copy link

AliSoftware commented Nov 6, 2018

I like the idea, but as I dislike anything String-based in the API and value constants instead, what about declaring constants on Character to handle this instead?

import Foundation

extension Character {
  init?(unicodeName: String) {
    if let char = "\\N{\(unicodeName)}".applyingTransform(.toUnicodeName, reverse: true)?.first {
      self = char
    } else {
      return nil
    }
  }
  var unicodeName: String? {
    guard let name = String(self).applyingTransform(.toUnicodeName, reverse: false) else { return nil }
    return String(name.dropFirst(3).dropLast())
  }

  struct Name {
    static let Woman: Character = "\u{1F469}"
    static let ZeroWidthJoiner: Character = "\u{200D}"
    static let Girl: Character = "\u{1F467}"
    static let HyphenMinus: Character = "\u{002D}"
    static let Hyphen: Character = "\u{2010}"
    static let NonBreakingHyphen: Character = "\u{2011}"
    static let FigureDash: Character = "\u{2012}"
    static let EnDash: Character = "\u{2013}"
    static let EmDash: Character = "\u{2014}"
    static let HorizontalBar: Character = "\u{2015}"
    static let TwoEmDash: Character = "\u{2E3A}"
    static let ThreeEmDash: Character = "\u{2E3B}"
    // ...
  }
}

Sure one could then argue that at call site this might be a little verbose:

let test2 = "\(Character.Name.Woman)\(Character.Name.ZeroWidthJoiner)\(Character.Name.Girl)"
// instead of: "\N{WOMAN}\{ZERO WIDTH JOINER}\{GIRL}"

But if that really bothers you, you can still alias it to something shorter in places where you need it a lot:

let N = Character.Name.self
let test2 = "\(N.Woman)\(N.ZeroWidthJoiner)\(N.Girl)"

let stringName = N.Woman.unicodeName // "WOMAN"
let revert = stringName.map(Character.init(unicodeName:))

My rationale on this alternative is:

  • As the original proposal is String-based, there isn't auto-completion (or if we want one, we'd have to built it in inside the IDE, so each IDE would need to add specific support, and be updated every time the Unicode version is updated with new Character names… not ideal)
  • And also as a consequence, I'd never remember either the name to use, or its exact spelling (is it "ZERO-WIDTH JOINER" or "ZERO WITDH JOINER" or "ZERO-WIDTH-JOINER"?)

As opposed to this, the Character.Name constants:

  • Have auto-completion and check-at-compile-time for free already, without any changes needed at the compiler level or IDE & LSP level
  • Sure might be more verbose at call-site, but compared to how rare we'd use them in string literals, it's acceptable to me
  • If you really need a string where you use them heavily, you could simply alias the type to a shorter name to make the interpolation shorter… and look very closely to the initial proposition (see example above)
@AliSoftware

This comment has been minimized.

Copy link

AliSoftware commented Nov 6, 2018

Note: as for the naming of the constants (Unicode names are uppercase, with spaces and hyphens, which are invalid names for constants and would need some normalization), there are some precedent on this in SE-0211 which did exactly the same kind of unicode properties naming normalization 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.