Skip to content

Instantly share code, notes, and snippets.

@rbuckton
Created September 22, 2020 03:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rbuckton/cc14f0ff97cf420f474996fee2f78a13 to your computer and use it in GitHub Desktop.
Save rbuckton/cc14f0ff97cf420f474996fee2f78a13 to your computer and use it in GitHub Desktop.
Rough design for improving the native API for strings in ECMAScript
  • Intl.StringInfo

    • .charCodes(s) - Returns an Iterable that yields each code unit within the string s.
    • .codePoints(s) - Returns an Iterable that yields each code point within the string s (essentially just yield* s).
    • .graphemeClusters(s) - Returns an Iterable that yields each grapheme cluster substring within the string s.
    • .codePointCount(s) - Counts the number of code points within the string s.
    • .graphemeClusterCount(s) - Counts the number of grapheme clusters within the string s.
    • .nthCodePoint(s, n) - Gets the n-th code point (as a number) within the string s.
    • .nthGraphemeCluster(s, n) - Gets the n-th grapheme cluster (as a substring) within the string s.
    • .codePointSize(codePoint) - Gets the size, in code units, of the provided code point.
    • .getUnicodeCategory(s, i) - Gets the unicode category for the code point at the specified index i in string s.
    • .isControl(s, i) - Returns true if the code point at the specified index i in string s is a control character.
    • .isDigit(s, i) - Returns true if the code point at the specified index i in string s is a decimal digit.
    • .isNumber(s, i) - Returns true if the code point at the specified index i in string s is a unicode number character.
    • .isLetter(s, i) - Returns true if the code point at the specified index i in string s is a unicode letter character.
    • .isPunctuation(s, i) - Returns true if the code point at the specified index i in string s is a unicode punctuation character.
    • .isSeparator(s, i) - Returns true if the code point at the specified index i in string s is a unicode separator character.
    • .isSymbol(s, i) - Returns true if the code point at the specified index i in string s is a unicode symbol character.
    • .isWhitespace(s, i) - Returns true if the code point at the specified index i in string s is a unicode whitespace character.
    • .isLowSurrogate(s, i) - Returns true if the code point at the specified index i in string s is a low surrogate code unit.
    • .isHighSurrogate(s, i) - Returns true if the code point at the specified index i in string s is a high surrogate code unit.
    • .isSurrogatePair(s, i) - Returns true if the next two code points at the specified index i in string s form a surrogate pair.
    • .isUpperCase(s, i) - Returns true if the code point at the specified index i in string s is a unicode upper-case character.
    • .isLowerCase(s, i) - Returns true if the code point at the specified index i in string s is a unicode lower-case character.
  • String.prototype:

    • .codePointCount() - Counts the number of code points within the string s.
    • .nthCodePoint(n) - Gets the n-th code point (as a number) within the string s.
@hax
Copy link

hax commented Sep 22, 2020

I also hope we can have some methods which just codepoint-safe but still codeunit-based index (for O(1) performance):

  • String.prototype.uItem(i) - behave like item(i) but return length 2 string if it's surrogate pair, and uItem(i+1) return empty string.
  • String.protoype.uSlice() - behave like slice() but use similar semantic of uItem for surrogate pair.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment