Skip to content

Instantly share code, notes, and snippets.

@noromanba
Last active March 22, 2018 23:49
Show Gist options
  • Save noromanba/e496144b647588baf33c42781f6da74d to your computer and use it in GitHub Desktop.
Save noromanba/e496144b647588baf33c42781f6da74d to your computer and use it in GitHub Desktop.
"Character Counting" in JavaScript

"Character Counting" in JavaScript

via

What is "Character Counting" in js, incl. Surrogates and ZWJ c.f.

on about:blank

document.body.innerHTML = `🍣 <br>
 🍣`
// pseudo-text is `🍣 \n 🍣`, [Sushi] [Space] [Newline] [Space] [Sushi]
<body>
🍣 <br>
 🍣
</body>
// `🍣 \n 🍣` which one?

// absolutely wrong, when appear surrogates e.g. emoji
document.body.textContent.length                // -> 7
// also wrong
document.body.textContent.match(/./g).length    // -> 6

// Text Editor standards?
// ES2015+ "u" flag; not incl. "\n", incl. " "
document.body.textContent.match(/./ug).length   // -> 4
// ES2015+ Spread Op
[...document.body.textContent]
  .filter(s => !/\n/.test(s)).length            // -> 4
// more strict
[...document.body.textContent]
  .filter(s => /\S/.test(s)).length             // -> 2

// w/ line terminator w/o hack; classical hacks /[\s\S]+/ alternative
[...document.body.textContent].length           // -> 5

// ES2018+ "s" flag aka "dotAll" flag c.f.
// http://2ality.com/2017/07/regexp-dotall-flag.html#limitations-of-the-dot-in-regular-expressions
document.body.textContent.match(/./sug).length  // -> 5
document.body.innerText.match(/./sug).length    // -> 4

Appendix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment