via
What is "Character Counting" in js, incl. Surrogates and ZWJ c.f.
- https://blog.jxck.io/entries/2017-03-02/unicode-in-javascript.html
- https://teppeis.github.io/everything-is-iterator/#1
- #10-#18
document.body.innerHTML = `🍣 <br>
🍣`
// pseudo-text is `🍣 \n 🍣`, [Sushi] [Space] [Newline] [Space] [Sushi]
<body>
🍣 <br>
🍣
</body>
// `🍣 \n 🍣` which one?
// absolutely wrong, when appear surrogates e.g. emoji
document.body.textContent.length // -> 7
// also wrong
document.body.textContent.match(/./g).length // -> 6
// Text Editor standards?
// ES2015+ "u" flag; not incl. "\n", incl. " "
document.body.textContent.match(/./ug).length // -> 4
// ES2015+ Spread Op
[...document.body.textContent]
.filter(s => !/\n/.test(s)).length // -> 4
// more strict
[...document.body.textContent]
.filter(s => /\S/.test(s)).length // -> 2
// w/ line terminator w/o hack; classical hacks /[\s\S]+/ alternative
[...document.body.textContent].length // -> 5
// ES2018+ "s" flag aka "dotAll" flag c.f.
// http://2ality.com/2017/07/regexp-dotall-flag.html#limitations-of-the-dot-in-regular-expressions
document.body.textContent.match(/./sug).length // -> 5
document.body.innerText.match(/./sug).length // -> 4