DerekNonGeneric/normalizeString.md

## normalizeString.md

      
    Raw
  

              normalizeString.md
            
          
    Here are a couple niche cases where the normalizeString() function could have
an effect due to re-tokenizing the string:

Unicode joiner/non-breaking space characters: These "invisible" characters
are meant to join or provide spaces between characters without breaking lines or
strings. Re-splitting and re-joining a string containing them could disrupt
their spacing effect. For example:

let str = 'hello' + '\u00A0' + 'world'; // non-breaking space between 'hello' and 'world'
normalizeString(str);
// Splits and rejoins, potentially disturbing the non-breaking space

Zero-width characters: Certain unicode characters like zero-width spaces,
zero-width joiners, etc. occupy "width" in a string but have no visual glyph.
They depend on the string boundaries remaining intact. Re-tokenizing the string
could affect their "invisible" position or join/split strings unintentionally.
For example:

let str = 'hel' + '\u200B' + 'lo'; // zero-width space between 'hel' and 'lo'
normalizeString(str);
// Re-splitting and re-joining may remove or alter the zero-width space

String "reset": Highly niche, but re-tokenizing and re-joining a
string could be used to "reset" certain unicode state (e.g. removing
strange double-spacing from Thai or Lao fonts). Again, an edge case
but where reconstructing the string with no other changes has an
effect.

So in summary, while the normalizeString() function does not provide
substantial utility for most normal string processing needs,
re-delimiting and re-joining strings can have an impact, however
subtle, on certain unicode and special characters that depend on precise
string boundaries or spacing.