Languages have various allowances for white-space. This is a short exploration of parsers I found and what they claim to accept
Oct Dec Char Hex Key Esc
\000 0 NUL \x00 ^@ \0 (Null byte)
\010 8 BS \x08 ^H \b (Backspace)
\011 9 HT \x09 ^I \t (Horizontal tab)
\012 10 LF \x0A ^J \n (Line feed) (Default UNIX NL)
\013 11 VT \x0B ^K (Vertical tab)
\014 12 FF \x0C ^L \f (Form feed)
\015 13 CR \x0D ^M \r (Carriage return)
\040 32 " " \x20 (space)
\u2028 (Line separator character)
\u2029 (Paragraph separator character)
\u00A0 (NO-BREAK SPACE <NBSP>)
\uFEFF (ZERO WIDTH NO-BREAK SPACE <ZWNBSP>)
Other category “Zs” Any other Unicode “Separator, space” code point <USP>
Rust:
'\x20' | '\x09' | '\x0a' | '\x0d'
Python:
space, tab, linefeed, return, formfeed, and vertical tab.
(c == ' ' || c == '\t' || c == '\n' || c == '\014')
(line[i] == '#' || line[i] == '\n' || line[i] == '\r')
Ruby:
space, tab, vertical tab, backspace, carriage return, and form feed.
C#:
C.1.1 Line terminators
new-line:
Carriage return character (U+000D)
Line feed character (U+000A)
Carriage return character (U+000D) followed by line feed character (U+000A)
Line separator character (U+2028)
Paragraph separator character (U+2029)
C.1.2 White space
whitespace:
Any character with Unicode class Zs
Horizontal tab character (U+0009)
Vertical tab character (U+000B)
Form feed character (U+000C)
C++:
white-space ::= non-nl-white-space | newline
non-nl-white-space ::= a blank, tab, vertical tab, carriage return, or formfeed character
JavaScript:
WhiteSpaces : [\t\u000B\u000C\u0020\u00A0]+ -> channel(HIDDEN);
WhiteSpace ::
<TAB>
<VT>
<FF>
<SP>
<NBSP>
<ZWNBSP>
<USP>
LineTerminator ::
<LF>
<CR>
<LS>
<PS>
JSON:
WS : [ \t\n\r]+ -> skip ;
Lua:
WS : [ \t\u000C\r\n]+ -> skip
Swift:
WS : [ \n\r\t\u000B\u000C\u0000]+ -> channel(HIDDEN) ;
Java:
WS : [ \t\r\n\u000C]+ -> skip
Clojure:
WS : [ \n\r\t\,] ;
SQLite:
SPACES : [ \u000B\t\r\n] -> channel(HIDDEN);
SPARQL:
WS
: (' '
| '\t'
| '\n'
| '\r')+ ->skip
;
Scala:
fragment WhiteSpace : '\u0020' | '\u0009' | '\u000D' | '\u000A';