Skip to content

Instantly share code, notes, and snippets.

@leebyron
Last active April 29, 2022 02:52
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save leebyron/7a63e80b31d9d4cc9061 to your computer and use it in GitHub Desktop.
Save leebyron/7a63e80b31d9d4cc9061 to your computer and use it in GitHub Desktop.

Languages have various allowances for white-space. This is a short exploration of parsers I found and what they claim to accept

Key

Oct  Dec Char  Hex  Key Esc
\000   0  NUL  \x00  ^@ \0 (Null byte)
\010   8   BS  \x08  ^H \b (Backspace)
\011   9   HT  \x09  ^I \t (Horizontal tab)
\012  10   LF  \x0A  ^J \n (Line feed)  (Default UNIX NL)
\013  11   VT  \x0B  ^K    (Vertical tab)
\014  12   FF  \x0C  ^L \f (Form feed)
\015  13   CR  \x0D  ^M \r (Carriage return)
\040  32   " " \x20        (space)
			   \u2028      (Line separator character)
  			   \u2029      (Paragraph separator character)
			   \u00A0	   (NO-BREAK SPACE	<NBSP>)
			   \uFEFF	   (ZERO WIDTH NO-BREAK SPACE	<ZWNBSP>)
Other category “Zs”	Any other Unicode “Separator, space” code point	<USP>

Whitespace

Rust:

'\x20' | '\x09' | '\x0a' | '\x0d'

Python:

space, tab, linefeed, return, formfeed, and vertical tab.  
(c == ' ' || c == '\t' || c == '\n' || c == '\014') 
(line[i] == '#' || line[i] == '\n' || line[i] == '\r')

Ruby:

space, tab, vertical tab, backspace, carriage return, and form feed.

C#:

C.1.1 Line terminators
new-line:
Carriage return character (U+000D)
Line feed character (U+000A)
Carriage return character (U+000D) followed by line feed character (U+000A)
Line separator character (U+2028)
Paragraph separator character (U+2029)

C.1.2 White space
whitespace:
Any character with Unicode class Zs
Horizontal tab character (U+0009)
Vertical tab character (U+000B)
Form feed character (U+000C)

C++:

white-space ::= non-nl-white-space | newline
non-nl-white-space ::= a blank, tab, vertical tab, carriage return, or formfeed character

JavaScript:

WhiteSpaces : [\t\u000B\u000C\u0020\u00A0]+ -> channel(HIDDEN);
WhiteSpace ::
<TAB>
<VT>
<FF>
<SP>
<NBSP>
<ZWNBSP>
<USP>

LineTerminator ::
<LF>
<CR>
<LS>
<PS>

JSON:

WS  :   [ \t\n\r]+ -> skip ;

Lua:

WS : [ \t\u000C\r\n]+ -> skip

Swift:

WS : [ \n\r\t\u000B\u000C\u0000]+ -> channel(HIDDEN) ;

Java:

WS  :  [ \t\r\n\u000C]+ -> skip

Clojure:

WS : [ \n\r\t\,] ;

SQLite:

SPACES : [ \u000B\t\r\n] -> channel(HIDDEN);

SPARQL:

WS
    : (' '
    | '\t'
    | '\n'
    | '\r')+ ->skip
    ;

Scala:

fragment WhiteSpace :  '\u0020' | '\u0009' | '\u000D' | '\u000A';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment