Skip to content

Instantly share code, notes, and snippets.

@KamilaBorowska
Created September 5, 2012 17:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KamilaBorowska/3640822 to your computer and use it in GitHub Desktop.
Save KamilaBorowska/3640822 to your computer and use it in GitHub Desktop.
/*
* My evil JavaScript parser tests. Mostly for syntax highlighters. Note that
* those tests only require you to specifically mark regexpes to work.
*
* Following regexpes should be found and other regexpes shouldn't.
*
* /1/i
* /2; /
* /[//*]/
* / 3 /
* / \//
* /4; /\u0069
* /5/i
* /6/i
* /7/i
* /8/i
* /9; /
*
* Everything in this test is valid ECMAScript. If you don't understand why
* certain regexp should or should not be matched, read specification.
*
* 2012 - GlitchMr
*/
// Highlighting any language correctly is not easy. Languages themselves are
// rather... ambiguous. And I'm not talking about Perl and Ruby (both languages
// aren't statistically parsable - the problem is worse in Perl, but Ruby also
// has this problem (functions and variables are parsed differently, a +1 could
// a variable plus 1 (addition) or function a() called with argument +1).
//
// But this isn't about those languages - it's simply impossible to parse them
// correctly. The problem is usually that highlighters cannot even highlight
// commonly known languages like JavaScript. One of problems is called
// "automatic semicolon insertion" - many syntax highlighters aren't aware
// that you can write JavaScript without semicolons. The second problem is
// that certain constructs (parenthesis/function calls, arrays/object access,
// dictionaries/blocks, regexpes/division) are simply ambiguous - the parser
// chooses one of those depending on whatever it expected infix operator or
// expression.
//
// In this test, regular expressions were choosed because usually editors
// highlight them differently and their syntax is ambiguous.
//
// Please note that even if this test fails it doesn't matter much - usually
// nobody will make code to intentionally break syntax highlighters. Unless
// they want to obfuscate code, but if they want, I would use other language,
// such as Python.
//
// Do you want easy language to highlight correctly? Well, try Brainfuck then.
// for 'return' which requires to be in function
(function () {
// Only three variables :). YAY!
var regular, notreturn, i = {};
// Regular expression in void context
// REGEXP, COMMENT
/1/i//
// DIVIDE BY, -10, DIVIDE BY, i, COMMENT, implied semicolon
/-8/i//
// TYPEOF, REGEXP, COMMENT, implied semicolon
typeof /2; ///
// VARIABLE (notreturn), DIVIDE BY, VARIABLE (regular), ";", COMMENT (///)
notreturn /-9; ///
// Mysteriously many syntax highlighters fail this test, so it makes sense to
// include it. If your does, then it's only serious failure in this test and
// something likely to show in real code.
// -1, DIVIDE BY, -2, DIVIDE BY, -3, DIVIDE BY, -4, SEMICOLON
-1 / -2 / -3 / -4;
// [//*] is part of ES5
// REGEXP, multiline comment
/[//*]//**/
// SEMICOLON
;
// REGEXP (with space), DIVIDE BY, REGEXP (space and escaped "/" character), COMMENT, implied semicolon
// v---------| | |
// v------------- |
// v---------------------------------
/ 3 // / \// //
// IF, "(", REGEXP, COMMENT
if(/4; /\u0069//)
// ENDIF, REGEXP, semicolon
)/5/i;
// [][0], multiline comment
[][0]/*
// end of multiline comment, implied semicolon, ++, REGEXP (with i modifier), source property, semicolon
*/++/6/i.source;
// [][0], multiline comment, ++, DIVIDE BY, -9, DIVIDE BY, i.source, SEMICOLON
[][0]/**/++/-10/i.source;
/7/i in/8/i
// Return of the regular expression
// RETURN, REGEXP, COMMENT, implied semicolon
return /9; ///
}())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment