Skip to content

Instantly share code, notes, and snippets.

@mossheim
Last active March 1, 2017 05:18
Show Gist options
  • Save mossheim/3ae1c2771f06ce653b35aad017b89d6b to your computer and use it in GitHub Desktop.
Save mossheim/3ae1c2771f06ce653b35aad017b89d6b to your computer and use it in GitHub Desktop.
Regression tests for the SuperCollider lexer, parser, and interpreter.

sclang Tests

TL;DR: Regression tests for the SuperCollider lexer, parser, and interpreter.

Brian Heim, 28 jan 2017

Overview

This document proposes the form and high-level implementation details of a series of rigorous regression tests for the SuperCollider lexer, parser, and interpreter. Although these are treated as three separate components for the purposes of testing, testing one component through the compiled language naturally involves some testing of the other two.

Definitions

Lexer: essentially, the method yylex() in PyrLexer.cpp. The component that handles character-to-character understanding of an input text, identifying tokens for the parser.

Parser: the method yyparse(), generated by the Bison file lang11d. The component that handles token-to-token understanding of an input text, identifying parse nodes for the interpreter.

Interpreter (or compiler): the component that transforms the AST generated by the parser into an interpreted sequence of byte codes. The code for this component lies in PyrParseNode.cpp.

Approach

There are three general approaches to testing:

  1. Full component stress-testing
  2. Targeted sub-component stress-testing
  3. Pseudorandom testing

Full component stress-testing involves testing the component by passing it input consisting of all possible strings of a given length. In addition to guaranteeing the performance of the component over a clearly bounded field, this stress-tests garbage handling.

Targeted sub-component stress-testing means testing a specialized behavior of the component, again with all possible strings of a given length, but limiting the alphabet and/or grammar so as to test a larger string length.

Pseudorandom testing means traversing a very large input space by passing a random (usually uniformly so) subset of the possible input strings. Imagine the string as being represented by a number represented in the base of the alphabet size, with a maximum value of (alphabet size)^(string length). For exmaple, if the alphabet size is 16, and the desired string length 6, a good pseudorandom test might involving starting with the string represented by 0x000000, then testing every "number" that is a multiple of a number coprime with the alphabet size, such as 11 or 13.

Limits

A reasonable number of inputs to test for any stress-test is (in the author's opinion) is 10-20 million. This results in the following (soft) limits on alphabet size, assuming there are no limitations on grammar (i.e., all combinations are to be tested):

String Length 10 million tests 20 million tests
1 10000000 20000000
2 3162 4472
3 215 271
4 56 67
5 25 29
6 15 16
7 10 11
8 7 8

Proposed Test Types

Lexer Tests

Alphabet: character literals in the range -128 to 127, excluding 0.

  • Full component test of strings up to 3 characters in length
  • Sub-component test of literal recognition (symbols, floats, radix floats, etc.) of strings 4+ characters in length. (By my reckoning, the longest relevant string worth testing is 7 characters: -1.1e-5)

Parser Tests

Alphabet: any symbol recognized by Bison, plus some duplicates to test more complex behavior. Specific alphabet TBD. This includes many of the 41 tokens explicitly defined at the top of lang11d, plus []{}();,, maybe more.

  • Full component tests of strings up to length 4
  • Sub-component tests??
  • Pseudorandom testing of strings up to length 10 (?)

Compiler tests

The compiler has no specific alphabet. Rather, these tests will reuse input spaces from the lexer and parser tests and surround them with { }.def.code to get the byte code array.

Ideas for Specific Test Implementations

Storage

In an output file, store the result of the regression test for comparison with pre-validated output.

Encoding

The storage format should include some sort of compression, since many strings in the lexer and parser are likely to be garbage and fail, producing nil. One idea is to just count the number of consecutive strings that produce the same result and encode that as a base-10 number at the end of the current output line. An output-comparing utility method would have to be written (and tested!) to properly decode this.

The current idea is to give the input string as a hex sequence (to make sure that non-printing characters are correctly shown) followed by the class of the result of interpretation or compilation. In the case of the lexer, this could be both the class of the object returned by .interpret and the result of .asString called on that object.

Strings should be encoded as strings of hex character values to ensure they do not interfere with delimiters, and so that non-printing characters are made visible.

Lexer testing notepad

For the main test, .interpreting the string in three ways:

  • first, the string alone;
  • second, the string prefixed by a known and easily identified expression;
  • third, the string suffixed by a known and easily identified expression.

Other tests do not need to be that rigorous.

Purpose: since we cannot directly test the output of the lexer without significant effort, we can use this to gain some insight into the lexer's behavior. Compare the results of these tests when run on a few test strings:

Test String str "\\before;" ++ str str ++ ";\\after"
"//" nil before nil
"10" 10 10 after
"abc" nil nil nil
"\\\'" (empty symbol) (empty symbol) (empty symbol)

Thoughts: could also suffix with "\after" to catch inoperative strings like " ", or surround with { } (but this gives less info. How many cases should we limit ourselves to?

Component tests: make little grammars to test radix notation, accidental notation, hex notation, exponential notation, and unholy mixtures of those. Testing should focus on that because the interactions among these notations is complicated.

Pseudorandom tests: maybe not needed for the lexer, but could lead to something interesting.

Error handling: some inputs will cause an error when interpreted, not lexed. This should be handled by surrounding the .interpret call with a try-catch. If the catch block is entered, set the result to some unique identifier like \error and use that as the recorded output.

Parser testing notepad

Tokens from Bison:


%token	NAME INTEGER SC_FLOAT ACCIDENTAL SYMBOL STRING ASCII PRIMITIVENAME CLASSNAME CURRYARG
%token  VAR ARG CLASSVAR SC_CONST
%token	NILOBJ TRUEOBJ FALSEOBJ
%token	PSEUDOVAR
%token  ELLIPSIS DOTDOT PIE BEGINCLOSEDFUNC
%token  BADTOKEN INTERPRET
%token  BEGINGENERATOR LEFTARROW WHILE
%left	':'
%right  '='
%left	BINOP KEYBINOP '-' '<' '>' '*' '+' '|' READWRITEVAR
%left	'.'
%right  '(backtick)'
%right  UMINUS

Should have at least 2 variable names and 2 real class names to test multiple declarations. And should also include punctuation characters not shown here.

Can probably test strings up to length 4, might want to do the same tests both raw and within {}?

Methods to use

Some important functions here are .interpret, .compile, and .def.code. What to use when is still kind of up in the air.

@mossheim
Copy link
Author

@telephon sorry just saw your comments! thanks for the ideas.

I like to avoid .join because it's a super slow method, but in this case it's not in any main performance-impacting loop so it would definitely improve readability.

I'm returning an IdentityDictionary now instead of Event, since all I really need is key-value pairs. Yes, it's quite slick!

Re: factoring out bits of the parser method, totally agree. I'll try to find a way to do it that doesn't involve a lot of argument passing. I wrote *parseHeader to be totally symmetric to *formatHeader, but since validation takes so much more effort I should move those subsections out.

Makes me glad to know someone is looking at this :)

@mossheim
Copy link
Author

tried refactoring *parseHeader but it's like reinventing the iostream wheel (especially with line reading). I'm going to refactor that function as a file parser tomorrow. reduces code and opportunities for cracks, although now the methods are asymmetrical (one formats a string, the other reads from a file). might also be practical to change *formatHeader to a file-writing method *writeHeader.

@mossheim
Copy link
Author

new code uploaded. still working on validation/diff method

@mossheim
Copy link
Author

just began a separate file for testing the tests... i really want this to be solid. also allowed me to move my preexisting validation code into a more durable format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment