mossheim/sclangTests.md

## sclangTests.md

      
    Raw
  

              sclangTests.md
            
          
    sclang Tests

TL;DR: Regression tests for the SuperCollider lexer, parser, and interpreter.
Brian Heim, 28 jan 2017
Overview

This document proposes the form and high-level implementation details of a series
of rigorous regression tests for the SuperCollider lexer, parser, and interpreter.
Although these are treated as three separate components for the purposes of testing,
testing one component through the compiled language naturally involves some
testing of the other two.
Definitions

Lexer: essentially, the method yylex() in PyrLexer.cpp. The component that
handles character-to-character understanding of an input text, identifying tokens
for the parser.
Parser: the method yyparse(), generated by the Bison file lang11d. The
component that handles token-to-token understanding of an input text, identifying
parse nodes for the interpreter.
Interpreter (or compiler): the component that transforms the AST generated by
the parser into an interpreted sequence of byte codes. The code for this
component lies in PyrParseNode.cpp.
Approach

There are three general approaches to testing:

Full component stress-testing
Targeted sub-component stress-testing
Pseudorandom testing

Full component stress-testing involves testing the component by passing it input
consisting of all possible strings of a given length. In addition to guaranteeing
the performance of the component over a clearly bounded field, this stress-tests
garbage handling.
Targeted sub-component stress-testing means testing a specialized behavior of the
component, again with all possible strings of a given length, but limiting the
alphabet and/or grammar so as to test a larger string length.
Pseudorandom testing means traversing a very large input space by passing a
random (usually uniformly so) subset of the possible input strings. Imagine the
string as being represented by a number represented in the base of the alphabet
size, with a maximum value of (alphabet size)^(string length).
For exmaple, if the alphabet size is 16, and the desired string length 6, a good
pseudorandom test might involving starting with the string represented by
0x000000, then testing every "number" that is a multiple of a number coprime
with the alphabet size, such as 11 or 13.
Limits

A reasonable number of inputs to test for any stress-test is (in the author's
opinion) is 10-20 million. This results in the following (soft) limits on alphabet
size, assuming there are no limitations on grammar (i.e., all combinations are
to be tested):


String Length
10 million tests
20 million tests


1
10000000
20000000


2
3162
4472


3
215
271


4
56
67


5
25
29


6
15
16


7
10
11


8
7
8


Proposed Test Types

Lexer Tests

Alphabet: character literals in the range -128 to 127, excluding 0.

Full component test of strings up to 3 characters in length
Sub-component test of literal recognition (symbols, floats, radix floats, etc.)
of strings 4+ characters in length. (By my reckoning, the longest relevant string
worth testing is 7 characters: -1.1e-5)

Parser Tests

Alphabet: any symbol recognized by Bison, plus some duplicates to test more
complex behavior. Specific alphabet TBD. This includes many of the 41 tokens
explicitly defined at the top of lang11d, plus []{}();,, maybe more.

Full component tests of strings up to length 4
Sub-component tests??
Pseudorandom testing of strings up to length 10 (?)

Compiler tests

The compiler has no specific alphabet. Rather, these tests will reuse input spaces
from the lexer and parser tests and surround them with { }.def.code to get the
byte code array.
Ideas for Specific Test Implementations

Storage

In an output file, store the result of the regression test for comparison with
pre-validated output.
Encoding

The storage format should include some sort of compression,
since many strings in the lexer and parser are likely to be garbage and fail,
producing nil. One idea is to just count the number of consecutive strings that
produce the same result and encode that as a base-10 number at the end of the
current output line. An output-comparing utility method would have to be written
(and tested!) to properly decode this.
The current idea is to give the input string as a hex sequence (to make sure that
non-printing characters are correctly shown) followed by the class of the result
of interpretation or compilation. In the case of the lexer, this could be
both the class of the object returned by .interpret and the result of .asString
called on that object.
Strings should be encoded as strings of hex character values to ensure they do not
interfere with delimiters, and so that non-printing characters are made visible.
Lexer testing notepad

For the main test, .interpreting the string in three ways:

first, the string alone;
second, the string prefixed by a known and easily identified expression;
third, the string suffixed by a known and easily identified expression.

Other tests do not need to be that rigorous.
Purpose: since we cannot directly test the output of the lexer without significant
effort, we can use this to gain some insight into the lexer's behavior. Compare
the results of these tests when run on a few test strings:


Test String
str
"\\before;" ++ str
str ++ ";\\after"


"//"
nil
before
nil


"10"
10
10
after


"abc"
nil
nil
nil


"\\\'"
(empty symbol)
(empty symbol)
(empty symbol)


Thoughts: could also suffix with "\after" to catch inoperative strings like "  ",
or surround with { } (but this gives less info.
How many cases should we limit ourselves to?
Component tests: make little grammars to test radix notation, accidental notation,
hex notation, exponential notation, and unholy mixtures of those. Testing should
focus on that because the interactions among these notations is complicated.
Pseudorandom tests: maybe not needed for the lexer, but could lead to something
interesting.
Error handling: some inputs will cause an error when interpreted, not lexed. This
should be handled by surrounding the .interpret call with a try-catch. If the
catch block is entered, set the result to some unique identifier like \error
and use that as the recorded output.
Parser testing notepad

Tokens from Bison:

%token	NAME INTEGER SC_FLOAT ACCIDENTAL SYMBOL STRING ASCII PRIMITIVENAME CLASSNAME CURRYARG
%token  VAR ARG CLASSVAR SC_CONST
%token	NILOBJ TRUEOBJ FALSEOBJ
%token	PSEUDOVAR
%token  ELLIPSIS DOTDOT PIE BEGINCLOSEDFUNC
%token  BADTOKEN INTERPRET
%token  BEGINGENERATOR LEFTARROW WHILE
%left	':'
%right  '='
%left	BINOP KEYBINOP '-' '<' '>' '*' '+' '|' READWRITEVAR
%left	'.'
%right  '(backtick)'
%right  UMINUS

Should have at least 2 variable names and 2 real class names to test multiple
declarations. And should also include punctuation characters not shown here.
Can probably test strings up to length 4, might want to do the same tests
both raw and within {}?
Methods to use

Some important functions here are .interpret, .compile, and .def.code.
What to use when is still kind of up in the air.
Test String	`str`	`"\\before;" ++ str`	`str ++ ";\\after"`
`"//"`	`nil`	`before`	`nil`
`"10"`	`10`	`10`	`after`
`"abc"`	`nil`	`nil`	`nil`
`"\\\'"`	(empty symbol)	(empty symbol)	(empty symbol)