Skip to content

Instantly share code, notes, and snippets.

@mossheim
Last active March 1, 2017 05:18
Show Gist options
  • Save mossheim/3ae1c2771f06ce653b35aad017b89d6b to your computer and use it in GitHub Desktop.
Save mossheim/3ae1c2771f06ce653b35aad017b89d6b to your computer and use it in GitHub Desktop.
Regression tests for the SuperCollider lexer, parser, and interpreter.

sclang Tests

TL;DR: Regression tests for the SuperCollider lexer, parser, and interpreter.

Brian Heim, 28 jan 2017

Overview

This document proposes the form and high-level implementation details of a series of rigorous regression tests for the SuperCollider lexer, parser, and interpreter. Although these are treated as three separate components for the purposes of testing, testing one component through the compiled language naturally involves some testing of the other two.

Definitions

Lexer: essentially, the method yylex() in PyrLexer.cpp. The component that handles character-to-character understanding of an input text, identifying tokens for the parser.

Parser: the method yyparse(), generated by the Bison file lang11d. The component that handles token-to-token understanding of an input text, identifying parse nodes for the interpreter.

Interpreter (or compiler): the component that transforms the AST generated by the parser into an interpreted sequence of byte codes. The code for this component lies in PyrParseNode.cpp.

Approach

There are three general approaches to testing:

  1. Full component stress-testing
  2. Targeted sub-component stress-testing
  3. Pseudorandom testing

Full component stress-testing involves testing the component by passing it input consisting of all possible strings of a given length. In addition to guaranteeing the performance of the component over a clearly bounded field, this stress-tests garbage handling.

Targeted sub-component stress-testing means testing a specialized behavior of the component, again with all possible strings of a given length, but limiting the alphabet and/or grammar so as to test a larger string length.

Pseudorandom testing means traversing a very large input space by passing a random (usually uniformly so) subset of the possible input strings. Imagine the string as being represented by a number represented in the base of the alphabet size, with a maximum value of (alphabet size)^(string length). For exmaple, if the alphabet size is 16, and the desired string length 6, a good pseudorandom test might involving starting with the string represented by 0x000000, then testing every "number" that is a multiple of a number coprime with the alphabet size, such as 11 or 13.

Limits

A reasonable number of inputs to test for any stress-test is (in the author's opinion) is 10-20 million. This results in the following (soft) limits on alphabet size, assuming there are no limitations on grammar (i.e., all combinations are to be tested):

String Length 10 million tests 20 million tests
1 10000000 20000000
2 3162 4472
3 215 271
4 56 67
5 25 29
6 15 16
7 10 11
8 7 8

Proposed Test Types

Lexer Tests

Alphabet: character literals in the range -128 to 127, excluding 0.

  • Full component test of strings up to 3 characters in length
  • Sub-component test of literal recognition (symbols, floats, radix floats, etc.) of strings 4+ characters in length. (By my reckoning, the longest relevant string worth testing is 7 characters: -1.1e-5)

Parser Tests

Alphabet: any symbol recognized by Bison, plus some duplicates to test more complex behavior. Specific alphabet TBD. This includes many of the 41 tokens explicitly defined at the top of lang11d, plus []{}();,, maybe more.

  • Full component tests of strings up to length 4
  • Sub-component tests??
  • Pseudorandom testing of strings up to length 10 (?)

Compiler tests

The compiler has no specific alphabet. Rather, these tests will reuse input spaces from the lexer and parser tests and surround them with { }.def.code to get the byte code array.

Ideas for Specific Test Implementations

Storage

In an output file, store the result of the regression test for comparison with pre-validated output.

Encoding

The storage format should include some sort of compression, since many strings in the lexer and parser are likely to be garbage and fail, producing nil. One idea is to just count the number of consecutive strings that produce the same result and encode that as a base-10 number at the end of the current output line. An output-comparing utility method would have to be written (and tested!) to properly decode this.

The current idea is to give the input string as a hex sequence (to make sure that non-printing characters are correctly shown) followed by the class of the result of interpretation or compilation. In the case of the lexer, this could be both the class of the object returned by .interpret and the result of .asString called on that object.

Strings should be encoded as strings of hex character values to ensure they do not interfere with delimiters, and so that non-printing characters are made visible.

Lexer testing notepad

For the main test, .interpreting the string in three ways:

  • first, the string alone;
  • second, the string prefixed by a known and easily identified expression;
  • third, the string suffixed by a known and easily identified expression.

Other tests do not need to be that rigorous.

Purpose: since we cannot directly test the output of the lexer without significant effort, we can use this to gain some insight into the lexer's behavior. Compare the results of these tests when run on a few test strings:

Test String str "\\before;" ++ str str ++ ";\\after"
"//" nil before nil
"10" 10 10 after
"abc" nil nil nil
"\\\'" (empty symbol) (empty symbol) (empty symbol)

Thoughts: could also suffix with "\after" to catch inoperative strings like " ", or surround with { } (but this gives less info. How many cases should we limit ourselves to?

Component tests: make little grammars to test radix notation, accidental notation, hex notation, exponential notation, and unholy mixtures of those. Testing should focus on that because the interactions among these notations is complicated.

Pseudorandom tests: maybe not needed for the lexer, but could lead to something interesting.

Error handling: some inputs will cause an error when interpreted, not lexed. This should be handled by surrounding the .interpret call with a try-catch. If the catch block is entered, set the result to some unique identifier like \error and use that as the recorded output.

Parser testing notepad

Tokens from Bison:


%token	NAME INTEGER SC_FLOAT ACCIDENTAL SYMBOL STRING ASCII PRIMITIVENAME CLASSNAME CURRYARG
%token  VAR ARG CLASSVAR SC_CONST
%token	NILOBJ TRUEOBJ FALSEOBJ
%token	PSEUDOVAR
%token  ELLIPSIS DOTDOT PIE BEGINCLOSEDFUNC
%token  BADTOKEN INTERPRET
%token  BEGINGENERATOR LEFTARROW WHILE
%left	':'
%right  '='
%left	BINOP KEYBINOP '-' '<' '>' '*' '+' '|' READWRITEVAR
%left	'.'
%right  '(backtick)'
%right  UMINUS

Should have at least 2 variable names and 2 real class names to test multiple declarations. And should also include punctuation characters not shown here.

Can probably test strings up to length 4, might want to do the same tests both raw and within {}?

Methods to use

Some important functions here are .interpret, .compile, and .def.code. What to use when is still kind of up in the air.

@mossheim
Copy link
Author

goal for tomorrow: assemble testAllStrings[WithX] and compareToExpected

@mossheim
Copy link
Author

mossheim commented Feb 19, 2017

the output file should also begin with the number of strings being tested

also put the prefix, suffix, and alphabet in the file. add an argument to testAllStrings for adding a unique ID to the filename so that different suffix/prefix test combos may be distinguished

(prefix, suffix, and alphabet should be encoded as hex strings for compatibility)

@mossheim
Copy link
Author

add an explanatory line to each file to describe the encoding style. use HEAD and DATA to mark the starts of these two sections.

@mossheim
Copy link
Author

mossheim commented Feb 19, 2017

full format:

HEAD
alphabet size: <number of items in alphabet>
alphabet: <csv list of hex strings>
prefix: <hex string>
suffix: <hex string>
method: <one of interpret, compile, def.code, ...>
encoding: <description of encoding, does not need to be in any special format; the most human-readable thing here>
DATA
numStrings
encoded string & result 1
encoded string & result 2
...
encoded string & result N

note that this is loosely inspired by the format of a .wav soundfile's fmt and data chunks

@mossheim
Copy link
Author

mossheim commented Feb 19, 2017

First attempts:

  • Header generation/parsing
  • A few other convenience methods
  • Stub for testAllPossibleStrings

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f

Update 1 (12:46 PM): fleshed out the main loop after much benchmarking, now getting into more tricky testing/interface challenges

@mossheim
Copy link
Author

LOTS done on this today. Practically functional output although prefixing/suffixing has not yet been benchmarked or tested.
Will also want to confirm with a fine-toothed comb that this works, and test unrolled output against the "compressed" version.

Also, the main function is bulky and could definitely use another pass.

Next up is writing the validation/verification function to point out and explain differences between output files.

A good test to verify this test might be picking random strings out of the entire combination space, replacing them with obviously different ones, and see if the difference is correctly identified.

Some way to accommodate one file being longer than the other should be sought out. I.e. in the near future we may be able to add ^ back to the alphabet for command-line parsing, in which case the diff function should be able to understand that there are additions, not that everything is totally different after the nth line. Lexicographical/numerical ordering....

@telephon
Copy link

telephon commented Feb 20, 2017

Thank you for all that work - it'll be tremendously helpful. I'm just reading the code, and have only really minor comments (nothing important).

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f#file-sclangtestsimplementation-sc-L394

can be written as alphabet.collect { |element| this.stringToHexString(element) }.join(",");

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f#file-sclangtestsimplementation-sc-L451

would be symmetric as:

headerString.keep(len);

It's a good decision to return an event, this strategy has proven to be the best for multiple return values.

In general, maybe instead of commenting sections, they could just be methods with that name.

E.g. instead of // READING ALPHABET it could be

*readAlphabet { … }

That might make it easier to test (sic). :)

@mossheim
Copy link
Author

also TBD - should maybe clear the a...z variables before/after each execution to avoid side effects. (Restore s = Server.local when done)

@mossheim
Copy link
Author

@telephon sorry just saw your comments! thanks for the ideas.

I like to avoid .join because it's a super slow method, but in this case it's not in any main performance-impacting loop so it would definitely improve readability.

I'm returning an IdentityDictionary now instead of Event, since all I really need is key-value pairs. Yes, it's quite slick!

Re: factoring out bits of the parser method, totally agree. I'll try to find a way to do it that doesn't involve a lot of argument passing. I wrote *parseHeader to be totally symmetric to *formatHeader, but since validation takes so much more effort I should move those subsections out.

Makes me glad to know someone is looking at this :)

@mossheim
Copy link
Author

tried refactoring *parseHeader but it's like reinventing the iostream wheel (especially with line reading). I'm going to refactor that function as a file parser tomorrow. reduces code and opportunities for cracks, although now the methods are asymmetrical (one formats a string, the other reads from a file). might also be practical to change *formatHeader to a file-writing method *writeHeader.

@mossheim
Copy link
Author

new code uploaded. still working on validation/diff method

@mossheim
Copy link
Author

just began a separate file for testing the tests... i really want this to be solid. also allowed me to move my preexisting validation code into a more durable format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment