Skip to content

Instantly share code, notes, and snippets.

@DmitrySoshnikov
Last active April 26, 2023 11:46
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DmitrySoshnikov/f5e2583b37e8f758c789cea9dcdf238a to your computer and use it in GitHub Desktop.
Save DmitrySoshnikov/f5e2583b37e8f758c789cea9dcdf238a to your computer and use it in GitHub Desktop.
Lexer start conditions

Start conditions of lex rules, and tokenizer states

Start conditions are declared in the definitions (first) section of the input using unindented lines beginning with either %s or %x followed by a list of names. The former declares inclusive start conditions, the latter exclusive start conditions. A start condition is activated using the BEGIN action. Until the next BEGIN action is executed, rules with the given start condition will be active and rules with other start conditions will be inactive. If the start condition is inclusive, then rules with no start conditions at all will also be active. If it is exclusive, then only rules qualified with the start condition will be active. A set of rules contingent on the same exclusive start condition describe a scanner which is independent of any of the other rules in the flex input. Because of this, exclusive start conditions make it easy to specify "mini-scanners" which scan portions of the input that are syntactically different from the rest (e.g., comments).

If the distinction between inclusive and exclusive start conditions is still a little vague, here's a simple example illustrating the connection between the two. The set of rules:

%s example
%%

<example>foo   do_something();

bar            something_else();

is equivalent to

%x example
%%

<example>foo   do_something();

<INITIAL,example>bar    something_else();

Without the <INITIAL,example> qualifier, the bar pattern in the second example wouldn't be active (i.e., couldn't match) when in start condition example. If we just used <example> to qualify bar, though, then it would only be active in example and not in INITIAL, while in the first example it's active in both, because in the first example the example starting condition is an inclusive (%s) start condition.

Also note that the special start-condition specifier <*> matches every start condition. Thus, the above example could also have been written;

%x example
%%

<example>foo   do_something();

<*>bar    something_else();

The default rule (to ECHO any unmatched character) remains active in start conditions. It is equivalent to:

<*>.|\\n     ECHO;

BEGIN(0) returns to the original state where only the rules with no start conditions are active. This state can also be referred to as the start-condition "INITIAL", so BEGIN(INITIAL) is equivalent to BEGIN(0). (The parentheses around the start condition name are not required but are considered good style.)

BEGIN actions can also be given as indented code at the beginning of the rules section. For example, the following will cause the scanner to enter the "SPECIAL" start condition whenever yylex() is called and the global variable enter_special is true:

        int enter_special;

%x SPECIAL
%%
        if ( enter_special )
            BEGIN(SPECIAL);

<SPECIAL>some_rule

Here is a scanner which recognizes (and skips) C comments while maintaining a count of the current input line.

%x comment
%%
        int line_num = 1;

"/*"         BEGIN(comment);

<comment>[^*\n]*        /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
<comment>\n             ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);

http://dinosaur.compilertools.net/flex/flex_11.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment