Skip to content

Instantly share code, notes, and snippets.

@dajobe
Created July 30, 2013 17:39
Show Gist options
  • Save dajobe/6115089 to your computer and use it in GitHub Desktop.
Save dajobe/6115089 to your computer and use it in GitHub Desktop.
NEWS
* Noteworthy changes in release 3.0 (2013-07-25) [stable]
** WARNING: Future backward-incompatibilities!
Like other GNU packages, Bison will start using some of the C99 features
for its own code, especially the definition of variables after statements.
The generated C parsers still aim at C90.
** Backward incompatible changes
*** Obsolete features
Support for YYFAIL is removed (deprecated in Bison 2.4.2): use YYERROR.
Support for yystype and yyltype is removed (deprecated in Bison 1.875):
use YYSTYPE and YYLTYPE.
Support for YYLEX_PARAM and YYPARSE_PARAM is removed (deprecated in Bison
1.875): use %lex-param, %parse-param, or %param.
Missing semicolons at the end of actions are no longer added (as announced
in the release 2.5).
*** Use of YACC='bison -y'
TL;DR: With Autoconf <= 2.69, pass -Wno-yacc to (AM_)YFLAGS if you use
Bison extensions.
Traditional Yacc generates 'y.tab.c' whatever the name of the input file.
Therefore Makefiles written for Yacc expect 'y.tab.c' (and possibly
'y.tab.h' and 'y.outout') to be generated from 'foo.y'.
To this end, for ages, AC_PROG_YACC, Autoconf's macro to look for an
implementation of Yacc, was using Bison as 'bison -y'. While it does
ensure compatible output file names, it also enables warnings for
incompatibilities with POSIX Yacc. In other words, 'bison -y' triggers
warnings for Bison extensions.
Autoconf 2.70+ fixes this incompatibility by using YACC='bison -o y.tab.c'
(which also generates 'y.tab.h' and 'y.output' when needed).
Alternatively, disable Yacc warnings by passing '-Wno-yacc' to your Yacc
flags (YFLAGS, or AM_YFLAGS with Automake).
** Bug fixes
*** The epilogue is no longer affected by internal #defines (glr.c)
The glr.c skeleton uses defines such as #define yylval (yystackp->yyval) in
generated code. These weren't properly undefined before the inclusion of
the user epilogue, so functions such as the following were butchered by the
preprocessor expansion:
int yylex (YYSTYPE *yylval);
This is fixed: yylval, yynerrs, yychar, and yylloc are now valid
identifiers for user-provided variables.
*** stdio.h is no longer needed when locations are enabled (yacc.c)
Changes in Bison 2.7 introduced a dependency on FILE and fprintf when
locations are enabled. This is fixed.
*** Warnings about useless %pure-parser/%define api.pure are restored
** Diagnostics reported by Bison
Most of these features were contributed by Théophile Ranquet and Victor
Santet.
*** Carets
Version 2.7 introduced caret errors, for a prettier output. These are now
activated by default. The old format can still be used by invoking Bison
with -fno-caret (or -fnone).
Some error messages that reproduced excerpts of the grammar are now using
the caret information only. For instance on:
%%
exp: 'a' | 'a';
Bison 2.7 reports:
in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
in.y:2.12-14: warning: rule useless in parser due to conflicts: exp: 'a' [-Wother]
Now bison reports:
in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
in.y:2.12-14: warning: rule useless in parser due to conflicts [-Wother]
exp: 'a' | 'a';
^^^
and "bison -fno-caret" reports:
in.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
in.y:2.12-14: warning: rule useless in parser due to conflicts [-Wother]
*** Enhancements of the -Werror option
The -Werror=CATEGORY option is now recognized, and will treat specified
warnings as errors. The warnings need not have been explicitly activated
using the -W option, this is similar to what GCC 4.7 does.
For example, given the following command line, Bison will treat both
warnings related to POSIX Yacc incompatibilities and S/R conflicts as
errors (and only those):
$ bison -Werror=yacc,error=conflicts-sr input.y
If no categories are specified, -Werror will make all active warnings into
errors. For example, the following line does the same the previous example:
$ bison -Werror -Wnone -Wyacc -Wconflicts-sr input.y
(By default -Wconflicts-sr,conflicts-rr,deprecated,other is enabled.)
Note that the categories in this -Werror option may not be prefixed with
"no-". However, -Wno-error[=CATEGORY] is valid.
Note that -y enables -Werror=yacc. Therefore it is now possible to require
Yacc-like behavior (e.g., always generate y.tab.c), but to report
incompatibilities as warnings: "-y -Wno-error=yacc".
*** The display of warnings is now richer
The option that controls a given warning is now displayed:
foo.y:4.6: warning: type clash on default action: <foo> != <bar> [-Wother]
In the case of warnings treated as errors, the prefix is changed from
"warning: " to "error: ", and the suffix is displayed, in a manner similar
to GCC, as [-Werror=CATEGORY].
For instance, where the previous version of Bison would report (and exit
with failure):
bison: warnings being treated as errors
input.y:1.1: warning: stray ',' treated as white space
it now reports:
input.y:1.1: error: stray ',' treated as white space [-Werror=other]
*** Deprecated constructs
The new 'deprecated' warning category flags obsolete constructs whose
support will be discontinued. It is enabled by default. These warnings
used to be reported as 'other' warnings.
*** Useless semantic types
Bison now warns about useless (uninhabited) semantic types. Since
semantic types are not declared to Bison (they are defined in the opaque
%union structure), it is %printer/%destructor directives about useless
types that trigger the warning:
%token <type1> term
%type <type2> nterm
%printer {} <type1> <type3>
%destructor {} <type2> <type4>
%%
nterm: term { $$ = $1; };
3.28-34: warning: type <type3> is used, but is not associated to any symbol
4.28-34: warning: type <type4> is used, but is not associated to any symbol
*** Undefined but unused symbols
Bison used to raise an error for undefined symbols that are not used in
the grammar. This is now only a warning.
%printer {} symbol1
%destructor {} symbol2
%type <type> symbol3
%%
exp: "a";
*** Useless destructors or printers
Bison now warns about useless destructors or printers. In the following
example, the printer for <type1>, and the destructor for <type2> are
useless: all symbols of <type1> (token1) already have a printer, and all
symbols of type <type2> (token2) already have a destructor.
%token <type1> token1
<type2> token2
<type3> token3
<type4> token4
%printer {} token1 <type1> <type3>
%destructor {} token2 <type2> <type4>
*** Conflicts
The warnings and error messages about shift/reduce and reduce/reduce
conflicts have been normalized. For instance on the following foo.y file:
%glr-parser
%%
exp: exp '+' exp | '0' | '0';
compare the previous version of bison:
$ bison foo.y
foo.y: conflicts: 1 shift/reduce, 2 reduce/reduce
$ bison -Werror foo.y
bison: warnings being treated as errors
foo.y: conflicts: 1 shift/reduce, 2 reduce/reduce
with the new behavior:
$ bison foo.y
foo.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
foo.y: warning: 2 reduce/reduce conflicts [-Wconflicts-rr]
$ bison -Werror foo.y
foo.y: error: 1 shift/reduce conflict [-Werror=conflicts-sr]
foo.y: error: 2 reduce/reduce conflicts [-Werror=conflicts-rr]
When %expect or %expect-rr is used, such as with bar.y:
%expect 0
%glr-parser
%%
exp: exp '+' exp | '0' | '0';
Former behavior:
$ bison bar.y
bar.y: conflicts: 1 shift/reduce, 2 reduce/reduce
bar.y: expected 0 shift/reduce conflicts
bar.y: expected 0 reduce/reduce conflicts
New one:
$ bison bar.y
bar.y: error: shift/reduce conflicts: 1 found, 0 expected
bar.y: error: reduce/reduce conflicts: 2 found, 0 expected
** Incompatibilities with POSIX Yacc
The 'yacc' category is no longer part of '-Wall', enable it explicitly
with '-Wyacc'.
** Additional yylex/yyparse arguments
The new directive %param declares additional arguments to both yylex and
yyparse. The %lex-param, %parse-param, and %param directives support one
or more arguments. Instead of
%lex-param {arg1_type *arg1}
%lex-param {arg2_type *arg2}
%parse-param {arg1_type *arg1}
%parse-param {arg2_type *arg2}
one may now declare
%param {arg1_type *arg1} {arg2_type *arg2}
** Types of values for %define variables
Bison used to make no difference between '%define foo bar' and '%define
foo "bar"'. The former is now called a 'keyword value', and the latter a
'string value'. A third kind was added: 'code values', such as '%define
foo {bar}'.
Keyword variables are used for fixed value sets, e.g.,
%define lr.type lalr
Code variables are used for value in the target language, e.g.,
%define api.value.type {struct semantic_type}
String variables are used remaining cases, e.g. file names.
** Variable api.token.prefix
The variable api.token.prefix changes the way tokens are identified in
the generated files. This is especially useful to avoid collisions
with identifiers in the target language. For instance
%token FILE for ERROR
%define api.token.prefix {TOK_}
%%
start: FILE for ERROR;
will generate the definition of the symbols TOK_FILE, TOK_for, and
TOK_ERROR in the generated sources. In particular, the scanner must
use these prefixed token names, although the grammar itself still
uses the short names (as in the sample rule given above).
** Variable api.value.type
This new %define variable supersedes the #define macro YYSTYPE. The use
of YYSTYPE is discouraged. In particular, #defining YYSTYPE *and* either
using %union or %defining api.value.type results in undefined behavior.
Either define api.value.type, or use "%union":
%union
{
int ival;
char *sval;
}
%token <ival> INT "integer"
%token <sval> STRING "string"
%printer { fprintf (yyo, "%d", $$); } <ival>
%destructor { free ($$); } <sval>
/* In yylex(). */
yylval.ival = 42; return INT;
yylval.sval = "42"; return STRING;
The %define variable api.value.type supports both keyword and code values.
The keyword value 'union' means that the user provides genuine types, not
union member names such as "ival" and "sval" above (WARNING: will fail if
-y/--yacc/%yacc is enabled).
%define api.value.type union
%token <int> INT "integer"
%token <char *> STRING "string"
%printer { fprintf (yyo, "%d", $$); } <int>
%destructor { free ($$); } <char *>
/* In yylex(). */
yylval.INT = 42; return INT;
yylval.STRING = "42"; return STRING;
The keyword value variant is somewhat equivalent, but for C++ special
provision is made to allow classes to be used (more about this below).
%define api.value.type variant
%token <int> INT "integer"
%token <std::string> STRING "string"
Code values (in braces) denote user defined types. This is where YYSTYPE
used to be used.
%code requires
{
struct my_value
{
enum
{
is_int, is_string
} kind;
union
{
int ival;
char *sval;
} u;
};
}
%define api.value.type {struct my_value}
%token <u.ival> INT "integer"
%token <u.sval> STRING "string"
%printer { fprintf (yyo, "%d", $$); } <u.ival>
%destructor { free ($$); } <u.sval>
/* In yylex(). */
yylval.u.ival = 42; return INT;
yylval.u.sval = "42"; return STRING;
** Variable parse.error
This variable controls the verbosity of error messages. The use of the
%error-verbose directive is deprecated in favor of "%define parse.error
verbose".
** Renamed %define variables
The following variables have been renamed for consistency. Backward
compatibility is ensured, but upgrading is recommended.
lr.default-reductions -> lr.default-reduction
lr.keep-unreachable-states -> lr.keep-unreachable-state
namespace -> api.namespace
stype -> api.value.type
** Semantic predicates
Contributed by Paul Hilfinger.
The new, experimental, semantic-predicate feature allows actions of the
form "%?{ BOOLEAN-EXPRESSION }", which cause syntax errors (as for
YYERROR) if the expression evaluates to 0, and are evaluated immediately
in GLR parsers, rather than being deferred. The result is that they allow
the programmer to prune possible parses based on the values of run-time
expressions.
** The directive %expect-rr is now an error in non GLR mode
It used to be an error only if used in non GLR mode, _and_ if there are
reduce/reduce conflicts.
** Tokens are numbered in their order of appearance
Contributed by Valentin Tolmer.
With '%token A B', A had a number less than the one of B. However,
precedence declarations used to generate a reversed order. This is now
fixed, and introducing tokens with any of %token, %left, %right,
%precedence, or %nonassoc yields the same result.
When mixing declarations of tokens with a litteral character (e.g., 'a')
or with an identifier (e.g., B) in a precedence declaration, Bison
numbered the litteral characters first. For example
%right A B 'c' 'd'
would lead to the tokens declared in this order: 'c' 'd' A B. Again, the
input order is now preserved.
These changes were made so that one can remove useless precedence and
associativity declarations (i.e., map %nonassoc, %left or %right to
%precedence, or to %token) and get exactly the same output.
** Useless precedence and associativity
Contributed by Valentin Tolmer.
When developing and maintaining a grammar, useless associativity and
precedence directives are common. They can be a nuisance: new ambiguities
arising are sometimes masked because their conflicts are resolved due to
the extra precedence or associativity information. Furthermore, it can
hinder the comprehension of a new grammar: one will wonder about the role
of a precedence, where in fact it is useless. The following changes aim
at detecting and reporting these extra directives.
*** Precedence warning category
A new category of warning, -Wprecedence, was introduced. It flags the
useless precedence and associativity directives.
*** Useless associativity
Bison now warns about symbols with a declared associativity that is never
used to resolve conflicts. In that case, using %precedence is sufficient;
the parsing tables will remain unchanged. Solving these warnings may raise
useless precedence warnings, as the symbols no longer have associativity.
For example:
%left '+'
%left '*'
%%
exp:
"number"
| exp '+' "number"
| exp '*' exp
;
will produce a
warning: useless associativity for '+', use %precedence [-Wprecedence]
%left '+'
^^^
*** Useless precedence
Bison now warns about symbols with a declared precedence and no declared
associativity (i.e., declared with %precedence), and whose precedence is
never used. In that case, the symbol can be safely declared with %token
instead, without modifying the parsing tables. For example:
%precedence '='
%%
exp: "var" '=' "number";
will produce a
warning: useless precedence for '=' [-Wprecedence]
%precedence '='
^^^
*** Useless precedence and associativity
In case of both useless precedence and associativity, the issue is flagged
as follows:
%nonassoc '='
%%
exp: "var" '=' "number";
The warning is:
warning: useless precedence and associativity for '=' [-Wprecedence]
%nonassoc '='
^^^
** Empty rules
With help from Joel E. Denny and Gabriel Rassoul.
Empty rules (i.e., with an empty right-hand side) can now be explicitly
marked by the new %empty directive. Using %empty on a non-empty rule is
an error. The new -Wempty-rule warning reports empty rules without
%empty. On the following grammar:
%%
s: a b c;
a: ;
b: %empty;
c: 'a' %empty;
bison reports:
3.4-5: warning: empty rule without %empty [-Wempty-rule]
a: {}
^^
5.8-13: error: %empty on non-empty rule
c: 'a' %empty {};
^^^^^^
** Java skeleton improvements
The constants for token names were moved to the Lexer interface. Also, it
is possible to add code to the parser's constructors using "%code init"
and "%define init_throws".
Contributed by Paolo Bonzini.
The Java skeleton now supports push parsing.
Contributed by Dennis Heimbigner.
** C++ skeletons improvements
*** The parser header is no longer mandatory (lalr1.cc, glr.cc)
Using %defines is now optional. Without it, the needed support classes
are defined in the generated parser, instead of additional files (such as
location.hh, position.hh and stack.hh).
*** Locations are no longer mandatory (lalr1.cc, glr.cc)
Both lalr1.cc and glr.cc no longer require %location.
*** syntax_error exception (lalr1.cc)
The C++ parser features a syntax_error exception, which can be
thrown from the scanner or from user rules to raise syntax errors.
This facilitates reporting errors caught in sub-functions (e.g.,
rejecting too large integral literals from a conversion function
used by the scanner, or rejecting invalid combinations from a
factory invoked by the user actions).
*** %define api.value.type variant
This is based on a submission from Michiel De Wilde. With help
from Théophile Ranquet.
In this mode, complex C++ objects can be used as semantic values. For
instance:
%token <::std::string> TEXT;
%token <int> NUMBER;
%token SEMICOLON ";"
%type <::std::string> item;
%type <::std::list<std::string>> list;
%%
result:
list { std::cout << $1 << std::endl; }
;
list:
%empty { /* Generates an empty string list. */ }
| list item ";" { std::swap ($$, $1); $$.push_back ($2); }
;
item:
TEXT { std::swap ($$, $1); }
| NUMBER { $$ = string_cast ($1); }
;
*** %define api.token.constructor
When variants are enabled, Bison can generate functions to build the
tokens. This guarantees that the token type (e.g., NUMBER) is consistent
with the semantic value (e.g., int):
parser::symbol_type yylex ()
{
parser::location_type loc = ...;
...
return parser::make_TEXT ("Hello, world!", loc);
...
return parser::make_NUMBER (42, loc);
...
return parser::make_SEMICOLON (loc);
...
}
*** C++ locations
There are operator- and operator-= for 'location'. Negative line/column
increments can no longer underflow the resulting value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment