The current code for yysyntax_error for %define parse.error verbose is
fishy (given that YYEMPTY is -2, invalid argument for yytname[]):
static int
yysyntax_error ([...])
{
YYPTRDIFF_T yysize0 = yytnamerr (YY_NULLPTR, yytname[yytoken]);
[...]
if (yytoken != YYEMPTY)
A nearby comment reports
The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
So it _is_ possible to call yysyntax_error with yytoken == YYEMPTY,
albeit quite difficult when meaning to, so virtually impossible by
accident (after all, there was never a bug report about this).
I failed to produce a test case, but Joel E. Denny provided me with
one (added to the test suite below). The yacc.c skeleton fails on
this, and once fixed dies on a second problem. The glr.c skeleton was
also dying, but immediately of this second problem.
Indeed we were not allocating space for the error message's final \0.
This was hidden by the fact that we only had error messages with at
least an unexpected token displayed, so with at least one "%s" in the
format string, whose size (2) was included (incorrectly) in the final
size of the message (where the %s have been replaced by the actual
content).
* data/skeletons/glr.c, data/skeletons/yacc.c (yysyntax_error):
Do not invoke yytnamerr on YYEMPTY.
Clarify the computation of the length of the _final_ error message,
with the NUL terminator but without the '%s's.
* tests/conflicts.at (Syntax error in consistent error state):
New, contributed by Joel E. Denny.
Having a file named "exception" is risky: the compiler might use that
file in #include.
Reported by 马俊 <majun123@whu.edu.cn>.
* tests/local.at (AT_SKIP_IF_EXCEPTION_SUPPORT_IS_POOR): Generate
'exceptions', not 'exception'.
String literals as tokens serve two distinct purposes: freeing from
having to implement the keyword matching in the scanner, and improving
error messages. Most of the time both can be achieved at the same
time, but on occasions, it does not work so well.
We promote their use for error messages. We will also still support
the former case, but it is _not_ the recommended approach.
* doc/bison.texi (Tokens from Literals): Clearly state that we don't
recommend looking up the token types in the list of token names.
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors. For
instance
%type <exVal> cond "condition"
does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases). It is rather equivalent to
%nterm <exVal> cond
%token <exVal> "condition"
i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.
Introduce -Wdangling-alias to catch this.
* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
On
%token TOKEN1
%type <ival> TOKEN1 TOKEN2 't'
%token TOKEN2
%%
expr:
bison -Wyacc gives
input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~
input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
The messages appear to be out of order, but they are emitted when the
error is found.
* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
As an extension to POSIX Yacc, Bison's %type accepts tokens.
Unfortunately with string literals as implicit tokens, this is
misleading, and led some users to write
%type <exVal> cond "condition"
believing that "condition" would be associated to the 'cond'
nonterminal (see https://github.com/apache/httpd/pull/72).
* doc/bison.texi: Promote %nterm rather than %type to declare the type
of nonterminals.
* README: A few fixes.
Explain how to install color support.
* README-hacking: Rename as...
* README-hacking.md: this, and convert to Markdown.
Improve typography.
Improve explanations about update-test.
* src/gram.c (grammar_dump): Print terminals likewise non terminals.
* tests/sets.at (Reduced Grammar): Update test case to catch up the
change and add a test case where prec and assoc are used.
* tests/diagnostics.at (Locations from M4, Tabulations and multibyte
characters from M4): These tests are actually checking a message
coming from C, not from M4. Replace with...
(Complaints from M4): This.
We still have a few old C casts in lalr1.cc, let's get rid of them.
Reported by Frank Heckenbach.
Actually, let's monitor all our casts using easy to grep macros.
Let's use these macros to use the C++ standard casts when we are in
C++.
* data/skeletons/c.m4 (b4_cast_define): New.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/stack.hh,
* data/skeletons/yacc.c:
Use it and/or its casts.
* tests/actions.at, tests/cxx-type.at,
* tests/glr-regression.at, tests/headers.at, tests/torture.at,
* tests/types.at:
Use YY_CAST instead of C casts.
* configure.ac (warn_cxx): Add -Wold-style-cast.
* doc/bison.texi: Disable it.
* data/skeletons/location.cc: Remove the u (for unsigned) suffix from
the initial line and column.
* NEWS: AFAICT, only C++ backends have their location types changed.
We have too many global variables, adding structure would help. For a
start, let's hide some of the variables closer to their usage.
* src/getargs.c, src/files.h (current_file): Move to...
* src/scan-gram.c: here.
* src/scan-gram.h (gram_in, gram__flex_debug): Remove, make them
private to the scanner.
* src/reader.h, src/reader.c (reader): Take a grammar file as argument.
Move the handling of scanner variables to...
* src/scan-gram.l (gram_scanner_open, gram_scanner_close): here.
(gram_scanner_initialize): Remove, replaced by gram_scanner_open.
* src/main.c: Adjust.
* src/parse-gram (%initial-action): here.
(handle_skeleton): Don't depend on the current file name to look for
"local" skeletons (subject to changes coming from "#lines"): depend
only on the initial file name, the one given on the command line.
Currently there are two globals denoting the input file: grammar_file
is the one from the command line, and current_file which might change
because of #line. Use only the former.
* src/complain.c (error_message): here.
* tests/diagnostics.at: Adjust.
This skeleton uses a single stack of state structures, so it is less
likely to benefit from a stack size reduction than yacc.c (which uses
several stacks: state number, value and location). But it will reduce
the size of the LAC stack.
This skeleton was already using int for state numbers, so, contrary to
yacc.c, this brings nothing for large automata.
Overall, it is still nicer to make the skeletons alike.
* data/skeletons/lalr1.cc (state_type): Here.
Reported by Thomas Petazzoni.
https://lists.gnu.org/archive/html/bug-bison/2019-08/msg00000.html
* examples/c/reccalc/local.mk: Complete dependencies, including for
earlier versions of Automake (for sake of our CI, on top of Ubuntu
Xenial/Bionic, which feature only Automake 1.15).
(%D%/scan.c %D%/scan.h): Upgrade to the full version provided in
Automake's documentation.