According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.
* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
Just as the yacc.c skeleton, the lalr1.cc skeleton should reject
invalid values for parse.lac.
* data/skeletons/lalr1.cc: check validity of parse.lac
* tests/input.at: new test cases
Be robust to newer versions of Autoconf where the package URL defaults
to https instead of http.
* configure.ac (AC_INIT): Use https.
* tests/report.at: Adjust expected output s/http/https/
to match updated URL.
* data/skeletons/glr.c (YYASSERT): Rename as...
(YY_ASSERT): this, for consistency with yacc.c, and also to emphasize
the fact that this is not for the end user (YY_ prefix).
* tests/glr-regression.at: Define parse.assert.
The current code for yysyntax_error for %define parse.error verbose is
fishy (given that YYEMPTY is -2, invalid argument for yytname[]):
static int
yysyntax_error ([...])
{
YYPTRDIFF_T yysize0 = yytnamerr (YY_NULLPTR, yytname[yytoken]);
[...]
if (yytoken != YYEMPTY)
A nearby comment reports
The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
So it _is_ possible to call yysyntax_error with yytoken == YYEMPTY,
albeit quite difficult when meaning to, so virtually impossible by
accident (after all, there was never a bug report about this).
I failed to produce a test case, but Joel E. Denny provided me with
one (added to the test suite below). The yacc.c skeleton fails on
this, and once fixed dies on a second problem. The glr.c skeleton was
also dying, but immediately of this second problem.
Indeed we were not allocating space for the error message's final \0.
This was hidden by the fact that we only had error messages with at
least an unexpected token displayed, so with at least one "%s" in the
format string, whose size (2) was included (incorrectly) in the final
size of the message (where the %s have been replaced by the actual
content).
* data/skeletons/glr.c, data/skeletons/yacc.c (yysyntax_error):
Do not invoke yytnamerr on YYEMPTY.
Clarify the computation of the length of the _final_ error message,
with the NUL terminator but without the '%s's.
* tests/conflicts.at (Syntax error in consistent error state):
New, contributed by Joel E. Denny.
Having a file named "exception" is risky: the compiler might use that
file in #include.
Reported by 马俊 <majun123@whu.edu.cn>.
* tests/local.at (AT_SKIP_IF_EXCEPTION_SUPPORT_IS_POOR): Generate
'exceptions', not 'exception'.
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors. For
instance
%type <exVal> cond "condition"
does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases). It is rather equivalent to
%nterm <exVal> cond
%token <exVal> "condition"
i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.
Introduce -Wdangling-alias to catch this.
* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
On
%token TOKEN1
%type <ival> TOKEN1 TOKEN2 't'
%token TOKEN2
%%
expr:
bison -Wyacc gives
input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~
input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
The messages appear to be out of order, but they are emitted when the
error is found.
* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
* src/gram.c (grammar_dump): Print terminals likewise non terminals.
* tests/sets.at (Reduced Grammar): Update test case to catch up the
change and add a test case where prec and assoc are used.
* tests/diagnostics.at (Locations from M4, Tabulations and multibyte
characters from M4): These tests are actually checking a message
coming from C, not from M4. Replace with...
(Complaints from M4): This.
We still have a few old C casts in lalr1.cc, let's get rid of them.
Reported by Frank Heckenbach.
Actually, let's monitor all our casts using easy to grep macros.
Let's use these macros to use the C++ standard casts when we are in
C++.
* data/skeletons/c.m4 (b4_cast_define): New.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/stack.hh,
* data/skeletons/yacc.c:
Use it and/or its casts.
* tests/actions.at, tests/cxx-type.at,
* tests/glr-regression.at, tests/headers.at, tests/torture.at,
* tests/types.at:
Use YY_CAST instead of C casts.
* configure.ac (warn_cxx): Add -Wold-style-cast.
* doc/bison.texi: Disable it.
Currently there are two globals denoting the input file: grammar_file
is the one from the command line, and current_file which might change
because of #line. Use only the former.
* src/complain.c (error_message): here.
* tests/diagnostics.at: Adjust.
This is really weird: GCC points to the LHS of the assignment...
260. headers.at:184: testing Sane headers: api.pure api.push-pull=both ...
tests/headers.at:184: COLUMNS=1000; export COLUMNS; bison --color=no -fno-caret -d -o input.c input.y
tests/headers.at:184: $CC $CFLAGS $CPPFLAGS -c -o input.o input.c
stderr:
input.c: In function 'yyparse':
input.c:1276:16: error: 'yylval' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1276 | yylval = *yypushed_val;
| ~~~~~~~^~~~~~~~~~~~~~~
input.c: In function 'yypull_parse':
input.c:1276:16: error: 'yylval' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1276 | yylval = *yypushed_val;
| ~~~~~~~^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
stdout:
tests/headers.at:184: exit code was 1, expected 0
See also d87c8ac79a
and 9645a2b20e.
* tests/headers.at (Several parsers, Several parsers): Disable these
warnings when in push parser.
As documented in the Autoconf manual, Solaris 10 sed rejects
script labels contianing more than 7 characters. POSIX requires
support for at least 8 characters, but we might as well be portable
to Solaris 10 which is still supported.
* tests/local.at (AT_SETS_CHECK): Use only the first 7 characters
in sed labels.
* src/scan-gram.l: Include errno.h, for errno.
(scan_integer, handle_syncline): Check for integer overflow.
* tests/input.at (too-large.y): Adjust to match new diagnostics.
Sun C 5.12 defines __SUNPRO_C to 0x5120 but diagnoses
‘__attribute__ ((__unused__))’. Change the ifdefs to use
the same method as Gnulib in this area.
* data/skeletons/c.m4 (YY_ATTRIBUTE): Remove, since
not all attributes were added in the same compiler version.
(YY_ATTRIBUTE_PURE, YY_ATTRIBUTE_UNUSED):
Use specific GCC version for each attribute.
Pay no attention to __SUNPRO_C.
* tests/headers.at (Several parsers): Tighten tests accordingly.
Let's make a difference between places where Perl is required for the
test (AT_PERL_REQUIRE), and the places where it's used to run the
test, but it's not not to run the test (AT_PERL_CHECK).
* tests/local.at (AT_REQUIRE): New.
(AT_PERL_CHECK, AT_PERL_REQUIRE): New.
Use them where appropriate.
* tests/local.mk ($(TESTSUITE)): Beware not to start the line with
'-pi' if Perl is empty, as Make understands this as "it's ok to fail".
Which it is not.
My previous tests (with ./configure PERL=false) have been fooled by
configure, that managed to find perl anyway. This time, I ran this on
a Fedora in Docker, without Perl.
* tests/calc.at, tests/diagnostics.at, tests/headers.at,
* tests/input.at, tests/local.at, tests/named-refs.at,
* tests/output.at, tests/regression.at, tests/skeletons.at,
* tests/synclines.at, tests/torture.at: Don't require Perl.
Currently we face test suite failures in different environments,
because of a conflict between the definitions of isnan by gnulib, and
by the C++ library:
262. headers.at:186: testing Sane headers: %locations %debug c++ ...
./headers.at:186: COLUMNS=1000; export COLUMNS; bison --color=no -fno-caret -d -o input.cc input.y
./headers.at:186: $CXX $CXXFLAGS $CPPFLAGS -c -o input.o input.cc
stderr:
In file included from /usr/include/c++/4.8.2/cmath:44:0,
from /usr/include/c++/4.8.2/random:38,
from /usr/include/c++/4.8.2/bits/stl_algo.h:65,
from /usr/include/c++/4.8.2/algorithm:62,
from location.hh:41,
from input.hh:90,
from input.cc:50:
/u/cs/fac/eggert/src/gnu/bison/lib/math.h: In function 'bool isnan(double)':
/u/cs/fac/eggert/src/gnu/bison/lib/math.h:2849:1: error: new declaration 'bool isnan(double)'
_GL_MATH_CXX_REAL_FLOATING_DECL_2 (isnan, isnan, bool)
^
In file included from /usr/include/features.h:375:0,
from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/os_defines.h:39,
from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/c++config.h:2097,
from /usr/include/c++/4.8.2/cstdlib:41,
from input.hh:48,
from input.cc:50:
/usr/include/bits/mathcalls.h:235:1: error: ambiguates old declaration 'int isnan(double)'
__MATHDECL_1 (int,isnan,, (_Mdouble_ __value)) __attribute__ ((__const__));
^
There might be something to do in gnulib about this, but I believe
that gnulib should not be used in the test suite in the first place.
The test suite should work with other compilers than the one used to
compile the package. For a start, Bison sources are more
demanding (C99) than the generated parsers. Last time I tried, tcc
for example, was not able to compile Bison, yet our generated parsers
should compile cleanly with it.
Besides the problem at hand is with the C++ compiler, with is not the
one used to set up gnulib at configuration-time (config.h is mainly
built from probing the C compiler).
We should really not depend on gnulib in tests.
This was introduced in 2001 to check whether including
stdlib.h/string.h is safe thanks to STDC_HEADERS
(2ce1014469). Today, we assume at least
a C90 compiler, it should be safe enough.
* tests/local.at, tests/testsuite.h: Do not include config.h.
* tests/atlocal.in (conftest.cc): Likewise.
(CPPFLAGS): Do not expose lib/, as because of this we might picked up
gnulib replacement headers for system headers.
* tests/input.at: Use int instead of ptrdiff_t, for easier portability
(some machine on the CI did not find ptrdiff_t).
* tests/c++.at: Add missing include for getchar.
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird. In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file. For instance, checking
symbols is done on the list of symbols sorted by tag:
foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
Let's sort them by location instead:
foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
From
input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
1 | %printer {} foo baz
| ^~~
to
input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
1 | %printer {} foo baz
| ^~~
| bar
* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.
This commit adds the suggestion in green, on the line below the
caret-and-tildes.
foo.y:1.1-14: warning: deprecated directive: '%error-verbose', use '%define parse.error verbose' [-Wdeprecated]
1 | %error-verbose
| ^~~~~~~~~~~~~~
| %define parse.error verbose
The current approach, with location_caret_suggestion, is fragile:
there's a protocol of calls to the complain functions which is strict.
We should rather have a richer structure describing the diagnostics,
including with submessages such as the suggestions, passed in the end
to the routines in charge of formatting and printing them.
* src/location.h, src/location.c (location_caret_suggestion): New.
* src/complain.c (deprecated_directive): Use it.
* tests/diagnostics.at, tests/input.at: Adjust expectations.
GCC 4.8 reports:
input.y:57:33: error: conversion to 'int' from 'long unsigned int'
may alter its value [-Werror=conversion]
int input_elts = sizeof input / sizeof input[0];
^
* tests/local.at (AT_YYLEX_DEFINE(c)): Add a cast (sorry, Paul!).