According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.
* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors. For
instance
%type <exVal> cond "condition"
does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases). It is rather equivalent to
%nterm <exVal> cond
%token <exVal> "condition"
i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.
Introduce -Wdangling-alias to catch this.
* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
On
%token TOKEN1
%type <ival> TOKEN1 TOKEN2 't'
%token TOKEN2
%%
expr:
bison -Wyacc gives
input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~
input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
The messages appear to be out of order, but they are emitted when the
error is found.
* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
* src/gram.c (grammar_dump): Print terminals likewise non terminals.
* tests/sets.at (Reduced Grammar): Update test case to catch up the
change and add a test case where prec and assoc are used.
We have too many global variables, adding structure would help. For a
start, let's hide some of the variables closer to their usage.
* src/getargs.c, src/files.h (current_file): Move to...
* src/scan-gram.c: here.
* src/scan-gram.h (gram_in, gram__flex_debug): Remove, make them
private to the scanner.
* src/reader.h, src/reader.c (reader): Take a grammar file as argument.
Move the handling of scanner variables to...
* src/scan-gram.l (gram_scanner_open, gram_scanner_close): here.
(gram_scanner_initialize): Remove, replaced by gram_scanner_open.
* src/main.c: Adjust.
* src/parse-gram (%initial-action): here.
(handle_skeleton): Don't depend on the current file name to look for
"local" skeletons (subject to changes coming from "#lines"): depend
only on the initial file name, the one given on the command line.
Currently there are two globals denoting the input file: grammar_file
is the one from the command line, and current_file which might change
because of #line. Use only the former.
* src/complain.c (error_message): here.
* tests/diagnostics.at: Adjust.
See https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00061.html
to https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00073.html.
Paul Eggert's changes in gnulib do fix the issue for modern GCCs (7,
8, 9) on macOS. Unfortunately these warnings are back on the
CI (GNU/Linux) with GCC 4.6, 4.7, (not 4.8) and 4.9.
Disable the warning locally.
* configure.ac (warn_common, warn_tests): Remove -Wtype-limits.
* src/system.h (IGNORE_TYPE_LIMITS_BEGIN, IGNORE_TYPE_LIMITS_END): New.
* src/InadequacyList.c, src/parse-gram.c, src/parse-gram.y,
* src/symtab.c: Use it.
* src/symtab.c: Include intprops.h
(symbol_user_token_number_set): Don’t allow user_token_number ==
INT_MAX because too much other code adds 1 to the user token number.
(symbols_token_translations_init): Complain on integer overflow
instead of indulging in undefined behavior.
* src/scan-gram.l: Include errno.h, for errno.
(scan_integer, handle_syncline): Check for integer overflow.
* tests/input.at (too-large.y): Adjust to match new diagnostics.
* src/parse-gram.y: Include intprops.h.
(handle_require): Don’t indulge in undefined behavior if the major
or minor number is out of range. Instead, check that the
resulting value is nonnegative, fits in int, and that the minor
number is less than 100. Also, check that a number was parsed.
This changes the Yacc skeleton to use “least” integer types to
keep tables smaller on some platforms, which should lessen cache
pressure. Since Bison uses the Yacc skeleton, it follows suit.
* data/skeletons/yacc.c: Include limits.h and stdint.h if this
seems to be needed.
(yytype_uint8, yytype_int8, yytype_uint16, yytype_int16):
If available, use GCC predefined macros __INT_MAX__ etc. to select
a “least” type, as this avoids namespace hassles. Otherwise, if
available fall back on selecting a “least” type via the C99 macros
INT_MAX, INT_LEAST8_MAX, etc. Otherwise, fall further back on one of
the builtin C99 types signed char, short, and int. Make sure that
any selected type promotes to int. Ignore any macros YYTYPE_INT16,
YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8 defined by the user.
(ptrdiff_t, PTRDIFF_MAX): Simplify in the light of the above.
(yytype_uint8, yytype_uint16): Do not assume that unsigned char
and unsigned short promote to int, as this isn’t true on some
platforms (e.g., TI TMS320C55x).
* src/parse-gram.y (YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16)
(YYTYPE_UINT8): Remove, as these are no longer effective.
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird. In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file. For instance, checking
symbols is done on the list of symbols sorted by tag:
foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
Let's sort them by location instead:
foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
From
input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
1 | %printer {} foo baz
| ^~~
to
input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
1 | %printer {} foo baz
| ^~~
| bar
* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.