With
%token EOF 0 EOF 0
we get
input.y:3.14-16: warning: symbol EOF redeclared [-Wother]
3 | %token EOF 0 EOF 0
| ^~~
input.y:3.8-10: previous declaration
3 | %token EOF 0 EOF 0
| ^~~
Assertion failed: (nsyms == ntokens + nvars), function check_and_convert_grammar,
file /Users/akim/src/gnu/bison/src/reader.c, line 839.
Reported by Marc Schönefeld.
* src/symtab.c (symbol_user_token_number_set): Register only the
first definition of the end of input token.
* tests/input.at (Symbol redeclared): Check that case.
hash_initialize returns NULL when out of memory. Check for it, and
die cleanly instead of crashing.
Reported by 江 祖铭 (Zu-Ming Jiang).
https://lists.gnu.org/archive/html/bug-bison/2019-08/msg00015.html
* src/muscle-tab.c, src/state.c, src/symtab.c, src/uniqstr.c:
Check the value returned by hash_initialize.
https://lists.gnu.org/archive/html/bison-patches/2019-08/msg00007.html
When Bison is started with a flag that suppresses warning messages, the
error_message() function can produce a few gigabytes of indentation
because of a dangling pointer.
* src/complain.c (error_message): Don't reset indent_ptr here, but...
(complain_indent): here.
* tests/diagnostics.at (Indentation with message suppression): Check
this case.
* data/diagnostics.css: Rename as...
* data/bison-default.css: this.
Add the GPL header.
This is the convention followed by Bruno Haible in gettext.
Adjust dependencies.
* src/complain.c (complain_init_color): Use BISON_STYLE instead of
BISON_DIAGNOSTICS_STYLE.
It is more consistent with --color=html, --color=test, etc.
* src/getargs.h, src/getargs.c (style_debug): Rename as...
(color_debug): this.
(getargs_colors): Rename --style=debug as --color=debug.
Adjust dependencies.
An experimental commit introduced a fix-it hint that changes comments
such as "/* empty */" into %empty. But in some case, because
diagnostics are not necessarily emitted in order, the fixits also come
in disorder, which must never happen, as the fixes are installed in
one pass.
* src/fixits.c (fixits_register): Insert them in order.
Currently we remove the rhs to install %empty instead.
* src/reader.c (grammar_rule_check_and_complete): Insert the missing
%empty in front of the rhs, not in replacement thereof.
* tests/actions.at (Add missing %empty): Check that.
Some members are called foo_location, others are foo_loc. Stick to
the latter.
* src/gram.h, src/location.h, src/location.c, src/output.c,
* src/parse-gram.y, src/reader.h, src/reader.c, src/reduce.c,
* src/scan-gram.l, src/symlist.h, src/symlist.c, src/symtab.h,
* src/symtab.c:
Use _loc consistently, not _location.
symbol_list features a 'location' and a 'sym_loc' member. The former
is expected to be set only for symbol_lists that denote a symbol (not
a type name), and the latter should only denote the location of the
symbol/type name. Yet both are set, and the name "location" is too
unprecise.
* src/symlist.h, src/symlist.c (symbol_list::location): Rename as
rhs_loc for clarity. Move it to the "section" of data valid only
for rules.
* src/reader.c, src/scan-code.l: Adjust.
* cfg.mk: Disable checks where needed (e.g., we do want to check the
behavior with tabs).
(sc_at_parser_check): Remove. Unfortunately since
a11c144609 we no longer use the './'
prefix to run programs in the current directory. That was so that we
could run Java programs like the other, although they are no run with
the `./` prefix (see 967a59d2c0).
As a consequence this sc check no longer makes sense.
However, since now AT_PARSER_CHECK passes the `./` prefix itself, this
sc-check was superfluous.
* examples/c/reccalc/scan.l: Use memcpy, not strncpy.
* src/ielr.c, src/reader.c: Obfuscate "lr(0)" so that the sc-check for
"space before paren" does not fire.
* tests/diagnostics.at: Avoid space-tab, use tab-tab.
This makes reading the trace slightly easier. It would be very nice
to highlight the "big steps", especially reductions. But this is a
private experiment: do not use it.
* data/diagnostics.css (value): New.
* src/parse-gram.y: Use no delimiters and no c quotation for strings
to facilitate debugging.
(tron, troff, TRACE): New.
Not very elegant, but until there is support for printf-formats in
libtextstyle, it shall be enough.
Currently we pass only the columns based on the screen-width, which is
important for the carets. But we don't pass the bytes-based columns,
which is important for the colors. Pass both.
* src/muscle-tab.c (muscle_boundary_grow): Also pass the byte-based column.
* src/location.c (location_caret): Clarify.
(boundary_set_from_string): Adjust to the new format.
* tests/diagnostics.at (Tabulations and multibyte characters from M4): New.
Locations issued from M4 need the byte-based column for the
diagnostics to work properly. Currently they were unassigned, which
typically resulted in partially non-colored diagnostics.
* src/location.c (boundary_set_from_string): Fix the parsed location.
* src/muscle-tab.c (muscle_percent_define_default): Set the byte values.
* tests/diagnostics.at (Locations from M4): New.
This is meant for developers, not end users, that's why I attached it
to --trace.
* src/getargs.h, src/getargs.c (trace_locations): New.
* src/location.c (location_print): Use it.
The "identifier and colon" of a rule is implemented as a single token,
but whose location is only that of the identifier (so that messages
about the lhs of a rule are accurate). When reducing empty rules, the
default location is the single point location on the end of the
previous symbol. As a consequence, when Bison parses a grammar, the
location of the right-hand side of an empty rule is based on the
lhs, *independently of the position of the colon*. And the colon can
be way farther, separated by comments, white spaces, including empty
lines.
As a result, some messages look really bad. For instance:
$ cat foo.y
%%
foo : /* empty */
bar
: /* empty */
gives
$ bison -Wall foo.y
foo.y:2.4: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:3.4: warning: empty rule without %empty [-Wempty-rule]
3 | bar
| ^
The carets are not at the right column, not even the right line.
This commit passes the colon "again" after the "id colon" token, which
gives more accurate locations for these messages:
$ bison -Wall foo.y
foo.y:2.10: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:4.2: warning: empty rule without %empty [-Wempty-rule]
4 | : /* empty */
| ^
* src/scan-gram.l (SC_AFTER_IDENTIFIER): Rollback the colon, so that
we scan it again afterwards.
(INITIAL): Scan colons.
* src/parse-gram.y (COLON): New.
(rules): Parse the colon after the rule's id_colon (and possible
named reference).
* tests/actions.at, tests/conflicts.at, tests/diagnostics.at,
* tests/existing.at: Adjust.
Because the fix-its were ready the character-based columns, but were
applied on byte-based columns, the result with multibyte characters or
tabs could be "interesting". For instance
%fixed-output_files
%fixed_output-files
%fixed-output-files
%define api.prefix {foo}
%no-default-prec
would give
%fixed-%fixed-output-files %fixed_output-files
%fixed-orefix= "foo"
o_default-prec
* src/fixits.c (fixit_print, fixits_run): Work on byte-base columns.
* tests/input.at: Check it.
Currently, when we quote the source file, we indent it with one space,
and preserve tabulations, so there is a discrepancy and the visual
rendering is bad. One way out is to indent with a tab instead of a
space, but then this space can be used for more information. This is
what GCC9 does. Let's play copy cats.
See
https://lists.gnu.org/archive/html/bison-patches/2019-04/msg00025.htmlhttps://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Guidelines-for-Diagnostics
* src/location.c (location_caret): Prefix quoted lines with the line
number and a pipe, fitting 8 columns.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/diagnostics.at, tests/input.at, tests/java.at,
* tests/named-refs.at, tests/reduce.at, tests/regression.at,
* tests/sets.at: Adjust expectations.
Partly by "./build-aux/update-test tests/testsuite.dir/*/testsuite.log"
repeatedly, and partly by hand.
This is a pity: efforts were invested in computing correctly the
number of screen columns consumed by multibyte characters, but the
routines that do that were fed by single-byte inputs...
As a consequence Bison never displayed correctly locations when there
are multibyte characters.
* src/scan-gram.l (mbchar): New.
Use it instead of . in the catch-all clause.
* tests/diagnostics.at (Tabulations): Enhance into...
(Tabulations and multibyte characters): this.
Single point locations (equal boundaries) are troublesome, and we were
incorrectly ending the style in their case. Which results in an abort
in libtextstyle.
There is also a confusion between columns as displayed on the
screen (which take into account multibyte characters and tabulations),
and the number of bytes. Counting the screen-column
incrementally (character by character) is uneasy (because of multibyte
characters), and I don't want to maintain a buffer of the current line
when displaying the diagnostic. So I believe the simplest solution is
to track the byte number in addition to the screen column.
* src/location.h, src/location.c (boundary): Add the byte-column.
Adjust dependencies.
* src/getargs.c, src/scan-gram.l: Adjust.
* tests/diagnostics.at: Check zero-width locations.
Enable checking of styles even when libtextstyle is not installed.
* src/getargs.h, src/getargs.c (style_debug): New.
(getargs_colors): Set it when --style=debug.
* src/complain.c (begin_use_class, end_use_class): Use it.
* tests/diagnostics.at: New.
* src/lalr.c: Move logs to a better place to understand the chronology
of events.
* src/symlist.c (symbol_list_syms_print): Don't dump core on type
elements.
Currently, with --no-lines, instead of "#line file line\n", we emit
"\n". Let's emit nothing.
* data/skeletons/bison.m4 (b4_syncline): Emit at end-of-line when enabled.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, src/output.c: Use dnl after b4_syncline to
avoid spurious empty lines.
* tests/synclines.at (Sync Lines): Make sure that --no-lines is like
grep -v #line.
* tests/calc.at: Make sure that a rich grammar file behaves properly
with %no-lines.
Currently we use the syncline to report errors about a symbol's
destructor/printer. This is not accurate (only file and line), and
this is incorrect: the file name is double quotes (a recent change,
needed to make sure we escape properly double quotes in it). And
worst of all: with --no-line, b4_syncline expands to nothing.
Rather, push the locations into the backend, and use them.
* src/muscle-tab.h, src/muscle-tab.c (muscle_location_grow): Make it
public.
* src/output.c (prepare_symbol_definitions): Use it to pubish the
location of the printer and destructor.
* data/skeletons/lalr1.java: Use complain_at instead of complain.
* tests/java.at (Java invalid directives): Adjust expectations.
* data/skeletons/bison.m4 (b4_symbol_action_location): Remove.
We should not use b4_syncline this way.
I never understood why we book ngotos+1 slots for relations between
gotos: there are at most ngotos images, not ngotos+1 (and "includes"
does have cases where a goto is in relation with itself, so it's not
ngotos-1).
Maybe bbf37f2534 explains the +1: a bug
left us register a goto several times on occasion, and the +1 might
have been a means to avoid this problem in most cases. Now that this
bug is addressed, we should no longer overbook memory, if only for the
clarity of the code ("why ngotos+1 instead of ngotos?").
* src/lalr.c: A goto has at most ngotos images, not ngotos+1.
While at it, avoid useless repeated call to map_goto introduced in
bbf37f2534.