An assertion failed when the last character is a '\' and we're in a
character or a string.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00009.html
* src/scan-gram.l: Catch unterminated escapes.
* tests/input.at (Unexpected end of file): New.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00008.html
On an empty such as
%token FOO
BAR
FOO 0
%%
input: %empty
we crash because when we find FOO 0, we decrement ntokens (since FOO
was discovered to be EOF, which is already known to be a token, so we
increment ntokens for it, and need to cancel this). This "works well"
when EOF is properly defined in one go, but here it is first defined
and later only assign token code 0. In the meanwhile BAR was given
the token number that we just decremented.
To fix this, assign symbol numbers after parsing, not during parsing,
so that we also saw all the explicit token codes. To maintain the
current numbers (I'd like to keep no difference in the output, not
just equivalence), we need to make sure the symbols are numbered in
the same order: that of appearance in the source file. So we need the
locations to be correct, which was almost the case, except for nterms
that appeared several times as LHS (i.e., several times as "foo:
..."). Fixing the use of location_of_lhs sufficed (it appears it was
intended for this use, but its implementation was unfinished: it was
always set to "false" only).
* src/symtab.c (symbol_location_as_lhs_set): Update location_of_lhs.
(symbol_code_set): Remove broken hack that decremented ntokens.
(symbol_class_set, dummy_symbol_get): Don't set number, ntokens and
nnterms.
(symbol_check_defined): Do it.
(symbols): Don't count nsyms here.
Actually, don't count nsyms at all: let it be done in...
* src/reader.c (check_and_convert_grammar): here. Define nsyms from
ntokens and nnterms after parsing.
* tests/input.at (EOF redeclared): New.
* examples/c/bistromathic/bistromathic.test: Adjust the traces: in
"%nterm <double> exp %% input: ...", exp used to be numbered before
input.
* cfg.mk (_space_before_paren_exempt): Be less laxist.
* src/output.c, src/reader.c: Fix space before paren issues.
Pacify the warnings where applicable.
Older versions of GCC (4.1.2 here) don't like repeated typedefs.
CC src/bison-parse-simulation.o
src/parse-simulation.c:61: error: redefinition of typedef 'parse_state'
src/parse-simulation.h:74: error: previous declaration of 'parse_state' was here
make: *** [Makefile:7876: src/bison-parse-simulation.o] Error 1
Reported by Nelson H. F. Beebe.
* src/parse-simulation.c (parse_state): Don't typedef,
parse-simulation.h did it already.
This reverts commit d0bec3175f (which
should have read "We have a clash...", not "With have a clash...").
Now that `max()` was renamed `max_int()`, we can use `max` again, as
elsewhere in the code.
* src/counterexample.c (visited_hasher): Alpha reconversion.
Reported by Maarten De Braekeleer.
https://lists.gnu.org/r/bison-patches/2020-07/msg00080.html
We don't want to use gnulib's min and max macros, since we use
function calls in min/max arguments.
* src/location.c (max_int, min_int): Move to...
* src/system.h: here.
* src/counterexample.c, src/derivation.c: Use max_int instead of max.
From
foo.y:1.7-11: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~~~
foo.y:1.7-11: note: previous declaration
1 | %type <foo> bar bar
| ^~~~~
to
foo.y:1.17-19: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~
foo.y:1.13-15: note: previous declaration
1 | %type <foo> bar bar
| ^~~
* src/symlist.h, src/symlist.c (symbol_list_type_set): There's no need
for the tag's location, use that of the symbol.
* src/parse-gram.y: Adjust.
* tests/input.at: Adjust.
We crash if the input contains a string containing a NUL byte.
Reported by Suhwan Song.
https://lists.gnu.org/r/bug-bison/2020-07/msg00051.html
* src/flex-scanner.h (STRING_FREE): Avoid accidental use of
last_string.
* src/scan-gram.l: Don't call STRING_FREE without calling
STRING_FINISH first.
* tests/input.at (Invalid inputs): Check that case.
With have a clash with the "max" function.
src/counterexample.c: In function 'visited_hasher':
src/counterexample.c:720:48: error: declaration of 'max' shadows a global declaration [-Werror=shadow]
src/counterexample.c:116:12: error: shadowed declaration is here [-Werror=shadow]
* src/counterexample.c (visited_hasher): Alpha conversion.
Currently the suggestion to rerun is a -Wother warning:
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]
Instead, let's attach it as a subnote of the diagnostic (in the
current case, -Wconflicts-sr):
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
* src/conflicts.c (conflicts_print): Do that.
Adjust the test suite.
From
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First derivation
a
`-> A b .
Second derivation
a
`-> A b
`-> b .
to
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First reduce derivation
a
`-> A b .
Second reduce derivation
a
`-> A b
`-> b .
* src/counterexample.c (print_counterexample): here.
Compute the width of the labels to properly align the values.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that the derivation is no longer printed on one line, aligning the
example and the derivation is no longer useful. It can actually be
harmful, as it makes the overall structure less clear.
* src/derivation.h, src/derivation.c (derivation_print_leaves): Remove
the `prefix` argument.
* src/counterexample.c (print_counterexample): Put the example next to
its label.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that we use complain, the "sections" are clearer.
* src/counterexample.c (print_counterexample): Use the empty line only
in reports.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at: Adjust.
This is more consistent, and brings benefits: users know that these
diagnostics are attached to -Wcounterexamples, and they can also click
on the hyperlink if permitted by their terminal.
We go from
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Reduce/reduce conflict on token $end:
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
to
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
with an hyperlink on -Wcounterexamples.
* src/counterexample.c (counterexample_report_reduce_reduce):
Use complain.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at:
Adjust.
* src/complain.c (begin_hyperlink, end_hyperlink): New.
(warnings_print_categories): Use them.
* tests/local.at (AT_SET_ENV): Disable hyperlinks in the tests, they
contain random id's, and brackets (which is not so nice for M4).
The code was written on top of buffers of `char[26]`, and then was
changed to use `char *`, yet was still using `sizeof buf`, which
became `sizeof (char *)` instead of `sizeof (char[26])`.
Reported by Dagobert Michelsen.
https://lists.gnu.org/r/bug-bison/2020-07/msg00023.html
* src/glyphs.h, src/glyphs.c: Get rid of uses of `char *`, use only
glyph_buffer_t.
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
Currently we use both names. Let's stick to the short one.
* src/AnnotationList.c, src/conflicts.c, src/counterexample.c,
* src/getargs.c, src/getargs.h, src/graphviz.c, src/ielr.c,
* src/lalr.c, src/print-graph.c, src/print-xml.c, src/print.c,
* src/state-item.c, src/state.c, src/state.h, src/tables.c:
s/lookahead_token/lookahead/gi.
* src/counterexample.c, src/parse-simulation.c: It is more usual in
Bison to use sizeof on expressions than on types, especially for
allocation.
Let the compiler do it's job instead of calling memcpy ourselves.
There are too many gl_list_t in there, it's hard to understand what is
going on. Introduce and use more precise types. I sure can be wrong
in some places, it's hard to tell without proper tool support.
* src/counterexample.c, src/lssi.c, src/lssi.h, src/parse-simulation.c,
* src/parse-simulation.h, src/state-item.c, src/state-item.h
(si_bfs_node_list, search_state_list, ssb_list, lssi_list)
(state_item_list): New.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
Prefer `&foos[i]` to `foos + i` when `foos` is an array. IMHO, it
makes the semantics clearer.
* src/counterexample.c, src/lssi.c, src/parse-simulation.c,
* src/state-item.c: With arrays, prefer the array notation rather than
the pointer one.