* src/main.c (main): When traces are enabled, display the Bison
version.
* tests/conflicts.at, tests/report.at, tests/sets.at:
Use AT_PACKAGE_VERSION (for package.m4) instead of post-processing the
output.
When printing items, it is clearer to put the dot after %emtpy rather
than before:
0 $accept: . unit "end of file"
1 unit: . assignments exp
- 2 assignments: . %empty
+ 2 assignments: %empty .
3 | . assignments assignment
Also, use the Unicode characters if they are supported.
* src/gram.c (item_print): Put the dot after %emtpy.
* tests/conflicts.at, tests/reduce.at, tests/report.at: Adjust.
This is consistent with --defines being deprecated in favor of
--header. The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.
* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
On a case such as
%%
exp
: empty "a"
| "a" empty
empty
: %empty
we used to display
warning: shift/reduce conflict on token "a" [-Wcounterexamples]
Example: • "a"
Shift derivation
exp
↳ 2: • "a" empty
↳ 2: ε
Example: • "a"
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: •
where the shift derivation shows an item "2: empty → ε", with an
explicit "ε", but the reduce derivation shows "3: empty → •", without
"ε".
For consistency, let's always show ε/%empty in rules with an empty
rhs:
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: ε •
* src/derivation.c (derivation_width, derivation_print_tree_impl):
Always show ε/%empty in counterexamples.
* tests/diagnostics.at: Check that case.
* tests/conflicts.at, tests/counterexample.at: Adjust.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
Fix 6b78e50cef, "cex: make "rerun with
'-Wcex'" a note instead of a warning"
* tests/conflicts.at (-W versus %expect and %expect-rr): Fix
expectations.
Currently the suggestion to rerun is a -Wother warning:
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]
Instead, let's attach it as a subnote of the diagnostic (in the
current case, -Wconflicts-sr):
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
* src/conflicts.c (conflicts_print): Do that.
Adjust the test suite.
From
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First derivation
a
`-> A b .
Second derivation
a
`-> A b
`-> b .
to
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First reduce derivation
a
`-> A b .
Second reduce derivation
a
`-> A b
`-> b .
* src/counterexample.c (print_counterexample): here.
Compute the width of the labels to properly align the values.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that the derivation is no longer printed on one line, aligning the
example and the derivation is no longer useful. It can actually be
harmful, as it makes the overall structure less clear.
* src/derivation.h, src/derivation.c (derivation_print_leaves): Remove
the `prefix` argument.
* src/counterexample.c (print_counterexample): Put the example next to
its label.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
This is more consistent, and brings benefits: users know that these
diagnostics are attached to -Wcounterexamples, and they can also click
on the hyperlink if permitted by their terminal.
We go from
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Reduce/reduce conflict on token $end:
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
to
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
with an hyperlink on -Wcounterexamples.
* src/counterexample.c (counterexample_report_reduce_reduce):
Use complain.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at:
Adjust.
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.
* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled. And check those colors.
Use of print_unicode_char suggested by Bruno Haible.
https://lists.gnu.org/r/bug-gettext/2020-06/msg00012.html
* src/gram.h (print_dot_fallback, print_dot): New.
* src/gram.c, src/derivation.c: Use it.
* tests/counterexample.at, tests/report.at: Adjust the test suite.
* .travis.yml, README-hacking.md: Adjust.
And let --report=all include the counterexamples.
* src/getargs.h, src/getargs.c (report_cex): New.
* src/main.c: Compute counterexamples when -rcex is specified.
* src/print.c: Include the counterexamples when -rcex is specified.
* tests/conflicts.at, tests/existing.at, tests/local.at: Adjust.
Suggesting -Wcounterexamples when there are conflicts is probably not
what the user wants. If she knows her conflicts and has set
%expect/%expect-rr appropriately, we shouldn't warn.
The commit also swaps the counterexamples and the report of conflicts,
into, IMHO, a more natural order: from
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
to
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
* src/conflicts.c (rule_conflicts_print): Rename as...
(report_rule_expectation_mismatches): this.
Move the handling of report_counterexamples to...
(conflicts_print): Here.
Display this warning when applicable.
The name "$end" is nice in the report, in particular it avoids that
pointed-rules (aka items) be too long. It also helps keeping them
"standard".
But it is bad in error messages, we should report "end of file" (or
maybe "end of input", this is debatable). So, unless the user already
defined the alias for the error token herself, make it "end of file".
It should even be translated if the user already translated some
tokens, so that there is now no strong reason to redefine the $end
token.
* src/output.c (prepare_symbol_names): Issue "end of file" instead of
"$end".
* data/skeletons/lalr1.java (yytnamerr_): Remove the renaming hack.
* build-aux/update-test: Accept files with names containing a "+",
such as c++.at.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/glr-regression.at, tests/regression.at, tests/skeletons.at:
Adjust.
Supporting YYERROR_VERBOSE via cpp is a nuisance: m4 is in charge of
handling alternatives. When adding more options for %define
parse.error, supporting both CPP and M4 is too complex. Anyway,
YYERROR_VERBOSE was deprecated long ago.
* data/skeletons/yacc.c: Use m4 only to handle verbose/simple error
messages.
The current code for yysyntax_error for %define parse.error verbose is
fishy (given that YYEMPTY is -2, invalid argument for yytname[]):
static int
yysyntax_error ([...])
{
YYPTRDIFF_T yysize0 = yytnamerr (YY_NULLPTR, yytname[yytoken]);
[...]
if (yytoken != YYEMPTY)
A nearby comment reports
The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
So it _is_ possible to call yysyntax_error with yytoken == YYEMPTY,
albeit quite difficult when meaning to, so virtually impossible by
accident (after all, there was never a bug report about this).
I failed to produce a test case, but Joel E. Denny provided me with
one (added to the test suite below). The yacc.c skeleton fails on
this, and once fixed dies on a second problem. The glr.c skeleton was
also dying, but immediately of this second problem.
Indeed we were not allocating space for the error message's final \0.
This was hidden by the fact that we only had error messages with at
least an unexpected token displayed, so with at least one "%s" in the
format string, whose size (2) was included (incorrectly) in the final
size of the message (where the %s have been replaced by the actual
content).
* data/skeletons/glr.c, data/skeletons/yacc.c (yysyntax_error):
Do not invoke yytnamerr on YYEMPTY.
Clarify the computation of the length of the _final_ error message,
with the NUL terminator but without the '%s's.
* tests/conflicts.at (Syntax error in consistent error state):
New, contributed by Joel E. Denny.
The "identifier and colon" of a rule is implemented as a single token,
but whose location is only that of the identifier (so that messages
about the lhs of a rule are accurate). When reducing empty rules, the
default location is the single point location on the end of the
previous symbol. As a consequence, when Bison parses a grammar, the
location of the right-hand side of an empty rule is based on the
lhs, *independently of the position of the colon*. And the colon can
be way farther, separated by comments, white spaces, including empty
lines.
As a result, some messages look really bad. For instance:
$ cat foo.y
%%
foo : /* empty */
bar
: /* empty */
gives
$ bison -Wall foo.y
foo.y:2.4: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:3.4: warning: empty rule without %empty [-Wempty-rule]
3 | bar
| ^
The carets are not at the right column, not even the right line.
This commit passes the colon "again" after the "id colon" token, which
gives more accurate locations for these messages:
$ bison -Wall foo.y
foo.y:2.10: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:4.2: warning: empty rule without %empty [-Wempty-rule]
4 | : /* empty */
| ^
* src/scan-gram.l (SC_AFTER_IDENTIFIER): Rollback the colon, so that
we scan it again afterwards.
(INITIAL): Scan colons.
* src/parse-gram.y (COLON): New.
(rules): Parse the colon after the rule's id_colon (and possible
named reference).
* tests/actions.at, tests/conflicts.at, tests/diagnostics.at,
* tests/existing.at: Adjust.
Currently, when we quote the source file, we indent it with one space,
and preserve tabulations, so there is a discrepancy and the visual
rendering is bad. One way out is to indent with a tab instead of a
space, but then this space can be used for more information. This is
what GCC9 does. Let's play copy cats.
See
https://lists.gnu.org/archive/html/bison-patches/2019-04/msg00025.htmlhttps://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Guidelines-for-Diagnostics
* src/location.c (location_caret): Prefix quoted lines with the line
number and a pipe, fitting 8 columns.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/diagnostics.at, tests/input.at, tests/java.at,
* tests/named-refs.at, tests/reduce.at, tests/regression.at,
* tests/sets.at: Adjust expectations.
Partly by "./build-aux/update-test tests/testsuite.dir/*/testsuite.log"
repeatedly, and partly by hand.
* tests/local.at (AT_MAIN_DEFINE(java)): Exit failure on failure.
(AT_PARSER_CHECK): If in Java, run AT_JAVA_PARSER_CHECK.
* tests/conflicts.at (AT_CONSISTENT_ERRORS_CHECK): Simplify.
Currently the caller must specify the ./ prefix to its command. Let's
avoid that: it will be nicer to read, make it easier to have a version
that works for Java and C/C++.
* tests/local.at (AT_PARSER_CHECK): Prefix the command with ./.
Adjust callers.
* tests/conflicts.at (AT_YYLEX_PROTOTYPE): Don't define it, leave that
task to AT_DATA_GRAMMAR.
But be honest: tell AT_BISON_OPTION_PUSHDEFS all the options we use.
The format is inconsistent. For instance most sections are
indented (including "Terminals unused in grammar" for instance), but
the sections "Terminals, with rules where they appear" and
"Nonterminals, with rules where they appear" are not. Let's indent
them. Also, these two sections try to wrap the output to avoid lines
too long. Yet we don't do that in the rest of the file, for instance
when listing the lookaheads of an item.
For instance in the case of Bison's parse-gram.output we go from:
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1, on right: 0
prologue_declarations (60)
on left: 2 3, on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24
25 26 27 28 29, on right: 3
[...]
to
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1
on right: 0
prologue_declarations (60)
on left: 2 3
on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29
on right: 3
[...]
* src/print.c (END_TEST): Remove.
(print_terminal_symbols): Don't try to wrap the output.
(print_nonterminal_symbols): Likewise.
Make two different lines for occurrences on the left, and occurrence
on the rhs of the rules.
Indent by 4 and 8, not 3.
* src/reduce.c (reduce_output): Indent by 4, not 3.
* tests/conflicts.at, tests/existing.at, tests/reduce.at,
* tests/regression.at, tests/report.at:
Adjust.
Currently on a grammar such as
exp : a '1' | a '2' | a '3' | b '1' | b '2' | b '3'
a:
b:
we count only one rr-conflict on the `b:` rule, i.e., we expect:
b: %expect-rr 1
although there are 3 conflicts in total. That's because in the
conflicted state we count only a single conflict, not three (one for
each of the lookaheads: '1', '2', '3').
State 0
0 $accept: . exp $end
1 exp: . a '1'
2 | . a '2'
3 | . a '3'
4 | . b '1'
5 | . b '2'
6 | . b '3'
7 a: . %empty ['1', '2', '3']
8 b: . %empty ['1', '2', '3']
'1' reduce using rule 7 (a)
'1' [reduce using rule 8 (b)]
'2' reduce using rule 7 (a)
'2' [reduce using rule 8 (b)]
'3' reduce using rule 7 (a)
'3' [reduce using rule 8 (b)]
$default reduce using rule 7 (a)
exp go to state 1
a go to state 2
b go to state 3
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_rr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_rr_conflicts): Adjust.
* tests/conflicts.at (%expect-rr in grammar rules)
(%expect-rr too much in grammar rules)
(%expect-rr not enough in grammar rules): New.
Currently on a grammar such as
exp: "number" | exp "+" exp | exp "*" exp
we count only one sr-conflict for both binary rules, i.e., we expect:
exp: "number" | exp "+" exp %expect 1 | exp "*" exp %expect 1
although there are 4 conflicts in total. That's because in the states
in conflict, for instance that for the "+" rule:
State 6
2 exp: exp . "+" exp
2 | exp "+" exp . [$end, "+", "*"]
3 | exp . "*" exp
"+" shift, and go to state 4
"*" shift, and go to state 5
"+" [reduce using rule 2 (exp)]
"*" [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
we count only a single conflict, although there are two (one on "+"
and another with "*").
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_sr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_sr_conflicts): Adjust.
* tests/conflicts.at (%expect in grammar rules): New.
This change allows one to document (and check) which rules participate
in shift/reduce and reduce/reduce conflicts. This is particularly
important GLR parsers, where conflicts are a normal occurrence. For
example,
%glr-parser
%expect 1
%%
...
argument_list:
arguments %expect 1
| arguments ','
| %empty
;
arguments:
expression
| argument_list ',' expression
;
...
Looking at the output from -v, one can see that the shift-reduce
conflict here is due to the fact that the parser does not know whether
to reduce arguments to argument_list until it sees the token AFTER the
following ','. By marking the rule with %expect 1 (because there is a
conflict in one state), we document the source of the 1 overall shift-
reduce conflict.
In GLR parsers, we can use %expect-rr in a rule for reduce/reduce
conflicts. In this case, we mark each of the conflicting rules. For
example,
%glr-parser
%expect-rr 1
%%
stmt:
target_list '=' expr ';'
| expr_list ';'
;
target_list:
target
| target ',' target_list
;
target:
ID %expect-rr 1
;
expr_list:
expr
| expr ',' expr_list
;
expr:
ID %expect-rr 1
| ...
;
In a statement such as
x, y = 3, 4;
the parser must reduce x to a target or an expr, but does not know
which until it sees the '='. So we notate the two possible reductions
to indicate that each conflicts in one rule.
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html.
* doc/bison.texi (Suppressing Conflict Warnings): Document %expect,
%expect-rr in grammar rules.
* src/conflicts.c (count_state_rr_conflicts): Adjust comment.
(rule_has_state_sr_conflicts): New static function.
(count_rule_sr_conflicts): New static function.
(rule_nast_state_rr_conflicts): New static function.
(count_rule_rr_conflicts): New static function.
(rule_conflicts_print): New static function.
(conflicts_print): Also use rule_conflicts_print to report on individual
rules.
* src/gram.h (struct rule): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* src/reader.c (grammar_midrule_action): Transfer expected_sr_conflicts,
expected_rr_conflicts to new rule, and turn off in current_rule.
(grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
(packgram): Transfer expected_sr_conflicts, expected_rr_conflicts
to new rule.
* src/reader.h (grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
* src/symlist.c (symbol_list_sym_new): Initialize expected_sr_conflicts,
expected_rr_conflicts.
* src/symlist.h (struct symbol_list): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* tests/conflicts.at: Add tests "%expect in grammar rule not enough",
"%expect in grammar rule right.", "%expect in grammar rule too much."
The previous name is too obscure, and the other macros for C++ use
CXX, not CC.
* tests/local.at (AT_SKEL_CC_IF, AT_SKEL_JAVA_IF): Rename as...
(AT_CXX_IF, AT_JAVA_IF): these.
Adjust callers.
On these tests, at -O2 and above, GCC 8 complains that yylval may be
uninitialized. But it seems wrong: it is initialized. Rather than
turning off the warning in the skeleton (hence possibility hiding
relevant warnings of user parsers), let's turn it off in the tests
only.
163: parse.error=verbose and consistent errors: FAILED (conflicts.at:625)
165: parse.error=verbose and consistent errors: lr.default-reduction=consistent FAILED (conflicts.at:635)
166: parse.error=verbose and consistent errors: lr.default-reduction=accepting FAILED (conflicts.at:641)
167: parse.error=verbose and consistent errors: lr.type=canonical-lr FAILED (conflicts.at:645)
168: parse.error=verbose and consistent errors: parse.lac=full FAILED (conflicts.at:650)
169: parse.error=verbose and consistent errors: parse.lac=full lr.default-reduction=accepting FAILED (conflicts.at:655)
We get:
input.c: In function 'yyparse':
input.c:980:9: error: 'yylval' may be used uninitialized in this function [-Werror=maybe-uninitialized]
YYSTYPE yylval YY_INITIAL_VALUE (= yyval_default);
^~~~~~
cc1: all warnings being treated as errors
See https://lists.gnu.org/archive/html/bison-patches/2018-08/msg00063.html.
* tests/conflicts.at (AT_CONSISTENT_ERRORS_CHECK): Disable
-Wmaybe-uninitialized.