The "identifier and colon" of a rule is implemented as a single token,
but whose location is only that of the identifier (so that messages
about the lhs of a rule are accurate). When reducing empty rules, the
default location is the single point location on the end of the
previous symbol. As a consequence, when Bison parses a grammar, the
location of the right-hand side of an empty rule is based on the
lhs, *independently of the position of the colon*. And the colon can
be way farther, separated by comments, white spaces, including empty
lines.
As a result, some messages look really bad. For instance:
$ cat foo.y
%%
foo : /* empty */
bar
: /* empty */
gives
$ bison -Wall foo.y
foo.y:2.4: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:3.4: warning: empty rule without %empty [-Wempty-rule]
3 | bar
| ^
The carets are not at the right column, not even the right line.
This commit passes the colon "again" after the "id colon" token, which
gives more accurate locations for these messages:
$ bison -Wall foo.y
foo.y:2.10: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:4.2: warning: empty rule without %empty [-Wempty-rule]
4 | : /* empty */
| ^
* src/scan-gram.l (SC_AFTER_IDENTIFIER): Rollback the colon, so that
we scan it again afterwards.
(INITIAL): Scan colons.
* src/parse-gram.y (COLON): New.
(rules): Parse the colon after the rule's id_colon (and possible
named reference).
* tests/actions.at, tests/conflicts.at, tests/diagnostics.at,
* tests/existing.at: Adjust.
Currently, when we quote the source file, we indent it with one space,
and preserve tabulations, so there is a discrepancy and the visual
rendering is bad. One way out is to indent with a tab instead of a
space, but then this space can be used for more information. This is
what GCC9 does. Let's play copy cats.
See
https://lists.gnu.org/archive/html/bison-patches/2019-04/msg00025.htmlhttps://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Guidelines-for-Diagnostics
* src/location.c (location_caret): Prefix quoted lines with the line
number and a pipe, fitting 8 columns.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/diagnostics.at, tests/input.at, tests/java.at,
* tests/named-refs.at, tests/reduce.at, tests/regression.at,
* tests/sets.at: Adjust expectations.
Partly by "./build-aux/update-test tests/testsuite.dir/*/testsuite.log"
repeatedly, and partly by hand.
* tests/local.at (AT_MAIN_DEFINE(java)): Exit failure on failure.
(AT_PARSER_CHECK): If in Java, run AT_JAVA_PARSER_CHECK.
* tests/conflicts.at (AT_CONSISTENT_ERRORS_CHECK): Simplify.
Currently the caller must specify the ./ prefix to its command. Let's
avoid that: it will be nicer to read, make it easier to have a version
that works for Java and C/C++.
* tests/local.at (AT_PARSER_CHECK): Prefix the command with ./.
Adjust callers.
* tests/conflicts.at (AT_YYLEX_PROTOTYPE): Don't define it, leave that
task to AT_DATA_GRAMMAR.
But be honest: tell AT_BISON_OPTION_PUSHDEFS all the options we use.
The format is inconsistent. For instance most sections are
indented (including "Terminals unused in grammar" for instance), but
the sections "Terminals, with rules where they appear" and
"Nonterminals, with rules where they appear" are not. Let's indent
them. Also, these two sections try to wrap the output to avoid lines
too long. Yet we don't do that in the rest of the file, for instance
when listing the lookaheads of an item.
For instance in the case of Bison's parse-gram.output we go from:
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1, on right: 0
prologue_declarations (60)
on left: 2 3, on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24
25 26 27 28 29, on right: 3
[...]
to
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1
on right: 0
prologue_declarations (60)
on left: 2 3
on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29
on right: 3
[...]
* src/print.c (END_TEST): Remove.
(print_terminal_symbols): Don't try to wrap the output.
(print_nonterminal_symbols): Likewise.
Make two different lines for occurrences on the left, and occurrence
on the rhs of the rules.
Indent by 4 and 8, not 3.
* src/reduce.c (reduce_output): Indent by 4, not 3.
* tests/conflicts.at, tests/existing.at, tests/reduce.at,
* tests/regression.at, tests/report.at:
Adjust.
Currently on a grammar such as
exp : a '1' | a '2' | a '3' | b '1' | b '2' | b '3'
a:
b:
we count only one rr-conflict on the `b:` rule, i.e., we expect:
b: %expect-rr 1
although there are 3 conflicts in total. That's because in the
conflicted state we count only a single conflict, not three (one for
each of the lookaheads: '1', '2', '3').
State 0
0 $accept: . exp $end
1 exp: . a '1'
2 | . a '2'
3 | . a '3'
4 | . b '1'
5 | . b '2'
6 | . b '3'
7 a: . %empty ['1', '2', '3']
8 b: . %empty ['1', '2', '3']
'1' reduce using rule 7 (a)
'1' [reduce using rule 8 (b)]
'2' reduce using rule 7 (a)
'2' [reduce using rule 8 (b)]
'3' reduce using rule 7 (a)
'3' [reduce using rule 8 (b)]
$default reduce using rule 7 (a)
exp go to state 1
a go to state 2
b go to state 3
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_rr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_rr_conflicts): Adjust.
* tests/conflicts.at (%expect-rr in grammar rules)
(%expect-rr too much in grammar rules)
(%expect-rr not enough in grammar rules): New.
Currently on a grammar such as
exp: "number" | exp "+" exp | exp "*" exp
we count only one sr-conflict for both binary rules, i.e., we expect:
exp: "number" | exp "+" exp %expect 1 | exp "*" exp %expect 1
although there are 4 conflicts in total. That's because in the states
in conflict, for instance that for the "+" rule:
State 6
2 exp: exp . "+" exp
2 | exp "+" exp . [$end, "+", "*"]
3 | exp . "*" exp
"+" shift, and go to state 4
"*" shift, and go to state 5
"+" [reduce using rule 2 (exp)]
"*" [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
we count only a single conflict, although there are two (one on "+"
and another with "*").
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_sr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_sr_conflicts): Adjust.
* tests/conflicts.at (%expect in grammar rules): New.
This change allows one to document (and check) which rules participate
in shift/reduce and reduce/reduce conflicts. This is particularly
important GLR parsers, where conflicts are a normal occurrence. For
example,
%glr-parser
%expect 1
%%
...
argument_list:
arguments %expect 1
| arguments ','
| %empty
;
arguments:
expression
| argument_list ',' expression
;
...
Looking at the output from -v, one can see that the shift-reduce
conflict here is due to the fact that the parser does not know whether
to reduce arguments to argument_list until it sees the token AFTER the
following ','. By marking the rule with %expect 1 (because there is a
conflict in one state), we document the source of the 1 overall shift-
reduce conflict.
In GLR parsers, we can use %expect-rr in a rule for reduce/reduce
conflicts. In this case, we mark each of the conflicting rules. For
example,
%glr-parser
%expect-rr 1
%%
stmt:
target_list '=' expr ';'
| expr_list ';'
;
target_list:
target
| target ',' target_list
;
target:
ID %expect-rr 1
;
expr_list:
expr
| expr ',' expr_list
;
expr:
ID %expect-rr 1
| ...
;
In a statement such as
x, y = 3, 4;
the parser must reduce x to a target or an expr, but does not know
which until it sees the '='. So we notate the two possible reductions
to indicate that each conflicts in one rule.
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html.
* doc/bison.texi (Suppressing Conflict Warnings): Document %expect,
%expect-rr in grammar rules.
* src/conflicts.c (count_state_rr_conflicts): Adjust comment.
(rule_has_state_sr_conflicts): New static function.
(count_rule_sr_conflicts): New static function.
(rule_nast_state_rr_conflicts): New static function.
(count_rule_rr_conflicts): New static function.
(rule_conflicts_print): New static function.
(conflicts_print): Also use rule_conflicts_print to report on individual
rules.
* src/gram.h (struct rule): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* src/reader.c (grammar_midrule_action): Transfer expected_sr_conflicts,
expected_rr_conflicts to new rule, and turn off in current_rule.
(grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
(packgram): Transfer expected_sr_conflicts, expected_rr_conflicts
to new rule.
* src/reader.h (grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
* src/symlist.c (symbol_list_sym_new): Initialize expected_sr_conflicts,
expected_rr_conflicts.
* src/symlist.h (struct symbol_list): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* tests/conflicts.at: Add tests "%expect in grammar rule not enough",
"%expect in grammar rule right.", "%expect in grammar rule too much."
The previous name is too obscure, and the other macros for C++ use
CXX, not CC.
* tests/local.at (AT_SKEL_CC_IF, AT_SKEL_JAVA_IF): Rename as...
(AT_CXX_IF, AT_JAVA_IF): these.
Adjust callers.
On these tests, at -O2 and above, GCC 8 complains that yylval may be
uninitialized. But it seems wrong: it is initialized. Rather than
turning off the warning in the skeleton (hence possibility hiding
relevant warnings of user parsers), let's turn it off in the tests
only.
163: parse.error=verbose and consistent errors: FAILED (conflicts.at:625)
165: parse.error=verbose and consistent errors: lr.default-reduction=consistent FAILED (conflicts.at:635)
166: parse.error=verbose and consistent errors: lr.default-reduction=accepting FAILED (conflicts.at:641)
167: parse.error=verbose and consistent errors: lr.type=canonical-lr FAILED (conflicts.at:645)
168: parse.error=verbose and consistent errors: parse.lac=full FAILED (conflicts.at:650)
169: parse.error=verbose and consistent errors: parse.lac=full lr.default-reduction=accepting FAILED (conflicts.at:655)
We get:
input.c: In function 'yyparse':
input.c:980:9: error: 'yylval' may be used uninitialized in this function [-Werror=maybe-uninitialized]
YYSTYPE yylval YY_INITIAL_VALUE (= yyval_default);
^~~~~~
cc1: all warnings being treated as errors
See https://lists.gnu.org/archive/html/bison-patches/2018-08/msg00063.html.
* tests/conflicts.at (AT_CONSISTENT_ERRORS_CHECK): Disable
-Wmaybe-uninitialized.
* tests/conflicts.at (AT_CONSISTENT_ERRORS_CHECK): Move AT_SETUP/AT_CLEANUP
into it, so that we don't skip non Java tests following a test case in Java.
* src/symtab.c (symbol_precedence_set): Use prec_location, not
location (which is the first occurrence of the symbol, possibly just
%token).
Also, as redefinitions are not allowed, keep the first values, not
the subsequent ones.
* tests/conflicts.at, tests/existing.at, tests/regression.at: Adjust.
In a precedence declaration, when tokens are declared with a litteral
character (e.g., 'a') or with a identifier (e.g., B), Bison behaved
differently: the litteral tokens would be numbered first, and then the
other ones, leading to the following grammar:
%right A B 'c' 'd'
being numbered as such: 'c' 'd' A B.
* src/parse-gram.y (symbol.prec): Set the symbol number when reading the
symbols.
* tests/conflicts.at (Token declaration order: literals vs. identifiers):
New.
Signed-off-by: Akim Demaille <akim@lrde.epita.fr>
* src/gram.c (grammar_rules_useless_report): Let -fcaret handle the
pretty-printing of the guilty rules.
(rule_print): Inline in its only use.
* tests/conflicts.at, tests/existing.at, tests/reduce.at,
* tests/regression.at: Adjust.
* NEWS: Document.
The new warning category "precedence" flags useless precedence and
associativity. -Wprecedence can now be used, it is disabled by default.
The warnings about precedence and associativity are grouped into one, and
the testsuite was corrected accordingly.
* src/complain.h (warnings): Introduce "precedence".
* src/complain.c (warnings_print_categories): Adjust.
* src/getargs.c (warnings_args, warning_types): Likewise.
* src/symtab.h, src/symtab.c (print_associativity_warnings): Remove.
* src/symtab.h (register_assoc): Correct arguments.
* src/symtab.c (print_precedence_warnings): Print both warnings together.
* doc/bison.texi (Bison options): Document the warnings and provide an
example.
* tests/conflicts.at, tests/existing.at, tests/local.at,
* tests/regression.at: Adapt the testsuite for the new category
(-Wprecedence instead of -Wother where appropriate).
Record which symbol associativity is used, and display useless ones.
* src/symtab.h, src/symtab.c (register_assoc, print_assoc_warnings): New
* src/symtab.c (init_assoc, is_assoc_used): New
* src/main.c: Use print_assoc_warnings
* src/conflicts.c: Use register_assoc
* tests/conflicts.at (Useless associativity warning): New.
Due to the new warning, many tests had to be updated.
* tests/conflicts.at tests/existing.at tests/regression.at:
Add the associativity warning in the expected results.
* tests/java.at: Fix the java calculator's grammar to remove a useless
associativity.
* doc/bison.texi (mfcalc example): Fix associativity to remove
warning.
Symbols with precedence but no associativity, and whose precedence is
never used, can be declared with %token instead. The used precedence
relationships are recorded and a warning about useless ones is issued.
* src/conflicts.c (resolve_sr_conflict): Record precedence relation.
* src/symtab.c, src/symtab.h (prec_nodes, init_prec_nodes)
(symgraphlink_new, register_precedence_second_symbol)
(print_precedence_warnings): New.
Record relationships in a graph and warn about useless ones.
* src/main.c (main): Print precedence warnings.
* tests/conflicts.at: New.
* cfg.mk: Ignore strcmp in local.at.
* tests/conflicts.at: Use AT_PARSER_CHECK.
* tests/regression.at: Preserve the exit status of the generated parsers.
* tests/local.mk ($(TESTSUITE)): Map @tb@ to a tabulation.
* tests/c++.at, tests/input.at, tests/regression.at: Use @tb@.
* cfg.mk: (space-tab): There are no longer exceptions.
* origin/maint:
misc: pacify the Tiny C Compiler
cpp: make the check of Flex version portable
misc: require getline
c++: support wide strings for file names
doc: document carets
tests: enhance existing tests with carets
errors: show carets
getargs: add support for --flags/-f
Conflicts:
doc/bison.texi
m4/.gitignore
src/complain.c
src/flex-scanner.h
src/getargs.c
src/getargs.h
src/gram.c
src/main.c
tests/headers.at
* tests/actions.at: Unset value.
* tests/conflicts.at: Rule useless due to conflicts.
* tests/input.at: Missing terminator, unexpected end of file, command line
redefinition of variable.
* tests/named-refs.at: Many errors.
* tests/reduce.at: Useless nonterminals and rules.
* tests/regression.at: Large token.
* origin/maint:
tests: close files in glr-regression
xml: match DOT output and xml2dot.xsl processing
xml: factor xslt space template
graph: fix a memory leak
xml: documentation
output: capitalize State
This improves readability. This is also what gcc does.
* NEWS: Document this change.
* src/complain.c (complain_at): Prefix all errors with "error: ".
(complain_at_indent, warn_at_indent): Do not prefix the context
information of errors, which are basically just indented errors.
* tests/conflicts.at, tests/glr-regression.at, tests/input.at,
tests/named-refs.at, tests/output.at, tests/push.at,
tests/regression.at, tests/skeletons.at: Apply this change.
Signed-off-by: Akim Demaille <akim@lrde.epita.fr>
The current routines used to display s/r and r/r conflicts are both
inconvenient from the programmer point of view (they do not use the
warning infrastructure) and for the user (the messages are rather
terse, not necessarily pleasant to read, and because they don't use
the same routines, they look different).
It was due to the belief (dating back to the initial checked-in
version of Bison) that, at some point, POSIX Yacc mandated the format
for these messages. Today, the Open Group's manual page for Yacc,
<http://pubs.opengroup.org/onlinepubs/009695399/utilities/yacc.html>,
explicitly states that the format of these messages is unspecified.
See commit be7280480c and
<http://lists.gnu.org/archive/html/bison-patches/2002-12/msg00027.html>.
For a discussion on the chosen warning format, see
http://lists.gnu.org/archive/html/bison-patches/2012-09/msg00039.html
In an effort to factor the handling of errors and warnings, use the
Bison warning routines to report these messages.
* src/conflicts.c (conflicts_print): Rewrite with clearer sections
about S/R and then R/R conflicts.
(conflict_report): Remove, inlined in its sole
caller...
(conflicts_output): here.
* tests/conflicts.at, tests/existing.at, tests/glr-regression.at,
* tests/reduce.at, tests/regression.at: Adjust the expected results.
* NEWS: Update.
* src/conflicts.c (conflicts_print): Complain about %expect-rr if not
in GLR mode, regardless of the number of reduce/reduce conflicts.
* tests/conflicts.at (%expect-rr non GLR): New test.
* NEWS: Update.