Suggesting -Wcounterexamples when there are conflicts is probably not
what the user wants. If she knows her conflicts and has set
%expect/%expect-rr appropriately, we shouldn't warn.
The commit also swaps the counterexamples and the report of conflicts,
into, IMHO, a more natural order: from
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
to
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
* src/conflicts.c (rule_conflicts_print): Rename as...
(report_rule_expectation_mismatches): this.
Move the handling of report_counterexamples to...
(conflicts_print): Here.
Display this warning when applicable.
Plural vs. singular is always a problem...
But we already have conflicts-sr and conflicts-rr, so counterexamples
makes more sense than counterexample. Besides, -Wcounterexample will
still be accepted as an unambiguous prefix of -Wcounterexamples.
Add -Wcex as a convenient alias.
While at it, use only "counterexample", never "counter example".
* src/complain.h, src/complain.c
(Wcounterexample, warning_counterexample): Rename as...
(Wcounterexamples, warning_counterexamples): these.
(argmatch_warning_docs): Rename -Wcounterexample as -Wcounterexamples.
(argmatch_warning_args): Likewise.
Add support for -Wcex.
Adjust dependencies.
While defining api.header.include worked as expected, its default
value was incorrectly defined. As a result, by default, the generated
parsers still duplicated the content of the generated header instead
of including it.
* data/skeletons/yacc.c (api.header.include): Fix its default value.
* tests/output.at: Check it.
* doc/bison.texi (%define Summary): Document api.header.include.
While at it, move the definition of api.namespace at the proper
place.
Use colors to show the counterexamples and the derivations in color,
to highlight their structure. Align the outputs, and add i18n
support. Reduce width by using a one-space separator instead of
two-space.
From
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
to
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
with colors.
* data/bison-default.css (cex-dot, cex-0, cex-1, cex-2, cex-3, cex-4)
(cex-5, cex-6, cex-7, cex-step, cex-leaf): New.
* src/derivation.c (derivation_print_styled_impl): New.
(derivation_print, derivation_print_leaves): Use it.
* src/counterexample.c: Reformat the output.
* tests/counterexample.at: Adjust.
It's unfortunate that the traditions between formal language theory
and Yacc differs, but here, tokens should be upper case, and
nonterminals should be lower case.
* tests/counterexample.at: Comply with this.
In Bison we refer to "shift/reduce" conflicts, not "shift-reduce" (in
Bison 3.6.3 186 occurrences vs 15). Enforce consistency on this.
Instead of "spending" a second line for each conflict to report the
lookaheads, put that on the same line as the type of conflict. Also,
prefer "token" to "symbol". Maybe we should even prefer "lookahead".
While at it, enable internationalization, with plurals where
appropriate.
As a consequence, instead of
Shift-Reduce Conflict:
6: 3 b: . %empty
6: 6 d: c . A
On Symbol: A
display
Shift/reduce conflict on token A:
6: 3 b: . %empty
6: 6 d: c . A
* NEWS, doc/bison.texi, src/conflicts.c: Spell it "shift/reduce", not
"shift-reduce".
* src/counterexample.c (counterexample_report_shift_reduce)
(counterexample_report_reduce_reduce): Reformat and internationalize
output.
* tests/counterexample.at: Adjust expectations.
Teaches bison about a new command line option, --file-prefix-map OLD=NEW
(based on the -ffile-prefix-map option from GCC) which causes it to
replace and file path of OLD in the text of the output file with NEW,
mainly for header guards and comments. The primary use of this is to
make builds reproducible with different input paths, and in particular
the debugging information produced when the source code is compiled. For
example, a distro may know that the bison source code will be located at
"/usr/src/bison" and thus can generate bison files that are reproducible
with the following command:
bison --output=/build/bison/parse.c -d --file-prefix-map=/build/bison/=/usr/src/bison/ parse.y
Importantly, this will change the header guards and #line directives
from:
#ifndef YY_BUILD_BISON_PARSE_H
#line 100 "/build/bison/parse.h"
to
#ifndef YY_USR_SRC_BISON_PARSE_H
#line 100 "/usr/src/bison/parse.h"
which is reproducible.
See https://lists.gnu.org/r/bison-patches/2020-05/msg00016.html
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
* src/files.h, src/files.c (spec_mapped_header_file)
(mapped_dir_prefix, map_file_name, add_prefix_map): New.
* src/getargs.c (-M, --file-prefix-map): New option.
* src/output.c (prepare): Define b4_mapped_dir_prefix and
b4_spec_header_file.
* src/scan-skel.l (@ofile@): Output the mapped file name.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/location.cc,
* data/skeletons/yacc.c:
Adjust.
* doc/bison.texi: Document.
* tests/input.at, tests/output.at: Check.
Instead of `On Symbols: {b,c,}`, display `On Symbols: b, c`.
* src/counterexample.c (counterexample_report_reduce_reduce): We don't
need braces.
Use commas as a separator, not a terminator.
* tests/counterexample.at: Adjust.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
The CI has "failures" such as (253, "Null nonterminals"):
@@ -21,7 +21,7 @@
3: 3 b: . %empty
3: 4 c: . %empty
On Symbols: {A,}
-time limit exceeded: 6.000000
+time limit exceeded: 11.000000
First Example c • c A A $end
First derivation $accept ::=[ a ::=[ c d ::=[ a ::=[ b ::=[ • ] d ::=[ c A A ] ] ] ] $end ]
Second Example c • A $end
* tests/counterexample.at (AT_BISON_CHECK_CEX): New.
Use it to neutralize differences in timeout values.
* src/parse-simulation.c: Replace reference counting with
parse_state_retain everywhere.
(free_parse_state): Make this function iterative instead of
recursive. Long parse_state chains were causing stack exhaustion.
* tests/counterexample.at: Fix expectations.
Fixes the SEGV in test 247 (counterexample.at:195): "S/R after first
token".
* src/counterexample.c: here.
* tests/counterexample.at: Fix expectations.
* src/counterexample.c, src/derivation.c:
Do not output diagnostics on stdout, that's the job of stderr, and the
testsuite heavily depend on this.
Do not leave trailing spaces in the output.
* tests/counterexample.at: Use AT_KEYWORDS.
Specify the expected outputs.
* tests/local.mk: Add counterexample.at.
In Bison 3.6.2, the comments with brackets lose their brackets, for
improper m4 quotation.
* data/skeletons/bison.m4 (b4_gsub): New.
* data/skeletons/c-like.m4 (_b4_comment): Use it.
* tests/m4.at: Check b4_gsub.
With input such as
%token<fl> yVL_CLOCK "/*verilator sc_clock*/"
we generate
yVL_CLOCK = 610, /* "/*verilator sc_clock*/" */
which is invalid since the comment will actually be closed on the
first "*/". Let's turn "*/" into "*\/" to avoid this. But GCC will
also warn about "/*" inside a comment, so let's "escape" it too.
Reported by Huang Rui.
https://github.com/akimd/bison/issues/38
* data/skeletons/c-like.m4 (_b4_comment): Escape comment delimiters in
comments.
* tests/input.at (Torturing the Scanner): Check thes cases.
* tests/m4.at: New.
Reported by Martin Blais <blais@furius.ca>.
https://lists.gnu.org/r/help-bison/2020-05/msg00005.html
* data/skeletons/lalr1.cc (symbol_name): Make it public.
Add a private hidden hook to enable testing of private parts.
* tests/local.at (AT_DATA_GRAMMAR_PROLOGUE): Help Emacs find the right
language mode.
* tests/c++.at (C++ Variant-based Symbols Unit Tests): Check that we
can read symbol_name.
AIX 7.1 supports diff -u, but its output does not match the expected
one.
Reported by Bruno Haible.
https://lists.gnu.org/r/bug-bison/2020-05/msg00049.html
* tests/atlocal.in (DIFF_U_WORKS): New.
* tests/local.at (AT_DIFF_U_CHECK): New.
* tests/existing.at (_AT_TEST_EXISTING_GRAMMAR): Use AT_DIFF_U_CHECK.
I don't plan to fix everything in one go. But this was in the way of
the next commit.
* data/skeletons/lalr1.java: Avoid space before parens.
* tests/java.at: Adjust.
From
public interface Lexer {
/* Token kinds. */
/** Token number, to be returned by the scanner. */
static final int YYEOF = 0;
/** Token number, to be returned by the scanner. */
static final int YYERRCODE = 256;
/** Token number, to be returned by the scanner. */
static final int YYUNDEF = 257;
/** Token number, to be returned by the scanner. */
static final int BANG = 258;
...
/** Deprecated, use b4_symbol(0, id) instead. */
public static final int EOF = YYEOF;
to
public interface Lexer {
/* Token kinds. */
/** Token "end of file", to be returned by the scanner. */
static final int YYEOF = 0;
/** Token error, to be returned by the scanner. */
static final int YYerror = 256;
/** Token "invalid token", to be returned by the scanner. */
static final int YYUNDEF = 257;
/** Token "!", to be returned by the scanner. */
static final int BANG = 258;
...
/** Deprecated, use YYEOF instead. */
public static final int EOF = YYEOF;
* data/skeletons/java.m4 (b4_token_enum): Display the symbol's tag in
comment.
* data/skeletons/lalr1.java: Address overquotation issue.
* examples/java/calc/Calc.y, examples/java/simple/Calc.y: Use YYEOF,
not EOF.
On an invalid character literal such as "'\777'" we used to produce
two errors:
input.y:2.9-12: error: invalid number after \-escape: 777
input.y:2.8-13: error: empty character literal
Get rid of the second one.
* src/scan-gram.l (STRING_GROW_ESCAPE): New.
* tests/input.at: Adjust.
I'm quite pleased to see that the tricky case of glr.c was already
prepared by the changes to support syntax_error exceptions. Better
yet, it is actually syntax_error that becomes a special case of the
general pattern: make yytoken be YYERRCODE.
* data/skeletons/glr.c (YYFAULTYTOK): Remove the now useless (Basil)
Faulty token.
Instead, use the error token.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java: When computing
the action, first check the case of the error token.
* tests/calc.at: Check cases for the error token symbols before and
after it.
* data/skeletons/yacc.c (yyparse): When the scanner returns YYERRCODE,
go directly to error recovery (yyerrlab1).
However, don't keep the error token as lookahead, that token is too
special.
* data/skeletons/lalr1.cc: Likewise.
* examples/c/bistromathic/parse.y (yylex): Use that feature to report
nicely invalid characters.
* examples/c/bistromathic/bistromathic.test: Check that.
* examples/test: Neutralize gratuitous differences such as rule
position.
* tests/calc.at: Check that case in C only.
The other case seem to be working, but that's an illusion that the
next commit will address (in fact, they can enter endless loops, and
report the error several times anyway).
We will not keep YYERRCODE anyway, it causes backward compatibility
issues. So as a first step, let all the skeletons use that name,
until we have a better one.
* data/skeletons/bison.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c, doc/bison.texi, tests/headers.at,
* tests/input.at:
here.
On macOS, wc -l always prepends the result with a tab, even when fed
by stdin. But anyway, we should have used `grep -c -v`, which appears
to be portable according to Autoconf's "Limitations of Usual Tools"
section.
Reported by Denis Excoffier.
https://lists.gnu.org/r/bug-bison/2020-04/msg00009.html
* tests/calc.at (_AT_CHECK_CALC): Use grep's -c instead.
Why didn't I think about this before??? symbolName should be a method
of SymbolKind.
* data/skeletons/lalr1.java (YYParser::yysymbolName): Move as...
* data/skeletons/java.m4 (SymbolKind::getName): this.
Make the table a static final table, not a local variable.
Adjust dependencies.
* doc/bison.texi (Java Parser Interface): Document i18n.
(Java Parser Context Interface): Document SymbolKind.
* examples/java/calc/Calc.y, tests/local.at: Adjust.
* doc/bison.texi (C++ Parser Context): New.
* data/skeletons/lalr1.cc (parser::yysymbol_name): Rename as...
(parser::symbol_name): this.
(A Complete C++ Example): Promote LAC, now that we have it.
Promote parse.error detailed over verbose.
* examples/c++/calc++/calc++.test, tests/local.at: Adjust.
The user should think of yypcontext fields as accessible only via
yypcontext_* functions. So let's rename yyexpected_tokens to reflect
that.
Let's _not_ rename yyreport_syntax_error, as the user may define this
function, and is not allowed to access directly the fields of
yypcontext_t: she *must* use the "accessors". This is comparable to
the case of C++/Java where the user defines
parser::report_syntax_error, not parser::context::report_syntax_error.
* data/skeletons/glr.c, data/skeletons/yacc.c (yyexpected_tokens):
Rename as...
(yypcontext_expected_tokens): this.
Adjust dependencies.
The name "$end" is nice in the report, in particular it avoids that
pointed-rules (aka items) be too long. It also helps keeping them
"standard".
But it is bad in error messages, we should report "end of file" (or
maybe "end of input", this is debatable). So, unless the user already
defined the alias for the error token herself, make it "end of file".
It should even be translated if the user already translated some
tokens, so that there is now no strong reason to redefine the $end
token.
* src/output.c (prepare_symbol_names): Issue "end of file" instead of
"$end".
* data/skeletons/lalr1.java (yytnamerr_): Remove the renaming hack.
* build-aux/update-test: Accept files with names containing a "+",
such as c++.at.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/glr-regression.at, tests/regression.at, tests/skeletons.at:
Adjust.
Yet, don't change the structure identifier to avoid introducing
conflicts in Vincent Imbimbo's PR (which, amusingly enough, is about
conflicts).
* src/symtab.c: here.
* tests/diagnostics.at, tests/input.at: Adjust.
yy::parser features a parse() function, not a yyparse() one.
* data/skeletons/lalr1.cc (yyreport_syntax_error)
(context::yyexpected_tokens): Rename as...
(report_syntax_error, context::expected_tokens): these.
Currently EOF is handled in an adhoc way, with a #define YYEOF 0 in
the implementation file. As a result, the user has to define her own
EOF token if she wants to use it, which is a pity.
Give the $end token a visible kind name, YYEOF. Except that in C,
where enums are not scoped, we would have collisions between all the
definitions of YYEOFs in the header files, so in C, make it
<api.PREFIX>EOF.
* data/skeletons/c.m4 (YYEOF): Override its name to avoid collisions.
Unless the user already gave it a different name.
* data/skeletons/glr.c (YYEOF): Remove.
Use ]b4_symbol(0, [id])[ instead.
Add support for "pre_epilogue", for glr.cc.
* data/skeletons/glr.cc: Remove dead code (never emitted #undefs).
* data/skeletons/yacc.c
* src/parse-gram.c
* src/reader.c
* src/symtab.c
* tests/actions.at
* tests/input.at