Now that we use complain, the "sections" are clearer.
* src/counterexample.c (print_counterexample): Use the empty line only
in reports.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at: Adjust.
This is more consistent, and brings benefits: users know that these
diagnostics are attached to -Wcounterexamples, and they can also click
on the hyperlink if permitted by their terminal.
We go from
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Reduce/reduce conflict on token $end:
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
to
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
with an hyperlink on -Wcounterexamples.
* src/counterexample.c (counterexample_report_reduce_reduce):
Use complain.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at:
Adjust.
* src/complain.c (begin_hyperlink, end_hyperlink): New.
(warnings_print_categories): Use them.
* tests/local.at (AT_SET_ENV): Disable hyperlinks in the tests, they
contain random id's, and brackets (which is not so nice for M4).
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
Currently we use both names. Let's stick to the short one.
* src/AnnotationList.c, src/conflicts.c, src/counterexample.c,
* src/getargs.c, src/getargs.h, src/graphviz.c, src/ielr.c,
* src/lalr.c, src/print-graph.c, src/print-xml.c, src/print.c,
* src/state-item.c, src/state.c, src/state.h, src/tables.c:
s/lookahead_token/lookahead/gi.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
Currently when we output useless rules, they appear before the
grammar, but using the same invocation. As a result, the anchor is
defined twice, and the wrong one, being first, is honored.
* data/xslt/xml2xhtml.xsl (rule): Take a new 'anchor' parameter to
decide whether being an anchor, or a target.
Let it be true when output the grammar.
* tests/report.at: Adjust.
The text and Dot reports are expected to be identical when generated
directly (--report, --graph) or indirectly (via XML). The xml
testsuite had not be run for ages, let it catch up a bit.
* src/print-xml.c: Pass the type of the symbols.
* data/xslt/xml2text.xsl
Catch up with the new layout.
Display the symbol types.
Use '•', not '.'
* tests/local.at: Smash '•' to '.' when matching against the direct
text report.
* tests/report.at: Adjust XML expectations.
Reported by Martin Blais and Yuriy Solodkyy.
https://lists.gnu.org/r/help-bison/2020-05/msg00011.htmlhttps://lists.gnu.org/r/bug-bison/2020-06/msg00038.html
While at it, modernize filename_type as api.filename.type and document
it properly.
* data/skeletons/c++.m4 (filename_type): Rename as...
(api.filename.type): this.
Default to const std::string.
* data/skeletons/location.cc (position, location): Expose the
filename_type type.
Use api.filename.type.
* doc/bison.texi (%define Summary): Document api.filename.type.
(C++ Location Values): Document position::filename_type.
* src/muscle-tab.c (muscle_percent_variable_update): Ensure backward
compatibility.
* tests/c++.at: Check that using const file names is ok.
tests/input.at: Check backward compat.
AFAICT, "dotted rule" is a more frequent synonym of "item" than
"pointed rule". So let's migrate to using "dot" only.
* doc/bison.texi: Use dot/'•' rather than point/'.'.
* src/print-xml.c (print_core): Use dot rather than point. This is
not backward compatible, but AFAICT, we don't have actual user of the
XML output (but ourselves). So...
* data/xslt/xml2dot.xsl, data/xslt/xml2text.xsl,
* data/xslt/xml2xhtml.xsl, tests/report.at: ... adjust.
It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.
* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled. And check those colors.
I implemented this to print A ::= [ ], but A ::= [ %empty ] might be
clearer.
* src/parse-simulation.c (nullable_closure): Don't generate null
nonterminal derivations as leaves.
* src/derivation.c (derivation_print_impl): Don't print seperator
spaces for null nonterminal.
* tests/counterexample.at: Update test results.
This was a hack to make it easier for people to migrate from yacc.c to
lalr1.cc and from glr.c to glr.cc: when set, YYSTYPE and YYLTYPE were
`#defined`. It was never documented (just mentioned in NEWS for Bison
2.2, 2006-05-19), but was used to simplify the test suite. Stop that:
adjust the test suite to the skeletons, not the converse.
In C++ use yy::parser::semantic_type, yy::parser::location_type, and
yy::parser::token::MY_TOKEN, instead of YYSTYPE, YYLTYPE and MY_TOKEN.
* data/skeletons/glr.cc, data/skeletons/lalr1.cc: Remove its support.
* tests/actions.at, tests/c++.at, tests/calc.at: Adjust.
Use of print_unicode_char suggested by Bruno Haible.
https://lists.gnu.org/r/bug-gettext/2020-06/msg00012.html
* src/gram.h (print_dot_fallback, print_dot): New.
* src/gram.c, src/derivation.c: Use it.
* tests/counterexample.at, tests/report.at: Adjust the test suite.
* .travis.yml, README-hacking.md: Adjust.
And let --report=all include the counterexamples.
* src/getargs.h, src/getargs.c (report_cex): New.
* src/main.c: Compute counterexamples when -rcex is specified.
* src/print.c: Include the counterexamples when -rcex is specified.
* tests/conflicts.at, tests/existing.at, tests/local.at: Adjust.
Instead of
Shift/reduce conflict on token D:
Example A a • D
First derivation s ::=[ A a a ::=[ b ::=[ c ::=[ • ] ] ] d ::=[ D ] ]
Example A a • D
Second derivation s ::=[ A a d ::=[ • D ] ]
display
Shift/reduce conflict on token D:
Example A a • D
First derivation s ::=[ A a a ::=[ b ::=[ c ::=[ • ] ] ] d ::=[ D ] ]
Example A a • D
Second derivation s ::=[ A a d ::=[ • D ] ]
* src/counterexample.c (print_counterexample): Indent.
* tests/counterexample.at: Adjust.
Showing the items (with the state numbers) is really something we
should restrict to the report.
* src/counterexample.c (counterexample_report_shift_reduce)
(counterexample_report_reduce_reduce): Don't show the pointed rules,
we will do that in the report.
* tests/counterexample.at: Adjust.
From
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
to
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
* src/print.c: Use mbswidth, not strlen, to compute visual columns.
* tests/report.at: Adjust.
Currently we use "quotearg" to escape the strings output in Dot. As a
result, if the user's locale is C for instance, all the non-ASCII are
escaped. Unfortunately graphviz does not interpret this style of
escaping.
For instance:
5 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]
was displayed as a sequence of numbers. We now output:
5 -> 2 [style=solid label="\"Ñùṃéℝô\""]
independently of the user's locale.
* src/system.h (obstack_backslash): New.
* src/graphviz.h, src/graphviz.c (escape): Remove, use
obstack_backslash instead.
* src/print-graph.c: Likewise.
* tests/report.at: Adjust.
Currently our scanner decodes all the escapes in the strings, and we
later reescape the strings when we emit them.
This is troublesome, as we do not respect the user input. For
instance, when the user writes in UTF-8, we destroy her string when we
write it back. And this shows everywhere: in the reports we show the
escaped string instead of the actual alias:
0 $accept: . exp $end
1 exp: . exp "\342\212\225" exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
"number" shift, and go to state 1
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" shift, and go to state 2
This commit preserves the user's exact spelling of the string aliases,
instead of interpreting the escapes and then reescaping. The report
now shows:
0 $accept: . exp $end
1 exp: . exp "⊕" exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "Ñùṃéℝô"
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
Likewise, the XML (and therefore HTML) outputs are fixed.
* src/scan-gram.l (STRING, TSTRING): Do not interpret the escapes in
the resulting string.
* src/parse-gram.y (unquote, parser_init, parser_free, unquote_free)
(handle_defines, handle_language, obstack_for_unquote): New.
Use them to unquote where needed.
* tests/regression.at, tests/report.at: Update.
This is to record the current state of the report, which escapes the
UTF-8 characters (as parse.error="verbose" does), but shouldn't (as
parse.error="detailed" does).
* tests/report.at: here.
Suggesting -Wcounterexamples when there are conflicts is probably not
what the user wants. If she knows her conflicts and has set
%expect/%expect-rr appropriately, we shouldn't warn.
The commit also swaps the counterexamples and the report of conflicts,
into, IMHO, a more natural order: from
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
to
input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
Shift/reduce conflict on token B:
1: 3 a: A .
1: 8 y: A . B
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]
* src/conflicts.c (rule_conflicts_print): Rename as...
(report_rule_expectation_mismatches): this.
Move the handling of report_counterexamples to...
(conflicts_print): Here.
Display this warning when applicable.
Plural vs. singular is always a problem...
But we already have conflicts-sr and conflicts-rr, so counterexamples
makes more sense than counterexample. Besides, -Wcounterexample will
still be accepted as an unambiguous prefix of -Wcounterexamples.
Add -Wcex as a convenient alias.
While at it, use only "counterexample", never "counter example".
* src/complain.h, src/complain.c
(Wcounterexample, warning_counterexample): Rename as...
(Wcounterexamples, warning_counterexamples): these.
(argmatch_warning_docs): Rename -Wcounterexample as -Wcounterexamples.
(argmatch_warning_args): Likewise.
Add support for -Wcex.
Adjust dependencies.
While defining api.header.include worked as expected, its default
value was incorrectly defined. As a result, by default, the generated
parsers still duplicated the content of the generated header instead
of including it.
* data/skeletons/yacc.c (api.header.include): Fix its default value.
* tests/output.at: Check it.
* doc/bison.texi (%define Summary): Document api.header.include.
While at it, move the definition of api.namespace at the proper
place.
Use colors to show the counterexamples and the derivations in color,
to highlight their structure. Align the outputs, and add i18n
support. Reduce width by using a one-space separator instead of
two-space.
From
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
to
Example A • B C
First derivation s ::=[ a ::=[ A • ] x ::=[ B C ] ]
Example A • B C
Second derivation s ::=[ y ::=[ A • B ] c ::=[ C ] ]
with colors.
* data/bison-default.css (cex-dot, cex-0, cex-1, cex-2, cex-3, cex-4)
(cex-5, cex-6, cex-7, cex-step, cex-leaf): New.
* src/derivation.c (derivation_print_styled_impl): New.
(derivation_print, derivation_print_leaves): Use it.
* src/counterexample.c: Reformat the output.
* tests/counterexample.at: Adjust.
It's unfortunate that the traditions between formal language theory
and Yacc differs, but here, tokens should be upper case, and
nonterminals should be lower case.
* tests/counterexample.at: Comply with this.
In Bison we refer to "shift/reduce" conflicts, not "shift-reduce" (in
Bison 3.6.3 186 occurrences vs 15). Enforce consistency on this.
Instead of "spending" a second line for each conflict to report the
lookaheads, put that on the same line as the type of conflict. Also,
prefer "token" to "symbol". Maybe we should even prefer "lookahead".
While at it, enable internationalization, with plurals where
appropriate.
As a consequence, instead of
Shift-Reduce Conflict:
6: 3 b: . %empty
6: 6 d: c . A
On Symbol: A
display
Shift/reduce conflict on token A:
6: 3 b: . %empty
6: 6 d: c . A
* NEWS, doc/bison.texi, src/conflicts.c: Spell it "shift/reduce", not
"shift-reduce".
* src/counterexample.c (counterexample_report_shift_reduce)
(counterexample_report_reduce_reduce): Reformat and internationalize
output.
* tests/counterexample.at: Adjust expectations.
Teaches bison about a new command line option, --file-prefix-map OLD=NEW
(based on the -ffile-prefix-map option from GCC) which causes it to
replace and file path of OLD in the text of the output file with NEW,
mainly for header guards and comments. The primary use of this is to
make builds reproducible with different input paths, and in particular
the debugging information produced when the source code is compiled. For
example, a distro may know that the bison source code will be located at
"/usr/src/bison" and thus can generate bison files that are reproducible
with the following command:
bison --output=/build/bison/parse.c -d --file-prefix-map=/build/bison/=/usr/src/bison/ parse.y
Importantly, this will change the header guards and #line directives
from:
#ifndef YY_BUILD_BISON_PARSE_H
#line 100 "/build/bison/parse.h"
to
#ifndef YY_USR_SRC_BISON_PARSE_H
#line 100 "/usr/src/bison/parse.h"
which is reproducible.
See https://lists.gnu.org/r/bison-patches/2020-05/msg00016.html
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
* src/files.h, src/files.c (spec_mapped_header_file)
(mapped_dir_prefix, map_file_name, add_prefix_map): New.
* src/getargs.c (-M, --file-prefix-map): New option.
* src/output.c (prepare): Define b4_mapped_dir_prefix and
b4_spec_header_file.
* src/scan-skel.l (@ofile@): Output the mapped file name.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/location.cc,
* data/skeletons/yacc.c:
Adjust.
* doc/bison.texi: Document.
* tests/input.at, tests/output.at: Check.
Instead of `On Symbols: {b,c,}`, display `On Symbols: b, c`.
* src/counterexample.c (counterexample_report_reduce_reduce): We don't
need braces.
Use commas as a separator, not a terminator.
* tests/counterexample.at: Adjust.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
The CI has "failures" such as (253, "Null nonterminals"):
@@ -21,7 +21,7 @@
3: 3 b: . %empty
3: 4 c: . %empty
On Symbols: {A,}
-time limit exceeded: 6.000000
+time limit exceeded: 11.000000
First Example c • c A A $end
First derivation $accept ::=[ a ::=[ c d ::=[ a ::=[ b ::=[ • ] d ::=[ c A A ] ] ] ] $end ]
Second Example c • A $end
* tests/counterexample.at (AT_BISON_CHECK_CEX): New.
Use it to neutralize differences in timeout values.