Currently the core of the initial state is limited to the single rule
on $accept.
* src/lr0.c (generate_states): There may now be several rules on
$accept.
* src/graphviz.c (conclude_red): Recognize "final" transitions by the
fact that we reduce to "$accept".
* src/print.c (print_reduction): Likewise.
* src/print-xml.c (print_reduction): Likewise.
From
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First derivation
a
`-> A b .
Second derivation
a
`-> A b
`-> b .
to
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First reduce derivation
a
`-> A b .
Second reduce derivation
a
`-> A b
`-> b .
* src/counterexample.c (print_counterexample): here.
Compute the width of the labels to properly align the values.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that the derivation is no longer printed on one line, aligning the
example and the derivation is no longer useful. It can actually be
harmful, as it makes the overall structure less clear.
* src/derivation.h, src/derivation.c (derivation_print_leaves): Remove
the `prefix` argument.
* src/counterexample.c (print_counterexample): Put the example next to
its label.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that we use complain, the "sections" are clearer.
* src/counterexample.c (print_counterexample): Use the empty line only
in reports.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at: Adjust.
This is more consistent, and brings benefits: users know that these
diagnostics are attached to -Wcounterexamples, and they can also click
on the hyperlink if permitted by their terminal.
We go from
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Reduce/reduce conflict on token $end:
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
to
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
with an hyperlink on -Wcounterexamples.
* src/counterexample.c (counterexample_report_reduce_reduce):
Use complain.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at:
Adjust.
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
Currently when we output useless rules, they appear before the
grammar, but using the same invocation. As a result, the anchor is
defined twice, and the wrong one, being first, is honored.
* data/xslt/xml2xhtml.xsl (rule): Take a new 'anchor' parameter to
decide whether being an anchor, or a target.
Let it be true when output the grammar.
* tests/report.at: Adjust.
The text and Dot reports are expected to be identical when generated
directly (--report, --graph) or indirectly (via XML). The xml
testsuite had not be run for ages, let it catch up a bit.
* src/print-xml.c: Pass the type of the symbols.
* data/xslt/xml2text.xsl
Catch up with the new layout.
Display the symbol types.
Use '•', not '.'
* tests/local.at: Smash '•' to '.' when matching against the direct
text report.
* tests/report.at: Adjust XML expectations.
AFAICT, "dotted rule" is a more frequent synonym of "item" than
"pointed rule". So let's migrate to using "dot" only.
* doc/bison.texi: Use dot/'•' rather than point/'.'.
* src/print-xml.c (print_core): Use dot rather than point. This is
not backward compatible, but AFAICT, we don't have actual user of the
XML output (but ourselves). So...
* data/xslt/xml2dot.xsl, data/xslt/xml2text.xsl,
* data/xslt/xml2xhtml.xsl, tests/report.at: ... adjust.
It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.
* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled. And check those colors.
Use of print_unicode_char suggested by Bruno Haible.
https://lists.gnu.org/r/bug-gettext/2020-06/msg00012.html
* src/gram.h (print_dot_fallback, print_dot): New.
* src/gram.c, src/derivation.c: Use it.
* tests/counterexample.at, tests/report.at: Adjust the test suite.
* .travis.yml, README-hacking.md: Adjust.
From
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
to
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
* src/print.c: Use mbswidth, not strlen, to compute visual columns.
* tests/report.at: Adjust.
Currently we use "quotearg" to escape the strings output in Dot. As a
result, if the user's locale is C for instance, all the non-ASCII are
escaped. Unfortunately graphviz does not interpret this style of
escaping.
For instance:
5 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]
was displayed as a sequence of numbers. We now output:
5 -> 2 [style=solid label="\"Ñùṃéℝô\""]
independently of the user's locale.
* src/system.h (obstack_backslash): New.
* src/graphviz.h, src/graphviz.c (escape): Remove, use
obstack_backslash instead.
* src/print-graph.c: Likewise.
* tests/report.at: Adjust.
Currently our scanner decodes all the escapes in the strings, and we
later reescape the strings when we emit them.
This is troublesome, as we do not respect the user input. For
instance, when the user writes in UTF-8, we destroy her string when we
write it back. And this shows everywhere: in the reports we show the
escaped string instead of the actual alias:
0 $accept: . exp $end
1 exp: . exp "\342\212\225" exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
"number" shift, and go to state 1
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" shift, and go to state 2
This commit preserves the user's exact spelling of the string aliases,
instead of interpreting the escapes and then reescaping. The report
now shows:
0 $accept: . exp $end
1 exp: . exp "⊕" exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "Ñùṃéℝô"
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
Likewise, the XML (and therefore HTML) outputs are fixed.
* src/scan-gram.l (STRING, TSTRING): Do not interpret the escapes in
the resulting string.
* src/parse-gram.y (unquote, parser_init, parser_free, unquote_free)
(handle_defines, handle_language, obstack_for_unquote): New.
Use them to unquote where needed.
* tests/regression.at, tests/report.at: Update.
This is to record the current state of the report, which escapes the
UTF-8 characters (as parse.error="verbose" does), but shouldn't (as
parse.error="detailed" does).
* tests/report.at: here.
Be robust to newer versions of Autoconf where the package URL defaults
to https instead of http.
* configure.ac (AC_INIT): Use https.
* tests/report.at: Adjust expected output s/http/https/
to match updated URL.
The format is inconsistent. For instance most sections are
indented (including "Terminals unused in grammar" for instance), but
the sections "Terminals, with rules where they appear" and
"Nonterminals, with rules where they appear" are not. Let's indent
them. Also, these two sections try to wrap the output to avoid lines
too long. Yet we don't do that in the rest of the file, for instance
when listing the lookaheads of an item.
For instance in the case of Bison's parse-gram.output we go from:
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1, on right: 0
prologue_declarations (60)
on left: 2 3, on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24
25 26 27 28 29, on right: 3
[...]
to
Terminals, with rules where they appear
"end of file" (0) 0
error (256) 28 88
"string" <char*> (258) 9 13 16 17 20 23 24 109 116
[...]
Nonterminals, with rules where they appear
$accept (58)
on left: 0
input (59)
on left: 1
on right: 0
prologue_declarations (60)
on left: 2 3
on right: 1 3
prologue_declaration (61)
on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29
on right: 3
[...]
* src/print.c (END_TEST): Remove.
(print_terminal_symbols): Don't try to wrap the output.
(print_nonterminal_symbols): Likewise.
Make two different lines for occurrences on the left, and occurrence
on the rhs of the rules.
Indent by 4 and 8, not 3.
* src/reduce.c (reduce_output): Indent by 4, not 3.
* tests/conflicts.at, tests/existing.at, tests/reduce.at,
* tests/regression.at, tests/report.at:
Adjust.
* tests/input.at (_AT_UNUSED_VALUES_DECLARATIONS): Check
typed mid-rule actions.
* tests/report.at (Reports): Check that types of typed mid-rule
actions are reported.
* tests/actions.at (Typed mid-rule actions): Check that
the values of typed mid-rule actions are correct.