cex: give more details about -Wcex and -rcex

* data/bison-default.css: Cobalt does not seem to be supported.
* doc/bison.texi (Counterexamples): A new section.
(Understanding): Show the counterexamples as it shows in the report:
with its items.
(Bison Options): Document -Wcex and -rcex.
This commit is contained in:
Akim Demaille
2020-07-02 08:19:58 +02:00
parent 526ef45f30
commit 156e548341
3 changed files with 235 additions and 78 deletions

4
TODO
View File

@@ -32,10 +32,6 @@ could try to use the same color for the same rule.
Show the counterexamples. This is going to be really hard and/or painful. Show the counterexamples. This is going to be really hard and/or painful.
Unless we play it dumb (little structure). Unless we play it dumb (little structure).
*** Doc
-Wcounterexamples, --report=counterexamples. Don't forget to include
examples about conflicts in the reports.
** Bistromathic ** Bistromathic
- How about not evaluating incomplete lines when the text is not finished - How about not evaluating incomplete lines when the text is not finished
(as shells do). (as shells do).

View File

@@ -44,7 +44,7 @@
.cex-0 { color: yellow; } .cex-0 { color: yellow; }
.cex-1 { color: green; } .cex-1 { color: green; }
.cex-2 { color: blue; } .cex-2 { color: blue; }
.cex-3 { color: cobalt; } .cex-3 { color: purple; }
.cex-4 { color: violet; } .cex-4 { color: violet; }
.cex-5 { color: orange; } .cex-5 { color: orange; }
.cex-6 { color: brown; } .cex-6 { color: brown; }

View File

@@ -53,11 +53,11 @@
\gdef\colorBlue{% \gdef\colorBlue{%
\setcolor{\rgbBlue}% \setcolor{\rgbBlue}%
} }
\gdef\rgbPurple{0.50 0 0.50}
\gdef\rgbWarning{0.50 0 0.50} \gdef\colorPurple{%
\gdef\colorWarning{% \setcolor{\rgbPurple}%
\setcolor{\rgbWarning}%
} }
\gdef\rgbError{0.80 0 0} \gdef\rgbError{0.80 0 0}
\gdef\colorError{% \gdef\colorError{%
\setcolor{\rgbError}% \setcolor{\rgbError}%
@@ -84,10 +84,10 @@
@macro colorBlue @macro colorBlue
@inlineraw{html, <b style="color:blue">} @inlineraw{html, <b style="color:blue">}
@end macro @end macro
@macro colorPurple
@macro colorWarning
@inlineraw{html, <b style="color:darkviolet">} @inlineraw{html, <b style="color:darkviolet">}
@end macro @end macro
@macro colorError @macro colorError
@inlineraw{html, <b style="color:red">} @inlineraw{html, <b style="color:red">}
@end macro @end macro
@@ -115,8 +115,12 @@
@colorBlue{}\text\@colorOff{} @colorBlue{}\text\@colorOff{}
@end macro @end macro
@macro purple{text}
@colorPurple{}\text\@colorOff{}
@end macro
@macro dwarning{text} @macro dwarning{text}
@colorWarning{}\text\@colorOff{} @purple{\text\}
@end macro @end macro
@macro derror{text} @macro derror{text}
@@ -445,6 +449,7 @@ Handling Context Dependencies
Debugging Your Parser Debugging Your Parser
* Counterexamples:: Understanding conflicts.
* Understanding:: Understanding the structure of your parser. * Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser. * Graphviz:: Getting a visual representation of the parser.
* Xml:: Getting a markup representation of the parser. * Xml:: Getting a markup representation of the parser.
@@ -9785,12 +9790,18 @@ clear the flag.
@chapter Debugging Your Parser @chapter Debugging Your Parser
Developing a parser can be a challenge, especially if you don't understand Developing a parser can be a challenge, especially if you don't understand
the algorithm (@pxref{Algorithm}). This the algorithm (@pxref{Algorithm}). This chapter explains how to understand
chapter explains how to understand and debug a parser. and debug a parser.
The first sections focus on the static part of the parser: its structure. The most frequent issue users face is solving their conflicts. To fix them,
They explain how to generate and read the detailed description of the the first step is understanding how they arise in a given grammar. This is
automaton. There are several formats available: made much easier by automated generation of counterexamples, cover in the
first section (@pxref{Counterexamples}).
In most cases though, looking at the structure of the automaton is still
needed. The following sections explain how to generate and read the
detailed structural description of the automaton. There are several formats
available:
@itemize @minus @itemize @minus
@item @item
as text, see @ref{Understanding}; as text, see @ref{Understanding};
@@ -9807,19 +9818,175 @@ The last section focuses on the dynamic part of the parser: how to enable
and understand the parser run-time traces (@pxref{Tracing}). and understand the parser run-time traces (@pxref{Tracing}).
@menu @menu
* Counterexamples:: Understanding conflicts.
* Understanding:: Understanding the structure of your parser. * Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser. * Graphviz:: Getting a visual representation of the parser.
* Xml:: Getting a markup representation of the parser. * Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser. * Tracing:: Tracing the execution of your parser.
@end menu @end menu
@node Counterexamples
@section Generation of Counterexamples
Solving conflicts is probably the most delicate part of the design of an LR
parser, as demonstrated by the number of sections devoted to them in this
very documentation. To solve a conflict, one must understand it: when does
it occur? Is it because of a flaw in the grammar? Is it rather because
LR(1) cannot cope with this grammar?
On difficulty is that conflicts occur in the @emph{automaton}, and it can be
tricky to related them to issues in the @emph{grammar} itself. With
experience and patience, analysis the detailed description of the automaton
(@pxref{Understanding}) allows to find example strings that reach these conflicts.
That task is made much easier thanks to the generation of counterexamples,
initially developed by Chinawat Isradisaikul and Andrew Myers
@pcite{Isradisaikul 2015}.
As a first example, see the example grammar of @ref{Shift/Reduce}, which
features on shift/reduce conflict:
@example
$ @kbd{bison if-then-else.y}
if-then-else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
if-then-else.y: warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]
@end example
@noindent
Let's rerun @command{bison} with the option
@option{-Wcex}/@option{-Wcounterexamples}@inlinefmt{info, (the following
output is actually in color)}:
@ifhtml
@example
Shift/reduce conflict on token "else":
@group
Example @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
First derivation @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} @yellow{"else" stmt ]}
Example @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
Second derivation @yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} @green{]} @yellow{]}
@end group
@end example
@end ifhtml
@ifnothtml
@smallexample
Shift/reduce conflict on token "else":
@group
Example
@yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
First derivation
@yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{]} @green{]} @yellow{"else" stmt ]}
Example
@yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
Second derivation
@yellow{if_stmt ::=[ "if" expr "then"} @green{stmt ::=[} @blue{if_stmt ::=[ "if" expr "then" stmt} @red{•} @blue{"else" stmt ]} @green{]} @yellow{]}
@end group
@end smallexample
@end ifnothtml
This shows two different derivations for one single expression. That
demonstrates that the grammar is ambiguous.
@sp 1
As a more delicate example, consider the example grammar of
@ref{Reduce/Reduce}, which features a reduce/reduce conflict:
@example
%%
sequence:
%empty
| maybeword
| sequence "word"
;
maybeword:
%empty
| "word"
;
@end example
Bison generates the following counterexamples:
@example
$ @kbd{bison -Wcex sequence.y}
sequence.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
sequence.y: @dwarning{warning}: 2 reduce/reduce conflicts [@dwarning{-Wconflicts-rr}]
Shift/reduce conflict on token "word":
Example @red{•} @yellow{"word"}
First derivation @yellow{sequence ::=[} @green{sequence ::=[} @red{•} @green{]} @yellow{"word" ]}
Example @red{•} @green{"word"}
Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{"word" ]} @yellow{]}
Reduce/reduce conflict on tokens $end, "word":
Example @red{•}
First derivation @yellow{sequence ::=[} @red{•} @yellow{]}
Example @red{•}
Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{]} @yellow{]}
Shift/reduce conflict on token "word":
Example @red{•} @yellow{"word"}
First derivation @yellow{sequence ::=[} @green{sequence ::=[} @blue{maybeword ::=[} @red{•} @blue{]} @green{]} @yellow{"word" ]}
Example @red{•} @green{"word"}
Second derivation @yellow{sequence ::=[} @green{maybeword ::=[} @red{•} @green{"word" ]} @yellow{]}
sequence.y:8.3-45: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
8 | @dwarning{%empty @{ printf ("empty maybeword\n"); @}}
| @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
@end example
Each of these three conflicts, again, prove that the grammar is ambiguous.
For instance, the second conflict (the reduce/reduce one) shows that the
grammar accept the empty input in two different ways.
@sp 1
Sometimes, the search will not find an example that can be derived in two
ways. In these cases, counterexample generation will provide two examples
that are the same up until the dot. Most notably, this will happen when
your grammar requires a stronger parser (more lookahead, LR instead of
LALR). The following example isn't LR(1):
@example
%token ID
%%
s: a ID
a: expr
expr: %empty | expr ID ','
@end example
@command{bison} reports:
@smallexample
Shift/reduce conflict on token ID:
First example @blue{expr} @red{•} @green{ID} @yellow{$end}
First derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} @red{•} @blue{]} @green{ID ]} @yellow{$end ]}
Second example @purple{expr} @red{•} @purple{ID ','} @green{ID} @yellow{$end}
Second derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[} @purple{expr ::=[ expr} @red{•} @purple{ID ',' ]} @blue{]} @green{ID ]} @yellow{$end ]}
@end smallexample
This conflict is caused by the parser not having enough information to know
the difference between these two examples. The parser would need an
additional lookahead token to know whether or not a comma follows the
@code{ID} after @code{expr}. These types of conflicts tend to be more
difficult to fix, and usually need a rework of the grammar. In this case,
it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
ID}.
Alternatively, you might also want to consider using a GLR parser
(@pxref{GLR Parsers}).
@sp 1
On occasions, it is useful to look at counterexamples @emph{in situ}: with
the automaton report (@xref{Understanding}, in particular @ref{state-8,,
State 8}).
@node Understanding @node Understanding
@section Understanding Your Parser @section Understanding Your Parser
As documented elsewhere (@pxref{Algorithm}) Bison parsers are Bison parsers are @dfn{shift/reduce automata} (@pxref{Algorithm}). In some
@dfn{shift/reduce automata}. In some cases (much more frequent than one cases (much more frequent than one would hope), looking at this automaton is
would hope), looking at this automaton is required to tune or simply fix a required to tune or simply fix a parser.
parser.
The textual file is generated when the options @option{--report} or The textual file is generated when the options @option{--report} or
@option{--verbose} are specified, see @ref{Invocation}. Its name is made by @option{--verbose} are specified, see @ref{Invocation}. Its name is made by
@@ -9877,62 +10044,6 @@ calc.y: @dwarning{warning}: 7 shift/reduce conflicts [@dwarning{-Wconflicts-sr}]
calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate conflict counterexamples [@dwarning{-Wother}] calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate conflict counterexamples [@dwarning{-Wother}]
@end smallexample @end smallexample
When given @option{-Wcounterexamples}, @command{bison} will run a search for
strings in your grammar that better demonstrate you
conflicts. Counterexample generation was initially developed by Chinawat
Isradisaikul and Andrew Myers @pcite{Isradisaikul 2015}. For
@file{calc.y}, the first printed example is:
@example
Shift/reduce conflict on token '/':
Example @green{exp '+' exp} @red{•} @yellow{'/' exp}
First derivation @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} @green{]} @yellow{'/' exp ]}
Example @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
Second derivation @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} @green{'/' exp ]} @yellow{]}
@end example
This shows two separate derivations in the grammar for the same @code{exp}:
@samp{e1 + e2 / e3}. The derivations show how your rules would parse the
given example. Here, the first derivation completes a reduction when seeing
@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
directives would fix this conflict.
Sometimes, the search will not find an example that can be derived in two
ways. In these cases, counterexample generation will provide two examples
that are the same up until the dot. Most notably, this will happen when
your grammar requires a stronger parser (more lookahead, LR instead of
LALR). The following example isn't LR(1):
@example
%token ID
%%
s: a ID
a: expr
expr: %empty | expr ID ','
@end example
@command{bison} reports:
@smallexample
Shift/reduce conflict on token ID:
First example @blue{expr} @red{•} @green{ID} @yellow{$end}
First derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr} @red{•} @blue{]} @green{ID ]} @yellow{$end ]}
Second example @blue{expr} @red{•} @blue{ID ','} @green{ID} @yellow{$end}
Second derivation @yellow{$accept ::=[} @green{s ::=[} @blue{a ::=[ expr ::=[ expr} @red{•} @blue{ID ',' ] ]} @green{ID ]} @yellow{$end ]}
@end smallexample
This conflict is caused by the parser not having enough information to know
the difference between these two examples. The parser would need an
additional lookahead token to know whether or not a comma follows the
@code{ID} after @code{expr}. These types of conflicts tend to be more
difficult to fix, and usually need a rework of the grammar. In this case,
it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
ID}.
Counterexamples can also be written to a file with @option{--report=cex}.
Going back to the calc example, when given @option{--report=state}, Going back to the calc example, when given @option{--report=state},
in addition to @file{calc.tab.c}, it creates a file @file{calc.output} in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
with contents detailed below. The order of the output and the exact with contents detailed below. The order of the output and the exact
@@ -10164,6 +10275,7 @@ State 7
exp go to state 11 exp go to state 11
@end example @end example
@anchor{state-8}
As was announced in beginning of the report, @samp{State 8 conflicts: As was announced in beginning of the report, @samp{State 8 conflicts:
1 shift/reduce}: 1 shift/reduce}:
@@ -10237,6 +10349,27 @@ Conflict between rule 1 and token '-' resolved as reduce (%left '-').
Conflict between rule 1 and token '*' resolved as shift ('+' < '*'). Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
@end example @end example
When given @option{--report=counterexamples}, @command{bison} will generate
counterexamples within the report, augmented with the corresponding items
(@pxref{Counterexamples}).
@example
Shift/reduce conflict on token '/':
1 exp: exp '+' exp •
4 exp: exp • '/' exp
Example @green{exp '+' exp} @red{•} @yellow{'/' exp}
First derivation @yellow{exp ::=[} @green{exp ::=[ exp '+' exp} @red{•} @green{]} @yellow{'/' exp ]}
Example @yellow{exp '+'} @green{exp} @red{•} @green{'/' exp}
Second derivation @yellow{exp ::=[ exp '+'} @green{exp ::=[ exp} @red{•} @green{'/' exp ]} @yellow{]}
@end example
This shows two separate derivations in the grammar for the same @code{exp}:
@samp{e1 + e2 / e3}. The derivations show how your rules would parse the
given example. Here, the first derivation completes a reduction when seeing
@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
directives would fix this conflict.
The remaining states are similar: The remaining states are similar:
@@ -11084,6 +11217,13 @@ unexpected number of conflicts is an error, and an expected number of
conflicts is not reported, so @option{-W} and @option{--warning} then have conflicts is not reported, so @option{-W} and @option{--warning} then have
no effect on the conflict report. no effect on the conflict report.
@item counterexamples
@itemx cex
Provide counterexamples for conflicts. @xref{Counterexamples}.
Counterexamples take time to compute. The option @option{-Wcex} should be
used by the developer when working on the grammar; it hardly makes sense to
use it in a CI.
@item dangling-alias @item dangling-alias
Report string literals that are not bound to a token symbol. Report string literals that are not bound to a token symbol.
@@ -11458,6 +11598,13 @@ each rule's lookahead set.
Implies @code{state}. Explain how conflicts were solved thanks to Implies @code{state}. Explain how conflicts were solved thanks to
precedence and associativity directives. precedence and associativity directives.
@item counterexamples
@itemx cex
Look for counterexamples for the conflicts. @xref{Counterexamples}.
Counterexamples take time to compute. The option @option{-rcex} should be
used by the developer when working on the grammar; it hardly makes sense to
use it in a CI.
@item all @item all
Enable all the items. Enable all the items.
@@ -15121,6 +15268,18 @@ Thus, if there is a rule which says that an integer can be used as an
expression, integers are allowed @emph{anywhere} an expression is expression, integers are allowed @emph{anywhere} an expression is
permitted. @xref{Language and Grammar}. permitted. @xref{Language and Grammar}.
@item Counterexample
A sequence of tokens and/or nonterminals, with one dot, that demonstrates a
conflict. The dot marks the place where the conflict occurs.
A @emph{unifying} counterexample is a single string that has two different
parses; its existence proves that the grammar is ambiguous. When a unifying
counterexample cannot be found in reasonable time, a @emph{nonunifying}
counterexample is built: @emph{two} different string sharing the prefix up
to the dot.
@xref{Counterexamples}
@item Default reduction @item Default reduction
The reduction that a parser should perform if the current parser state The reduction that a parser should perform if the current parser state
contains no other action for the lookahead token. In permitted parser contains no other action for the lookahead token. In permitted parser
@@ -15507,7 +15666,9 @@ London, Department of Computer Science, TR-00-12 (December 2000).
@c LocalWords: ResourceBundle myResources getString getName getToken ylwrap @c LocalWords: ResourceBundle myResources getString getName getToken ylwrap
@c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic @c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic
@c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI @c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI
@c LocalWords: Isradisaikul @c LocalWords: Isradisaikul tcite pcite rgbGreen colorGreen rgbYellow Wcex
@c LocalWords: colorYellow rgbRed colorRed rgbBlue colorBlue rgbPurple
@c LocalWords: colorPurple ifhtml ifnothtml situ rcex
@c Local Variables: @c Local Variables:
@c ispell-dictionary: "american" @c ispell-dictionary: "american"