mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 04:13:03 +00:00
doc: create a new Tuning LR section in the manual.
And clean up all other documentation of the features described
there.
* NEWS (2.5): Tweak wording of lr.type and parse.lac entries a
bit, update the cross-references to the manual, and point out that
LAC has caveats. Don't be so adamant that IELR+LAC=canonical LR.
That is, as the referenced section in the manual documents, LAC
does not fix infinite parsing loops on syntax errors.
* doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR,
IELR, and LR in @cindex.
(%define Summary): Condense the entries for lr.default-reductions,
lr.keep-unreachable-states, lr.type, and parse.lac into brief
summaries, and cross-reference the appropriate subsections of
Tuning LR. For parse.lac, mention that it's only implemented for
deterministic parsers in C. In parse.error entry, mention LAC,
and add cross-reference to the LAC section.
(Error Reporting): When mentioning parse.error, mention LAC, and
add cross-reference to the LAC section.
(Tuning LR): New section with an extended version of the
documentation removed from %define Summary. Change all
cross-references in the manual to point here instead of there.
(Calc++ Parser): When mentioning parse.error, mention LAC, and add
cross-reference to the LAC section.
(Table of Symbols): In %error-verbose entry, add cross-reference
to Error Reporting.
(Glossary): Capitalize entry titles consistently. Add definitions
for "defaulted state" and "unreachable state". Expand IELR
acronym in IELR's entry.
(cherry picked from commit 6f04ee6c78)
Conflicts:
doc/bison.texinfo
This commit is contained in:
31
ChangeLog
31
ChangeLog
@@ -1,3 +1,34 @@
|
||||
2011-03-06 Joel E. Denny <joeldenny@joeldenny.org>
|
||||
|
||||
doc: create a new Tuning LR section in the manual.
|
||||
And clean up all other documentation of the features described
|
||||
there.
|
||||
* NEWS (2.5): Tweak wording of lr.type and parse.lac entries a
|
||||
bit, update the cross-references to the manual, and point out that
|
||||
LAC has caveats. Don't be so adamant that IELR+LAC=canonical LR.
|
||||
That is, as the referenced section in the manual documents, LAC
|
||||
does not fix infinite parsing loops on syntax errors.
|
||||
* doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR,
|
||||
IELR, and LR in @cindex.
|
||||
(%define Summary): Condense the entries for lr.default-reductions,
|
||||
lr.keep-unreachable-states, lr.type, and parse.lac into brief
|
||||
summaries, and cross-reference the appropriate subsections of
|
||||
Tuning LR. For parse.lac, mention that it's only implemented for
|
||||
deterministic parsers in C. In parse.error entry, mention LAC,
|
||||
and add cross-reference to the LAC section.
|
||||
(Error Reporting): When mentioning parse.error, mention LAC, and
|
||||
add cross-reference to the LAC section.
|
||||
(Tuning LR): New section with an extended version of the
|
||||
documentation removed from %define Summary. Change all
|
||||
cross-references in the manual to point here instead of there.
|
||||
(Calc++ Parser): When mentioning parse.error, mention LAC, and add
|
||||
cross-reference to the LAC section.
|
||||
(Table of Symbols): In %error-verbose entry, add cross-reference
|
||||
to Error Reporting.
|
||||
(Glossary): Capitalize entry titles consistently. Add definitions
|
||||
for "defaulted state" and "unreachable state". Expand IELR
|
||||
acronym in IELR's entry.
|
||||
|
||||
2011-02-20 Joel E. Denny <joeldenny@joeldenny.org>
|
||||
|
||||
doc: add bibliography to manual.
|
||||
|
||||
44
NEWS
44
NEWS
@@ -116,27 +116,27 @@ Bison News
|
||||
%define lr.type ielr
|
||||
%define lr.type canonical-lr
|
||||
|
||||
The default reduction optimization in the parser tables can also be
|
||||
adjusted using `%define lr.default-reductions'. See the documentation
|
||||
for `%define lr.type' and `%define lr.default-reductions' in the
|
||||
section `Bison Declaration Summary' in the Bison manual for the
|
||||
details.
|
||||
The default-reduction optimization in the parser tables can also be
|
||||
adjusted using `%define lr.default-reductions'. For details on both
|
||||
of these features, see the new section `Tuning LR' in the Bison
|
||||
manual.
|
||||
|
||||
These features are experimental. More user feedback will help to
|
||||
stabilize them.
|
||||
|
||||
** LAC (lookahead correction) for syntax error handling:
|
||||
** LAC (Lookahead Correction) for syntax error handling:
|
||||
|
||||
Canonical LR, IELR, and LALR can suffer from a couple of problems
|
||||
upon encountering a syntax error. First, the parser might perform
|
||||
additional parser stack reductions before discovering the syntax
|
||||
error. Such reductions perform user semantic actions that are
|
||||
error. Such reductions can perform user semantic actions that are
|
||||
unexpected because they are based on an invalid token, and they
|
||||
cause error recovery to begin in a different syntactic context than
|
||||
the one in which the invalid token was encountered. Second, when
|
||||
verbose error messages are enabled (with %error-verbose or `#define
|
||||
YYERROR_VERBOSE'), the expected token list in the syntax error
|
||||
message can both contain invalid tokens and omit valid tokens.
|
||||
verbose error messages are enabled (with %error-verbose or the
|
||||
obsolete `#define YYERROR_VERBOSE'), the expected token list in the
|
||||
syntax error message can both contain invalid tokens and omit valid
|
||||
tokens.
|
||||
|
||||
The culprits for the above problems are %nonassoc, default
|
||||
reductions in inconsistent states, and parser state merging. Thus,
|
||||
@@ -144,11 +144,11 @@ Bison News
|
||||
%nonassoc is used or if default reductions are enabled for
|
||||
inconsistent states.
|
||||
|
||||
LAC is a new mechanism within the parsing algorithm that completely
|
||||
solves these problems for canonical LR, IELR, and LALR without
|
||||
sacrificing %nonassoc, default reductions, or state mering. When
|
||||
LAC is in use, canonical LR and IELR behave exactly the same for
|
||||
both syntactically acceptable and syntactically unacceptable input.
|
||||
LAC is a new mechanism within the parsing algorithm that solves
|
||||
these problems for canonical LR, IELR, and LALR without sacrificing
|
||||
%nonassoc, default reductions, or state merging. When LAC is in
|
||||
use, canonical LR and IELR behave almost exactly the same for both
|
||||
syntactically acceptable and syntactically unacceptable input.
|
||||
While LALR still does not support the full language-recognition
|
||||
power of canonical LR and IELR, LAC at least enables LALR's syntax
|
||||
error handling to correctly reflect LALR's language-recognition
|
||||
@@ -159,8 +159,8 @@ Bison News
|
||||
|
||||
%define parse.lac full
|
||||
|
||||
See the documentation for `%define parse.lac' in the section `Bison
|
||||
Declaration Summary' in the Bison manual for additional details.
|
||||
See the new section `LAC' in the Bison manual for additional
|
||||
details including a few caveats.
|
||||
|
||||
LAC is an experimental feature. More user feedback will help to
|
||||
stabilize it.
|
||||
@@ -314,11 +314,11 @@ Bison News
|
||||
|
||||
** Verbose syntax error message fixes:
|
||||
|
||||
When %error-verbose or `#define YYERROR_VERBOSE' is specified,
|
||||
syntax error messages produced by the generated parser include the
|
||||
unexpected token as well as a list of expected tokens. The effect
|
||||
of %nonassoc on these verbose messages has been corrected in two
|
||||
ways, but a complete fix requires LAC, described above:
|
||||
When %error-verbose or the obsolete `#define YYERROR_VERBOSE' is
|
||||
specified, syntax error messages produced by the generated parser
|
||||
include the unexpected token as well as a list of expected tokens.
|
||||
The effect of %nonassoc on these verbose messages has been corrected
|
||||
in two ways, but a more complete fix requires LAC, described above:
|
||||
|
||||
*** When %nonassoc is used, there can exist parser states that accept no
|
||||
tokens, and so the parser does not always require a lookahead token
|
||||
|
||||
@@ -266,6 +266,7 @@ The Bison Parser Algorithm
|
||||
* Parser States:: The parser is a finite-state-machine with stack.
|
||||
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
||||
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
||||
* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
|
||||
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
||||
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
||||
|
||||
@@ -277,6 +278,13 @@ Operator Precedence
|
||||
* Precedence Examples:: How these features are used in the previous example.
|
||||
* How Precedence:: How they work.
|
||||
|
||||
Tuning LR
|
||||
|
||||
* LR Table Construction:: Choose a different construction algorithm.
|
||||
* Default Reductions:: Disable default reductions.
|
||||
* LAC:: Correct lookahead sets in the parser states.
|
||||
* Unreachable States:: Keep unreachable parser states for debugging.
|
||||
|
||||
Handling Context Dependencies
|
||||
|
||||
* Semantic Tokens:: Token parsing can depend on the semantic context.
|
||||
@@ -473,21 +481,19 @@ order to specify the language Algol 60. Any grammar expressed in
|
||||
BNF is a context-free grammar. The input to Bison is
|
||||
essentially machine-readable BNF.
|
||||
|
||||
@cindex LALR(1) grammars
|
||||
@cindex IELR(1) grammars
|
||||
@cindex LR(1) grammars
|
||||
There are various important subclasses of context-free grammars.
|
||||
Although it can handle almost all context-free grammars, Bison is
|
||||
optimized for what are called LR(1) grammars.
|
||||
In brief, in these grammars, it must be possible to tell how to parse
|
||||
any portion of an input string with just a single token of lookahead.
|
||||
For historical reasons, Bison by default is limited by the additional
|
||||
restrictions of LALR(1), which is hard to explain simply.
|
||||
@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
|
||||
more information on this.
|
||||
As an experimental feature, you can escape these additional restrictions by
|
||||
requesting IELR(1) or canonical LR(1) parser tables.
|
||||
@xref{%define Summary,,lr.type}, to learn how.
|
||||
@cindex LALR grammars
|
||||
@cindex IELR grammars
|
||||
@cindex LR grammars
|
||||
There are various important subclasses of context-free grammars. Although
|
||||
it can handle almost all context-free grammars, Bison is optimized for what
|
||||
are called LR(1) grammars. In brief, in these grammars, it must be possible
|
||||
to tell how to parse any portion of an input string with just a single token
|
||||
of lookahead. For historical reasons, Bison by default is limited by the
|
||||
additional restrictions of LALR(1), which is hard to explain simply.
|
||||
@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more
|
||||
information on this. As an experimental feature, you can escape these
|
||||
additional restrictions by requesting IELR(1) or canonical LR(1) parser
|
||||
tables. @xref{LR Table Construction}, to learn how.
|
||||
|
||||
@cindex GLR parsing
|
||||
@cindex generalized LR (GLR) parsing
|
||||
@@ -5352,65 +5358,17 @@ Boolean.
|
||||
@c ================================================== lr.default-reductions
|
||||
|
||||
@item lr.default-reductions
|
||||
@cindex default reductions
|
||||
@findex %define lr.default-reductions
|
||||
@cindex delayed syntax errors
|
||||
@cindex syntax errors delayed
|
||||
@cindex LAC
|
||||
@findex %nonassoc
|
||||
|
||||
@itemize @bullet
|
||||
@item Language(s): all
|
||||
|
||||
@item Purpose: Specify the kind of states that are permitted to
|
||||
contain default reductions.
|
||||
That is, in such a state, Bison selects the reduction with the largest
|
||||
lookahead set to be the default parser action and then removes that
|
||||
lookahead set.
|
||||
(The ability to specify where default reductions should be used is
|
||||
experimental.
|
||||
More user feedback will help to stabilize it.)
|
||||
|
||||
@item Accepted Values:
|
||||
@itemize
|
||||
@item @code{all}.
|
||||
This is the traditional Bison behavior. The main advantage is a
|
||||
significant decrease in the size of the parser tables. The
|
||||
disadvantage is that, when the generated parser encounters a
|
||||
syntactically unacceptable token, the parser might then perform
|
||||
unnecessary default reductions before it can detect the syntax error.
|
||||
Such delayed syntax error detection is usually inherent in LALR and
|
||||
IELR parser tables anyway due to LR state merging (@pxref{%define
|
||||
Summary,,lr.type}). Furthermore, the use of @code{%nonassoc} can
|
||||
contribute to delayed syntax error detection even in the case of
|
||||
canonical LR. As an experimental feature, delayed syntax error
|
||||
detection can be overcome in all cases by enabling LAC (@pxref{%define
|
||||
Summary,,parse.lac}, for details, including a discussion of the
|
||||
effects of delayed syntax error detection).
|
||||
|
||||
@item @code{consistent}.
|
||||
@cindex consistent states
|
||||
A consistent state is a state that has only one possible action.
|
||||
If that action is a reduction, then the parser does not need to request
|
||||
a lookahead token from the scanner before performing that action.
|
||||
However, the parser recognizes the ability to ignore the lookahead token
|
||||
in this way only when such a reduction is encoded as a default
|
||||
reduction.
|
||||
Thus, if default reductions are permitted only in consistent states,
|
||||
then a canonical LR parser that does not employ
|
||||
@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
|
||||
syntactically unacceptable token from the scanner.
|
||||
|
||||
@item @code{accepting}.
|
||||
@cindex accepting state
|
||||
In the accepting state, the default reduction is actually the accept
|
||||
action.
|
||||
In this case, a canonical LR parser that does not employ
|
||||
@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
|
||||
syntactically unacceptable token in the input.
|
||||
That is, it does not perform any extra reductions.
|
||||
@end itemize
|
||||
contain default reductions. @xref{Default Reductions}. (The ability to
|
||||
specify where default reductions should be used is experimental. More user
|
||||
feedback will help to stabilize it.)
|
||||
|
||||
@item Accepted Values: @code{all}, @code{consistent}, @code{accepting}
|
||||
@item Default Value:
|
||||
@itemize
|
||||
@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
|
||||
@@ -5425,42 +5383,10 @@ That is, it does not perform any extra reductions.
|
||||
|
||||
@itemize @bullet
|
||||
@item Language(s): all
|
||||
|
||||
@item Purpose: Request that Bison allow unreachable parser states to
|
||||
remain in the parser tables.
|
||||
Bison considers a state to be unreachable if there exists no sequence of
|
||||
transitions from the start state to that state.
|
||||
A state can become unreachable during conflict resolution if Bison disables a
|
||||
shift action leading to it from a predecessor state.
|
||||
Keeping unreachable states is sometimes useful for analysis purposes, but they
|
||||
are useless in the generated parser.
|
||||
|
||||
remain in the parser tables. @xref{Unreachable States}.
|
||||
@item Accepted Values: Boolean
|
||||
|
||||
@item Default Value: @code{false}
|
||||
|
||||
@item Caveats:
|
||||
|
||||
@itemize @bullet
|
||||
|
||||
@item Unreachable states may contain conflicts and may use rules not used in
|
||||
any other state.
|
||||
Thus, keeping unreachable states may induce warnings that are irrelevant to
|
||||
your parser's behavior, and it may eliminate warnings that are relevant.
|
||||
Of course, the change in warnings may actually be relevant to a parser table
|
||||
analysis that wants to keep unreachable states, so this behavior will likely
|
||||
remain in future Bison releases.
|
||||
|
||||
@item While Bison is able to remove unreachable states, it is not guaranteed to
|
||||
remove other kinds of useless states.
|
||||
Specifically, when Bison disables reduce actions during conflict resolution,
|
||||
some goto actions may become useless, and thus some additional states may
|
||||
become useless.
|
||||
If Bison were to compute which goto actions were useless and then disable those
|
||||
actions, it could identify such states as unreachable and then remove those
|
||||
states.
|
||||
However, Bison does not compute which goto actions are useless.
|
||||
@end itemize
|
||||
@end itemize
|
||||
@c lr.keep-unreachable-states
|
||||
|
||||
@@ -5468,87 +5394,15 @@ However, Bison does not compute which goto actions are useless.
|
||||
|
||||
@item lr.type
|
||||
@findex %define lr.type
|
||||
@cindex LALR
|
||||
@cindex IELR
|
||||
@cindex LR
|
||||
|
||||
@itemize @bullet
|
||||
@item Language(s): all
|
||||
|
||||
@item Purpose: Specify the type of parser tables within the
|
||||
LR(1) family.
|
||||
(This feature is experimental.
|
||||
LR(1) family. @xref{LR Table Construction}. (This feature is experimental.
|
||||
More user feedback will help to stabilize it.)
|
||||
|
||||
@item Accepted Values:
|
||||
@itemize
|
||||
@item @code{lalr}.
|
||||
While Bison generates LALR parser tables by default for
|
||||
historical reasons, IELR or canonical LR is almost
|
||||
always preferable for deterministic parsers.
|
||||
The trouble is that LALR parser tables can suffer from
|
||||
mysterious conflicts and thus may not accept the full set of sentences
|
||||
that IELR and canonical LR accept.
|
||||
@xref{Mystery Conflicts}, for details.
|
||||
However, there are at least two scenarios where LALR may be
|
||||
worthwhile:
|
||||
@itemize
|
||||
@cindex GLR with LALR
|
||||
@item When employing GLR parsers (@pxref{GLR Parsers}), if you
|
||||
do not resolve any conflicts statically (for example, with @code{%left}
|
||||
or @code{%prec}), then the parser explores all potential parses of any
|
||||
given input.
|
||||
In this case, the use of LALR parser tables is guaranteed not
|
||||
to alter the language accepted by the parser.
|
||||
LALR parser tables are the smallest parser tables Bison can
|
||||
currently generate, so they may be preferable.
|
||||
Nevertheless, once you begin to resolve conflicts statically,
|
||||
GLR begins to behave more like a deterministic parser, and so
|
||||
IELR and canonical LR can be helpful to avoid
|
||||
LALR's mysterious behavior.
|
||||
|
||||
@item Occasionally during development, an especially malformed grammar
|
||||
with a major recurring flaw may severely impede the IELR or
|
||||
canonical LR parser table generation algorithm.
|
||||
LALR can be a quick way to generate parser tables in order to
|
||||
investigate such problems while ignoring the more subtle differences
|
||||
from IELR and canonical LR.
|
||||
@end itemize
|
||||
|
||||
@item @code{ielr}.
|
||||
IELR is a minimal LR algorithm.
|
||||
That is, given any grammar (LR or non-LR),
|
||||
IELR and canonical LR always accept exactly the same
|
||||
set of sentences.
|
||||
However, as for LALR, the number of parser states is often an
|
||||
order of magnitude less for IELR than for canonical
|
||||
LR.
|
||||
More importantly, because canonical LR's extra parser states
|
||||
may contain duplicate conflicts in the case of non-LR
|
||||
grammars, the number of conflicts for IELR is often an order
|
||||
of magnitude less as well.
|
||||
This can significantly reduce the complexity of developing of a grammar.
|
||||
|
||||
@item @code{canonical-lr}.
|
||||
@cindex delayed syntax errors
|
||||
@cindex syntax errors delayed
|
||||
@cindex LAC
|
||||
@findex %nonassoc
|
||||
While inefficient, canonical LR parser tables can be an interesting
|
||||
means to explore a grammar because they have a property that IELR and
|
||||
LALR tables do not. That is, if @code{%nonassoc} is not used and
|
||||
default reductions are left disabled (@pxref{%define
|
||||
Summary,,lr.default-reductions}), then, for every left context of
|
||||
every canonical LR state, the set of tokens accepted by that state is
|
||||
guaranteed to be the exact set of tokens that is syntactically
|
||||
acceptable in that left context. It might then seem that an advantage
|
||||
of canonical LR parsers in production is that, under the above
|
||||
constraints, they are guaranteed to detect a syntax error as soon as
|
||||
possible without performing any unnecessary reductions. However, IELR
|
||||
parsers using LAC (@pxref{%define Summary,,parse.lac}) are also able
|
||||
to achieve this behavior without sacrificing @code{%nonassoc} or
|
||||
default reductions.
|
||||
@end itemize
|
||||
@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
|
||||
|
||||
@item Default Value: @code{lalr}
|
||||
@end itemize
|
||||
@@ -5596,8 +5450,9 @@ function. @xref{Error Reporting, ,The Error Reporting Function
|
||||
Error messages passed to @code{yyerror} are simply @w{@code{"syntax
|
||||
error"}}.
|
||||
@item @code{verbose}
|
||||
Error messages report the unexpected token, and possibly the expected
|
||||
ones.
|
||||
Error messages report the unexpected token, and possibly the expected ones.
|
||||
However, this report can often be incorrect when LAC is not enabled
|
||||
(@pxref{LAC}).
|
||||
@end itemize
|
||||
|
||||
@item Default Value:
|
||||
@@ -5609,84 +5464,13 @@ ones.
|
||||
@c ================================================== parse.lac
|
||||
@item parse.lac
|
||||
@findex %define parse.lac
|
||||
@cindex LAC
|
||||
@cindex lookahead correction
|
||||
|
||||
@itemize
|
||||
@item Languages(s): C
|
||||
@item Languages(s): C (deterministic parsers only)
|
||||
|
||||
@item Purpose: Enable LAC (lookahead correction) to improve
|
||||
syntax error handling.
|
||||
|
||||
Canonical LR, IELR, and LALR can suffer
|
||||
from a couple of problems upon encountering a syntax error. First, the
|
||||
parser might perform additional parser stack reductions before
|
||||
discovering the syntax error. Such reductions perform user semantic
|
||||
actions that are unexpected because they are based on an invalid token,
|
||||
and they cause error recovery to begin in a different syntactic context
|
||||
than the one in which the invalid token was encountered. Second, when
|
||||
verbose error messages are enabled (with @code{%error-verbose} or
|
||||
@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
|
||||
error message can both contain invalid tokens and omit valid tokens.
|
||||
|
||||
The culprits for the above problems are @code{%nonassoc}, default
|
||||
reductions in inconsistent states, and parser state merging. Thus,
|
||||
IELR and LALR suffer the most. Canonical
|
||||
LR can suffer only if @code{%nonassoc} is used or if default
|
||||
reductions are enabled for inconsistent states.
|
||||
|
||||
LAC is a new mechanism within the parsing algorithm that
|
||||
completely solves these problems for canonical LR,
|
||||
IELR, and LALR without sacrificing @code{%nonassoc},
|
||||
default reductions, or state mering. Conceptually, the mechanism is
|
||||
straight-forward. Whenever the parser fetches a new token from the
|
||||
scanner so that it can determine the next parser action, it immediately
|
||||
suspends normal parsing and performs an exploratory parse using a
|
||||
temporary copy of the normal parser state stack. During this
|
||||
exploratory parse, the parser does not perform user semantic actions.
|
||||
If the exploratory parse reaches a shift action, normal parsing then
|
||||
resumes on the normal parser stacks. If the exploratory parse reaches
|
||||
an error instead, the parser reports a syntax error. If verbose syntax
|
||||
error messages are enabled, the parser must then discover the list of
|
||||
expected tokens, so it performs a separate exploratory parse for each
|
||||
token in the grammar.
|
||||
|
||||
There is one subtlety about the use of LAC. That is, when in a
|
||||
consistent parser state with a default reduction, the parser will not
|
||||
attempt to fetch a token from the scanner because no lookahead is
|
||||
needed to determine the next parser action. Thus, whether default
|
||||
reductions are enabled in consistent states (@pxref{%define
|
||||
Summary,,lr.default-reductions}) affects how soon the parser detects a
|
||||
syntax error: when it @emph{reaches} an erroneous token or when it
|
||||
eventually @emph{needs} that token as a lookahead. The latter
|
||||
behavior is probably more intuitive, so Bison currently provides no
|
||||
way to achieve the former behavior while default reductions are fully
|
||||
enabled.
|
||||
|
||||
Thus, when LAC is in use, for some fixed decision of whether
|
||||
to enable default reductions in consistent states, canonical
|
||||
LR and IELR behave exactly the same for both
|
||||
syntactically acceptable and syntactically unacceptable input. While
|
||||
LALR still does not support the full language-recognition
|
||||
power of canonical LR and IELR, LAC at
|
||||
least enables LALR's syntax error handling to correctly
|
||||
reflect LALR's language-recognition power.
|
||||
|
||||
Because LAC requires many parse actions to be performed twice,
|
||||
it can have a performance penalty. However, not all parse actions must
|
||||
be performed twice. Specifically, during a series of default reductions
|
||||
in consistent states and shift actions, the parser never has to initiate
|
||||
an exploratory parse. Moreover, the most time-consuming tasks in a
|
||||
parse are often the file I/O, the lexical analysis performed by the
|
||||
scanner, and the user's semantic actions, but none of these are
|
||||
performed during the exploratory parse. Finally, the base of the
|
||||
temporary stack used during an exploratory parse is a pointer into the
|
||||
normal parser state stack so that the stack is never physically copied.
|
||||
In our experience, the performance penalty of LAC has proven
|
||||
insignificant for practical grammars.
|
||||
|
||||
syntax error handling. @xref{LAC}.
|
||||
@item Accepted Values: @code{none}, @code{full}
|
||||
|
||||
@item Default Value: @code{none}
|
||||
@end itemize
|
||||
@c parse.lac
|
||||
@@ -6328,10 +6112,11 @@ receives one argument. For a syntax error, the string is normally
|
||||
@w{@code{"syntax error"}}.
|
||||
|
||||
@findex %define parse.error
|
||||
If you invoke @samp{%define parse.error verbose} in the Bison
|
||||
declarations section (@pxref{Bison Declarations, ,The Bison Declarations
|
||||
Section}), then Bison provides a more verbose and specific error message
|
||||
string instead of just plain @w{@code{"syntax error"}}.
|
||||
If you invoke @samp{%define parse.error verbose} in the Bison declarations
|
||||
section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
|
||||
Bison provides a more verbose and specific error message string instead of
|
||||
just plain @w{@code{"syntax error"}}. However, that message sometimes
|
||||
contains incorrect information if LAC is not enabled (@pxref{LAC}).
|
||||
|
||||
The parser can detect one other kind of error: memory exhaustion. This
|
||||
can happen when the input contains constructions that are very deeply
|
||||
@@ -6732,6 +6517,7 @@ This kind of parser is known in the literature as a bottom-up parser.
|
||||
* Parser States:: The parser is a finite-state-machine with stack.
|
||||
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
||||
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
||||
* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
|
||||
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
||||
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
||||
@end menu
|
||||
@@ -7301,6 +7087,7 @@ redirects:redirect
|
||||
|
||||
@node Mystery Conflicts
|
||||
@section Mysterious Reduce/Reduce Conflicts
|
||||
@cindex Mysterious Conflicts
|
||||
|
||||
Sometimes reduce/reduce conflicts can occur that don't look warranted.
|
||||
Here is an example:
|
||||
@@ -7342,8 +7129,8 @@ of lookahead: when a @code{param_spec} is being read, an @code{ID} is
|
||||
a @code{name} if a comma or colon follows, or a @code{type} if another
|
||||
@code{ID} follows. In other words, this grammar is LR(1).
|
||||
|
||||
@cindex LR(1)
|
||||
@cindex LALR(1)
|
||||
@cindex LR
|
||||
@cindex LALR
|
||||
However, for historical reasons, Bison cannot by default handle all
|
||||
LR(1) grammars.
|
||||
In this grammar, two contexts, that after an @code{ID} at the beginning
|
||||
@@ -7358,15 +7145,16 @@ contexts, so it makes a single parser state for them both. Combining
|
||||
the two contexts causes a conflict later. In parser terminology, this
|
||||
occurrence means that the grammar is not LALR(1).
|
||||
|
||||
For many practical grammars (specifically those that fall into the
|
||||
non-LR(1) class), the limitations of LALR(1) result in difficulties
|
||||
beyond just mysterious reduce/reduce conflicts. The best way to fix
|
||||
all these problems is to select a different parser table generation
|
||||
algorithm. Either IELR(1) or canonical LR(1) would suffice, but the
|
||||
former is more efficient and easier to debug during development.
|
||||
@xref{%define Summary,,lr.type}, for details. (Bison's IELR(1) and
|
||||
canonical LR(1) implementations are experimental. More user feedback
|
||||
will help to stabilize them.)
|
||||
@cindex IELR
|
||||
@cindex canonical LR
|
||||
For many practical grammars (specifically those that fall into the non-LR(1)
|
||||
class), the limitations of LALR(1) result in difficulties beyond just
|
||||
mysterious reduce/reduce conflicts. The best way to fix all these problems
|
||||
is to select a different parser table construction algorithm. Either
|
||||
IELR(1) or canonical LR(1) would suffice, but the former is more efficient
|
||||
and easier to debug during development. @xref{LR Table Construction}, for
|
||||
details. (Bison's IELR(1) and canonical LR(1) implementations are
|
||||
experimental. More user feedback will help to stabilize them.)
|
||||
|
||||
If you instead wish to work around LALR(1)'s limitations, you
|
||||
can often fix a mysterious conflict by identifying the two parser states
|
||||
@@ -7417,6 +7205,409 @@ return_spec:
|
||||
For a more detailed exposition of LALR(1) parsers and parser
|
||||
generators, @pxref{Bibliography,,DeRemer 1982}.
|
||||
|
||||
@node Tuning LR
|
||||
@section Tuning LR
|
||||
|
||||
The default behavior of Bison's LR-based parsers is chosen mostly for
|
||||
historical reasons, but that behavior is often not robust. For example, in
|
||||
the previous section, we discussed the mysterious conflicts that can be
|
||||
produced by LALR(1), Bison's default parser table construction algorithm.
|
||||
Another example is Bison's @code{%define parse.error verbose} directive,
|
||||
which instructs the generated parser to produce verbose syntax error
|
||||
messages, which can sometimes contain incorrect information.
|
||||
|
||||
In this section, we explore several modern features of Bison that allow you
|
||||
to tune fundamental aspects of the generated LR-based parsers. Some of
|
||||
these features easily eliminate shortcomings like those mentioned above.
|
||||
Others can be helpful purely for understanding your parser.
|
||||
|
||||
Most of the features discussed in this section are still experimental. More
|
||||
user feedback will help to stabilize them.
|
||||
|
||||
@menu
|
||||
* LR Table Construction:: Choose a different construction algorithm.
|
||||
* Default Reductions:: Disable default reductions.
|
||||
* LAC:: Correct lookahead sets in the parser states.
|
||||
* Unreachable States:: Keep unreachable parser states for debugging.
|
||||
@end menu
|
||||
|
||||
@node LR Table Construction
|
||||
@subsection LR Table Construction
|
||||
@cindex Mysterious Conflict
|
||||
@cindex LALR
|
||||
@cindex IELR
|
||||
@cindex canonical LR
|
||||
@findex %define lr.type
|
||||
|
||||
For historical reasons, Bison constructs LALR(1) parser tables by default.
|
||||
However, LALR does not possess the full language-recognition power of LR.
|
||||
As a result, the behavior of parsers employing LALR parser tables is often
|
||||
mysterious. We presented a simple example of this effect in @ref{Mystery
|
||||
Conflicts}.
|
||||
|
||||
As we also demonstrated in that example, the traditional approach to
|
||||
eliminating such mysterious behavior is to restructure the grammar.
|
||||
Unfortunately, doing so correctly is often difficult. Moreover, merely
|
||||
discovering that LALR causes mysterious behavior in your parser can be
|
||||
difficult as well.
|
||||
|
||||
Fortunately, Bison provides an easy way to eliminate the possibility of such
|
||||
mysterious behavior altogether. You simply need to activate a more powerful
|
||||
parser table construction algorithm by using the @code{%define lr.type}
|
||||
directive.
|
||||
|
||||
@deffn {Directive} {%define lr.type @var{TYPE}}
|
||||
Specify the type of parser tables within the LR(1) family. The accepted
|
||||
values for @var{TYPE} are:
|
||||
|
||||
@itemize
|
||||
@item @code{lalr} (default)
|
||||
@item @code{ielr}
|
||||
@item @code{canonical-lr}
|
||||
@end itemize
|
||||
|
||||
(This feature is experimental. More user feedback will help to stabilize
|
||||
it.)
|
||||
@end deffn
|
||||
|
||||
For example, to activate IELR, you might add the following directive to you
|
||||
grammar file:
|
||||
|
||||
@example
|
||||
%define lr.type ielr
|
||||
@end example
|
||||
|
||||
@noindent For the example in @ref{Mystery Conflicts}, the mysterious
|
||||
conflict is then eliminated, so there is no need to invest time in
|
||||
comprehending the conflict or restructuring the grammar to fix it. If,
|
||||
during future development, the grammar evolves such that all mysterious
|
||||
behavior would have disappeared using just LALR, you need not fear that
|
||||
continuing to use IELR will result in unnecessarily large parser tables.
|
||||
That is, IELR generates LALR tables when LALR (using a deterministic parsing
|
||||
algorithm) is sufficient to support the full language-recognition power of
|
||||
LR. Thus, by enabling IELR at the start of grammar development, you can
|
||||
safely and completely eliminate the need to consider LALR's shortcomings.
|
||||
|
||||
While IELR is almost always preferable, there are circumstances where LALR
|
||||
or the canonical LR parser tables described by Knuth
|
||||
(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the
|
||||
relative advantages of each parser table construction algorithm within
|
||||
Bison:
|
||||
|
||||
@itemize
|
||||
@item LALR
|
||||
|
||||
There are at least two scenarios where LALR can be worthwhile:
|
||||
|
||||
@itemize
|
||||
@item GLR without static conflict resolution.
|
||||
|
||||
@cindex GLR with LALR
|
||||
When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
|
||||
conflicts statically (for example, with @code{%left} or @code{%prec}), then
|
||||
the parser explores all potential parses of any given input. In this case,
|
||||
the choice of parser table construction algorithm is guaranteed not to alter
|
||||
the language accepted by the parser. LALR parser tables are the smallest
|
||||
parser tables Bison can currently construct, so they may then be preferable.
|
||||
Nevertheless, once you begin to resolve conflicts statically, GLR behaves
|
||||
more like a deterministic parser in the syntactic contexts where those
|
||||
conflicts appear, and so either IELR or canonical LR can then be helpful to
|
||||
avoid LALR's mysterious behavior.
|
||||
|
||||
@item Malformed grammars.
|
||||
|
||||
Occasionally during development, an especially malformed grammar with a
|
||||
major recurring flaw may severely impede the IELR or canonical LR parser
|
||||
table construction algorithm. LALR can be a quick way to construct parser
|
||||
tables in order to investigate such problems while ignoring the more subtle
|
||||
differences from IELR and canonical LR.
|
||||
@end itemize
|
||||
|
||||
@item IELR
|
||||
|
||||
IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
|
||||
any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
|
||||
always accept exactly the same set of sentences. However, like LALR, IELR
|
||||
merges parser states during parser table construction so that the number of
|
||||
parser states is often an order of magnitude less than for canonical LR.
|
||||
More importantly, because canonical LR's extra parser states may contain
|
||||
duplicate conflicts in the case of non-LR grammars, the number of conflicts
|
||||
for IELR is often an order of magnitude less as well. This effect can
|
||||
significantly reduce the complexity of developing a grammar.
|
||||
|
||||
@item Canonical LR
|
||||
|
||||
@cindex delayed syntax error detection
|
||||
@cindex LAC
|
||||
@findex %nonassoc
|
||||
While inefficient, canonical LR parser tables can be an interesting means to
|
||||
explore a grammar because they possess a property that IELR and LALR tables
|
||||
do not. That is, if @code{%nonassoc} is not used and default reductions are
|
||||
left disabled (@pxref{Default Reductions}), then, for every left context of
|
||||
every canonical LR state, the set of tokens accepted by that state is
|
||||
guaranteed to be the exact set of tokens that is syntactically acceptable in
|
||||
that left context. It might then seem that an advantage of canonical LR
|
||||
parsers in production is that, under the above constraints, they are
|
||||
guaranteed to detect a syntax error as soon as possible without performing
|
||||
any unnecessary reductions. However, IELR parsers that use LAC are also
|
||||
able to achieve this behavior without sacrificing @code{%nonassoc} or
|
||||
default reductions. For details and a few caveats of LAC, @pxref{LAC}.
|
||||
@end itemize
|
||||
|
||||
For a more detailed exposition of the mysterious behavior in LALR parsers
|
||||
and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
|
||||
@ref{Bibliography,,Denny 2010 November}.
|
||||
|
||||
@node Default Reductions
|
||||
@subsection Default Reductions
|
||||
@cindex default reductions
|
||||
@findex %define lr.default-reductions
|
||||
@findex %nonassoc
|
||||
|
||||
After parser table construction, Bison identifies the reduction with the
|
||||
largest lookahead set in each parser state. To reduce the size of the
|
||||
parser state, traditional Bison behavior is to remove that lookahead set and
|
||||
to assign that reduction to be the default parser action. Such a reduction
|
||||
is known as a @dfn{default reduction}.
|
||||
|
||||
Default reductions affect more than the size of the parser tables. They
|
||||
also affect the behavior of the parser:
|
||||
|
||||
@itemize
|
||||
@item Delayed @code{yylex} invocations.
|
||||
|
||||
@cindex delayed yylex invocations
|
||||
@cindex consistent states
|
||||
@cindex defaulted states
|
||||
A @dfn{consistent state} is a state that has only one possible parser
|
||||
action. If that action is a reduction and is encoded as a default
|
||||
reduction, then that consistent state is called a @dfn{defaulted state}.
|
||||
Upon reaching a defaulted state, a Bison-generated parser does not bother to
|
||||
invoke @code{yylex} to fetch the next token before performing the reduction.
|
||||
In other words, whether default reductions are enabled in consistent states
|
||||
determines how soon a Bison-generated parser invokes @code{yylex} for a
|
||||
token: immediately when it @emph{reaches} that token in the input or when it
|
||||
eventually @emph{needs} that token as a lookahead to determine the next
|
||||
parser action. Traditionally, default reductions are enabled, and so the
|
||||
parser exhibits the latter behavior.
|
||||
|
||||
The presence of defaulted states is an important consideration when
|
||||
designing @code{yylex} and the grammar file. That is, if the behavior of
|
||||
@code{yylex} can influence or be influenced by the semantic actions
|
||||
associated with the reductions in defaulted states, then the delay of the
|
||||
next @code{yylex} invocation until after those reductions is significant.
|
||||
For example, the semantic actions might pop a scope stack that @code{yylex}
|
||||
uses to determine what token to return. Thus, the delay might be necessary
|
||||
to ensure that @code{yylex} does not look up the next token in a scope that
|
||||
should already be considered closed.
|
||||
|
||||
@item Delayed syntax error detection.
|
||||
|
||||
@cindex delayed syntax error detection
|
||||
When the parser fetches a new token by invoking @code{yylex}, it checks
|
||||
whether there is an action for that token in the current parser state. The
|
||||
parser detects a syntax error if and only if either (1) there is no action
|
||||
for that token or (2) the action for that token is the error action (due to
|
||||
the use of @code{%nonassoc}). However, if there is a default reduction in
|
||||
that state (which might or might not be a defaulted state), then it is
|
||||
impossible for condition 1 to exist. That is, all tokens have an action.
|
||||
Thus, the parser sometimes fails to detect the syntax error until it reaches
|
||||
a later state.
|
||||
|
||||
@cindex LAC
|
||||
@c If there's an infinite loop, default reductions can prevent an incorrect
|
||||
@c sentence from being rejected.
|
||||
While default reductions never cause the parser to accept syntactically
|
||||
incorrect sentences, the delay of syntax error detection can have unexpected
|
||||
effects on the behavior of the parser. However, the delay can be caused
|
||||
anyway by parser state merging and the use of @code{%nonassoc}, and it can
|
||||
be fixed by another Bison feature, LAC. We discuss the effects of delayed
|
||||
syntax error detection and LAC more in the next section (@pxref{LAC}).
|
||||
@end itemize
|
||||
|
||||
For canonical LR, the only default reduction that Bison enables by default
|
||||
is the accept action, which appears only in the accepting state, which has
|
||||
no other action and is thus a defaulted state. However, the default accept
|
||||
action does not delay any @code{yylex} invocation or syntax error detection
|
||||
because the accept action ends the parse.
|
||||
|
||||
For LALR and IELR, Bison enables default reductions in nearly all states by
|
||||
default. There are only two exceptions. First, states that have a shift
|
||||
action on the @code{error} token do not have default reductions because
|
||||
delayed syntax error detection could then prevent the @code{error} token
|
||||
from ever being shifted in that state. However, parser state merging can
|
||||
cause the same effect anyway, and LAC fixes it in both cases, so future
|
||||
versions of Bison might drop this exception when LAC is activated. Second,
|
||||
GLR parsers do not record the default reduction as the action on a lookahead
|
||||
token for which there is a conflict. The correct action in this case is to
|
||||
split the parse instead.
|
||||
|
||||
To adjust which states have default reductions enabled, use the
|
||||
@code{%define lr.default-reductions} directive.
|
||||
|
||||
@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
|
||||
Specify the kind of states that are permitted to contain default reductions.
|
||||
The accepted values of @var{WHERE} are:
|
||||
@itemize
|
||||
@item @code{all} (default for LALR and IELR)
|
||||
@item @code{consistent}
|
||||
@item @code{accepting} (default for canonical LR)
|
||||
@end itemize
|
||||
|
||||
(The ability to specify where default reductions are permitted is
|
||||
experimental. More user feedback will help to stabilize it.)
|
||||
@end deffn
|
||||
|
||||
FIXME: Because of the exceptions described above, @code{all} is a misnomer.
|
||||
Rename to @code{full}.
|
||||
|
||||
@node LAC
|
||||
@subsection LAC
|
||||
@findex %define parse.lac
|
||||
@cindex LAC
|
||||
@cindex lookahead correction
|
||||
|
||||
Canonical LR, IELR, and LALR can suffer from a couple of problems upon
|
||||
encountering a syntax error. First, the parser might perform additional
|
||||
parser stack reductions before discovering the syntax error. Such
|
||||
reductions can perform user semantic actions that are unexpected because
|
||||
they are based on an invalid token, and they cause error recovery to begin
|
||||
in a different syntactic context than the one in which the invalid token was
|
||||
encountered. Second, when verbose error messages are enabled (@pxref{Error
|
||||
Reporting}), the expected token list in the syntax error message can both
|
||||
contain invalid tokens and omit valid tokens.
|
||||
|
||||
The culprits for the above problems are @code{%nonassoc}, default reductions
|
||||
in inconsistent states (@pxref{Default Reductions}), and parser state
|
||||
merging. Because IELR and LALR merge parser states, they suffer the most.
|
||||
Canonical LR can suffer only if @code{%nonassoc} is used or if default
|
||||
reductions are enabled for inconsistent states.
|
||||
|
||||
LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
|
||||
that solves these problems for canonical LR, IELR, and LALR without
|
||||
sacrificing @code{%nonassoc}, default reductions, or state merging. You can
|
||||
enable LAC with the @code{%define parse.lac} directive.
|
||||
|
||||
@deffn {Directive} {%define parse.lac @var{VALUE}}
|
||||
Enable LAC to improve syntax error handling.
|
||||
@itemize
|
||||
@item @code{none} (default)
|
||||
@item @code{full}
|
||||
@end itemize
|
||||
(This feature is experimental. More user feedback will help to stabilize
|
||||
it. Moreover, it is currently only available for deterministic parsers in
|
||||
C.)
|
||||
@end deffn
|
||||
|
||||
Conceptually, the LAC mechanism is straight-forward. Whenever the parser
|
||||
fetches a new token from the scanner so that it can determine the next
|
||||
parser action, it immediately suspends normal parsing and performs an
|
||||
exploratory parse using a temporary copy of the normal parser state stack.
|
||||
During this exploratory parse, the parser does not perform user semantic
|
||||
actions. If the exploratory parse reaches a shift action, normal parsing
|
||||
then resumes on the normal parser stacks. If the exploratory parse reaches
|
||||
an error instead, the parser reports a syntax error. If verbose syntax
|
||||
error messages are enabled, the parser must then discover the list of
|
||||
expected tokens, so it performs a separate exploratory parse for each token
|
||||
in the grammar.
|
||||
|
||||
There is one subtlety about the use of LAC. That is, when in a consistent
|
||||
parser state with a default reduction, the parser will not attempt to fetch
|
||||
a token from the scanner because no lookahead is needed to determine the
|
||||
next parser action. Thus, whether default reductions are enabled in
|
||||
consistent states (@pxref{Default Reductions}) affects how soon the parser
|
||||
detects a syntax error: immediately when it @emph{reaches} an erroneous
|
||||
token or when it eventually @emph{needs} that token as a lookahead to
|
||||
determine the next parser action. The latter behavior is probably more
|
||||
intuitive, so Bison currently provides no way to achieve the former behavior
|
||||
while default reductions are enabled in consistent states.
|
||||
|
||||
Thus, when LAC is in use, for some fixed decision of whether to enable
|
||||
default reductions in consistent states, canonical LR and IELR behave almost
|
||||
exactly the same for both syntactically acceptable and syntactically
|
||||
unacceptable input. While LALR still does not support the full
|
||||
language-recognition power of canonical LR and IELR, LAC at least enables
|
||||
LALR's syntax error handling to correctly reflect LALR's
|
||||
language-recognition power.
|
||||
|
||||
There are a few caveats to consider when using LAC:
|
||||
|
||||
@itemize
|
||||
@item Infinite parsing loops.
|
||||
|
||||
IELR plus LAC does have one shortcoming relative to canonical LR. Some
|
||||
parsers generated by Bison can loop infinitely. LAC does not fix infinite
|
||||
parsing loops that occur between encountering a syntax error and detecting
|
||||
it, but enabling canonical LR or disabling default reductions sometimes
|
||||
does.
|
||||
|
||||
@item Verbose error message limitations.
|
||||
|
||||
Because of internationalization considerations, Bison-generated parsers
|
||||
limit the size of the expected token list they are willing to report in a
|
||||
verbose syntax error message. If the number of expected tokens exceeds that
|
||||
limit, the list is simply dropped from the message. Enabling LAC can
|
||||
increase the size of the list and thus cause the parser to drop it. Of
|
||||
course, dropping the list is better than reporting an incorrect list.
|
||||
|
||||
@item Performance.
|
||||
|
||||
Because LAC requires many parse actions to be performed twice, it can have a
|
||||
performance penalty. However, not all parse actions must be performed
|
||||
twice. Specifically, during a series of default reductions in consistent
|
||||
states and shift actions, the parser never has to initiate an exploratory
|
||||
parse. Moreover, the most time-consuming tasks in a parse are often the
|
||||
file I/O, the lexical analysis performed by the scanner, and the user's
|
||||
semantic actions, but none of these are performed during the exploratory
|
||||
parse. Finally, the base of the temporary stack used during an exploratory
|
||||
parse is a pointer into the normal parser state stack so that the stack is
|
||||
never physically copied. In our experience, the performance penalty of LAC
|
||||
has proven insignificant for practical grammars.
|
||||
@end itemize
|
||||
|
||||
@node Unreachable States
|
||||
@subsection Unreachable States
|
||||
@findex %define lr.keep-unreachable-states
|
||||
@cindex unreachable states
|
||||
|
||||
If there exists no sequence of transitions from the parser's start state to
|
||||
some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
|
||||
state}. A state can become unreachable during conflict resolution if Bison
|
||||
disables a shift action leading to it from a predecessor state.
|
||||
|
||||
By default, Bison removes unreachable states from the parser after conflict
|
||||
resolution because they are useless in the generated parser. However,
|
||||
keeping unreachable states is sometimes useful when trying to understand the
|
||||
relationship between the parser and the grammar.
|
||||
|
||||
@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
|
||||
Request that Bison allow unreachable states to remain in the parser tables.
|
||||
@var{VALUE} must be a Boolean. The default is @code{false}.
|
||||
@end deffn
|
||||
|
||||
There are a few caveats to consider:
|
||||
|
||||
@itemize @bullet
|
||||
@item Missing or extraneous warnings.
|
||||
|
||||
Unreachable states may contain conflicts and may use rules not used in any
|
||||
other state. Thus, keeping unreachable states may induce warnings that are
|
||||
irrelevant to your parser's behavior, and it may eliminate warnings that are
|
||||
relevant. Of course, the change in warnings may actually be relevant to a
|
||||
parser table analysis that wants to keep unreachable states, so this
|
||||
behavior will likely remain in future Bison releases.
|
||||
|
||||
@item Other useless states.
|
||||
|
||||
While Bison is able to remove unreachable states, it is not guaranteed to
|
||||
remove other kinds of useless states. Specifically, when Bison disables
|
||||
reduce actions during conflict resolution, some goto actions may become
|
||||
useless, and thus some additional states may become useless. If Bison were
|
||||
to compute which goto actions were useless and then disable those actions,
|
||||
it could identify such states as unreachable and then remove those states.
|
||||
However, Bison does not compute which goto actions are useless.
|
||||
@end itemize
|
||||
|
||||
@node Generalized LR Parsing
|
||||
@section Generalized LR (GLR) Parsing
|
||||
@cindex GLR parsing
|
||||
@@ -9500,8 +9691,9 @@ propagated.
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
Use the following two directives to enable parser tracing and verbose
|
||||
error messages.
|
||||
Use the following two directives to enable parser tracing and verbose error
|
||||
messages. However, verbose error messages can contain incorrect information
|
||||
(@pxref{LAC}).
|
||||
|
||||
@comment file: calc++-parser.yy
|
||||
@example
|
||||
@@ -10908,9 +11100,9 @@ Precedence}.
|
||||
@end deffn
|
||||
@end ifset
|
||||
|
||||
@deffn {Directive} %define @var{define-variable}
|
||||
@deffnx {Directive} %define @var{define-variable} @var{value}
|
||||
@deffnx {Directive} %define @var{define-variable} "@var{value}"
|
||||
@deffn {Directive} %define @var{variable}
|
||||
@deffnx {Directive} %define @var{variable} @var{value}
|
||||
@deffnx {Directive} %define @var{variable} "@var{value}"
|
||||
Define a variable to adjust Bison's behavior. @xref{%define Summary}.
|
||||
@end deffn
|
||||
|
||||
@@ -10952,7 +11144,8 @@ token is reset to the token that originally caused the violation.
|
||||
@end deffn
|
||||
|
||||
@deffn {Directive} %error-verbose
|
||||
An obsolete directive standing for @samp{%define parse.error verbose}.
|
||||
An obsolete directive standing for @samp{%define parse.error verbose}
|
||||
(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
|
||||
@end deffn
|
||||
|
||||
@deffn {Directive} %file-prefix "@var{prefix}"
|
||||
@@ -11305,7 +11498,7 @@ Data type of semantic values; @code{int} by default.
|
||||
@cindex glossary
|
||||
|
||||
@table @asis
|
||||
@item Accepting State
|
||||
@item Accepting state
|
||||
A state whose only action is the accept action.
|
||||
The accepting state is thus a consistent state.
|
||||
@xref{Understanding,,}.
|
||||
@@ -11316,9 +11509,8 @@ by John Backus, and slightly improved by Peter Naur in his 1960-01-02
|
||||
committee document contributing to what became the Algol 60 report.
|
||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||
|
||||
@item Consistent State
|
||||
A state containing only one possible action. @xref{%define
|
||||
Summary,,lr.default-reductions}.
|
||||
@item Consistent state
|
||||
A state containing only one possible action. @xref{Default Reductions}.
|
||||
|
||||
@item Context-free grammars
|
||||
Grammars specified as rules that can be applied regardless of context.
|
||||
@@ -11327,12 +11519,15 @@ expression, integers are allowed @emph{anywhere} an expression is
|
||||
permitted. @xref{Language and Grammar, ,Languages and Context-Free
|
||||
Grammars}.
|
||||
|
||||
@item Default Reduction
|
||||
@item Default reduction
|
||||
The reduction that a parser should perform if the current parser state
|
||||
contains no other action for the lookahead token. In permitted parser
|
||||
states, Bison declares the reduction with the largest lookahead set to
|
||||
be the default reduction and removes that lookahead set.
|
||||
@xref{%define Summary,,lr.default-reductions}.
|
||||
states, Bison declares the reduction with the largest lookahead set to be
|
||||
the default reduction and removes that lookahead set. @xref{Default
|
||||
Reductions}.
|
||||
|
||||
@item Defaulted state
|
||||
A consistent state with a default reduction. @xref{Default Reductions}.
|
||||
|
||||
@item Dynamic allocation
|
||||
Allocation of memory that occurs during execution, rather than at
|
||||
@@ -11364,17 +11559,16 @@ A language construct that is (in general) grammatically divisible;
|
||||
for example, `expression' or `declaration' in C@.
|
||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||
|
||||
@item IELR(1)
|
||||
A minimal LR(1) parser table generation algorithm. That is, given any
|
||||
@item IELR(1) (Inadequacy Elimination LR(1))
|
||||
A minimal LR(1) parser table construction algorithm. That is, given any
|
||||
context-free grammar, IELR(1) generates parser tables with the full
|
||||
language recognition power of canonical LR(1) but with nearly the same
|
||||
number of parser states as LALR(1). This reduction in parser states
|
||||
is often an order of magnitude. More importantly, because canonical
|
||||
LR(1)'s extra parser states may contain duplicate conflicts in the
|
||||
case of non-LR(1) grammars, the number of conflicts for IELR(1) is
|
||||
often an order of magnitude less as well. This can significantly
|
||||
reduce the complexity of developing of a grammar. @xref{%define
|
||||
Summary,,lr.type}.
|
||||
language-recognition power of canonical LR(1) but with nearly the same
|
||||
number of parser states as LALR(1). This reduction in parser states is
|
||||
often an order of magnitude. More importantly, because canonical LR(1)'s
|
||||
extra parser states may contain duplicate conflicts in the case of non-LR(1)
|
||||
grammars, the number of conflicts for IELR(1) is often an order of magnitude
|
||||
less as well. This can significantly reduce the complexity of developing a
|
||||
grammar. @xref{LR Table Construction}.
|
||||
|
||||
@item Infix operator
|
||||
An arithmetic operator that is placed between the operands on which it
|
||||
@@ -11385,12 +11579,11 @@ A continuous flow of data between devices or programs.
|
||||
|
||||
@item LAC (Lookahead Correction)
|
||||
A parsing mechanism that fixes the problem of delayed syntax error
|
||||
detection, which is caused by LR state merging, default reductions,
|
||||
and the use of @code{%nonassoc}. Delayed syntax error detection
|
||||
results in unexpected semantic actions, initiation of error recovery
|
||||
in the wrong syntactic context, and an incorrect list of expected
|
||||
tokens in a verbose syntax error message. @xref{%define
|
||||
Summary,,parse.lac}.
|
||||
detection, which is caused by LR state merging, default reductions, and the
|
||||
use of @code{%nonassoc}. Delayed syntax error detection results in
|
||||
unexpected semantic actions, initiation of error recovery in the wrong
|
||||
syntactic context, and an incorrect list of expected tokens in a verbose
|
||||
syntax error message. @xref{LAC}.
|
||||
|
||||
@item Language construct
|
||||
One of the typical usage schemas of the language. For example, one of
|
||||
@@ -11506,6 +11699,11 @@ the lexical analyzer. @xref{Symbols}.
|
||||
A grammar symbol that has no rules in the grammar and therefore is
|
||||
grammatically indivisible. The piece of text it represents is a token.
|
||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||
|
||||
@item Unreachable state
|
||||
A parser state to which there does not exist a sequence of transitions from
|
||||
the parser's start state. A state can become unreachable during conflict
|
||||
resolution. @xref{Unreachable States}.
|
||||
@end table
|
||||
|
||||
@node Copying This Manual
|
||||
|
||||
Reference in New Issue
Block a user