mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
doc: create a new Tuning LR section in the manual.
And clean up all other documentation of the features described there. * NEWS (2.5): Tweak wording of lr.type and parse.lac entries a bit, update the cross-references to the manual, and point out that LAC has caveats. Don't be so adamant that IELR+LAC=canonical LR. That is, as the referenced section in the manual documents, LAC does not fix infinite parsing loops on syntax errors. * doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR, IELR, and LR in @cindex. (%define Summary): Condense the entries for lr.default-reductions, lr.keep-unreachable-states, lr.type, and parse.lac into brief summaries, and cross-reference the appropriate subsections of Tuning LR. For parse.lac, mention that it's only implemented for deterministic parsers in C. (Error Reporting): When mentioning %error-verbose, mention LAC, and add cross-reference to the LAC section. (Tuning LR): New section with an extended version of the documentation removed from %define Summary. Change all cross-references in the manual to point here instead of there. (Calc++ Parser): When mentioning %error-verbose, mention LAC, and add cross-reference to the LAC section. (Table of Symbols): In %error-verbose and YYERROR_VERBOSE entries, add cross-references to Error Reporting. (Glossary): Capitalize entry titles consistently. Add definitions for "defaulted state" and "unreachable state". Expand IELR acronym in IELR's entry.
This commit is contained in:
30
ChangeLog
30
ChangeLog
@@ -1,3 +1,33 @@
|
|||||||
|
2011-03-06 Joel E. Denny <joeldenny@joeldenny.org>
|
||||||
|
|
||||||
|
doc: create a new Tuning LR section in the manual.
|
||||||
|
And clean up all other documentation of the features described
|
||||||
|
there.
|
||||||
|
* NEWS (2.5): Tweak wording of lr.type and parse.lac entries a
|
||||||
|
bit, update the cross-references to the manual, and point out that
|
||||||
|
LAC has caveats. Don't be so adamant that IELR+LAC=canonical LR.
|
||||||
|
That is, as the referenced section in the manual documents, LAC
|
||||||
|
does not fix infinite parsing loops on syntax errors.
|
||||||
|
* doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR,
|
||||||
|
IELR, and LR in @cindex.
|
||||||
|
(%define Summary): Condense the entries for lr.default-reductions,
|
||||||
|
lr.keep-unreachable-states, lr.type, and parse.lac into brief
|
||||||
|
summaries, and cross-reference the appropriate subsections of
|
||||||
|
Tuning LR. For parse.lac, mention that it's only implemented for
|
||||||
|
deterministic parsers in C.
|
||||||
|
(Error Reporting): When mentioning %error-verbose, mention LAC,
|
||||||
|
and add cross-reference to the LAC section.
|
||||||
|
(Tuning LR): New section with an extended version of the
|
||||||
|
documentation removed from %define Summary. Change all
|
||||||
|
cross-references in the manual to point here instead of there.
|
||||||
|
(Calc++ Parser): When mentioning %error-verbose, mention LAC, and
|
||||||
|
add cross-reference to the LAC section.
|
||||||
|
(Table of Symbols): In %error-verbose and YYERROR_VERBOSE entries,
|
||||||
|
add cross-references to Error Reporting.
|
||||||
|
(Glossary): Capitalize entry titles consistently. Add definitions
|
||||||
|
for "defaulted state" and "unreachable state". Expand IELR
|
||||||
|
acronym in IELR's entry.
|
||||||
|
|
||||||
2011-02-20 Joel E. Denny <joeldenny@joeldenny.org>
|
2011-02-20 Joel E. Denny <joeldenny@joeldenny.org>
|
||||||
|
|
||||||
doc: add bibliography to manual.
|
doc: add bibliography to manual.
|
||||||
|
|||||||
44
NEWS
44
NEWS
@@ -57,27 +57,27 @@ Bison News
|
|||||||
%define lr.type ielr
|
%define lr.type ielr
|
||||||
%define lr.type canonical-lr
|
%define lr.type canonical-lr
|
||||||
|
|
||||||
The default reduction optimization in the parser tables can also be
|
The default-reduction optimization in the parser tables can also be
|
||||||
adjusted using `%define lr.default-reductions'. See the documentation
|
adjusted using `%define lr.default-reductions'. For details on both
|
||||||
for `%define lr.type' and `%define lr.default-reductions' in the
|
of these features, see the new section `Tuning LR' in the Bison
|
||||||
section `Bison Declaration Summary' in the Bison manual for the
|
manual.
|
||||||
details.
|
|
||||||
|
|
||||||
These features are experimental. More user feedback will help to
|
These features are experimental. More user feedback will help to
|
||||||
stabilize them.
|
stabilize them.
|
||||||
|
|
||||||
** LAC (lookahead correction) for syntax error handling:
|
** LAC (Lookahead Correction) for syntax error handling:
|
||||||
|
|
||||||
Canonical LR, IELR, and LALR can suffer from a couple of problems
|
Canonical LR, IELR, and LALR can suffer from a couple of problems
|
||||||
upon encountering a syntax error. First, the parser might perform
|
upon encountering a syntax error. First, the parser might perform
|
||||||
additional parser stack reductions before discovering the syntax
|
additional parser stack reductions before discovering the syntax
|
||||||
error. Such reductions perform user semantic actions that are
|
error. Such reductions can perform user semantic actions that are
|
||||||
unexpected because they are based on an invalid token, and they
|
unexpected because they are based on an invalid token, and they
|
||||||
cause error recovery to begin in a different syntactic context than
|
cause error recovery to begin in a different syntactic context than
|
||||||
the one in which the invalid token was encountered. Second, when
|
the one in which the invalid token was encountered. Second, when
|
||||||
verbose error messages are enabled (with %error-verbose or `#define
|
verbose error messages are enabled (with %error-verbose or the
|
||||||
YYERROR_VERBOSE'), the expected token list in the syntax error
|
obsolete `#define YYERROR_VERBOSE'), the expected token list in the
|
||||||
message can both contain invalid tokens and omit valid tokens.
|
syntax error message can both contain invalid tokens and omit valid
|
||||||
|
tokens.
|
||||||
|
|
||||||
The culprits for the above problems are %nonassoc, default
|
The culprits for the above problems are %nonassoc, default
|
||||||
reductions in inconsistent states, and parser state merging. Thus,
|
reductions in inconsistent states, and parser state merging. Thus,
|
||||||
@@ -85,11 +85,11 @@ Bison News
|
|||||||
%nonassoc is used or if default reductions are enabled for
|
%nonassoc is used or if default reductions are enabled for
|
||||||
inconsistent states.
|
inconsistent states.
|
||||||
|
|
||||||
LAC is a new mechanism within the parsing algorithm that completely
|
LAC is a new mechanism within the parsing algorithm that solves
|
||||||
solves these problems for canonical LR, IELR, and LALR without
|
these problems for canonical LR, IELR, and LALR without sacrificing
|
||||||
sacrificing %nonassoc, default reductions, or state mering. When
|
%nonassoc, default reductions, or state merging. When LAC is in
|
||||||
LAC is in use, canonical LR and IELR behave exactly the same for
|
use, canonical LR and IELR behave almost exactly the same for both
|
||||||
both syntactically acceptable and syntactically unacceptable input.
|
syntactically acceptable and syntactically unacceptable input.
|
||||||
While LALR still does not support the full language-recognition
|
While LALR still does not support the full language-recognition
|
||||||
power of canonical LR and IELR, LAC at least enables LALR's syntax
|
power of canonical LR and IELR, LAC at least enables LALR's syntax
|
||||||
error handling to correctly reflect LALR's language-recognition
|
error handling to correctly reflect LALR's language-recognition
|
||||||
@@ -100,8 +100,8 @@ Bison News
|
|||||||
|
|
||||||
%define parse.lac full
|
%define parse.lac full
|
||||||
|
|
||||||
See the documentation for `%define parse.lac' in the section `Bison
|
See the new section `LAC' in the Bison manual for additional
|
||||||
Declaration Summary' in the Bison manual for additional details.
|
details including a few caveats.
|
||||||
|
|
||||||
LAC is an experimental feature. More user feedback will help to
|
LAC is an experimental feature. More user feedback will help to
|
||||||
stabilize it.
|
stabilize it.
|
||||||
@@ -255,11 +255,11 @@ Bison News
|
|||||||
|
|
||||||
** Verbose syntax error message fixes:
|
** Verbose syntax error message fixes:
|
||||||
|
|
||||||
When %error-verbose or `#define YYERROR_VERBOSE' is specified,
|
When %error-verbose or the obsolete `#define YYERROR_VERBOSE' is
|
||||||
syntax error messages produced by the generated parser include the
|
specified, syntax error messages produced by the generated parser
|
||||||
unexpected token as well as a list of expected tokens. The effect
|
include the unexpected token as well as a list of expected tokens.
|
||||||
of %nonassoc on these verbose messages has been corrected in two
|
The effect of %nonassoc on these verbose messages has been corrected
|
||||||
ways, but a complete fix requires LAC, described above:
|
in two ways, but a more complete fix requires LAC, described above:
|
||||||
|
|
||||||
*** When %nonassoc is used, there can exist parser states that accept no
|
*** When %nonassoc is used, there can exist parser states that accept no
|
||||||
tokens, and so the parser does not always require a lookahead token
|
tokens, and so the parser does not always require a lookahead token
|
||||||
|
|||||||
@@ -265,6 +265,7 @@ The Bison Parser Algorithm
|
|||||||
* Parser States:: The parser is a finite-state-machine with stack.
|
* Parser States:: The parser is a finite-state-machine with stack.
|
||||||
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
||||||
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
||||||
|
* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
|
||||||
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
||||||
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
||||||
|
|
||||||
@@ -275,6 +276,13 @@ Operator Precedence
|
|||||||
* Precedence Examples:: How these features are used in the previous example.
|
* Precedence Examples:: How these features are used in the previous example.
|
||||||
* How Precedence:: How they work.
|
* How Precedence:: How they work.
|
||||||
|
|
||||||
|
Tuning LR
|
||||||
|
|
||||||
|
* LR Table Construction:: Choose a different construction algorithm.
|
||||||
|
* Default Reductions:: Disable default reductions.
|
||||||
|
* LAC:: Correct lookahead sets in the parser states.
|
||||||
|
* Unreachable States:: Keep unreachable parser states for debugging.
|
||||||
|
|
||||||
Handling Context Dependencies
|
Handling Context Dependencies
|
||||||
|
|
||||||
* Semantic Tokens:: Token parsing can depend on the semantic context.
|
* Semantic Tokens:: Token parsing can depend on the semantic context.
|
||||||
@@ -471,21 +479,19 @@ order to specify the language Algol 60. Any grammar expressed in
|
|||||||
BNF is a context-free grammar. The input to Bison is
|
BNF is a context-free grammar. The input to Bison is
|
||||||
essentially machine-readable BNF.
|
essentially machine-readable BNF.
|
||||||
|
|
||||||
@cindex LALR(1) grammars
|
@cindex LALR grammars
|
||||||
@cindex IELR(1) grammars
|
@cindex IELR grammars
|
||||||
@cindex LR(1) grammars
|
@cindex LR grammars
|
||||||
There are various important subclasses of context-free grammars.
|
There are various important subclasses of context-free grammars. Although
|
||||||
Although it can handle almost all context-free grammars, Bison is
|
it can handle almost all context-free grammars, Bison is optimized for what
|
||||||
optimized for what are called LR(1) grammars.
|
are called LR(1) grammars. In brief, in these grammars, it must be possible
|
||||||
In brief, in these grammars, it must be possible to tell how to parse
|
to tell how to parse any portion of an input string with just a single token
|
||||||
any portion of an input string with just a single token of lookahead.
|
of lookahead. For historical reasons, Bison by default is limited by the
|
||||||
For historical reasons, Bison by default is limited by the additional
|
additional restrictions of LALR(1), which is hard to explain simply.
|
||||||
restrictions of LALR(1), which is hard to explain simply.
|
@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more
|
||||||
@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
|
information on this. As an experimental feature, you can escape these
|
||||||
more information on this.
|
additional restrictions by requesting IELR(1) or canonical LR(1) parser
|
||||||
As an experimental feature, you can escape these additional restrictions by
|
tables. @xref{LR Table Construction}, to learn how.
|
||||||
requesting IELR(1) or canonical LR(1) parser tables.
|
|
||||||
@xref{%define Summary,,lr.type}, to learn how.
|
|
||||||
|
|
||||||
@cindex GLR parsing
|
@cindex GLR parsing
|
||||||
@cindex generalized LR (GLR) parsing
|
@cindex generalized LR (GLR) parsing
|
||||||
@@ -5150,65 +5156,17 @@ More user feedback will help to stabilize it.)
|
|||||||
@c ================================================== lr.default-reductions
|
@c ================================================== lr.default-reductions
|
||||||
|
|
||||||
@item lr.default-reductions
|
@item lr.default-reductions
|
||||||
@cindex default reductions
|
|
||||||
@findex %define lr.default-reductions
|
@findex %define lr.default-reductions
|
||||||
@cindex delayed syntax errors
|
|
||||||
@cindex syntax errors delayed
|
|
||||||
@cindex LAC
|
|
||||||
@findex %nonassoc
|
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item Language(s): all
|
@item Language(s): all
|
||||||
|
|
||||||
@item Purpose: Specify the kind of states that are permitted to
|
@item Purpose: Specify the kind of states that are permitted to
|
||||||
contain default reductions.
|
contain default reductions. @xref{Default Reductions}. (The ability to
|
||||||
That is, in such a state, Bison selects the reduction with the largest
|
specify where default reductions should be used is experimental. More user
|
||||||
lookahead set to be the default parser action and then removes that
|
feedback will help to stabilize it.)
|
||||||
lookahead set.
|
|
||||||
(The ability to specify where default reductions should be used is
|
|
||||||
experimental.
|
|
||||||
More user feedback will help to stabilize it.)
|
|
||||||
|
|
||||||
@item Accepted Values:
|
|
||||||
@itemize
|
|
||||||
@item @code{all}.
|
|
||||||
This is the traditional Bison behavior. The main advantage is a
|
|
||||||
significant decrease in the size of the parser tables. The
|
|
||||||
disadvantage is that, when the generated parser encounters a
|
|
||||||
syntactically unacceptable token, the parser might then perform
|
|
||||||
unnecessary default reductions before it can detect the syntax error.
|
|
||||||
Such delayed syntax error detection is usually inherent in LALR and
|
|
||||||
IELR parser tables anyway due to LR state merging (@pxref{%define
|
|
||||||
Summary,,lr.type}). Furthermore, the use of @code{%nonassoc} can
|
|
||||||
contribute to delayed syntax error detection even in the case of
|
|
||||||
canonical LR. As an experimental feature, delayed syntax error
|
|
||||||
detection can be overcome in all cases by enabling LAC (@pxref{%define
|
|
||||||
Summary,,parse.lac}, for details, including a discussion of the
|
|
||||||
effects of delayed syntax error detection).
|
|
||||||
|
|
||||||
@item @code{consistent}.
|
|
||||||
@cindex consistent states
|
|
||||||
A consistent state is a state that has only one possible action.
|
|
||||||
If that action is a reduction, then the parser does not need to request
|
|
||||||
a lookahead token from the scanner before performing that action.
|
|
||||||
However, the parser recognizes the ability to ignore the lookahead token
|
|
||||||
in this way only when such a reduction is encoded as a default
|
|
||||||
reduction.
|
|
||||||
Thus, if default reductions are permitted only in consistent states,
|
|
||||||
then a canonical LR parser that does not employ
|
|
||||||
@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
|
|
||||||
syntactically unacceptable token from the scanner.
|
|
||||||
|
|
||||||
@item @code{accepting}.
|
|
||||||
@cindex accepting state
|
|
||||||
In the accepting state, the default reduction is actually the accept
|
|
||||||
action.
|
|
||||||
In this case, a canonical LR parser that does not employ
|
|
||||||
@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
|
|
||||||
syntactically unacceptable token in the input.
|
|
||||||
That is, it does not perform any extra reductions.
|
|
||||||
@end itemize
|
|
||||||
|
|
||||||
|
@item Accepted Values: @code{all}, @code{consistent}, @code{accepting}
|
||||||
@item Default Value:
|
@item Default Value:
|
||||||
@itemize
|
@itemize
|
||||||
@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
|
@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
|
||||||
@@ -5223,129 +5181,25 @@ That is, it does not perform any extra reductions.
|
|||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item Language(s): all
|
@item Language(s): all
|
||||||
|
|
||||||
@item Purpose: Request that Bison allow unreachable parser states to
|
@item Purpose: Request that Bison allow unreachable parser states to
|
||||||
remain in the parser tables.
|
remain in the parser tables. @xref{Unreachable States}.
|
||||||
Bison considers a state to be unreachable if there exists no sequence of
|
|
||||||
transitions from the start state to that state.
|
|
||||||
A state can become unreachable during conflict resolution if Bison disables a
|
|
||||||
shift action leading to it from a predecessor state.
|
|
||||||
Keeping unreachable states is sometimes useful for analysis purposes, but they
|
|
||||||
are useless in the generated parser.
|
|
||||||
|
|
||||||
@item Accepted Values: Boolean
|
@item Accepted Values: Boolean
|
||||||
|
|
||||||
@item Default Value: @code{false}
|
@item Default Value: @code{false}
|
||||||
|
|
||||||
@item Caveats:
|
|
||||||
|
|
||||||
@itemize @bullet
|
|
||||||
|
|
||||||
@item Unreachable states may contain conflicts and may use rules not used in
|
|
||||||
any other state.
|
|
||||||
Thus, keeping unreachable states may induce warnings that are irrelevant to
|
|
||||||
your parser's behavior, and it may eliminate warnings that are relevant.
|
|
||||||
Of course, the change in warnings may actually be relevant to a parser table
|
|
||||||
analysis that wants to keep unreachable states, so this behavior will likely
|
|
||||||
remain in future Bison releases.
|
|
||||||
|
|
||||||
@item While Bison is able to remove unreachable states, it is not guaranteed to
|
|
||||||
remove other kinds of useless states.
|
|
||||||
Specifically, when Bison disables reduce actions during conflict resolution,
|
|
||||||
some goto actions may become useless, and thus some additional states may
|
|
||||||
become useless.
|
|
||||||
If Bison were to compute which goto actions were useless and then disable those
|
|
||||||
actions, it could identify such states as unreachable and then remove those
|
|
||||||
states.
|
|
||||||
However, Bison does not compute which goto actions are useless.
|
|
||||||
@end itemize
|
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@c ================================================== lr.type
|
@c ================================================== lr.type
|
||||||
|
|
||||||
@item lr.type
|
@item lr.type
|
||||||
@findex %define lr.type
|
@findex %define lr.type
|
||||||
@cindex LALR
|
|
||||||
@cindex IELR
|
|
||||||
@cindex LR
|
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item Language(s): all
|
@item Language(s): all
|
||||||
|
|
||||||
@item Purpose: Specify the type of parser tables within the
|
@item Purpose: Specify the type of parser tables within the
|
||||||
LR(1) family.
|
LR(1) family. @xref{LR Table Construction}. (This feature is experimental.
|
||||||
(This feature is experimental.
|
|
||||||
More user feedback will help to stabilize it.)
|
More user feedback will help to stabilize it.)
|
||||||
|
|
||||||
@item Accepted Values:
|
@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
|
||||||
@itemize
|
|
||||||
@item @code{lalr}.
|
|
||||||
While Bison generates LALR parser tables by default for
|
|
||||||
historical reasons, IELR or canonical LR is almost
|
|
||||||
always preferable for deterministic parsers.
|
|
||||||
The trouble is that LALR parser tables can suffer from
|
|
||||||
mysterious conflicts and thus may not accept the full set of sentences
|
|
||||||
that IELR and canonical LR accept.
|
|
||||||
@xref{Mystery Conflicts}, for details.
|
|
||||||
However, there are at least two scenarios where LALR may be
|
|
||||||
worthwhile:
|
|
||||||
@itemize
|
|
||||||
@cindex GLR with LALR
|
|
||||||
@item When employing GLR parsers (@pxref{GLR Parsers}), if you
|
|
||||||
do not resolve any conflicts statically (for example, with @code{%left}
|
|
||||||
or @code{%prec}), then the parser explores all potential parses of any
|
|
||||||
given input.
|
|
||||||
In this case, the use of LALR parser tables is guaranteed not
|
|
||||||
to alter the language accepted by the parser.
|
|
||||||
LALR parser tables are the smallest parser tables Bison can
|
|
||||||
currently generate, so they may be preferable.
|
|
||||||
Nevertheless, once you begin to resolve conflicts statically,
|
|
||||||
GLR begins to behave more like a deterministic parser, and so
|
|
||||||
IELR and canonical LR can be helpful to avoid
|
|
||||||
LALR's mysterious behavior.
|
|
||||||
|
|
||||||
@item Occasionally during development, an especially malformed grammar
|
|
||||||
with a major recurring flaw may severely impede the IELR or
|
|
||||||
canonical LR parser table generation algorithm.
|
|
||||||
LALR can be a quick way to generate parser tables in order to
|
|
||||||
investigate such problems while ignoring the more subtle differences
|
|
||||||
from IELR and canonical LR.
|
|
||||||
@end itemize
|
|
||||||
|
|
||||||
@item @code{ielr}.
|
|
||||||
IELR is a minimal LR algorithm.
|
|
||||||
That is, given any grammar (LR or non-LR),
|
|
||||||
IELR and canonical LR always accept exactly the same
|
|
||||||
set of sentences.
|
|
||||||
However, as for LALR, the number of parser states is often an
|
|
||||||
order of magnitude less for IELR than for canonical
|
|
||||||
LR.
|
|
||||||
More importantly, because canonical LR's extra parser states
|
|
||||||
may contain duplicate conflicts in the case of non-LR
|
|
||||||
grammars, the number of conflicts for IELR is often an order
|
|
||||||
of magnitude less as well.
|
|
||||||
This can significantly reduce the complexity of developing of a grammar.
|
|
||||||
|
|
||||||
@item @code{canonical-lr}.
|
|
||||||
@cindex delayed syntax errors
|
|
||||||
@cindex syntax errors delayed
|
|
||||||
@cindex LAC
|
|
||||||
@findex %nonassoc
|
|
||||||
While inefficient, canonical LR parser tables can be an interesting
|
|
||||||
means to explore a grammar because they have a property that IELR and
|
|
||||||
LALR tables do not. That is, if @code{%nonassoc} is not used and
|
|
||||||
default reductions are left disabled (@pxref{%define
|
|
||||||
Summary,,lr.default-reductions}), then, for every left context of
|
|
||||||
every canonical LR state, the set of tokens accepted by that state is
|
|
||||||
guaranteed to be the exact set of tokens that is syntactically
|
|
||||||
acceptable in that left context. It might then seem that an advantage
|
|
||||||
of canonical LR parsers in production is that, under the above
|
|
||||||
constraints, they are guaranteed to detect a syntax error as soon as
|
|
||||||
possible without performing any unnecessary reductions. However, IELR
|
|
||||||
parsers using LAC (@pxref{%define Summary,,parse.lac}) are also able
|
|
||||||
to achieve this behavior without sacrificing @code{%nonassoc} or
|
|
||||||
default reductions.
|
|
||||||
@end itemize
|
|
||||||
|
|
||||||
@item Default Value: @code{lalr}
|
@item Default Value: @code{lalr}
|
||||||
@end itemize
|
@end itemize
|
||||||
@@ -5405,84 +5259,13 @@ The parser namespace is @code{foo} and @code{yylex} is referenced as
|
|||||||
@c ================================================== parse.lac
|
@c ================================================== parse.lac
|
||||||
@item parse.lac
|
@item parse.lac
|
||||||
@findex %define parse.lac
|
@findex %define parse.lac
|
||||||
@cindex LAC
|
|
||||||
@cindex lookahead correction
|
|
||||||
|
|
||||||
@itemize
|
@itemize
|
||||||
@item Languages(s): C
|
@item Languages(s): C (deterministic parsers only)
|
||||||
|
|
||||||
@item Purpose: Enable LAC (lookahead correction) to improve
|
@item Purpose: Enable LAC (lookahead correction) to improve
|
||||||
syntax error handling.
|
syntax error handling. @xref{LAC}.
|
||||||
|
|
||||||
Canonical LR, IELR, and LALR can suffer
|
|
||||||
from a couple of problems upon encountering a syntax error. First, the
|
|
||||||
parser might perform additional parser stack reductions before
|
|
||||||
discovering the syntax error. Such reductions perform user semantic
|
|
||||||
actions that are unexpected because they are based on an invalid token,
|
|
||||||
and they cause error recovery to begin in a different syntactic context
|
|
||||||
than the one in which the invalid token was encountered. Second, when
|
|
||||||
verbose error messages are enabled (with @code{%error-verbose} or
|
|
||||||
@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
|
|
||||||
error message can both contain invalid tokens and omit valid tokens.
|
|
||||||
|
|
||||||
The culprits for the above problems are @code{%nonassoc}, default
|
|
||||||
reductions in inconsistent states, and parser state merging. Thus,
|
|
||||||
IELR and LALR suffer the most. Canonical
|
|
||||||
LR can suffer only if @code{%nonassoc} is used or if default
|
|
||||||
reductions are enabled for inconsistent states.
|
|
||||||
|
|
||||||
LAC is a new mechanism within the parsing algorithm that
|
|
||||||
completely solves these problems for canonical LR,
|
|
||||||
IELR, and LALR without sacrificing @code{%nonassoc},
|
|
||||||
default reductions, or state mering. Conceptually, the mechanism is
|
|
||||||
straight-forward. Whenever the parser fetches a new token from the
|
|
||||||
scanner so that it can determine the next parser action, it immediately
|
|
||||||
suspends normal parsing and performs an exploratory parse using a
|
|
||||||
temporary copy of the normal parser state stack. During this
|
|
||||||
exploratory parse, the parser does not perform user semantic actions.
|
|
||||||
If the exploratory parse reaches a shift action, normal parsing then
|
|
||||||
resumes on the normal parser stacks. If the exploratory parse reaches
|
|
||||||
an error instead, the parser reports a syntax error. If verbose syntax
|
|
||||||
error messages are enabled, the parser must then discover the list of
|
|
||||||
expected tokens, so it performs a separate exploratory parse for each
|
|
||||||
token in the grammar.
|
|
||||||
|
|
||||||
There is one subtlety about the use of LAC. That is, when in a
|
|
||||||
consistent parser state with a default reduction, the parser will not
|
|
||||||
attempt to fetch a token from the scanner because no lookahead is
|
|
||||||
needed to determine the next parser action. Thus, whether default
|
|
||||||
reductions are enabled in consistent states (@pxref{%define
|
|
||||||
Summary,,lr.default-reductions}) affects how soon the parser detects a
|
|
||||||
syntax error: when it @emph{reaches} an erroneous token or when it
|
|
||||||
eventually @emph{needs} that token as a lookahead. The latter
|
|
||||||
behavior is probably more intuitive, so Bison currently provides no
|
|
||||||
way to achieve the former behavior while default reductions are fully
|
|
||||||
enabled.
|
|
||||||
|
|
||||||
Thus, when LAC is in use, for some fixed decision of whether
|
|
||||||
to enable default reductions in consistent states, canonical
|
|
||||||
LR and IELR behave exactly the same for both
|
|
||||||
syntactically acceptable and syntactically unacceptable input. While
|
|
||||||
LALR still does not support the full language-recognition
|
|
||||||
power of canonical LR and IELR, LAC at
|
|
||||||
least enables LALR's syntax error handling to correctly
|
|
||||||
reflect LALR's language-recognition power.
|
|
||||||
|
|
||||||
Because LAC requires many parse actions to be performed twice,
|
|
||||||
it can have a performance penalty. However, not all parse actions must
|
|
||||||
be performed twice. Specifically, during a series of default reductions
|
|
||||||
in consistent states and shift actions, the parser never has to initiate
|
|
||||||
an exploratory parse. Moreover, the most time-consuming tasks in a
|
|
||||||
parse are often the file I/O, the lexical analysis performed by the
|
|
||||||
scanner, and the user's semantic actions, but none of these are
|
|
||||||
performed during the exploratory parse. Finally, the base of the
|
|
||||||
temporary stack used during an exploratory parse is a pointer into the
|
|
||||||
normal parser state stack so that the stack is never physically copied.
|
|
||||||
In our experience, the performance penalty of LAC has proven
|
|
||||||
insignificant for practical grammars.
|
|
||||||
|
|
||||||
@item Accepted Values: @code{none}, @code{full}
|
@item Accepted Values: @code{none}, @code{full}
|
||||||
|
|
||||||
@item Default Value: @code{none}
|
@item Default Value: @code{none}
|
||||||
@end itemize
|
@end itemize
|
||||||
@end itemize
|
@end itemize
|
||||||
@@ -6075,10 +5858,11 @@ receives one argument. For a syntax error, the string is normally
|
|||||||
@w{@code{"syntax error"}}.
|
@w{@code{"syntax error"}}.
|
||||||
|
|
||||||
@findex %error-verbose
|
@findex %error-verbose
|
||||||
If you invoke the directive @code{%error-verbose} in the Bison
|
If you invoke the directive @code{%error-verbose} in the Bison declarations
|
||||||
declarations section (@pxref{Bison Declarations, ,The Bison Declarations
|
section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
|
||||||
Section}), then Bison provides a more verbose and specific error message
|
Bison provides a more verbose and specific error message string instead of
|
||||||
string instead of just plain @w{@code{"syntax error"}}.
|
just plain @w{@code{"syntax error"}}. However, that message sometimes
|
||||||
|
contains incorrect information if LAC is not enabled (@pxref{LAC}).
|
||||||
|
|
||||||
The parser can detect one other kind of error: memory exhaustion. This
|
The parser can detect one other kind of error: memory exhaustion. This
|
||||||
can happen when the input contains constructions that are very deeply
|
can happen when the input contains constructions that are very deeply
|
||||||
@@ -6479,6 +6263,7 @@ This kind of parser is known in the literature as a bottom-up parser.
|
|||||||
* Parser States:: The parser is a finite-state-machine with stack.
|
* Parser States:: The parser is a finite-state-machine with stack.
|
||||||
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
||||||
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
||||||
|
* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
|
||||||
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
|
||||||
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
* Memory Management:: What happens when memory is exhausted. How to avoid it.
|
||||||
@end menu
|
@end menu
|
||||||
@@ -6996,6 +6781,7 @@ redirects:redirect
|
|||||||
|
|
||||||
@node Mystery Conflicts
|
@node Mystery Conflicts
|
||||||
@section Mysterious Reduce/Reduce Conflicts
|
@section Mysterious Reduce/Reduce Conflicts
|
||||||
|
@cindex Mysterious Conflicts
|
||||||
|
|
||||||
Sometimes reduce/reduce conflicts can occur that don't look warranted.
|
Sometimes reduce/reduce conflicts can occur that don't look warranted.
|
||||||
Here is an example:
|
Here is an example:
|
||||||
@@ -7037,8 +6823,8 @@ of lookahead: when a @code{param_spec} is being read, an @code{ID} is
|
|||||||
a @code{name} if a comma or colon follows, or a @code{type} if another
|
a @code{name} if a comma or colon follows, or a @code{type} if another
|
||||||
@code{ID} follows. In other words, this grammar is LR(1).
|
@code{ID} follows. In other words, this grammar is LR(1).
|
||||||
|
|
||||||
@cindex LR(1)
|
@cindex LR
|
||||||
@cindex LALR(1)
|
@cindex LALR
|
||||||
However, for historical reasons, Bison cannot by default handle all
|
However, for historical reasons, Bison cannot by default handle all
|
||||||
LR(1) grammars.
|
LR(1) grammars.
|
||||||
In this grammar, two contexts, that after an @code{ID} at the beginning
|
In this grammar, two contexts, that after an @code{ID} at the beginning
|
||||||
@@ -7053,15 +6839,16 @@ contexts, so it makes a single parser state for them both. Combining
|
|||||||
the two contexts causes a conflict later. In parser terminology, this
|
the two contexts causes a conflict later. In parser terminology, this
|
||||||
occurrence means that the grammar is not LALR(1).
|
occurrence means that the grammar is not LALR(1).
|
||||||
|
|
||||||
For many practical grammars (specifically those that fall into the
|
@cindex IELR
|
||||||
non-LR(1) class), the limitations of LALR(1) result in difficulties
|
@cindex canonical LR
|
||||||
beyond just mysterious reduce/reduce conflicts. The best way to fix
|
For many practical grammars (specifically those that fall into the non-LR(1)
|
||||||
all these problems is to select a different parser table generation
|
class), the limitations of LALR(1) result in difficulties beyond just
|
||||||
algorithm. Either IELR(1) or canonical LR(1) would suffice, but the
|
mysterious reduce/reduce conflicts. The best way to fix all these problems
|
||||||
former is more efficient and easier to debug during development.
|
is to select a different parser table construction algorithm. Either
|
||||||
@xref{%define Summary,,lr.type}, for details. (Bison's IELR(1) and
|
IELR(1) or canonical LR(1) would suffice, but the former is more efficient
|
||||||
canonical LR(1) implementations are experimental. More user feedback
|
and easier to debug during development. @xref{LR Table Construction}, for
|
||||||
will help to stabilize them.)
|
details. (Bison's IELR(1) and canonical LR(1) implementations are
|
||||||
|
experimental. More user feedback will help to stabilize them.)
|
||||||
|
|
||||||
If you instead wish to work around LALR(1)'s limitations, you
|
If you instead wish to work around LALR(1)'s limitations, you
|
||||||
can often fix a mysterious conflict by identifying the two parser states
|
can often fix a mysterious conflict by identifying the two parser states
|
||||||
@@ -7112,6 +6899,409 @@ return_spec:
|
|||||||
For a more detailed exposition of LALR(1) parsers and parser
|
For a more detailed exposition of LALR(1) parsers and parser
|
||||||
generators, @pxref{Bibliography,,DeRemer 1982}.
|
generators, @pxref{Bibliography,,DeRemer 1982}.
|
||||||
|
|
||||||
|
@node Tuning LR
|
||||||
|
@section Tuning LR
|
||||||
|
|
||||||
|
The default behavior of Bison's LR-based parsers is chosen mostly for
|
||||||
|
historical reasons, but that behavior is often not robust. For example, in
|
||||||
|
the previous section, we discussed the mysterious conflicts that can be
|
||||||
|
produced by LALR(1), Bison's default parser table construction algorithm.
|
||||||
|
Another example is Bison's @code{%error-verbose} directive, which instructs
|
||||||
|
the generated parser to produce verbose syntax error messages, which can
|
||||||
|
sometimes contain incorrect information.
|
||||||
|
|
||||||
|
In this section, we explore several modern features of Bison that allow you
|
||||||
|
to tune fundamental aspects of the generated LR-based parsers. Some of
|
||||||
|
these features easily eliminate shortcomings like those mentioned above.
|
||||||
|
Others can be helpful purely for understanding your parser.
|
||||||
|
|
||||||
|
Most of the features discussed in this section are still experimental. More
|
||||||
|
user feedback will help to stabilize them.
|
||||||
|
|
||||||
|
@menu
|
||||||
|
* LR Table Construction:: Choose a different construction algorithm.
|
||||||
|
* Default Reductions:: Disable default reductions.
|
||||||
|
* LAC:: Correct lookahead sets in the parser states.
|
||||||
|
* Unreachable States:: Keep unreachable parser states for debugging.
|
||||||
|
@end menu
|
||||||
|
|
||||||
|
@node LR Table Construction
|
||||||
|
@subsection LR Table Construction
|
||||||
|
@cindex Mysterious Conflict
|
||||||
|
@cindex LALR
|
||||||
|
@cindex IELR
|
||||||
|
@cindex canonical LR
|
||||||
|
@findex %define lr.type
|
||||||
|
|
||||||
|
For historical reasons, Bison constructs LALR(1) parser tables by default.
|
||||||
|
However, LALR does not possess the full language-recognition power of LR.
|
||||||
|
As a result, the behavior of parsers employing LALR parser tables is often
|
||||||
|
mysterious. We presented a simple example of this effect in @ref{Mystery
|
||||||
|
Conflicts}.
|
||||||
|
|
||||||
|
As we also demonstrated in that example, the traditional approach to
|
||||||
|
eliminating such mysterious behavior is to restructure the grammar.
|
||||||
|
Unfortunately, doing so correctly is often difficult. Moreover, merely
|
||||||
|
discovering that LALR causes mysterious behavior in your parser can be
|
||||||
|
difficult as well.
|
||||||
|
|
||||||
|
Fortunately, Bison provides an easy way to eliminate the possibility of such
|
||||||
|
mysterious behavior altogether. You simply need to activate a more powerful
|
||||||
|
parser table construction algorithm by using the @code{%define lr.type}
|
||||||
|
directive.
|
||||||
|
|
||||||
|
@deffn {Directive} {%define lr.type @var{TYPE}}
|
||||||
|
Specify the type of parser tables within the LR(1) family. The accepted
|
||||||
|
values for @var{TYPE} are:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item @code{lalr} (default)
|
||||||
|
@item @code{ielr}
|
||||||
|
@item @code{canonical-lr}
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
(This feature is experimental. More user feedback will help to stabilize
|
||||||
|
it.)
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
For example, to activate IELR, you might add the following directive to you
|
||||||
|
grammar file:
|
||||||
|
|
||||||
|
@example
|
||||||
|
%define lr.type ielr
|
||||||
|
@end example
|
||||||
|
|
||||||
|
@noindent For the example in @ref{Mystery Conflicts}, the mysterious
|
||||||
|
conflict is then eliminated, so there is no need to invest time in
|
||||||
|
comprehending the conflict or restructuring the grammar to fix it. If,
|
||||||
|
during future development, the grammar evolves such that all mysterious
|
||||||
|
behavior would have disappeared using just LALR, you need not fear that
|
||||||
|
continuing to use IELR will result in unnecessarily large parser tables.
|
||||||
|
That is, IELR generates LALR tables when LALR (using a deterministic parsing
|
||||||
|
algorithm) is sufficient to support the full language-recognition power of
|
||||||
|
LR. Thus, by enabling IELR at the start of grammar development, you can
|
||||||
|
safely and completely eliminate the need to consider LALR's shortcomings.
|
||||||
|
|
||||||
|
While IELR is almost always preferable, there are circumstances where LALR
|
||||||
|
or the canonical LR parser tables described by Knuth
|
||||||
|
(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the
|
||||||
|
relative advantages of each parser table construction algorithm within
|
||||||
|
Bison:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item LALR
|
||||||
|
|
||||||
|
There are at least two scenarios where LALR can be worthwhile:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item GLR without static conflict resolution.
|
||||||
|
|
||||||
|
@cindex GLR with LALR
|
||||||
|
When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
|
||||||
|
conflicts statically (for example, with @code{%left} or @code{%prec}), then
|
||||||
|
the parser explores all potential parses of any given input. In this case,
|
||||||
|
the choice of parser table construction algorithm is guaranteed not to alter
|
||||||
|
the language accepted by the parser. LALR parser tables are the smallest
|
||||||
|
parser tables Bison can currently construct, so they may then be preferable.
|
||||||
|
Nevertheless, once you begin to resolve conflicts statically, GLR behaves
|
||||||
|
more like a deterministic parser in the syntactic contexts where those
|
||||||
|
conflicts appear, and so either IELR or canonical LR can then be helpful to
|
||||||
|
avoid LALR's mysterious behavior.
|
||||||
|
|
||||||
|
@item Malformed grammars.
|
||||||
|
|
||||||
|
Occasionally during development, an especially malformed grammar with a
|
||||||
|
major recurring flaw may severely impede the IELR or canonical LR parser
|
||||||
|
table construction algorithm. LALR can be a quick way to construct parser
|
||||||
|
tables in order to investigate such problems while ignoring the more subtle
|
||||||
|
differences from IELR and canonical LR.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
@item IELR
|
||||||
|
|
||||||
|
IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
|
||||||
|
any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
|
||||||
|
always accept exactly the same set of sentences. However, like LALR, IELR
|
||||||
|
merges parser states during parser table construction so that the number of
|
||||||
|
parser states is often an order of magnitude less than for canonical LR.
|
||||||
|
More importantly, because canonical LR's extra parser states may contain
|
||||||
|
duplicate conflicts in the case of non-LR grammars, the number of conflicts
|
||||||
|
for IELR is often an order of magnitude less as well. This effect can
|
||||||
|
significantly reduce the complexity of developing a grammar.
|
||||||
|
|
||||||
|
@item Canonical LR
|
||||||
|
|
||||||
|
@cindex delayed syntax error detection
|
||||||
|
@cindex LAC
|
||||||
|
@findex %nonassoc
|
||||||
|
While inefficient, canonical LR parser tables can be an interesting means to
|
||||||
|
explore a grammar because they possess a property that IELR and LALR tables
|
||||||
|
do not. That is, if @code{%nonassoc} is not used and default reductions are
|
||||||
|
left disabled (@pxref{Default Reductions}), then, for every left context of
|
||||||
|
every canonical LR state, the set of tokens accepted by that state is
|
||||||
|
guaranteed to be the exact set of tokens that is syntactically acceptable in
|
||||||
|
that left context. It might then seem that an advantage of canonical LR
|
||||||
|
parsers in production is that, under the above constraints, they are
|
||||||
|
guaranteed to detect a syntax error as soon as possible without performing
|
||||||
|
any unnecessary reductions. However, IELR parsers that use LAC are also
|
||||||
|
able to achieve this behavior without sacrificing @code{%nonassoc} or
|
||||||
|
default reductions. For details and a few caveats of LAC, @pxref{LAC}.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
For a more detailed exposition of the mysterious behavior in LALR parsers
|
||||||
|
and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
|
||||||
|
@ref{Bibliography,,Denny 2010 November}.
|
||||||
|
|
||||||
|
@node Default Reductions
|
||||||
|
@subsection Default Reductions
|
||||||
|
@cindex default reductions
|
||||||
|
@findex %define lr.default-reductions
|
||||||
|
@findex %nonassoc
|
||||||
|
|
||||||
|
After parser table construction, Bison identifies the reduction with the
|
||||||
|
largest lookahead set in each parser state. To reduce the size of the
|
||||||
|
parser state, traditional Bison behavior is to remove that lookahead set and
|
||||||
|
to assign that reduction to be the default parser action. Such a reduction
|
||||||
|
is known as a @dfn{default reduction}.
|
||||||
|
|
||||||
|
Default reductions affect more than the size of the parser tables. They
|
||||||
|
also affect the behavior of the parser:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item Delayed @code{yylex} invocations.
|
||||||
|
|
||||||
|
@cindex delayed yylex invocations
|
||||||
|
@cindex consistent states
|
||||||
|
@cindex defaulted states
|
||||||
|
A @dfn{consistent state} is a state that has only one possible parser
|
||||||
|
action. If that action is a reduction and is encoded as a default
|
||||||
|
reduction, then that consistent state is called a @dfn{defaulted state}.
|
||||||
|
Upon reaching a defaulted state, a Bison-generated parser does not bother to
|
||||||
|
invoke @code{yylex} to fetch the next token before performing the reduction.
|
||||||
|
In other words, whether default reductions are enabled in consistent states
|
||||||
|
determines how soon a Bison-generated parser invokes @code{yylex} for a
|
||||||
|
token: immediately when it @emph{reaches} that token in the input or when it
|
||||||
|
eventually @emph{needs} that token as a lookahead to determine the next
|
||||||
|
parser action. Traditionally, default reductions are enabled, and so the
|
||||||
|
parser exhibits the latter behavior.
|
||||||
|
|
||||||
|
The presence of defaulted states is an important consideration when
|
||||||
|
designing @code{yylex} and the grammar file. That is, if the behavior of
|
||||||
|
@code{yylex} can influence or be influenced by the semantic actions
|
||||||
|
associated with the reductions in defaulted states, then the delay of the
|
||||||
|
next @code{yylex} invocation until after those reductions is significant.
|
||||||
|
For example, the semantic actions might pop a scope stack that @code{yylex}
|
||||||
|
uses to determine what token to return. Thus, the delay might be necessary
|
||||||
|
to ensure that @code{yylex} does not look up the next token in a scope that
|
||||||
|
should already be considered closed.
|
||||||
|
|
||||||
|
@item Delayed syntax error detection.
|
||||||
|
|
||||||
|
@cindex delayed syntax error detection
|
||||||
|
When the parser fetches a new token by invoking @code{yylex}, it checks
|
||||||
|
whether there is an action for that token in the current parser state. The
|
||||||
|
parser detects a syntax error if and only if either (1) there is no action
|
||||||
|
for that token or (2) the action for that token is the error action (due to
|
||||||
|
the use of @code{%nonassoc}). However, if there is a default reduction in
|
||||||
|
that state (which might or might not be a defaulted state), then it is
|
||||||
|
impossible for condition 1 to exist. That is, all tokens have an action.
|
||||||
|
Thus, the parser sometimes fails to detect the syntax error until it reaches
|
||||||
|
a later state.
|
||||||
|
|
||||||
|
@cindex LAC
|
||||||
|
@c If there's an infinite loop, default reductions can prevent an incorrect
|
||||||
|
@c sentence from being rejected.
|
||||||
|
While default reductions never cause the parser to accept syntactically
|
||||||
|
incorrect sentences, the delay of syntax error detection can have unexpected
|
||||||
|
effects on the behavior of the parser. However, the delay can be caused
|
||||||
|
anyway by parser state merging and the use of @code{%nonassoc}, and it can
|
||||||
|
be fixed by another Bison feature, LAC. We discuss the effects of delayed
|
||||||
|
syntax error detection and LAC more in the next section (@pxref{LAC}).
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
For canonical LR, the only default reduction that Bison enables by default
|
||||||
|
is the accept action, which appears only in the accepting state, which has
|
||||||
|
no other action and is thus a defaulted state. However, the default accept
|
||||||
|
action does not delay any @code{yylex} invocation or syntax error detection
|
||||||
|
because the accept action ends the parse.
|
||||||
|
|
||||||
|
For LALR and IELR, Bison enables default reductions in nearly all states by
|
||||||
|
default. There are only two exceptions. First, states that have a shift
|
||||||
|
action on the @code{error} token do not have default reductions because
|
||||||
|
delayed syntax error detection could then prevent the @code{error} token
|
||||||
|
from ever being shifted in that state. However, parser state merging can
|
||||||
|
cause the same effect anyway, and LAC fixes it in both cases, so future
|
||||||
|
versions of Bison might drop this exception when LAC is activated. Second,
|
||||||
|
GLR parsers do not record the default reduction as the action on a lookahead
|
||||||
|
token for which there is a conflict. The correct action in this case is to
|
||||||
|
split the parse instead.
|
||||||
|
|
||||||
|
To adjust which states have default reductions enabled, use the
|
||||||
|
@code{%define lr.default-reductions} directive.
|
||||||
|
|
||||||
|
@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
|
||||||
|
Specify the kind of states that are permitted to contain default reductions.
|
||||||
|
The accepted values of @var{WHERE} are:
|
||||||
|
@itemize
|
||||||
|
@item @code{all} (default for LALR and IELR)
|
||||||
|
@item @code{consistent}
|
||||||
|
@item @code{accepting} (default for canonical LR)
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
(The ability to specify where default reductions are permitted is
|
||||||
|
experimental. More user feedback will help to stabilize it.)
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
FIXME: Because of the exceptions described above, @code{all} is a misnomer.
|
||||||
|
Rename to @code{full}.
|
||||||
|
|
||||||
|
@node LAC
|
||||||
|
@subsection LAC
|
||||||
|
@findex %define parse.lac
|
||||||
|
@cindex LAC
|
||||||
|
@cindex lookahead correction
|
||||||
|
|
||||||
|
Canonical LR, IELR, and LALR can suffer from a couple of problems upon
|
||||||
|
encountering a syntax error. First, the parser might perform additional
|
||||||
|
parser stack reductions before discovering the syntax error. Such
|
||||||
|
reductions can perform user semantic actions that are unexpected because
|
||||||
|
they are based on an invalid token, and they cause error recovery to begin
|
||||||
|
in a different syntactic context than the one in which the invalid token was
|
||||||
|
encountered. Second, when verbose error messages are enabled (@pxref{Error
|
||||||
|
Reporting}), the expected token list in the syntax error message can both
|
||||||
|
contain invalid tokens and omit valid tokens.
|
||||||
|
|
||||||
|
The culprits for the above problems are @code{%nonassoc}, default reductions
|
||||||
|
in inconsistent states (@pxref{Default Reductions}), and parser state
|
||||||
|
merging. Because IELR and LALR merge parser states, they suffer the most.
|
||||||
|
Canonical LR can suffer only if @code{%nonassoc} is used or if default
|
||||||
|
reductions are enabled for inconsistent states.
|
||||||
|
|
||||||
|
LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
|
||||||
|
that solves these problems for canonical LR, IELR, and LALR without
|
||||||
|
sacrificing @code{%nonassoc}, default reductions, or state merging. You can
|
||||||
|
enable LAC with the @code{%define parse.lac} directive.
|
||||||
|
|
||||||
|
@deffn {Directive} {%define parse.lac @var{VALUE}}
|
||||||
|
Enable LAC to improve syntax error handling.
|
||||||
|
@itemize
|
||||||
|
@item @code{none} (default)
|
||||||
|
@item @code{full}
|
||||||
|
@end itemize
|
||||||
|
(This feature is experimental. More user feedback will help to stabilize
|
||||||
|
it. Moreover, it is currently only available for deterministic parsers in
|
||||||
|
C.)
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
Conceptually, the LAC mechanism is straight-forward. Whenever the parser
|
||||||
|
fetches a new token from the scanner so that it can determine the next
|
||||||
|
parser action, it immediately suspends normal parsing and performs an
|
||||||
|
exploratory parse using a temporary copy of the normal parser state stack.
|
||||||
|
During this exploratory parse, the parser does not perform user semantic
|
||||||
|
actions. If the exploratory parse reaches a shift action, normal parsing
|
||||||
|
then resumes on the normal parser stacks. If the exploratory parse reaches
|
||||||
|
an error instead, the parser reports a syntax error. If verbose syntax
|
||||||
|
error messages are enabled, the parser must then discover the list of
|
||||||
|
expected tokens, so it performs a separate exploratory parse for each token
|
||||||
|
in the grammar.
|
||||||
|
|
||||||
|
There is one subtlety about the use of LAC. That is, when in a consistent
|
||||||
|
parser state with a default reduction, the parser will not attempt to fetch
|
||||||
|
a token from the scanner because no lookahead is needed to determine the
|
||||||
|
next parser action. Thus, whether default reductions are enabled in
|
||||||
|
consistent states (@pxref{Default Reductions}) affects how soon the parser
|
||||||
|
detects a syntax error: immediately when it @emph{reaches} an erroneous
|
||||||
|
token or when it eventually @emph{needs} that token as a lookahead to
|
||||||
|
determine the next parser action. The latter behavior is probably more
|
||||||
|
intuitive, so Bison currently provides no way to achieve the former behavior
|
||||||
|
while default reductions are enabled in consistent states.
|
||||||
|
|
||||||
|
Thus, when LAC is in use, for some fixed decision of whether to enable
|
||||||
|
default reductions in consistent states, canonical LR and IELR behave almost
|
||||||
|
exactly the same for both syntactically acceptable and syntactically
|
||||||
|
unacceptable input. While LALR still does not support the full
|
||||||
|
language-recognition power of canonical LR and IELR, LAC at least enables
|
||||||
|
LALR's syntax error handling to correctly reflect LALR's
|
||||||
|
language-recognition power.
|
||||||
|
|
||||||
|
There are a few caveats to consider when using LAC:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item Infinite parsing loops.
|
||||||
|
|
||||||
|
IELR plus LAC does have one shortcoming relative to canonical LR. Some
|
||||||
|
parsers generated by Bison can loop infinitely. LAC does not fix infinite
|
||||||
|
parsing loops that occur between encountering a syntax error and detecting
|
||||||
|
it, but enabling canonical LR or disabling default reductions sometimes
|
||||||
|
does.
|
||||||
|
|
||||||
|
@item Verbose error message limitations.
|
||||||
|
|
||||||
|
Because of internationalization considerations, Bison-generated parsers
|
||||||
|
limit the size of the expected token list they are willing to report in a
|
||||||
|
verbose syntax error message. If the number of expected tokens exceeds that
|
||||||
|
limit, the list is simply dropped from the message. Enabling LAC can
|
||||||
|
increase the size of the list and thus cause the parser to drop it. Of
|
||||||
|
course, dropping the list is better than reporting an incorrect list.
|
||||||
|
|
||||||
|
@item Performance.
|
||||||
|
|
||||||
|
Because LAC requires many parse actions to be performed twice, it can have a
|
||||||
|
performance penalty. However, not all parse actions must be performed
|
||||||
|
twice. Specifically, during a series of default reductions in consistent
|
||||||
|
states and shift actions, the parser never has to initiate an exploratory
|
||||||
|
parse. Moreover, the most time-consuming tasks in a parse are often the
|
||||||
|
file I/O, the lexical analysis performed by the scanner, and the user's
|
||||||
|
semantic actions, but none of these are performed during the exploratory
|
||||||
|
parse. Finally, the base of the temporary stack used during an exploratory
|
||||||
|
parse is a pointer into the normal parser state stack so that the stack is
|
||||||
|
never physically copied. In our experience, the performance penalty of LAC
|
||||||
|
has proven insignificant for practical grammars.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
@node Unreachable States
|
||||||
|
@subsection Unreachable States
|
||||||
|
@findex %define lr.keep-unreachable-states
|
||||||
|
@cindex unreachable states
|
||||||
|
|
||||||
|
If there exists no sequence of transitions from the parser's start state to
|
||||||
|
some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
|
||||||
|
state}. A state can become unreachable during conflict resolution if Bison
|
||||||
|
disables a shift action leading to it from a predecessor state.
|
||||||
|
|
||||||
|
By default, Bison removes unreachable states from the parser after conflict
|
||||||
|
resolution because they are useless in the generated parser. However,
|
||||||
|
keeping unreachable states is sometimes useful when trying to understand the
|
||||||
|
relationship between the parser and the grammar.
|
||||||
|
|
||||||
|
@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
|
||||||
|
Request that Bison allow unreachable states to remain in the parser tables.
|
||||||
|
@var{VALUE} must be a Boolean. The default is @code{false}.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
There are a few caveats to consider:
|
||||||
|
|
||||||
|
@itemize @bullet
|
||||||
|
@item Missing or extraneous warnings.
|
||||||
|
|
||||||
|
Unreachable states may contain conflicts and may use rules not used in any
|
||||||
|
other state. Thus, keeping unreachable states may induce warnings that are
|
||||||
|
irrelevant to your parser's behavior, and it may eliminate warnings that are
|
||||||
|
relevant. Of course, the change in warnings may actually be relevant to a
|
||||||
|
parser table analysis that wants to keep unreachable states, so this
|
||||||
|
behavior will likely remain in future Bison releases.
|
||||||
|
|
||||||
|
@item Other useless states.
|
||||||
|
|
||||||
|
While Bison is able to remove unreachable states, it is not guaranteed to
|
||||||
|
remove other kinds of useless states. Specifically, when Bison disables
|
||||||
|
reduce actions during conflict resolution, some goto actions may become
|
||||||
|
useless, and thus some additional states may become useless. If Bison were
|
||||||
|
to compute which goto actions were useless and then disable those actions,
|
||||||
|
it could identify such states as unreachable and then remove those states.
|
||||||
|
However, Bison does not compute which goto actions are useless.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
@node Generalized LR Parsing
|
@node Generalized LR Parsing
|
||||||
@section Generalized LR (GLR) Parsing
|
@section Generalized LR (GLR) Parsing
|
||||||
@cindex GLR parsing
|
@cindex GLR parsing
|
||||||
@@ -8934,8 +9124,9 @@ automatically propagated.
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@noindent
|
@noindent
|
||||||
Use the two following directives to enable parser tracing and verbose
|
Use the two following directives to enable parser tracing and verbose error
|
||||||
error messages.
|
messages. However, verbose error messages can contain incorrect information
|
||||||
|
(@pxref{LAC}).
|
||||||
|
|
||||||
@comment file: calc++-parser.yy
|
@comment file: calc++-parser.yy
|
||||||
@example
|
@example
|
||||||
@@ -10267,9 +10458,9 @@ Precedence}.
|
|||||||
@end deffn
|
@end deffn
|
||||||
@end ifset
|
@end ifset
|
||||||
|
|
||||||
@deffn {Directive} %define @var{define-variable}
|
@deffn {Directive} %define @var{variable}
|
||||||
@deffnx {Directive} %define @var{define-variable} @var{value}
|
@deffnx {Directive} %define @var{variable} @var{value}
|
||||||
@deffnx {Directive} %define @var{define-variable} "@var{value}"
|
@deffnx {Directive} %define @var{variable} "@var{value}"
|
||||||
Define a variable to adjust Bison's behavior. @xref{%define Summary}.
|
Define a variable to adjust Bison's behavior. @xref{%define Summary}.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@@ -10312,7 +10503,7 @@ token is reset to the token that originally caused the violation.
|
|||||||
|
|
||||||
@deffn {Directive} %error-verbose
|
@deffn {Directive} %error-verbose
|
||||||
Bison declaration to request verbose, specific error message strings
|
Bison declaration to request verbose, specific error message strings
|
||||||
when @code{yyerror} is called.
|
when @code{yyerror} is called. @xref{Error Reporting}.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Directive} %file-prefix "@var{prefix}"
|
@deffn {Directive} %file-prefix "@var{prefix}"
|
||||||
@@ -10515,7 +10706,7 @@ An obsolete macro that you define with @code{#define} in the prologue
|
|||||||
to request verbose, specific error message strings
|
to request verbose, specific error message strings
|
||||||
when @code{yyerror} is called. It doesn't matter what definition you
|
when @code{yyerror} is called. It doesn't matter what definition you
|
||||||
use for @code{YYERROR_VERBOSE}, just whether you define it. Using
|
use for @code{YYERROR_VERBOSE}, just whether you define it. Using
|
||||||
@code{%error-verbose} is preferred.
|
@code{%error-verbose} is preferred. @xref{Error Reporting}.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Macro} YYINITDEPTH
|
@deffn {Macro} YYINITDEPTH
|
||||||
@@ -10655,7 +10846,7 @@ Data type of semantic values; @code{int} by default.
|
|||||||
@cindex glossary
|
@cindex glossary
|
||||||
|
|
||||||
@table @asis
|
@table @asis
|
||||||
@item Accepting State
|
@item Accepting state
|
||||||
A state whose only action is the accept action.
|
A state whose only action is the accept action.
|
||||||
The accepting state is thus a consistent state.
|
The accepting state is thus a consistent state.
|
||||||
@xref{Understanding,,}.
|
@xref{Understanding,,}.
|
||||||
@@ -10666,9 +10857,8 @@ by John Backus, and slightly improved by Peter Naur in his 1960-01-02
|
|||||||
committee document contributing to what became the Algol 60 report.
|
committee document contributing to what became the Algol 60 report.
|
||||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||||
|
|
||||||
@item Consistent State
|
@item Consistent state
|
||||||
A state containing only one possible action. @xref{%define
|
A state containing only one possible action. @xref{Default Reductions}.
|
||||||
Summary,,lr.default-reductions}.
|
|
||||||
|
|
||||||
@item Context-free grammars
|
@item Context-free grammars
|
||||||
Grammars specified as rules that can be applied regardless of context.
|
Grammars specified as rules that can be applied regardless of context.
|
||||||
@@ -10677,12 +10867,15 @@ expression, integers are allowed @emph{anywhere} an expression is
|
|||||||
permitted. @xref{Language and Grammar, ,Languages and Context-Free
|
permitted. @xref{Language and Grammar, ,Languages and Context-Free
|
||||||
Grammars}.
|
Grammars}.
|
||||||
|
|
||||||
@item Default Reduction
|
@item Default reduction
|
||||||
The reduction that a parser should perform if the current parser state
|
The reduction that a parser should perform if the current parser state
|
||||||
contains no other action for the lookahead token. In permitted parser
|
contains no other action for the lookahead token. In permitted parser
|
||||||
states, Bison declares the reduction with the largest lookahead set to
|
states, Bison declares the reduction with the largest lookahead set to be
|
||||||
be the default reduction and removes that lookahead set.
|
the default reduction and removes that lookahead set. @xref{Default
|
||||||
@xref{%define Summary,,lr.default-reductions}.
|
Reductions}.
|
||||||
|
|
||||||
|
@item Defaulted state
|
||||||
|
A consistent state with a default reduction. @xref{Default Reductions}.
|
||||||
|
|
||||||
@item Dynamic allocation
|
@item Dynamic allocation
|
||||||
Allocation of memory that occurs during execution, rather than at
|
Allocation of memory that occurs during execution, rather than at
|
||||||
@@ -10714,17 +10907,16 @@ A language construct that is (in general) grammatically divisible;
|
|||||||
for example, `expression' or `declaration' in C@.
|
for example, `expression' or `declaration' in C@.
|
||||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||||
|
|
||||||
@item IELR(1)
|
@item IELR(1) (Inadequacy Elimination LR(1))
|
||||||
A minimal LR(1) parser table generation algorithm. That is, given any
|
A minimal LR(1) parser table construction algorithm. That is, given any
|
||||||
context-free grammar, IELR(1) generates parser tables with the full
|
context-free grammar, IELR(1) generates parser tables with the full
|
||||||
language recognition power of canonical LR(1) but with nearly the same
|
language-recognition power of canonical LR(1) but with nearly the same
|
||||||
number of parser states as LALR(1). This reduction in parser states
|
number of parser states as LALR(1). This reduction in parser states is
|
||||||
is often an order of magnitude. More importantly, because canonical
|
often an order of magnitude. More importantly, because canonical LR(1)'s
|
||||||
LR(1)'s extra parser states may contain duplicate conflicts in the
|
extra parser states may contain duplicate conflicts in the case of non-LR(1)
|
||||||
case of non-LR(1) grammars, the number of conflicts for IELR(1) is
|
grammars, the number of conflicts for IELR(1) is often an order of magnitude
|
||||||
often an order of magnitude less as well. This can significantly
|
less as well. This can significantly reduce the complexity of developing a
|
||||||
reduce the complexity of developing of a grammar. @xref{%define
|
grammar. @xref{LR Table Construction}.
|
||||||
Summary,,lr.type}.
|
|
||||||
|
|
||||||
@item Infix operator
|
@item Infix operator
|
||||||
An arithmetic operator that is placed between the operands on which it
|
An arithmetic operator that is placed between the operands on which it
|
||||||
@@ -10735,12 +10927,11 @@ A continuous flow of data between devices or programs.
|
|||||||
|
|
||||||
@item LAC (Lookahead Correction)
|
@item LAC (Lookahead Correction)
|
||||||
A parsing mechanism that fixes the problem of delayed syntax error
|
A parsing mechanism that fixes the problem of delayed syntax error
|
||||||
detection, which is caused by LR state merging, default reductions,
|
detection, which is caused by LR state merging, default reductions, and the
|
||||||
and the use of @code{%nonassoc}. Delayed syntax error detection
|
use of @code{%nonassoc}. Delayed syntax error detection results in
|
||||||
results in unexpected semantic actions, initiation of error recovery
|
unexpected semantic actions, initiation of error recovery in the wrong
|
||||||
in the wrong syntactic context, and an incorrect list of expected
|
syntactic context, and an incorrect list of expected tokens in a verbose
|
||||||
tokens in a verbose syntax error message. @xref{%define
|
syntax error message. @xref{LAC}.
|
||||||
Summary,,parse.lac}.
|
|
||||||
|
|
||||||
@item Language construct
|
@item Language construct
|
||||||
One of the typical usage schemas of the language. For example, one of
|
One of the typical usage schemas of the language. For example, one of
|
||||||
@@ -10856,6 +11047,11 @@ the lexical analyzer. @xref{Symbols}.
|
|||||||
A grammar symbol that has no rules in the grammar and therefore is
|
A grammar symbol that has no rules in the grammar and therefore is
|
||||||
grammatically indivisible. The piece of text it represents is a token.
|
grammatically indivisible. The piece of text it represents is a token.
|
||||||
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
|
||||||
|
|
||||||
|
@item Unreachable state
|
||||||
|
A parser state to which there does not exist a sequence of transitions from
|
||||||
|
the parser's start state. A state can become unreachable during conflict
|
||||||
|
resolution. @xref{Unreachable States}.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
@node Copying This Manual
|
@node Copying This Manual
|
||||||
|
|||||||
Reference in New Issue
Block a user