doc: create a new Tuning LR section in the manual.

And clean up all other documentation of the features described there. * NEWS (2.5): Tweak wording of lr.type and parse.lac entries a bit, update the cross-references to the manual, and point out that LAC has caveats. Don't be so adamant that IELR+LAC=canonical LR. That is, as the referenced section in the manual documents, LAC does not fix infinite parsing loops on syntax errors. * doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR, IELR, and LR in @cindex. (%define Summary): Condense the entries for lr.default-reductions, lr.keep-unreachable-states, lr.type, and parse.lac into brief summaries, and cross-reference the appropriate subsections of Tuning LR. For parse.lac, mention that it's only implemented for deterministic parsers in C. In parse.error entry, mention LAC, and add cross-reference to the LAC section. (Error Reporting): When mentioning parse.error, mention LAC, and add cross-reference to the LAC section. (Tuning LR): New section with an extended version of the documentation removed from %define Summary. Change all cross-references in the manual to point here instead of there. (Calc++ Parser): When mentioning parse.error, mention LAC, and add cross-reference to the LAC section. (Table of Symbols): In %error-verbose entry, add cross-reference to Error Reporting. (Glossary): Capitalize entry titles consistently. Add definitions for "defaulted state" and "unreachable state". Expand IELR acronym in IELR's entry. (cherry picked from commit 6f04ee6c78) Conflicts: doc/bison.texinfo
2026-07-26 09:10:32 +00:00 · 2011-03-06 16:29:52 -05:00
parent 5e528941f4
commit 7fceb615a5
3 changed files with 545 additions and 316 deletions
@@ -1,3 +1,34 @@
+2011-03-06  Joel E. Denny  <[email protected]>
+
+	doc: create a new Tuning LR section in the manual.
+	And clean up all other documentation of the features described
+	there.
+	* NEWS (2.5): Tweak wording of lr.type and parse.lac entries a
+	bit, update the cross-references to the manual, and point out that
+	LAC has caveats.  Don't be so adamant that IELR+LAC=canonical LR.
+	That is, as the referenced section in the manual documents, LAC
+	does not fix infinite parsing loops on syntax errors.
+	* doc/bison.texinfo: Consistently drop the "(1)" suffix from LALR,
+	IELR, and LR in @cindex.
+	(%define Summary): Condense the entries for lr.default-reductions,
+	lr.keep-unreachable-states, lr.type, and parse.lac into brief
+	summaries, and cross-reference the appropriate subsections of
+	Tuning LR.  For parse.lac, mention that it's only implemented for
+	deterministic parsers in C.  In parse.error entry, mention LAC,
+	and add cross-reference to the LAC section.
+	(Error Reporting): When mentioning parse.error, mention LAC, and
+	add cross-reference to the LAC section.
+	(Tuning LR): New section with an extended version of the
+	documentation removed from %define Summary.  Change all
+	cross-references in the manual to point here instead of there.
+	(Calc++ Parser): When mentioning parse.error, mention LAC, and add
+	cross-reference to the LAC section.
+	(Table of Symbols): In %error-verbose entry, add cross-reference
+	to Error Reporting.
+	(Glossary): Capitalize entry titles consistently.  Add definitions
+	for "defaulted state" and "unreachable state".  Expand IELR
+	acronym in IELR's entry.
+
 2011-02-20  Joel E. Denny  <[email protected]>

 	doc: add bibliography to manual.
@@ -116,27 +116,27 @@ Bison News
    %define lr.type ielr
    %define lr.type canonical-lr

-  The default reduction optimization in the parser tables can also be
-  adjusted using `%define lr.default-reductions'.  See the documentation
-  for `%define lr.type' and `%define lr.default-reductions' in the
-  section `Bison Declaration Summary' in the Bison manual for the
-  details.
+  The default-reduction optimization in the parser tables can also be
+  adjusted using `%define lr.default-reductions'.  For details on both
+  of these features, see the new section `Tuning LR' in the Bison
+  manual.

  These features are experimental.  More user feedback will help to
  stabilize them.

-** LAC (lookahead correction) for syntax error handling:
+** LAC (Lookahead Correction) for syntax error handling:

  Canonical LR, IELR, and LALR can suffer from a couple of problems
  upon encountering a syntax error.  First, the parser might perform
  additional parser stack reductions before discovering the syntax
-  error.  Such reductions perform user semantic actions that are
+  error.  Such reductions can perform user semantic actions that are
  unexpected because they are based on an invalid token, and they
  cause error recovery to begin in a different syntactic context than
  the one in which the invalid token was encountered.  Second, when
-  verbose error messages are enabled (with %error-verbose or `#define
-  YYERROR_VERBOSE'), the expected token list in the syntax error
-  message can both contain invalid tokens and omit valid tokens.
+  verbose error messages are enabled (with %error-verbose or the
+  obsolete `#define YYERROR_VERBOSE'), the expected token list in the
+  syntax error message can both contain invalid tokens and omit valid
+  tokens.

  The culprits for the above problems are %nonassoc, default
  reductions in inconsistent states, and parser state merging.  Thus,
@@ -144,11 +144,11 @@ Bison News
  %nonassoc is used or if default reductions are enabled for
  inconsistent states.

-  LAC is a new mechanism within the parsing algorithm that completely
-  solves these problems for canonical LR, IELR, and LALR without
-  sacrificing %nonassoc, default reductions, or state mering.  When
-  LAC is in use, canonical LR and IELR behave exactly the same for
-  both syntactically acceptable and syntactically unacceptable input.
+  LAC is a new mechanism within the parsing algorithm that solves
+  these problems for canonical LR, IELR, and LALR without sacrificing
+  %nonassoc, default reductions, or state merging.  When LAC is in
+  use, canonical LR and IELR behave almost exactly the same for both
+  syntactically acceptable and syntactically unacceptable input.
  While LALR still does not support the full language-recognition
  power of canonical LR and IELR, LAC at least enables LALR's syntax
  error handling to correctly reflect LALR's language-recognition
@@ -159,8 +159,8 @@ Bison News

    %define parse.lac full

-  See the documentation for `%define parse.lac' in the section `Bison
-  Declaration Summary' in the Bison manual for additional details.
+  See the new section `LAC' in the Bison manual for additional
+  details including a few caveats.

  LAC is an experimental feature.  More user feedback will help to
  stabilize it.
@@ -314,11 +314,11 @@ Bison News

 ** Verbose syntax error message fixes:

-  When %error-verbose or `#define YYERROR_VERBOSE' is specified,
-  syntax error messages produced by the generated parser include the
-  unexpected token as well as a list of expected tokens.  The effect
-  of %nonassoc on these verbose messages has been corrected in two
-  ways, but a complete fix requires LAC, described above:
+  When %error-verbose or the obsolete `#define YYERROR_VERBOSE' is
+  specified, syntax error messages produced by the generated parser
+  include the unexpected token as well as a list of expected tokens.
+  The effect of %nonassoc on these verbose messages has been corrected
+  in two ways, but a more complete fix requires LAC, described above:

 *** When %nonassoc is used, there can exist parser states that accept no
    tokens, and so the parser does not always require a lookahead token
@@ -266,6 +266,7 @@ The Bison Parser Algorithm
 * Parser States::     The parser is a finite-state-machine with stack.
 * Reduce/Reduce::     When two rules are applicable in the same situation.
 * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Tuning LR::         How to tune fundamental aspects of LR-based parsing.
 * Generalized LR Parsing::  Parsing arbitrary context-free grammars.
 * Memory Management:: What happens when memory is exhausted.  How to avoid it.

@@ -277,6 +278,13 @@ Operator Precedence
 * Precedence Examples::  How these features are used in the previous example.
 * How Precedence::    How they work.

+Tuning LR
+
+* LR Table Construction:: Choose a different construction algorithm.
+* Default Reductions::    Disable default reductions.
+* LAC::                   Correct lookahead sets in the parser states.
+* Unreachable States::    Keep unreachable parser states for debugging.
+
 Handling Context Dependencies

 * Semantic Tokens::   Token parsing can depend on the semantic context.
@@ -473,21 +481,19 @@ order to specify the language Algol 60.  Any grammar expressed in
 BNF is a context-free grammar.  The input to Bison is
 essentially machine-readable BNF.

-@cindex LALR(1) grammars
-@cindex IELR(1) grammars
-@cindex LR(1) grammars
-There are various important subclasses of context-free grammars.
-Although it can handle almost all context-free grammars, Bison is
-optimized for what are called LR(1) grammars.
-In brief, in these grammars, it must be possible to tell how to parse
-any portion of an input string with just a single token of lookahead.
-For historical reasons, Bison by default is limited by the additional
-restrictions of LALR(1), which is hard to explain simply.
-@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
-more information on this.
-As an experimental feature, you can escape these additional restrictions by
-requesting IELR(1) or canonical LR(1) parser tables.
-@xref{%define Summary,,lr.type}, to learn how.
+@cindex LALR grammars
+@cindex IELR grammars
+@cindex LR grammars
+There are various important subclasses of context-free grammars.  Although
+it can handle almost all context-free grammars, Bison is optimized for what
+are called LR(1) grammars.  In brief, in these grammars, it must be possible
+to tell how to parse any portion of an input string with just a single token
+of lookahead.  For historical reasons, Bison by default is limited by the
+additional restrictions of LALR(1), which is hard to explain simply.
+@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more
+information on this.  As an experimental feature, you can escape these
+additional restrictions by requesting IELR(1) or canonical LR(1) parser
+tables.  @xref{LR Table Construction}, to learn how.

@cindex GLR parsing
@cindex generalized LR (GLR) parsing
@@ -5352,65 +5358,17 @@ Boolean.
@c ================================================== lr.default-reductions

@item lr.default-reductions
-@cindex default reductions
@findex %define lr.default-reductions
-@cindex delayed syntax errors
-@cindex syntax errors delayed
-@cindex LAC
-@findex %nonassoc

@itemize @bullet
@item Language(s): all

@item Purpose: Specify the kind of states that are permitted to
-contain default reductions.
-That is, in such a state, Bison selects the reduction with the largest
-lookahead set to be the default parser action and then removes that
-lookahead set.
-(The ability to specify where default reductions should be used is
-experimental.
-More user feedback will help to stabilize it.)
-
-@item Accepted Values:
-@itemize
-@item @code{all}.
-This is the traditional Bison behavior.  The main advantage is a
-significant decrease in the size of the parser tables.  The
-disadvantage is that, when the generated parser encounters a
-syntactically unacceptable token, the parser might then perform
-unnecessary default reductions before it can detect the syntax error.
-Such delayed syntax error detection is usually inherent in LALR and
-IELR parser tables anyway due to LR state merging (@pxref{%define
-Summary,,lr.type}).  Furthermore, the use of @code{%nonassoc} can
-contribute to delayed syntax error detection even in the case of
-canonical LR.  As an experimental feature, delayed syntax error
-detection can be overcome in all cases by enabling LAC (@pxref{%define
-Summary,,parse.lac}, for details, including a discussion of the
-effects of delayed syntax error detection).
-
-@item @code{consistent}.
-@cindex consistent states
-A consistent state is a state that has only one possible action.
-If that action is a reduction, then the parser does not need to request
-a lookahead token from the scanner before performing that action.
-However, the parser recognizes the ability to ignore the lookahead token
-in this way only when such a reduction is encoded as a default
-reduction.
-Thus, if default reductions are permitted only in consistent states,
-then a canonical LR parser that does not employ
-@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
-syntactically unacceptable token from the scanner.
-
-@item @code{accepting}.
-@cindex accepting state
-In the accepting state, the default reduction is actually the accept
-action.
-In this case, a canonical LR parser that does not employ
-@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
-syntactically unacceptable token in the input.
-That is, it does not perform any extra reductions.
-@end itemize
+contain default reductions.  @xref{Default Reductions}.  (The ability to
+specify where default reductions should be used is experimental.  More user
+feedback will help to stabilize it.)

+@item Accepted Values: @code{all}, @code{consistent}, @code{accepting}
@item Default Value:
@itemize
@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
@@ -5425,42 +5383,10 @@ That is, it does not perform any extra reductions.

@itemize @bullet
@item Language(s): all
-
@item Purpose: Request that Bison allow unreachable parser states to
-remain in the parser tables.
-Bison considers a state to be unreachable if there exists no sequence of
-transitions from the start state to that state.
-A state can become unreachable during conflict resolution if Bison disables a
-shift action leading to it from a predecessor state.
-Keeping unreachable states is sometimes useful for analysis purposes, but they
-are useless in the generated parser.
-
+remain in the parser tables.  @xref{Unreachable States}.
@item Accepted Values: Boolean
-
@item Default Value: @code{false}
-
-@item Caveats:
-
-@itemize @bullet
-
-@item Unreachable states may contain conflicts and may use rules not used in
-any other state.
-Thus, keeping unreachable states may induce warnings that are irrelevant to
-your parser's behavior, and it may eliminate warnings that are relevant.
-Of course, the change in warnings may actually be relevant to a parser table
-analysis that wants to keep unreachable states, so this behavior will likely
-remain in future Bison releases.
-
-@item While Bison is able to remove unreachable states, it is not guaranteed to
-remove other kinds of useless states.
-Specifically, when Bison disables reduce actions during conflict resolution,
-some goto actions may become useless, and thus some additional states may
-become useless.
-If Bison were to compute which goto actions were useless and then disable those
-actions, it could identify such states as unreachable and then remove those
-states.
-However, Bison does not compute which goto actions are useless.
-@end itemize
@end itemize
@c lr.keep-unreachable-states

@@ -5468,87 +5394,15 @@ However, Bison does not compute which goto actions are useless.

@item lr.type
@findex %define lr.type
-@cindex LALR
-@cindex IELR
-@cindex LR

@itemize @bullet
@item Language(s): all

@item Purpose: Specify the type of parser tables within the
-LR(1) family.
-(This feature is experimental.
+LR(1) family.  @xref{LR Table Construction}.  (This feature is experimental.
 More user feedback will help to stabilize it.)

-@item Accepted Values:
-@itemize
-@item @code{lalr}.
-While Bison generates LALR parser tables by default for
-historical reasons, IELR or canonical LR is almost
-always preferable for deterministic parsers.
-The trouble is that LALR parser tables can suffer from
-mysterious conflicts and thus may not accept the full set of sentences
-that IELR and canonical LR accept.
-@xref{Mystery Conflicts}, for details.
-However, there are at least two scenarios where LALR may be
-worthwhile:
-@itemize
-@cindex GLR with LALR
-@item When employing GLR parsers (@pxref{GLR Parsers}), if you
-do not resolve any conflicts statically (for example, with @code{%left}
-or @code{%prec}), then the parser explores all potential parses of any
-given input.
-In this case, the use of LALR parser tables is guaranteed not
-to alter the language accepted by the parser.
-LALR parser tables are the smallest parser tables Bison can
-currently generate, so they may be preferable.
-Nevertheless, once you begin to resolve conflicts statically,
-GLR begins to behave more like a deterministic parser, and so
-IELR and canonical LR can be helpful to avoid
-LALR's mysterious behavior.
-
-@item Occasionally during development, an especially malformed grammar
-with a major recurring flaw may severely impede the IELR or
-canonical LR parser table generation algorithm.
-LALR can be a quick way to generate parser tables in order to
-investigate such problems while ignoring the more subtle differences
-from IELR and canonical LR.
-@end itemize
-
-@item @code{ielr}.
-IELR is a minimal LR algorithm.
-That is, given any grammar (LR or non-LR),
-IELR and canonical LR always accept exactly the same
-set of sentences.
-However, as for LALR, the number of parser states is often an
-order of magnitude less for IELR than for canonical
-LR.
-More importantly, because canonical LR's extra parser states
-may contain duplicate conflicts in the case of non-LR
-grammars, the number of conflicts for IELR is often an order
-of magnitude less as well.
-This can significantly reduce the complexity of developing of a grammar.
-
-@item @code{canonical-lr}.
-@cindex delayed syntax errors
-@cindex syntax errors delayed
-@cindex LAC
-@findex %nonassoc
-While inefficient, canonical LR parser tables can be an interesting
-means to explore a grammar because they have a property that IELR and
-LALR tables do not.  That is, if @code{%nonassoc} is not used and
-default reductions are left disabled (@pxref{%define
-Summary,,lr.default-reductions}), then, for every left context of
-every canonical LR state, the set of tokens accepted by that state is
-guaranteed to be the exact set of tokens that is syntactically
-acceptable in that left context.  It might then seem that an advantage
-of canonical LR parsers in production is that, under the above
-constraints, they are guaranteed to detect a syntax error as soon as
-possible without performing any unnecessary reductions.  However, IELR
-parsers using LAC (@pxref{%define Summary,,parse.lac}) are also able
-to achieve this behavior without sacrificing @code{%nonassoc} or
-default reductions.
-@end itemize
+@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}

@item Default Value: @code{lalr}
@end itemize
@@ -5596,8 +5450,9 @@ function.  @xref{Error Reporting, ,The Error Reporting Function
 Error messages passed to @code{yyerror} are simply @w{@code{"syntax
 error"}}.
@item @code{verbose}
-Error messages report the unexpected token, and possibly the expected
-ones.
+Error messages report the unexpected token, and possibly the expected ones.
+However, this report can often be incorrect when LAC is not enabled
+(@pxref{LAC}).
@end itemize

@item Default Value:
@@ -5609,84 +5464,13 @@ ones.
@c ================================================== parse.lac
@item parse.lac
@findex %define parse.lac
-@cindex LAC
-@cindex lookahead correction

@itemize
-@item Languages(s): C
+@item Languages(s): C (deterministic parsers only)

@item Purpose: Enable LAC (lookahead correction) to improve
-syntax error handling.
-
-Canonical LR, IELR, and LALR can suffer
-from a couple of problems upon encountering a syntax error.  First, the
-parser might perform additional parser stack reductions before
-discovering the syntax error.  Such reductions perform user semantic
-actions that are unexpected because they are based on an invalid token,
-and they cause error recovery to begin in a different syntactic context
-than the one in which the invalid token was encountered.  Second, when
-verbose error messages are enabled (with @code{%error-verbose} or
-@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
-error message can both contain invalid tokens and omit valid tokens.
-
-The culprits for the above problems are @code{%nonassoc}, default
-reductions in inconsistent states, and parser state merging.  Thus,
-IELR and LALR suffer the most.  Canonical
-LR can suffer only if @code{%nonassoc} is used or if default
-reductions are enabled for inconsistent states.
-
-LAC is a new mechanism within the parsing algorithm that
-completely solves these problems for canonical LR,
-IELR, and LALR without sacrificing @code{%nonassoc},
-default reductions, or state mering.  Conceptually, the mechanism is
-straight-forward.  Whenever the parser fetches a new token from the
-scanner so that it can determine the next parser action, it immediately
-suspends normal parsing and performs an exploratory parse using a
-temporary copy of the normal parser state stack.  During this
-exploratory parse, the parser does not perform user semantic actions.
-If the exploratory parse reaches a shift action, normal parsing then
-resumes on the normal parser stacks.  If the exploratory parse reaches
-an error instead, the parser reports a syntax error.  If verbose syntax
-error messages are enabled, the parser must then discover the list of
-expected tokens, so it performs a separate exploratory parse for each
-token in the grammar.
-
-There is one subtlety about the use of LAC.  That is, when in a
-consistent parser state with a default reduction, the parser will not
-attempt to fetch a token from the scanner because no lookahead is
-needed to determine the next parser action.  Thus, whether default
-reductions are enabled in consistent states (@pxref{%define
-Summary,,lr.default-reductions}) affects how soon the parser detects a
-syntax error: when it @emph{reaches} an erroneous token or when it
-eventually @emph{needs} that token as a lookahead.  The latter
-behavior is probably more intuitive, so Bison currently provides no
-way to achieve the former behavior while default reductions are fully
-enabled.
-
-Thus, when LAC is in use, for some fixed decision of whether
-to enable default reductions in consistent states, canonical
-LR and IELR behave exactly the same for both
-syntactically acceptable and syntactically unacceptable input.  While
-LALR still does not support the full language-recognition
-power of canonical LR and IELR, LAC at
-least enables LALR's syntax error handling to correctly
-reflect LALR's language-recognition power.
-
-Because LAC requires many parse actions to be performed twice,
-it can have a performance penalty.  However, not all parse actions must
-be performed twice.  Specifically, during a series of default reductions
-in consistent states and shift actions, the parser never has to initiate
-an exploratory parse.  Moreover, the most time-consuming tasks in a
-parse are often the file I/O, the lexical analysis performed by the
-scanner, and the user's semantic actions, but none of these are
-performed during the exploratory parse.  Finally, the base of the
-temporary stack used during an exploratory parse is a pointer into the
-normal parser state stack so that the stack is never physically copied.
-In our experience, the performance penalty of LAC has proven
-insignificant for practical grammars.
-
+syntax error handling.  @xref{LAC}.
@item Accepted Values: @code{none}, @code{full}
-
@item Default Value: @code{none}
@end itemize
@c parse.lac
@@ -6328,10 +6112,11 @@ receives one argument.  For a syntax error, the string is normally
@w{@code{"syntax error"}}.

@findex %define parse.error
-If you invoke @samp{%define parse.error verbose} in the Bison
-declarations section (@pxref{Bison Declarations, ,The Bison Declarations
-Section}), then Bison provides a more verbose and specific error message
-string instead of just plain @w{@code{"syntax error"}}.
+If you invoke @samp{%define parse.error verbose} in the Bison declarations
+section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
+Bison provides a more verbose and specific error message string instead of
+just plain @w{@code{"syntax error"}}.  However, that message sometimes
+contains incorrect information if LAC is not enabled (@pxref{LAC}).

 The parser can detect one other kind of error: memory exhaustion.  This
 can happen when the input contains constructions that are very deeply
@@ -6732,6 +6517,7 @@ This kind of parser is known in the literature as a bottom-up parser.
 * Parser States::     The parser is a finite-state-machine with stack.
 * Reduce/Reduce::     When two rules are applicable in the same situation.
 * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Tuning LR::         How to tune fundamental aspects of LR-based parsing.
 * Generalized LR Parsing::  Parsing arbitrary context-free grammars.
 * Memory Management:: What happens when memory is exhausted.  How to avoid it.
@end menu
@@ -7301,6 +7087,7 @@ redirects:redirect

@node Mystery Conflicts
@section Mysterious Reduce/Reduce Conflicts
+@cindex Mysterious Conflicts

 Sometimes reduce/reduce conflicts can occur that don't look warranted.
 Here is an example:
@@ -7342,8 +7129,8 @@ of lookahead: when a @code{param_spec} is being read, an @code{ID} is
 a @code{name} if a comma or colon follows, or a @code{type} if another
@code{ID} follows.  In other words, this grammar is LR(1).

-@cindex LR(1)
-@cindex LALR(1)
+@cindex LR
+@cindex LALR
 However, for historical reasons, Bison cannot by default handle all
 LR(1) grammars.
 In this grammar, two contexts, that after an @code{ID} at the beginning
@@ -7358,15 +7145,16 @@ contexts, so it makes a single parser state for them both.  Combining
 the two contexts causes a conflict later.  In parser terminology, this
 occurrence means that the grammar is not LALR(1).

-For many practical grammars (specifically those that fall into the
-non-LR(1) class), the limitations of LALR(1) result in difficulties
-beyond just mysterious reduce/reduce conflicts.  The best way to fix
-all these problems is to select a different parser table generation
-algorithm.  Either IELR(1) or canonical LR(1) would suffice, but the
-former is more efficient and easier to debug during development.
-@xref{%define Summary,,lr.type}, for details.  (Bison's IELR(1) and
-canonical LR(1) implementations are experimental.  More user feedback
-will help to stabilize them.)
+@cindex IELR
+@cindex canonical LR
+For many practical grammars (specifically those that fall into the non-LR(1)
+class), the limitations of LALR(1) result in difficulties beyond just
+mysterious reduce/reduce conflicts.  The best way to fix all these problems
+is to select a different parser table construction algorithm.  Either
+IELR(1) or canonical LR(1) would suffice, but the former is more efficient
+and easier to debug during development.  @xref{LR Table Construction}, for
+details.  (Bison's IELR(1) and canonical LR(1) implementations are
+experimental.  More user feedback will help to stabilize them.)

 If you instead wish to work around LALR(1)'s limitations, you
 can often fix a mysterious conflict by identifying the two parser states
@@ -7417,6 +7205,409 @@ return_spec:
 For a more detailed exposition of LALR(1) parsers and parser
 generators, @pxref{Bibliography,,DeRemer 1982}.

+@node Tuning LR
+@section Tuning LR
+
+The default behavior of Bison's LR-based parsers is chosen mostly for
+historical reasons, but that behavior is often not robust.  For example, in
+the previous section, we discussed the mysterious conflicts that can be
+produced by LALR(1), Bison's default parser table construction algorithm.
+Another example is Bison's @code{%define parse.error verbose} directive,
+which instructs the generated parser to produce verbose syntax error
+messages, which can sometimes contain incorrect information.
+
+In this section, we explore several modern features of Bison that allow you
+to tune fundamental aspects of the generated LR-based parsers.  Some of
+these features easily eliminate shortcomings like those mentioned above.
+Others can be helpful purely for understanding your parser.
+
+Most of the features discussed in this section are still experimental.  More
+user feedback will help to stabilize them.
+
+@menu
+* LR Table Construction:: Choose a different construction algorithm.
+* Default Reductions::    Disable default reductions.
+* LAC::                   Correct lookahead sets in the parser states.
+* Unreachable States::    Keep unreachable parser states for debugging.
+@end menu
+
+@node LR Table Construction
+@subsection LR Table Construction
+@cindex Mysterious Conflict
+@cindex LALR
+@cindex IELR
+@cindex canonical LR
+@findex %define lr.type
+
+For historical reasons, Bison constructs LALR(1) parser tables by default.
+However, LALR does not possess the full language-recognition power of LR.
+As a result, the behavior of parsers employing LALR parser tables is often
+mysterious.  We presented a simple example of this effect in @ref{Mystery
+Conflicts}.
+
+As we also demonstrated in that example, the traditional approach to
+eliminating such mysterious behavior is to restructure the grammar.
+Unfortunately, doing so correctly is often difficult.  Moreover, merely
+discovering that LALR causes mysterious behavior in your parser can be
+difficult as well.
+
+Fortunately, Bison provides an easy way to eliminate the possibility of such
+mysterious behavior altogether.  You simply need to activate a more powerful
+parser table construction algorithm by using the @code{%define lr.type}
+directive.
+
+@deffn {Directive} {%define lr.type @var{TYPE}}
+Specify the type of parser tables within the LR(1) family.  The accepted
+values for @var{TYPE} are:
+
+@itemize
+@item @code{lalr} (default)
+@item @code{ielr}
+@item @code{canonical-lr}
+@end itemize
+
+(This feature is experimental. More user feedback will help to stabilize
+it.)
+@end deffn
+
+For example, to activate IELR, you might add the following directive to you
+grammar file:
+
+@example
+%define lr.type ielr
+@end example
+
+@noindent For the example in @ref{Mystery Conflicts}, the mysterious
+conflict is then eliminated, so there is no need to invest time in
+comprehending the conflict or restructuring the grammar to fix it.  If,
+during future development, the grammar evolves such that all mysterious
+behavior would have disappeared using just LALR, you need not fear that
+continuing to use IELR will result in unnecessarily large parser tables.
+That is, IELR generates LALR tables when LALR (using a deterministic parsing
+algorithm) is sufficient to support the full language-recognition power of
+LR.  Thus, by enabling IELR at the start of grammar development, you can
+safely and completely eliminate the need to consider LALR's shortcomings.
+
+While IELR is almost always preferable, there are circumstances where LALR
+or the canonical LR parser tables described by Knuth
+(@pxref{Bibliography,,Knuth 1965}) can be useful.  Here we summarize the
+relative advantages of each parser table construction algorithm within
+Bison:
+
+@itemize
+@item LALR
+
+There are at least two scenarios where LALR can be worthwhile:
+
+@itemize
+@item GLR without static conflict resolution.
+
+@cindex GLR with LALR
+When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
+conflicts statically (for example, with @code{%left} or @code{%prec}), then
+the parser explores all potential parses of any given input.  In this case,
+the choice of parser table construction algorithm is guaranteed not to alter
+the language accepted by the parser.  LALR parser tables are the smallest
+parser tables Bison can currently construct, so they may then be preferable.
+Nevertheless, once you begin to resolve conflicts statically, GLR behaves
+more like a deterministic parser in the syntactic contexts where those
+conflicts appear, and so either IELR or canonical LR can then be helpful to
+avoid LALR's mysterious behavior.
+
+@item Malformed grammars.
+
+Occasionally during development, an especially malformed grammar with a
+major recurring flaw may severely impede the IELR or canonical LR parser
+table construction algorithm.  LALR can be a quick way to construct parser
+tables in order to investigate such problems while ignoring the more subtle
+differences from IELR and canonical LR.
+@end itemize
+
+@item IELR
+
+IELR (Inadequacy Elimination LR) is a minimal LR algorithm.  That is, given
+any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
+always accept exactly the same set of sentences.  However, like LALR, IELR
+merges parser states during parser table construction so that the number of
+parser states is often an order of magnitude less than for canonical LR.
+More importantly, because canonical LR's extra parser states may contain
+duplicate conflicts in the case of non-LR grammars, the number of conflicts
+for IELR is often an order of magnitude less as well.  This effect can
+significantly reduce the complexity of developing a grammar.
+
+@item Canonical LR
+
+@cindex delayed syntax error detection
+@cindex LAC
+@findex %nonassoc
+While inefficient, canonical LR parser tables can be an interesting means to
+explore a grammar because they possess a property that IELR and LALR tables
+do not.  That is, if @code{%nonassoc} is not used and default reductions are
+left disabled (@pxref{Default Reductions}), then, for every left context of
+every canonical LR state, the set of tokens accepted by that state is
+guaranteed to be the exact set of tokens that is syntactically acceptable in
+that left context.  It might then seem that an advantage of canonical LR
+parsers in production is that, under the above constraints, they are
+guaranteed to detect a syntax error as soon as possible without performing
+any unnecessary reductions.  However, IELR parsers that use LAC are also
+able to achieve this behavior without sacrificing @code{%nonassoc} or
+default reductions.  For details and a few caveats of LAC, @pxref{LAC}.
+@end itemize
+
+For a more detailed exposition of the mysterious behavior in LALR parsers
+and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
+@ref{Bibliography,,Denny 2010 November}.
+
+@node Default Reductions
+@subsection Default Reductions
+@cindex default reductions
+@findex %define lr.default-reductions
+@findex %nonassoc
+
+After parser table construction, Bison identifies the reduction with the
+largest lookahead set in each parser state.  To reduce the size of the
+parser state, traditional Bison behavior is to remove that lookahead set and
+to assign that reduction to be the default parser action.  Such a reduction
+is known as a @dfn{default reduction}.
+
+Default reductions affect more than the size of the parser tables.  They
+also affect the behavior of the parser:
+
+@itemize
+@item Delayed @code{yylex} invocations.
+
+@cindex delayed yylex invocations
+@cindex consistent states
+@cindex defaulted states
+A @dfn{consistent state} is a state that has only one possible parser
+action.  If that action is a reduction and is encoded as a default
+reduction, then that consistent state is called a @dfn{defaulted state}.
+Upon reaching a defaulted state, a Bison-generated parser does not bother to
+invoke @code{yylex} to fetch the next token before performing the reduction.
+In other words, whether default reductions are enabled in consistent states
+determines how soon a Bison-generated parser invokes @code{yylex} for a
+token: immediately when it @emph{reaches} that token in the input or when it
+eventually @emph{needs} that token as a lookahead to determine the next
+parser action.  Traditionally, default reductions are enabled, and so the
+parser exhibits the latter behavior.
+
+The presence of defaulted states is an important consideration when
+designing @code{yylex} and the grammar file.  That is, if the behavior of
+@code{yylex} can influence or be influenced by the semantic actions
+associated with the reductions in defaulted states, then the delay of the
+next @code{yylex} invocation until after those reductions is significant.
+For example, the semantic actions might pop a scope stack that @code{yylex}
+uses to determine what token to return.  Thus, the delay might be necessary
+to ensure that @code{yylex} does not look up the next token in a scope that
+should already be considered closed.
+
+@item Delayed syntax error detection.
+
+@cindex delayed syntax error detection
+When the parser fetches a new token by invoking @code{yylex}, it checks
+whether there is an action for that token in the current parser state.  The
+parser detects a syntax error if and only if either (1) there is no action
+for that token or (2) the action for that token is the error action (due to
+the use of @code{%nonassoc}).  However, if there is a default reduction in
+that state (which might or might not be a defaulted state), then it is
+impossible for condition 1 to exist.  That is, all tokens have an action.
+Thus, the parser sometimes fails to detect the syntax error until it reaches
+a later state.
+
+@cindex LAC
+@c If there's an infinite loop, default reductions can prevent an incorrect
+@c sentence from being rejected.
+While default reductions never cause the parser to accept syntactically
+incorrect sentences, the delay of syntax error detection can have unexpected
+effects on the behavior of the parser.  However, the delay can be caused
+anyway by parser state merging and the use of @code{%nonassoc}, and it can
+be fixed by another Bison feature, LAC.  We discuss the effects of delayed
+syntax error detection and LAC more in the next section (@pxref{LAC}).
+@end itemize
+
+For canonical LR, the only default reduction that Bison enables by default
+is the accept action, which appears only in the accepting state, which has
+no other action and is thus a defaulted state.  However, the default accept
+action does not delay any @code{yylex} invocation or syntax error detection
+because the accept action ends the parse.
+
+For LALR and IELR, Bison enables default reductions in nearly all states by
+default.  There are only two exceptions.  First, states that have a shift
+action on the @code{error} token do not have default reductions because
+delayed syntax error detection could then prevent the @code{error} token
+from ever being shifted in that state.  However, parser state merging can
+cause the same effect anyway, and LAC fixes it in both cases, so future
+versions of Bison might drop this exception when LAC is activated.  Second,
+GLR parsers do not record the default reduction as the action on a lookahead
+token for which there is a conflict.  The correct action in this case is to
+split the parse instead.
+
+To adjust which states have default reductions enabled, use the
+@code{%define lr.default-reductions} directive.
+
+@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
+Specify the kind of states that are permitted to contain default reductions.
+The accepted values of @var{WHERE} are:
+@itemize
+@item @code{all} (default for LALR and IELR)
+@item @code{consistent}
+@item @code{accepting} (default for canonical LR)
+@end itemize
+
+(The ability to specify where default reductions are permitted is
+experimental.  More user feedback will help to stabilize it.)
+@end deffn
+
+FIXME: Because of the exceptions described above, @code{all} is a misnomer.
+Rename to @code{full}.
+
+@node LAC
+@subsection LAC
+@findex %define parse.lac
+@cindex LAC
+@cindex lookahead correction
+
+Canonical LR, IELR, and LALR can suffer from a couple of problems upon
+encountering a syntax error.  First, the parser might perform additional
+parser stack reductions before discovering the syntax error.  Such
+reductions can perform user semantic actions that are unexpected because
+they are based on an invalid token, and they cause error recovery to begin
+in a different syntactic context than the one in which the invalid token was
+encountered.  Second, when verbose error messages are enabled (@pxref{Error
+Reporting}), the expected token list in the syntax error message can both
+contain invalid tokens and omit valid tokens.
+
+The culprits for the above problems are @code{%nonassoc}, default reductions
+in inconsistent states (@pxref{Default Reductions}), and parser state
+merging.  Because IELR and LALR merge parser states, they suffer the most.
+Canonical LR can suffer only if @code{%nonassoc} is used or if default
+reductions are enabled for inconsistent states.
+
+LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
+that solves these problems for canonical LR, IELR, and LALR without
+sacrificing @code{%nonassoc}, default reductions, or state merging.  You can
+enable LAC with the @code{%define parse.lac} directive.
+
+@deffn {Directive} {%define parse.lac @var{VALUE}}
+Enable LAC to improve syntax error handling.
+@itemize
+@item @code{none} (default)
+@item @code{full}
+@end itemize
+(This feature is experimental.  More user feedback will help to stabilize
+it.  Moreover, it is currently only available for deterministic parsers in
+C.)
+@end deffn
+
+Conceptually, the LAC mechanism is straight-forward.  Whenever the parser
+fetches a new token from the scanner so that it can determine the next
+parser action, it immediately suspends normal parsing and performs an
+exploratory parse using a temporary copy of the normal parser state stack.
+During this exploratory parse, the parser does not perform user semantic
+actions.  If the exploratory parse reaches a shift action, normal parsing
+then resumes on the normal parser stacks.  If the exploratory parse reaches
+an error instead, the parser reports a syntax error.  If verbose syntax
+error messages are enabled, the parser must then discover the list of
+expected tokens, so it performs a separate exploratory parse for each token
+in the grammar.
+
+There is one subtlety about the use of LAC.  That is, when in a consistent
+parser state with a default reduction, the parser will not attempt to fetch
+a token from the scanner because no lookahead is needed to determine the
+next parser action.  Thus, whether default reductions are enabled in
+consistent states (@pxref{Default Reductions}) affects how soon the parser
+detects a syntax error: immediately when it @emph{reaches} an erroneous
+token or when it eventually @emph{needs} that token as a lookahead to
+determine the next parser action.  The latter behavior is probably more
+intuitive, so Bison currently provides no way to achieve the former behavior
+while default reductions are enabled in consistent states.
+
+Thus, when LAC is in use, for some fixed decision of whether to enable
+default reductions in consistent states, canonical LR and IELR behave almost
+exactly the same for both syntactically acceptable and syntactically
+unacceptable input.  While LALR still does not support the full
+language-recognition power of canonical LR and IELR, LAC at least enables
+LALR's syntax error handling to correctly reflect LALR's
+language-recognition power.
+
+There are a few caveats to consider when using LAC:
+
+@itemize
+@item Infinite parsing loops.
+
+IELR plus LAC does have one shortcoming relative to canonical LR.  Some
+parsers generated by Bison can loop infinitely.  LAC does not fix infinite
+parsing loops that occur between encountering a syntax error and detecting
+it, but enabling canonical LR or disabling default reductions sometimes
+does.
+
+@item Verbose error message limitations.
+
+Because of internationalization considerations, Bison-generated parsers
+limit the size of the expected token list they are willing to report in a
+verbose syntax error message.  If the number of expected tokens exceeds that
+limit, the list is simply dropped from the message.  Enabling LAC can
+increase the size of the list and thus cause the parser to drop it.  Of
+course, dropping the list is better than reporting an incorrect list.
+
+@item Performance.
+
+Because LAC requires many parse actions to be performed twice, it can have a
+performance penalty.  However, not all parse actions must be performed
+twice.  Specifically, during a series of default reductions in consistent
+states and shift actions, the parser never has to initiate an exploratory
+parse.  Moreover, the most time-consuming tasks in a parse are often the
+file I/O, the lexical analysis performed by the scanner, and the user's
+semantic actions, but none of these are performed during the exploratory
+parse.  Finally, the base of the temporary stack used during an exploratory
+parse is a pointer into the normal parser state stack so that the stack is
+never physically copied.  In our experience, the performance penalty of LAC
+has proven insignificant for practical grammars.
+@end itemize
+
+@node Unreachable States
+@subsection Unreachable States
+@findex %define lr.keep-unreachable-states
+@cindex unreachable states
+
+If there exists no sequence of transitions from the parser's start state to
+some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
+state}.  A state can become unreachable during conflict resolution if Bison
+disables a shift action leading to it from a predecessor state.
+
+By default, Bison removes unreachable states from the parser after conflict
+resolution because they are useless in the generated parser.  However,
+keeping unreachable states is sometimes useful when trying to understand the
+relationship between the parser and the grammar.
+
+@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
+Request that Bison allow unreachable states to remain in the parser tables.
+@var{VALUE} must be a Boolean.  The default is @code{false}.
+@end deffn
+
+There are a few caveats to consider:
+
+@itemize @bullet
+@item Missing or extraneous warnings.
+
+Unreachable states may contain conflicts and may use rules not used in any
+other state.  Thus, keeping unreachable states may induce warnings that are
+irrelevant to your parser's behavior, and it may eliminate warnings that are
+relevant.  Of course, the change in warnings may actually be relevant to a
+parser table analysis that wants to keep unreachable states, so this
+behavior will likely remain in future Bison releases.
+
+@item Other useless states.
+
+While Bison is able to remove unreachable states, it is not guaranteed to
+remove other kinds of useless states.  Specifically, when Bison disables
+reduce actions during conflict resolution, some goto actions may become
+useless, and thus some additional states may become useless.  If Bison were
+to compute which goto actions were useless and then disable those actions,
+it could identify such states as unreachable and then remove those states.
+However, Bison does not compute which goto actions are useless.
+@end itemize
+
@node Generalized LR Parsing
@section Generalized LR (GLR) Parsing
@cindex GLR parsing
@@ -9500,8 +9691,9 @@ propagated.
@end example

@noindent
-Use the following two directives to enable parser tracing and verbose
-error messages.
+Use the following two directives to enable parser tracing and verbose error
+messages.  However, verbose error messages can contain incorrect information
+(@pxref{LAC}).

@comment file: calc++-parser.yy
@example
@@ -10908,9 +11100,9 @@ Precedence}.
@end deffn
@end ifset

-@deffn {Directive} %define @var{define-variable}
-@deffnx {Directive} %define @var{define-variable} @var{value}
-@deffnx {Directive} %define @var{define-variable} "@var{value}"
+@deffn {Directive} %define @var{variable}
+@deffnx {Directive} %define @var{variable} @var{value}
+@deffnx {Directive} %define @var{variable} "@var{value}"
 Define a variable to adjust Bison's behavior.  @xref{%define Summary}.
@end deffn

@@ -10952,7 +11144,8 @@ token is reset to the token that originally caused the violation.
@end deffn

@deffn {Directive} %error-verbose
-An obsolete directive standing for @samp{%define parse.error verbose}.
+An obsolete directive standing for @samp{%define parse.error verbose}
+(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
@end deffn

@deffn {Directive} %file-prefix "@var{prefix}"
@@ -11305,7 +11498,7 @@ Data type of semantic values; @code{int} by default.
@cindex glossary

@table @asis
-@item Accepting State
+@item Accepting state
 A state whose only action is the accept action.
 The accepting state is thus a consistent state.
@xref{Understanding,,}.
@@ -11316,9 +11509,8 @@ by John Backus, and slightly improved by Peter Naur in his 1960-01-02
 committee document contributing to what became the Algol 60 report.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.

-@item Consistent State
-A state containing only one possible action.  @xref{%define
-Summary,,lr.default-reductions}.
+@item Consistent state
+A state containing only one possible action.  @xref{Default Reductions}.

@item Context-free grammars
 Grammars specified as rules that can be applied regardless of context.
@@ -11327,12 +11519,15 @@ expression, integers are allowed @emph{anywhere} an expression is
 permitted.  @xref{Language and Grammar, ,Languages and Context-Free
 Grammars}.

-@item Default Reduction
+@item Default reduction
 The reduction that a parser should perform if the current parser state
 contains no other action for the lookahead token.  In permitted parser
-states, Bison declares the reduction with the largest lookahead set to
-be the default reduction and removes that lookahead set.
-@xref{%define Summary,,lr.default-reductions}.
+states, Bison declares the reduction with the largest lookahead set to be
+the default reduction and removes that lookahead set.  @xref{Default
+Reductions}.
+
+@item Defaulted state
+A consistent state with a default reduction.  @xref{Default Reductions}.

@item Dynamic allocation
 Allocation of memory that occurs during execution, rather than at
@@ -11364,17 +11559,16 @@ A language construct that is (in general) grammatically divisible;
 for example, `expression' or `declaration' in C@.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.

-@item IELR(1)
-A minimal LR(1) parser table generation algorithm.  That is, given any
+@item IELR(1) (Inadequacy Elimination LR(1))
+A minimal LR(1) parser table construction algorithm.  That is, given any
 context-free grammar, IELR(1) generates parser tables with the full
-language recognition power of canonical LR(1) but with nearly the same
-number of parser states as LALR(1).  This reduction in parser states
-is often an order of magnitude.  More importantly, because canonical
-LR(1)'s extra parser states may contain duplicate conflicts in the
-case of non-LR(1) grammars, the number of conflicts for IELR(1) is
-often an order of magnitude less as well.  This can significantly
-reduce the complexity of developing of a grammar.  @xref{%define
-Summary,,lr.type}.
+language-recognition power of canonical LR(1) but with nearly the same
+number of parser states as LALR(1).  This reduction in parser states is
+often an order of magnitude.  More importantly, because canonical LR(1)'s
+extra parser states may contain duplicate conflicts in the case of non-LR(1)
+grammars, the number of conflicts for IELR(1) is often an order of magnitude
+less as well.  This can significantly reduce the complexity of developing a
+grammar.  @xref{LR Table Construction}.

@item Infix operator
 An arithmetic operator that is placed between the operands on which it
@@ -11385,12 +11579,11 @@ A continuous flow of data between devices or programs.

@item LAC (Lookahead Correction)
 A parsing mechanism that fixes the problem of delayed syntax error
-detection, which is caused by LR state merging, default reductions,
-and the use of @code{%nonassoc}.  Delayed syntax error detection
-results in unexpected semantic actions, initiation of error recovery
-in the wrong syntactic context, and an incorrect list of expected
-tokens in a verbose syntax error message.  @xref{%define
-Summary,,parse.lac}.
+detection, which is caused by LR state merging, default reductions, and the
+use of @code{%nonassoc}.  Delayed syntax error detection results in
+unexpected semantic actions, initiation of error recovery in the wrong
+syntactic context, and an incorrect list of expected tokens in a verbose
+syntax error message.  @xref{LAC}.

@item Language construct
 One of the typical usage schemas of the language.  For example, one of
@@ -11506,6 +11699,11 @@ the lexical analyzer.  @xref{Symbols}.
 A grammar symbol that has no rules in the grammar and therefore is
 grammatically indivisible.  The piece of text it represents is a token.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
+
+@item Unreachable state
+A parser state to which there does not exist a sequence of transitions from
+the parser's start state.  A state can become unreachable during conflict
+resolution.  @xref{Unreachable States}.
@end table

@node Copying This Manual