mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
parse.lac: document.
* NEWS (2.5): Add entry for LAC, and mention LAC in entry for other corrections to verbose syntax error messages. * doc/bison.texinfo (Decl Summary): Rewrite entries for lr.default-reductions and lr.type to be clearer, to mention %nonassoc's effect on canonical LR, and to mention LAC. Add entry for parse.lac. (Glossary): Add entry for LAC.
This commit is contained in:
11
ChangeLog
11
ChangeLog
@@ -1,3 +1,14 @@
|
|||||||
|
2010-12-19 Joel E. Denny <jdenny@clemson.edu>
|
||||||
|
|
||||||
|
parse.lac: document.
|
||||||
|
* NEWS (2.5): Add entry for LAC, and mention LAC in entry for
|
||||||
|
other corrections to verbose syntax error messages.
|
||||||
|
* doc/bison.texinfo (Decl Summary): Rewrite entries for
|
||||||
|
lr.default-reductions and lr.type to be clearer, to mention
|
||||||
|
%nonassoc's effect on canonical LR, and to mention LAC. Add entry
|
||||||
|
for parse.lac.
|
||||||
|
(Glossary): Add entry for LAC.
|
||||||
|
|
||||||
2010-12-11 Joel E. Denny <jdenny@clemson.edu>
|
2010-12-11 Joel E. Denny <jdenny@clemson.edu>
|
||||||
|
|
||||||
parse.lac: implement exploratory stack reallocations.
|
parse.lac: implement exploratory stack reallocations.
|
||||||
|
|||||||
72
NEWS
72
NEWS
@@ -117,6 +117,46 @@ Bison News
|
|||||||
These features are experimental. More user feedback will help to
|
These features are experimental. More user feedback will help to
|
||||||
stabilize them.
|
stabilize them.
|
||||||
|
|
||||||
|
** LAC (lookahead correction) for syntax error handling:
|
||||||
|
|
||||||
|
Canonical LR, IELR, and LALR can suffer from a couple of problems
|
||||||
|
upon encountering a syntax error. First, the parser might perform
|
||||||
|
additional parser stack reductions before discovering the syntax
|
||||||
|
error. Such reductions perform user semantic actions that are
|
||||||
|
unexpected because they are based on an invalid token, and they
|
||||||
|
cause error recovery to begin in a different syntactic context than
|
||||||
|
the one in which the invalid token was encountered. Second, when
|
||||||
|
verbose error messages are enabled (with %error-verbose or `#define
|
||||||
|
YYERROR_VERBOSE'), the expected token list in the syntax error
|
||||||
|
message can both contain invalid tokens and omit valid tokens.
|
||||||
|
|
||||||
|
The culprits for the above problems are %nonassoc, default
|
||||||
|
reductions in inconsistent states, and parser state merging. Thus,
|
||||||
|
IELR and LALR suffer the most. Canonical LR can suffer only if
|
||||||
|
%nonassoc is used or if default reductions are enabled for
|
||||||
|
inconsistent states.
|
||||||
|
|
||||||
|
LAC is a new mechanism within the parsing algorithm that completely
|
||||||
|
solves these problems for canonical LR, IELR, and LALR without
|
||||||
|
sacrificing %nonassoc, default reductions, or state mering. When
|
||||||
|
LAC is in use, canonical LR and IELR behave exactly the same for
|
||||||
|
both syntactically acceptable and syntactically unacceptable input.
|
||||||
|
While LALR still does not support the full language-recognition
|
||||||
|
power of canonical LR and IELR, LAC at least enables LALR's syntax
|
||||||
|
error handling to correctly reflect LALR's language-recognition
|
||||||
|
power.
|
||||||
|
|
||||||
|
Currently, LAC is only supported for deterministic parsers in C.
|
||||||
|
You can enable LAC with the following directive:
|
||||||
|
|
||||||
|
%define parse.lac full
|
||||||
|
|
||||||
|
See the documentation for `%define parse.lac' in the section `Bison
|
||||||
|
Declaration Summary' in the Bison manual for additional details.
|
||||||
|
|
||||||
|
LAC is an experimental feature. More user feedback will help to
|
||||||
|
stabilize it.
|
||||||
|
|
||||||
** Unrecognized %code qualifiers are now an error not a warning.
|
** Unrecognized %code qualifiers are now an error not a warning.
|
||||||
|
|
||||||
** %define improvements.
|
** %define improvements.
|
||||||
@@ -225,11 +265,11 @@ Bison News
|
|||||||
|
|
||||||
** Verbose syntax error message fixes:
|
** Verbose syntax error message fixes:
|
||||||
|
|
||||||
When %error-verbose or `#define YYERROR_VERBOSE' is specified, syntax
|
When %error-verbose or `#define YYERROR_VERBOSE' is specified,
|
||||||
error messages produced by the generated parser include the unexpected
|
syntax error messages produced by the generated parser include the
|
||||||
token as well as a list of expected tokens. The effect of %nonassoc
|
unexpected token as well as a list of expected tokens. The effect
|
||||||
on these verbose messages has been corrected in two ways, but
|
of %nonassoc on these verbose messages has been corrected in two
|
||||||
additional fixes are still being implemented:
|
ways, but a complete fix requires LAC, described above:
|
||||||
|
|
||||||
*** When %nonassoc is used, there can exist parser states that accept no
|
*** When %nonassoc is used, there can exist parser states that accept no
|
||||||
tokens, and so the parser does not always require a lookahead token
|
tokens, and so the parser does not always require a lookahead token
|
||||||
@@ -248,16 +288,18 @@ Bison News
|
|||||||
tokens are now properly omitted from the list.
|
tokens are now properly omitted from the list.
|
||||||
|
|
||||||
*** Expected token lists are still often wrong due to state merging
|
*** Expected token lists are still often wrong due to state merging
|
||||||
(from LALR or IELR) and default reductions, which can both add and
|
(from LALR or IELR) and default reductions, which can both add
|
||||||
subtract valid tokens. Canonical LR almost completely fixes this
|
invalid tokens and subtract valid tokens. Canonical LR almost
|
||||||
problem by eliminating state merging and default reductions.
|
completely fixes this problem by eliminating state merging and
|
||||||
However, there is one minor problem left even when using canonical
|
default reductions. However, there is one minor problem left even
|
||||||
LR and even after the fixes above. That is, if the resolution of a
|
when using canonical LR and even after the fixes above. That is,
|
||||||
conflict with %nonassoc appears in a later parser state than the one
|
if the resolution of a conflict with %nonassoc appears in a later
|
||||||
at which some syntax error is discovered, the conflicted token is
|
parser state than the one at which some syntax error is
|
||||||
still erroneously included in the expected token list. We are
|
discovered, the conflicted token is still erroneously included in
|
||||||
currently working on a fix to eliminate this problem and to
|
the expected token list. Bison's new LAC implementation,
|
||||||
eliminate the need for canonical LR.
|
described above, eliminates this problem and the need for
|
||||||
|
canonical LR. However, LAC is still experimental and is disabled
|
||||||
|
by default.
|
||||||
|
|
||||||
** Destructor calls fixed for lookaheads altered in semantic actions.
|
** Destructor calls fixed for lookaheads altered in semantic actions.
|
||||||
|
|
||||||
|
|||||||
@@ -5230,57 +5230,61 @@ Boolean.
|
|||||||
@findex %define lr.default-reductions
|
@findex %define lr.default-reductions
|
||||||
@cindex delayed syntax errors
|
@cindex delayed syntax errors
|
||||||
@cindex syntax errors delayed
|
@cindex syntax errors delayed
|
||||||
|
@cindex @acronym{LAC}
|
||||||
|
@findex %nonassoc
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item Language(s): all
|
@item Language(s): all
|
||||||
|
|
||||||
@item Purpose: Specifies the kind of states that are permitted to
|
@item Purpose: Specify the kind of states that are permitted to
|
||||||
contain default reductions.
|
contain default reductions.
|
||||||
That is, in such a state, Bison declares the reduction with the largest
|
That is, in such a state, Bison selects the reduction with the largest
|
||||||
lookahead set to be the default reduction and then removes that
|
lookahead set to be the default parser action and then removes that
|
||||||
lookahead set.
|
lookahead set.
|
||||||
The advantages of default reductions are discussed below.
|
(The ability to specify where default reductions should be used is
|
||||||
The disadvantage is that, when the generated parser encounters a
|
experimental.
|
||||||
syntactically unacceptable token, the parser might then perform
|
|
||||||
unnecessary default reductions before it can detect the syntax error.
|
|
||||||
|
|
||||||
(This feature is experimental.
|
|
||||||
More user feedback will help to stabilize it.)
|
More user feedback will help to stabilize it.)
|
||||||
|
|
||||||
@item Accepted Values:
|
@item Accepted Values:
|
||||||
@itemize
|
@itemize
|
||||||
@item @code{all}.
|
@item @code{all}.
|
||||||
For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl
|
This is the traditional Bison behavior.
|
||||||
Summary,,lr.type}) by default, all states are permitted to contain
|
The main advantage is a significant decrease in the size of the parser
|
||||||
default reductions.
|
tables.
|
||||||
The advantage is that parser table sizes can be significantly reduced.
|
The disadvantage is that, when the generated parser encounters a
|
||||||
The reason Bison does not by default attempt to address the disadvantage
|
syntactically unacceptable token, the parser might then perform
|
||||||
of delayed syntax error detection is that this disadvantage is already
|
unnecessary default reductions before it can detect the syntax error.
|
||||||
inherent in @acronym{LALR} and @acronym{IELR} parser tables.
|
Such delayed syntax error detection is usually inherent in
|
||||||
That is, unlike in a canonical @acronym{LR} state, the lookahead sets of
|
@acronym{LALR} and @acronym{IELR} parser tables anyway due to
|
||||||
reductions in an @acronym{LALR} or @acronym{IELR} state can contain
|
@acronym{LR} state merging (@pxref{Decl Summary,,lr.type}).
|
||||||
tokens that are syntactically incorrect for some left contexts.
|
Furthermore, the use of @code{%nonassoc} can contribute to delayed
|
||||||
|
syntax error detection even in the case of canonical @acronym{LR}.
|
||||||
|
As an experimental feature, delayed syntax error detection can be
|
||||||
|
overcome in all cases by enabling @acronym{LAC} (@pxref{Decl
|
||||||
|
Summary,,parse.lac}, for details, including a discussion of the effects
|
||||||
|
of delayed syntax error detection).
|
||||||
|
|
||||||
@item @code{consistent}.
|
@item @code{consistent}.
|
||||||
@cindex consistent states
|
@cindex consistent states
|
||||||
A consistent state is a state that has only one possible action.
|
A consistent state is a state that has only one possible action.
|
||||||
If that action is a reduction, then the parser does not need to request
|
If that action is a reduction, then the parser does not need to request
|
||||||
a lookahead token from the scanner before performing that action.
|
a lookahead token from the scanner before performing that action.
|
||||||
However, the parser only recognizes the ability to ignore the lookahead
|
However, the parser recognizes the ability to ignore the lookahead token
|
||||||
token when such a reduction is encoded as a default reduction.
|
in this way only when such a reduction is encoded as a default
|
||||||
Thus, if default reductions are permitted in and only in consistent
|
reduction.
|
||||||
states, then a canonical @acronym{LR} parser reports a syntax error as
|
Thus, if default reductions are permitted only in consistent states,
|
||||||
soon as it @emph{needs} the syntactically unacceptable token from the
|
then a canonical @acronym{LR} parser that does not employ
|
||||||
scanner.
|
@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
|
||||||
|
syntactically unacceptable token from the scanner.
|
||||||
|
|
||||||
@item @code{accepting}.
|
@item @code{accepting}.
|
||||||
@cindex accepting state
|
@cindex accepting state
|
||||||
By default, the only default reduction permitted in a canonical
|
In the accepting state, the default reduction is actually the accept
|
||||||
@acronym{LR} parser is the accept action in the accepting state, which
|
action.
|
||||||
the parser reaches only after reading all tokens from the input.
|
In this case, a canonical @acronym{LR} parser that does not employ
|
||||||
Thus, the default canonical @acronym{LR} parser reports a syntax error
|
@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
|
||||||
as soon as it @emph{reaches} the syntactically unacceptable token
|
syntactically unacceptable token in the input.
|
||||||
without performing any extra reductions.
|
That is, it does not perform any extra reductions.
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@item Default Value:
|
@item Default Value:
|
||||||
@@ -5400,17 +5404,23 @@ This can significantly reduce the complexity of developing of a grammar.
|
|||||||
@item @code{canonical-lr}.
|
@item @code{canonical-lr}.
|
||||||
@cindex delayed syntax errors
|
@cindex delayed syntax errors
|
||||||
@cindex syntax errors delayed
|
@cindex syntax errors delayed
|
||||||
The only advantage of canonical @acronym{LR} over @acronym{IELR} is
|
@cindex @acronym{LAC}
|
||||||
that, for every left context of every canonical @acronym{LR} state, the
|
@findex %nonassoc
|
||||||
set of tokens accepted by that state is the exact set of tokens that is
|
While inefficient, canonical @acronym{LR} parser tables can be an
|
||||||
syntactically acceptable in that left context.
|
interesting means to explore a grammar because they have a property that
|
||||||
Thus, the only difference in parsing behavior is that the canonical
|
@acronym{IELR} and @acronym{LALR} tables do not.
|
||||||
@acronym{LR} parser can report a syntax error as soon as possible
|
That is, if @code{%nonassoc} is not used and default reductions are left
|
||||||
without performing any unnecessary reductions.
|
disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every
|
||||||
@xref{Decl Summary,,lr.default-reductions}, for further details.
|
left context of every canonical @acronym{LR} state, the set of tokens
|
||||||
Even when canonical @acronym{LR} behavior is ultimately desired,
|
accepted by that state is guaranteed to be the exact set of tokens that
|
||||||
@acronym{IELR}'s elimination of duplicate conflicts should still
|
is syntactically acceptable in that left context.
|
||||||
facilitate the development of a grammar.
|
It might then seem that an advantage of canonical @acronym{LR} parsers
|
||||||
|
in production is that, under the above constraints, they are guaranteed
|
||||||
|
to detect a syntax error as soon as possible without performing any
|
||||||
|
unnecessary reductions.
|
||||||
|
However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl
|
||||||
|
Summary,,parse.lac}) are also able to achieve this behavior without
|
||||||
|
sacrificing @code{%nonassoc} or default reductions.
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@item Default Value: @code{lalr}
|
@item Default Value: @code{lalr}
|
||||||
@@ -5448,7 +5458,7 @@ destroyed properly. This option checks these constraints.
|
|||||||
@findex %define parse.error
|
@findex %define parse.error
|
||||||
@itemize
|
@itemize
|
||||||
@item Languages(s):
|
@item Languages(s):
|
||||||
all.
|
all
|
||||||
@item Purpose:
|
@item Purpose:
|
||||||
Control the kind of error messages passed to the error reporting
|
Control the kind of error messages passed to the error reporting
|
||||||
function. @xref{Error Reporting, ,The Error Reporting Function
|
function. @xref{Error Reporting, ,The Error Reporting Function
|
||||||
@@ -5469,6 +5479,90 @@ ones.
|
|||||||
@c parse.error
|
@c parse.error
|
||||||
|
|
||||||
|
|
||||||
|
@c ================================================== parse.lac
|
||||||
|
@item parse.lac
|
||||||
|
@findex %define parse.lac
|
||||||
|
@cindex @acronym{LAC}
|
||||||
|
@cindex lookahead correction
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item Languages(s): C
|
||||||
|
|
||||||
|
@item Purpose: Enable @acronym{LAC} (lookahead correction) to improve
|
||||||
|
syntax error handling.
|
||||||
|
|
||||||
|
Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer
|
||||||
|
from a couple of problems upon encountering a syntax error. First, the
|
||||||
|
parser might perform additional parser stack reductions before
|
||||||
|
discovering the syntax error. Such reductions perform user semantic
|
||||||
|
actions that are unexpected because they are based on an invalid token,
|
||||||
|
and they cause error recovery to begin in a different syntactic context
|
||||||
|
than the one in which the invalid token was encountered. Second, when
|
||||||
|
verbose error messages are enabled (with @code{%error-verbose} or
|
||||||
|
@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
|
||||||
|
error message can both contain invalid tokens and omit valid tokens.
|
||||||
|
|
||||||
|
The culprits for the above problems are @code{%nonassoc}, default
|
||||||
|
reductions in inconsistent states, and parser state merging. Thus,
|
||||||
|
@acronym{IELR} and @acronym{LALR} suffer the most. Canonical
|
||||||
|
@acronym{LR} can suffer only if @code{%nonassoc} is used or if default
|
||||||
|
reductions are enabled for inconsistent states.
|
||||||
|
|
||||||
|
@acronym{LAC} is a new mechanism within the parsing algorithm that
|
||||||
|
completely solves these problems for canonical @acronym{LR},
|
||||||
|
@acronym{IELR}, and @acronym{LALR} without sacrificing @code{%nonassoc},
|
||||||
|
default reductions, or state mering. Conceptually, the mechanism is
|
||||||
|
straight-forward. Whenever the parser fetches a new token from the
|
||||||
|
scanner so that it can determine the next parser action, it immediately
|
||||||
|
suspends normal parsing and performs an exploratory parse using a
|
||||||
|
temporary copy of the normal parser state stack. During this
|
||||||
|
exploratory parse, the parser does not perform user semantic actions.
|
||||||
|
If the exploratory parse reaches a shift action, normal parsing then
|
||||||
|
resumes on the normal parser stacks. If the exploratory parse reaches
|
||||||
|
an error instead, the parser reports a syntax error. If verbose syntax
|
||||||
|
error messages are enabled, the parser must then discover the list of
|
||||||
|
expected tokens, so it performs a separate exploratory parse for each
|
||||||
|
token in the grammar.
|
||||||
|
|
||||||
|
There is one subtlety about the use of @acronym{LAC}. That is, when in
|
||||||
|
a consistent parser state with a default reduction, the parser will not
|
||||||
|
attempt to fetch a token from the scanner because no lookahead is needed
|
||||||
|
to determine the next parser action. Thus, whether default reductions
|
||||||
|
are enabled in consistent states (@pxref{Decl
|
||||||
|
Summary,,lr.default-reductions}) affects how soon the parser detects a
|
||||||
|
syntax error: when it @emph{reaches} an erroneous token or when it
|
||||||
|
eventually @emph{needs} that token as a lookahead. The latter behavior
|
||||||
|
is probably more intuitive, so Bison currently provides no way to
|
||||||
|
achieve the former behavior while default reductions are fully enabled.
|
||||||
|
|
||||||
|
Thus, when @acronym{LAC} is in use, for some fixed decision of whether
|
||||||
|
to enable default reductions in consistent states, canonical
|
||||||
|
@acronym{LR} and @acronym{IELR} behave exactly the same for both
|
||||||
|
syntactically acceptable and syntactically unacceptable input. While
|
||||||
|
@acronym{LALR} still does not support the full language-recognition
|
||||||
|
power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at
|
||||||
|
least enables @acronym{LALR}'s syntax error handling to correctly
|
||||||
|
reflect @acronym{LALR}'s language-recognition power.
|
||||||
|
|
||||||
|
Because @acronym{LAC} requires many parse actions to be performed twice,
|
||||||
|
it can have a performance penalty. However, not all parse actions must
|
||||||
|
be performed twice. Specifically, during a series of default reductions
|
||||||
|
in consistent states and shift actions, the parser never has to initiate
|
||||||
|
an exploratory parse. Moreover, the most time-consuming tasks in a
|
||||||
|
parse are often the file I/O, the lexical analysis performed by the
|
||||||
|
scanner, and the user's semantic actions, but none of these are
|
||||||
|
performed during the exploratory parse. Finally, the base of the
|
||||||
|
temporary stack used during an exploratory parse is a pointer into the
|
||||||
|
normal parser state stack so that the stack is never physically copied.
|
||||||
|
In our experience, the performance penalty of @acronym{LAC} has proven
|
||||||
|
insignificant for practical grammars.
|
||||||
|
|
||||||
|
@item Accepted Values: @code{none}, @code{full}
|
||||||
|
|
||||||
|
@item Default Value: @code{none}
|
||||||
|
@end itemize
|
||||||
|
@c parse.lac
|
||||||
|
|
||||||
@c ================================================== parse.trace
|
@c ================================================== parse.trace
|
||||||
@item parse.trace
|
@item parse.trace
|
||||||
@findex %define parse.trace
|
@findex %define parse.trace
|
||||||
@@ -11241,6 +11335,14 @@ performs some operation.
|
|||||||
@item Input stream
|
@item Input stream
|
||||||
A continuous flow of data between devices or programs.
|
A continuous flow of data between devices or programs.
|
||||||
|
|
||||||
|
@item @acronym{LAC} (Lookahead Correction)
|
||||||
|
A parsing mechanism that fixes the problem of delayed syntax error
|
||||||
|
detection, which is caused by LR state merging, default reductions, and
|
||||||
|
the use of @code{%nonassoc}. Delayed syntax error detection results in
|
||||||
|
unexpected semantic actions, initiation of error recovery in the wrong
|
||||||
|
syntactic context, and an incorrect list of expected tokens in a verbose
|
||||||
|
syntax error message. @xref{Decl Summary,,parse.lac}.
|
||||||
|
|
||||||
@item Language construct
|
@item Language construct
|
||||||
One of the typical usage schemas of the language. For example, one of
|
One of the typical usage schemas of the language. For example, one of
|
||||||
the constructs of the C language is the @code{if} statement.
|
the constructs of the C language is the @code{if} statement.
|
||||||
@@ -11397,7 +11499,7 @@ grammatically indivisible. The piece of text it represents is a token.
|
|||||||
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
|
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
|
||||||
@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
|
@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
|
||||||
@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
|
@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
|
||||||
@c LocalWords: subexpressions declarator nondeferred config libintl postfix
|
@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
|
||||||
@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
|
@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
|
||||||
@c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC
|
@c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC
|
||||||
@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
|
@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
|
||||||
|
|||||||
Reference in New Issue
Block a user