mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 04:13:03 +00:00
parse.lac: document.
* NEWS (2.5): Add entry for LAC, and mention LAC in entry for
other corrections to verbose syntax error messages.
* doc/bison.texinfo (Decl Summary): Rewrite entries for
lr.default-reductions and lr.type to be clearer, to mention
%nonassoc's effect on canonical LR, and to mention LAC. Add entry
for parse.lac.
(Glossary): Add entry for LAC.
(cherry picked from commit fcf834f9ec)
Conflicts:
doc/bison.texinfo
This commit is contained in:
11
ChangeLog
11
ChangeLog
@@ -1,3 +1,14 @@
|
||||
2010-12-19 Joel E. Denny <jdenny@clemson.edu>
|
||||
|
||||
parse.lac: document.
|
||||
* NEWS (2.5): Add entry for LAC, and mention LAC in entry for
|
||||
other corrections to verbose syntax error messages.
|
||||
* doc/bison.texinfo (Decl Summary): Rewrite entries for
|
||||
lr.default-reductions and lr.type to be clearer, to mention
|
||||
%nonassoc's effect on canonical LR, and to mention LAC. Add entry
|
||||
for parse.lac.
|
||||
(Glossary): Add entry for LAC.
|
||||
|
||||
2010-12-11 Joel E. Denny <jdenny@clemson.edu>
|
||||
|
||||
parse.lac: implement exploratory stack reallocations.
|
||||
|
||||
72
NEWS
72
NEWS
@@ -58,6 +58,46 @@ Bison News
|
||||
These features are experimental. More user feedback will help to
|
||||
stabilize them.
|
||||
|
||||
** LAC (lookahead correction) for syntax error handling:
|
||||
|
||||
Canonical LR, IELR, and LALR can suffer from a couple of problems
|
||||
upon encountering a syntax error. First, the parser might perform
|
||||
additional parser stack reductions before discovering the syntax
|
||||
error. Such reductions perform user semantic actions that are
|
||||
unexpected because they are based on an invalid token, and they
|
||||
cause error recovery to begin in a different syntactic context than
|
||||
the one in which the invalid token was encountered. Second, when
|
||||
verbose error messages are enabled (with %error-verbose or `#define
|
||||
YYERROR_VERBOSE'), the expected token list in the syntax error
|
||||
message can both contain invalid tokens and omit valid tokens.
|
||||
|
||||
The culprits for the above problems are %nonassoc, default
|
||||
reductions in inconsistent states, and parser state merging. Thus,
|
||||
IELR and LALR suffer the most. Canonical LR can suffer only if
|
||||
%nonassoc is used or if default reductions are enabled for
|
||||
inconsistent states.
|
||||
|
||||
LAC is a new mechanism within the parsing algorithm that completely
|
||||
solves these problems for canonical LR, IELR, and LALR without
|
||||
sacrificing %nonassoc, default reductions, or state mering. When
|
||||
LAC is in use, canonical LR and IELR behave exactly the same for
|
||||
both syntactically acceptable and syntactically unacceptable input.
|
||||
While LALR still does not support the full language-recognition
|
||||
power of canonical LR and IELR, LAC at least enables LALR's syntax
|
||||
error handling to correctly reflect LALR's language-recognition
|
||||
power.
|
||||
|
||||
Currently, LAC is only supported for deterministic parsers in C.
|
||||
You can enable LAC with the following directive:
|
||||
|
||||
%define parse.lac full
|
||||
|
||||
See the documentation for `%define parse.lac' in the section `Bison
|
||||
Declaration Summary' in the Bison manual for additional details.
|
||||
|
||||
LAC is an experimental feature. More user feedback will help to
|
||||
stabilize it.
|
||||
|
||||
** Unrecognized %code qualifiers are now an error not a warning.
|
||||
|
||||
** %define improvements.
|
||||
@@ -166,11 +206,11 @@ Bison News
|
||||
|
||||
** Verbose syntax error message fixes:
|
||||
|
||||
When %error-verbose or `#define YYERROR_VERBOSE' is specified, syntax
|
||||
error messages produced by the generated parser include the unexpected
|
||||
token as well as a list of expected tokens. The effect of %nonassoc
|
||||
on these verbose messages has been corrected in two ways, but
|
||||
additional fixes are still being implemented:
|
||||
When %error-verbose or `#define YYERROR_VERBOSE' is specified,
|
||||
syntax error messages produced by the generated parser include the
|
||||
unexpected token as well as a list of expected tokens. The effect
|
||||
of %nonassoc on these verbose messages has been corrected in two
|
||||
ways, but a complete fix requires LAC, described above:
|
||||
|
||||
*** When %nonassoc is used, there can exist parser states that accept no
|
||||
tokens, and so the parser does not always require a lookahead token
|
||||
@@ -189,16 +229,18 @@ Bison News
|
||||
tokens are now properly omitted from the list.
|
||||
|
||||
*** Expected token lists are still often wrong due to state merging
|
||||
(from LALR or IELR) and default reductions, which can both add and
|
||||
subtract valid tokens. Canonical LR almost completely fixes this
|
||||
problem by eliminating state merging and default reductions.
|
||||
However, there is one minor problem left even when using canonical
|
||||
LR and even after the fixes above. That is, if the resolution of a
|
||||
conflict with %nonassoc appears in a later parser state than the one
|
||||
at which some syntax error is discovered, the conflicted token is
|
||||
still erroneously included in the expected token list. We are
|
||||
currently working on a fix to eliminate this problem and to
|
||||
eliminate the need for canonical LR.
|
||||
(from LALR or IELR) and default reductions, which can both add
|
||||
invalid tokens and subtract valid tokens. Canonical LR almost
|
||||
completely fixes this problem by eliminating state merging and
|
||||
default reductions. However, there is one minor problem left even
|
||||
when using canonical LR and even after the fixes above. That is,
|
||||
if the resolution of a conflict with %nonassoc appears in a later
|
||||
parser state than the one at which some syntax error is
|
||||
discovered, the conflicted token is still erroneously included in
|
||||
the expected token list. Bison's new LAC implementation,
|
||||
described above, eliminates this problem and the need for
|
||||
canonical LR. However, LAC is still experimental and is disabled
|
||||
by default.
|
||||
|
||||
** Destructor calls fixed for lookaheads altered in semantic actions.
|
||||
|
||||
|
||||
@@ -5028,57 +5028,61 @@ More user feedback will help to stabilize it.)
|
||||
@findex %define lr.default-reductions
|
||||
@cindex delayed syntax errors
|
||||
@cindex syntax errors delayed
|
||||
@cindex @acronym{LAC}
|
||||
@findex %nonassoc
|
||||
|
||||
@itemize @bullet
|
||||
@item Language(s): all
|
||||
|
||||
@item Purpose: Specifies the kind of states that are permitted to
|
||||
@item Purpose: Specify the kind of states that are permitted to
|
||||
contain default reductions.
|
||||
That is, in such a state, Bison declares the reduction with the largest
|
||||
lookahead set to be the default reduction and then removes that
|
||||
That is, in such a state, Bison selects the reduction with the largest
|
||||
lookahead set to be the default parser action and then removes that
|
||||
lookahead set.
|
||||
The advantages of default reductions are discussed below.
|
||||
The disadvantage is that, when the generated parser encounters a
|
||||
syntactically unacceptable token, the parser might then perform
|
||||
unnecessary default reductions before it can detect the syntax error.
|
||||
|
||||
(This feature is experimental.
|
||||
(The ability to specify where default reductions should be used is
|
||||
experimental.
|
||||
More user feedback will help to stabilize it.)
|
||||
|
||||
@item Accepted Values:
|
||||
@itemize
|
||||
@item @code{all}.
|
||||
For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl
|
||||
Summary,,lr.type}) by default, all states are permitted to contain
|
||||
default reductions.
|
||||
The advantage is that parser table sizes can be significantly reduced.
|
||||
The reason Bison does not by default attempt to address the disadvantage
|
||||
of delayed syntax error detection is that this disadvantage is already
|
||||
inherent in @acronym{LALR} and @acronym{IELR} parser tables.
|
||||
That is, unlike in a canonical @acronym{LR} state, the lookahead sets of
|
||||
reductions in an @acronym{LALR} or @acronym{IELR} state can contain
|
||||
tokens that are syntactically incorrect for some left contexts.
|
||||
This is the traditional Bison behavior.
|
||||
The main advantage is a significant decrease in the size of the parser
|
||||
tables.
|
||||
The disadvantage is that, when the generated parser encounters a
|
||||
syntactically unacceptable token, the parser might then perform
|
||||
unnecessary default reductions before it can detect the syntax error.
|
||||
Such delayed syntax error detection is usually inherent in
|
||||
@acronym{LALR} and @acronym{IELR} parser tables anyway due to
|
||||
@acronym{LR} state merging (@pxref{Decl Summary,,lr.type}).
|
||||
Furthermore, the use of @code{%nonassoc} can contribute to delayed
|
||||
syntax error detection even in the case of canonical @acronym{LR}.
|
||||
As an experimental feature, delayed syntax error detection can be
|
||||
overcome in all cases by enabling @acronym{LAC} (@pxref{Decl
|
||||
Summary,,parse.lac}, for details, including a discussion of the effects
|
||||
of delayed syntax error detection).
|
||||
|
||||
@item @code{consistent}.
|
||||
@cindex consistent states
|
||||
A consistent state is a state that has only one possible action.
|
||||
If that action is a reduction, then the parser does not need to request
|
||||
a lookahead token from the scanner before performing that action.
|
||||
However, the parser only recognizes the ability to ignore the lookahead
|
||||
token when such a reduction is encoded as a default reduction.
|
||||
Thus, if default reductions are permitted in and only in consistent
|
||||
states, then a canonical @acronym{LR} parser reports a syntax error as
|
||||
soon as it @emph{needs} the syntactically unacceptable token from the
|
||||
scanner.
|
||||
However, the parser recognizes the ability to ignore the lookahead token
|
||||
in this way only when such a reduction is encoded as a default
|
||||
reduction.
|
||||
Thus, if default reductions are permitted only in consistent states,
|
||||
then a canonical @acronym{LR} parser that does not employ
|
||||
@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
|
||||
syntactically unacceptable token from the scanner.
|
||||
|
||||
@item @code{accepting}.
|
||||
@cindex accepting state
|
||||
By default, the only default reduction permitted in a canonical
|
||||
@acronym{LR} parser is the accept action in the accepting state, which
|
||||
the parser reaches only after reading all tokens from the input.
|
||||
Thus, the default canonical @acronym{LR} parser reports a syntax error
|
||||
as soon as it @emph{reaches} the syntactically unacceptable token
|
||||
without performing any extra reductions.
|
||||
In the accepting state, the default reduction is actually the accept
|
||||
action.
|
||||
In this case, a canonical @acronym{LR} parser that does not employ
|
||||
@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
|
||||
syntactically unacceptable token in the input.
|
||||
That is, it does not perform any extra reductions.
|
||||
@end itemize
|
||||
|
||||
@item Default Value:
|
||||
@@ -5197,17 +5201,23 @@ This can significantly reduce the complexity of developing of a grammar.
|
||||
@item @code{canonical-lr}.
|
||||
@cindex delayed syntax errors
|
||||
@cindex syntax errors delayed
|
||||
The only advantage of canonical @acronym{LR} over @acronym{IELR} is
|
||||
that, for every left context of every canonical @acronym{LR} state, the
|
||||
set of tokens accepted by that state is the exact set of tokens that is
|
||||
syntactically acceptable in that left context.
|
||||
Thus, the only difference in parsing behavior is that the canonical
|
||||
@acronym{LR} parser can report a syntax error as soon as possible
|
||||
without performing any unnecessary reductions.
|
||||
@xref{Decl Summary,,lr.default-reductions}, for further details.
|
||||
Even when canonical @acronym{LR} behavior is ultimately desired,
|
||||
@acronym{IELR}'s elimination of duplicate conflicts should still
|
||||
facilitate the development of a grammar.
|
||||
@cindex @acronym{LAC}
|
||||
@findex %nonassoc
|
||||
While inefficient, canonical @acronym{LR} parser tables can be an
|
||||
interesting means to explore a grammar because they have a property that
|
||||
@acronym{IELR} and @acronym{LALR} tables do not.
|
||||
That is, if @code{%nonassoc} is not used and default reductions are left
|
||||
disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every
|
||||
left context of every canonical @acronym{LR} state, the set of tokens
|
||||
accepted by that state is guaranteed to be the exact set of tokens that
|
||||
is syntactically acceptable in that left context.
|
||||
It might then seem that an advantage of canonical @acronym{LR} parsers
|
||||
in production is that, under the above constraints, they are guaranteed
|
||||
to detect a syntax error as soon as possible without performing any
|
||||
unnecessary reductions.
|
||||
However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl
|
||||
Summary,,parse.lac}) are also able to achieve this behavior without
|
||||
sacrificing @code{%nonassoc} or default reductions.
|
||||
@end itemize
|
||||
|
||||
@item Default Value: @code{lalr}
|
||||
@@ -5264,6 +5274,89 @@ For example, if you specify:
|
||||
The parser namespace is @code{foo} and @code{yylex} is referenced as
|
||||
@code{bar::lex}.
|
||||
@end itemize
|
||||
|
||||
@c ================================================== parse.lac
|
||||
@item parse.lac
|
||||
@findex %define parse.lac
|
||||
@cindex @acronym{LAC}
|
||||
@cindex lookahead correction
|
||||
|
||||
@itemize
|
||||
@item Languages(s): C
|
||||
|
||||
@item Purpose: Enable @acronym{LAC} (lookahead correction) to improve
|
||||
syntax error handling.
|
||||
|
||||
Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer
|
||||
from a couple of problems upon encountering a syntax error. First, the
|
||||
parser might perform additional parser stack reductions before
|
||||
discovering the syntax error. Such reductions perform user semantic
|
||||
actions that are unexpected because they are based on an invalid token,
|
||||
and they cause error recovery to begin in a different syntactic context
|
||||
than the one in which the invalid token was encountered. Second, when
|
||||
verbose error messages are enabled (with @code{%error-verbose} or
|
||||
@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
|
||||
error message can both contain invalid tokens and omit valid tokens.
|
||||
|
||||
The culprits for the above problems are @code{%nonassoc}, default
|
||||
reductions in inconsistent states, and parser state merging. Thus,
|
||||
@acronym{IELR} and @acronym{LALR} suffer the most. Canonical
|
||||
@acronym{LR} can suffer only if @code{%nonassoc} is used or if default
|
||||
reductions are enabled for inconsistent states.
|
||||
|
||||
@acronym{LAC} is a new mechanism within the parsing algorithm that
|
||||
completely solves these problems for canonical @acronym{LR},
|
||||
@acronym{IELR}, and @acronym{LALR} without sacrificing @code{%nonassoc},
|
||||
default reductions, or state mering. Conceptually, the mechanism is
|
||||
straight-forward. Whenever the parser fetches a new token from the
|
||||
scanner so that it can determine the next parser action, it immediately
|
||||
suspends normal parsing and performs an exploratory parse using a
|
||||
temporary copy of the normal parser state stack. During this
|
||||
exploratory parse, the parser does not perform user semantic actions.
|
||||
If the exploratory parse reaches a shift action, normal parsing then
|
||||
resumes on the normal parser stacks. If the exploratory parse reaches
|
||||
an error instead, the parser reports a syntax error. If verbose syntax
|
||||
error messages are enabled, the parser must then discover the list of
|
||||
expected tokens, so it performs a separate exploratory parse for each
|
||||
token in the grammar.
|
||||
|
||||
There is one subtlety about the use of @acronym{LAC}. That is, when in
|
||||
a consistent parser state with a default reduction, the parser will not
|
||||
attempt to fetch a token from the scanner because no lookahead is needed
|
||||
to determine the next parser action. Thus, whether default reductions
|
||||
are enabled in consistent states (@pxref{Decl
|
||||
Summary,,lr.default-reductions}) affects how soon the parser detects a
|
||||
syntax error: when it @emph{reaches} an erroneous token or when it
|
||||
eventually @emph{needs} that token as a lookahead. The latter behavior
|
||||
is probably more intuitive, so Bison currently provides no way to
|
||||
achieve the former behavior while default reductions are fully enabled.
|
||||
|
||||
Thus, when @acronym{LAC} is in use, for some fixed decision of whether
|
||||
to enable default reductions in consistent states, canonical
|
||||
@acronym{LR} and @acronym{IELR} behave exactly the same for both
|
||||
syntactically acceptable and syntactically unacceptable input. While
|
||||
@acronym{LALR} still does not support the full language-recognition
|
||||
power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at
|
||||
least enables @acronym{LALR}'s syntax error handling to correctly
|
||||
reflect @acronym{LALR}'s language-recognition power.
|
||||
|
||||
Because @acronym{LAC} requires many parse actions to be performed twice,
|
||||
it can have a performance penalty. However, not all parse actions must
|
||||
be performed twice. Specifically, during a series of default reductions
|
||||
in consistent states and shift actions, the parser never has to initiate
|
||||
an exploratory parse. Moreover, the most time-consuming tasks in a
|
||||
parse are often the file I/O, the lexical analysis performed by the
|
||||
scanner, and the user's semantic actions, but none of these are
|
||||
performed during the exploratory parse. Finally, the base of the
|
||||
temporary stack used during an exploratory parse is a pointer into the
|
||||
normal parser state stack so that the stack is never physically copied.
|
||||
In our experience, the performance penalty of @acronym{LAC} has proven
|
||||
insignificant for practical grammars.
|
||||
|
||||
@item Accepted Values: @code{none}, @code{full}
|
||||
|
||||
@item Default Value: @code{none}
|
||||
@end itemize
|
||||
@end itemize
|
||||
|
||||
@end deffn
|
||||
@@ -10588,6 +10681,14 @@ performs some operation.
|
||||
@item Input stream
|
||||
A continuous flow of data between devices or programs.
|
||||
|
||||
@item @acronym{LAC} (Lookahead Correction)
|
||||
A parsing mechanism that fixes the problem of delayed syntax error
|
||||
detection, which is caused by LR state merging, default reductions, and
|
||||
the use of @code{%nonassoc}. Delayed syntax error detection results in
|
||||
unexpected semantic actions, initiation of error recovery in the wrong
|
||||
syntactic context, and an incorrect list of expected tokens in a verbose
|
||||
syntax error message. @xref{Decl Summary,,parse.lac}.
|
||||
|
||||
@item Language construct
|
||||
One of the typical usage schemas of the language. For example, one of
|
||||
the constructs of the C language is the @code{if} statement.
|
||||
@@ -10748,7 +10849,7 @@ grammatically indivisible. The piece of text it represents is a token.
|
||||
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
|
||||
@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
|
||||
@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
|
||||
@c LocalWords: subexpressions declarator nondeferred config libintl postfix
|
||||
@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
|
||||
@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
|
||||
@c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC
|
||||
@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
|
||||
|
||||
Reference in New Issue
Block a user