mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-21 10:13:03 +00:00
doc: formatting changes
* doc/bison.texi: No visible changes.
This commit is contained in:
189
doc/bison.texi
189
doc/bison.texi
@@ -819,35 +819,32 @@ input. These are known respectively as @dfn{reduce/reduce} conflicts
|
|||||||
(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
|
(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
|
||||||
(@pxref{Shift/Reduce}).
|
(@pxref{Shift/Reduce}).
|
||||||
|
|
||||||
To use a grammar that is not easily modified to be LR(1), a
|
To use a grammar that is not easily modified to be LR(1), a more general
|
||||||
more general parsing algorithm is sometimes necessary. If you include
|
parsing algorithm is sometimes necessary. If you include @code{%glr-parser}
|
||||||
@code{%glr-parser} among the Bison declarations in your file
|
among the Bison declarations in your file (@pxref{Grammar Outline}), the
|
||||||
(@pxref{Grammar Outline}), the result is a Generalized LR
|
result is a Generalized LR (GLR) parser. These parsers handle Bison
|
||||||
(GLR) parser. These parsers handle Bison grammars that
|
grammars that contain no unresolved conflicts (i.e., after applying
|
||||||
contain no unresolved conflicts (i.e., after applying precedence
|
precedence declarations) identically to deterministic parsers. However,
|
||||||
declarations) identically to deterministic parsers. However, when
|
when faced with unresolved shift/reduce and reduce/reduce conflicts, GLR
|
||||||
faced with unresolved shift/reduce and reduce/reduce conflicts,
|
parsers use the simple expedient of doing both, effectively cloning the
|
||||||
GLR parsers use the simple expedient of doing both,
|
parser to follow both possibilities. Each of the resulting parsers can
|
||||||
effectively cloning the parser to follow both possibilities. Each of
|
again split, so that at any given time, there can be any number of possible
|
||||||
the resulting parsers can again split, so that at any given time, there
|
parses being explored. The parsers proceed in lockstep; that is, all of
|
||||||
can be any number of possible parses being explored. The parsers
|
them consume (shift) a given input symbol before any of them proceed to the
|
||||||
proceed in lockstep; that is, all of them consume (shift) a given input
|
next. Each of the cloned parsers eventually meets one of two possible
|
||||||
symbol before any of them proceed to the next. Each of the cloned
|
fates: either it runs into a parsing error, in which case it simply
|
||||||
parsers eventually meets one of two possible fates: either it runs into
|
vanishes, or it merges with another parser, because the two of them have
|
||||||
a parsing error, in which case it simply vanishes, or it merges with
|
reduced the input to an identical set of symbols.
|
||||||
another parser, because the two of them have reduced the input to an
|
|
||||||
identical set of symbols.
|
|
||||||
|
|
||||||
During the time that there are multiple parsers, semantic actions are
|
During the time that there are multiple parsers, semantic actions are
|
||||||
recorded, but not performed. When a parser disappears, its recorded
|
recorded, but not performed. When a parser disappears, its recorded
|
||||||
semantic actions disappear as well, and are never performed. When a
|
semantic actions disappear as well, and are never performed. When a
|
||||||
reduction makes two parsers identical, causing them to merge, Bison
|
reduction makes two parsers identical, causing them to merge, Bison records
|
||||||
records both sets of semantic actions. Whenever the last two parsers
|
both sets of semantic actions. Whenever the last two parsers merge,
|
||||||
merge, reverting to the single-parser case, Bison resolves all the
|
reverting to the single-parser case, Bison resolves all the outstanding
|
||||||
outstanding actions either by precedences given to the grammar rules
|
actions either by precedences given to the grammar rules involved, or by
|
||||||
involved, or by performing both actions, and then calling a designated
|
performing both actions, and then calling a designated user-defined function
|
||||||
user-defined function on the resulting values to produce an arbitrary
|
on the resulting values to produce an arbitrary merged result.
|
||||||
merged result.
|
|
||||||
|
|
||||||
@menu
|
@menu
|
||||||
* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
|
* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
|
||||||
@@ -881,13 +878,11 @@ type enum = (a, b, c);
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@noindent
|
@noindent
|
||||||
The original language standard allows only numeric
|
The original language standard allows only numeric literals and constant
|
||||||
literals and constant identifiers for the subrange bounds (@samp{lo}
|
identifiers for the subrange bounds (@samp{lo} and @samp{hi}), but Extended
|
||||||
and @samp{hi}), but Extended Pascal (ISO/IEC
|
Pascal (ISO/IEC 10206) and many other Pascal implementations allow arbitrary
|
||||||
10206) and many other
|
expressions there. This gives rise to the following situation, containing a
|
||||||
Pascal implementations allow arbitrary expressions there. This gives
|
superfluous pair of parentheses:
|
||||||
rise to the following situation, containing a superfluous pair of
|
|
||||||
parentheses:
|
|
||||||
|
|
||||||
@example
|
@example
|
||||||
type subrange = (a) .. b;
|
type subrange = (a) .. b;
|
||||||
@@ -902,62 +897,55 @@ type enum = (a);
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@noindent
|
@noindent
|
||||||
(These declarations are contrived, but they are syntactically
|
(These declarations are contrived, but they are syntactically valid, and
|
||||||
valid, and more-complicated cases can come up in practical programs.)
|
more-complicated cases can come up in practical programs.)
|
||||||
|
|
||||||
These two declarations look identical until the @samp{..} token.
|
These two declarations look identical until the @samp{..} token. With
|
||||||
With normal LR(1) one-token lookahead it is not
|
normal LR(1) one-token lookahead it is not possible to decide between the
|
||||||
possible to decide between the two forms when the identifier
|
two forms when the identifier @samp{a} is parsed. It is, however, desirable
|
||||||
@samp{a} is parsed. It is, however, desirable
|
for a parser to decide this, since in the latter case @samp{a} must become a
|
||||||
for a parser to decide this, since in the latter case
|
new identifier to represent the enumeration value, while in the former case
|
||||||
@samp{a} must become a new identifier to represent the enumeration
|
@samp{a} must be evaluated with its current meaning, which may be a constant
|
||||||
value, while in the former case @samp{a} must be evaluated with its
|
or even a function call.
|
||||||
current meaning, which may be a constant or even a function call.
|
|
||||||
|
|
||||||
You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
|
You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
|
||||||
to be resolved later, but this typically requires substantial
|
to be resolved later, but this typically requires substantial contortions in
|
||||||
contortions in both semantic actions and large parts of the
|
both semantic actions and large parts of the grammar, where the parentheses
|
||||||
grammar, where the parentheses are nested in the recursive rules for
|
are nested in the recursive rules for expressions.
|
||||||
expressions.
|
|
||||||
|
|
||||||
You might think of using the lexer to distinguish between the two
|
You might think of using the lexer to distinguish between the two forms by
|
||||||
forms by returning different tokens for currently defined and
|
returning different tokens for currently defined and undefined identifiers.
|
||||||
undefined identifiers. But if these declarations occur in a local
|
But if these declarations occur in a local scope, and @samp{a} is defined in
|
||||||
scope, and @samp{a} is defined in an outer scope, then both forms
|
an outer scope, then both forms are possible---either locally redefining
|
||||||
are possible---either locally redefining @samp{a}, or using the
|
@samp{a}, or using the value of @samp{a} from the outer scope. So this
|
||||||
value of @samp{a} from the outer scope. So this approach cannot
|
approach cannot work.
|
||||||
work.
|
|
||||||
|
|
||||||
A simple solution to this problem is to declare the parser to
|
A simple solution to this problem is to declare the parser to use the GLR
|
||||||
use the GLR algorithm.
|
algorithm. When the GLR parser reaches the critical state, it merely splits
|
||||||
When the GLR parser reaches the critical state, it
|
into two branches and pursues both syntax rules simultaneously. Sooner or
|
||||||
merely splits into two branches and pursues both syntax rules
|
later, one of them runs into a parsing error. If there is a @samp{..} token
|
||||||
simultaneously. Sooner or later, one of them runs into a parsing
|
before the next @samp{;}, the rule for enumerated types fails since it
|
||||||
error. If there is a @samp{..} token before the next
|
cannot accept @samp{..} anywhere; otherwise, the subrange type rule fails
|
||||||
@samp{;}, the rule for enumerated types fails since it cannot
|
since it requires a @samp{..} token. So one of the branches fails silently,
|
||||||
accept @samp{..} anywhere; otherwise, the subrange type rule
|
and the other one continues normally, performing all the intermediate
|
||||||
fails since it requires a @samp{..} token. So one of the branches
|
actions that were postponed during the split.
|
||||||
fails silently, and the other one continues normally, performing
|
|
||||||
all the intermediate actions that were postponed during the split.
|
|
||||||
|
|
||||||
If the input is syntactically incorrect, both branches fail and the parser
|
If the input is syntactically incorrect, both branches fail and the parser
|
||||||
reports a syntax error as usual.
|
reports a syntax error as usual.
|
||||||
|
|
||||||
The effect of all this is that the parser seems to ``guess'' the
|
The effect of all this is that the parser seems to ``guess'' the correct
|
||||||
correct branch to take, or in other words, it seems to use more
|
branch to take, or in other words, it seems to use more lookahead than the
|
||||||
lookahead than the underlying LR(1) algorithm actually allows
|
underlying LR(1) algorithm actually allows for. In this example, LR(2)
|
||||||
for. In this example, LR(2) would suffice, but also some cases
|
would suffice, but also some cases that are not LR(@math{k}) for any
|
||||||
that are not LR(@math{k}) for any @math{k} can be handled this way.
|
@math{k} can be handled this way.
|
||||||
|
|
||||||
In general, a GLR parser can take quadratic or cubic worst-case time,
|
In general, a GLR parser can take quadratic or cubic worst-case time, and
|
||||||
and the current Bison parser even takes exponential time and space
|
the current Bison parser even takes exponential time and space for some
|
||||||
for some grammars. In practice, this rarely happens, and for many
|
grammars. In practice, this rarely happens, and for many grammars it is
|
||||||
grammars it is possible to prove that it cannot happen.
|
possible to prove that it cannot happen. The present example contains only
|
||||||
The present example contains only one conflict between two
|
one conflict between two rules, and the type-declaration context containing
|
||||||
rules, and the type-declaration context containing the conflict
|
the conflict cannot be nested. So the number of branches that can exist at
|
||||||
cannot be nested. So the number of
|
any time is limited by the constant 2, and the parsing time is still linear.
|
||||||
branches that can exist at any time is limited by the constant 2,
|
|
||||||
and the parsing time is still linear.
|
|
||||||
|
|
||||||
Here is a Bison grammar corresponding to the example above. It
|
Here is a Bison grammar corresponding to the example above. It
|
||||||
parses a vastly simplified form of Pascal type declarations.
|
parses a vastly simplified form of Pascal type declarations.
|
||||||
@@ -1020,32 +1008,29 @@ these two declarations to the Bison grammar file (before the first
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@noindent
|
@noindent
|
||||||
No change in the grammar itself is required. Now the
|
No change in the grammar itself is required. Now the parser recognizes all
|
||||||
parser recognizes all valid declarations, according to the
|
valid declarations, according to the limited syntax above, transparently.
|
||||||
limited syntax above, transparently. In fact, the user does not even
|
In fact, the user does not even notice when the parser splits.
|
||||||
notice when the parser splits.
|
|
||||||
|
|
||||||
So here we have a case where we can use the benefits of GLR,
|
So here we have a case where we can use the benefits of GLR, almost without
|
||||||
almost without disadvantages. Even in simple cases like this, however,
|
disadvantages. Even in simple cases like this, however, there are at least
|
||||||
there are at least two potential problems to beware. First, always
|
two potential problems to beware. First, always analyze the conflicts
|
||||||
analyze the conflicts reported by Bison to make sure that GLR
|
reported by Bison to make sure that GLR splitting is only done where it is
|
||||||
splitting is only done where it is intended. A GLR parser
|
intended. A GLR parser splitting inadvertently may cause problems less
|
||||||
splitting inadvertently may cause problems less obvious than an
|
obvious than an LR parser statically choosing the wrong alternative in a
|
||||||
LR parser statically choosing the wrong alternative in a
|
|
||||||
conflict. Second, consider interactions with the lexer (@pxref{Semantic
|
conflict. Second, consider interactions with the lexer (@pxref{Semantic
|
||||||
Tokens}) with great care. Since a split parser consumes tokens without
|
Tokens}) with great care. Since a split parser consumes tokens without
|
||||||
performing any actions during the split, the lexer cannot obtain
|
performing any actions during the split, the lexer cannot obtain information
|
||||||
information via parser actions. Some cases of lexer interactions can be
|
via parser actions. Some cases of lexer interactions can be eliminated by
|
||||||
eliminated by using GLR to shift the complications from the
|
using GLR to shift the complications from the lexer to the parser. You must
|
||||||
lexer to the parser. You must check the remaining cases for
|
check the remaining cases for correctness.
|
||||||
correctness.
|
|
||||||
|
|
||||||
In our example, it would be safe for the lexer to return tokens based on
|
In our example, it would be safe for the lexer to return tokens based on
|
||||||
their current meanings in some symbol table, because no new symbols are
|
their current meanings in some symbol table, because no new symbols are
|
||||||
defined in the middle of a type declaration. Though it is possible for
|
defined in the middle of a type declaration. Though it is possible for a
|
||||||
a parser to define the enumeration constants as they are parsed, before
|
parser to define the enumeration constants as they are parsed, before the
|
||||||
the type declaration is completed, it actually makes no difference since
|
type declaration is completed, it actually makes no difference since they
|
||||||
they cannot be used within the same enumerated type declaration.
|
cannot be used within the same enumerated type declaration.
|
||||||
|
|
||||||
@node Merging GLR Parses
|
@node Merging GLR Parses
|
||||||
@subsection Using GLR to Resolve Ambiguities
|
@subsection Using GLR to Resolve Ambiguities
|
||||||
@@ -7084,10 +7069,10 @@ If the grammar uses literal string tokens, there are two ways that
|
|||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
If the grammar defines symbolic token names as aliases for the
|
If the grammar defines symbolic token names as aliases for the literal
|
||||||
literal string tokens, @code{yylex} can use these symbolic names like
|
string tokens, @code{yylex} can use these symbolic names like all others.
|
||||||
all others. In this case, the use of the literal string tokens in
|
In this case, the use of the literal string tokens in the grammar file has
|
||||||
the grammar file has no effect on @code{yylex}.
|
no effect on @code{yylex}.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
@code{yylex} can find the multicharacter token in the @code{yytname} table.
|
@code{yylex} can find the multicharacter token in the @code{yytname} table.
|
||||||
|
|||||||
Reference in New Issue
Block a user