mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-04-24 10:39:38 +00:00
1281 lines
48 KiB
Plaintext
1281 lines
48 KiB
Plaintext
Ceci est le fichier Info bison.info, produit par Makeinfo version 4.0b
|
|
à partir bison.texinfo.
|
|
|
|
START-INFO-DIR-ENTRY
|
|
* bison: (bison). GNU Project parser generator (yacc replacement).
|
|
END-INFO-DIR-ENTRY
|
|
|
|
This file documents the Bison parser generator.
|
|
|
|
Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
|
|
2000, 2001 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of this
|
|
manual provided the copyright notice and this permission notice are
|
|
preserved on all copies.
|
|
|
|
Permission is granted to copy and distribute modified versions of
|
|
this manual under the conditions for verbatim copying, provided also
|
|
that the sections entitled "GNU General Public License" and "Conditions
|
|
for Using Bison" are included exactly as in the original, and provided
|
|
that the entire resulting derived work is distributed under the terms
|
|
of a permission notice identical to this one.
|
|
|
|
Permission is granted to copy and distribute translations of this
|
|
manual into another language, under the above conditions for modified
|
|
versions, except that the sections entitled "GNU General Public
|
|
License", "Conditions for Using Bison" and this permission notice may be
|
|
included in translations approved by the Free Software Foundation
|
|
instead of in the original English.
|
|
|
|
|
|
File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics
|
|
|
|
Actions in Mid-Rule
|
|
-------------------
|
|
|
|
Occasionally it is useful to put an action in the middle of a rule.
|
|
These actions are written just like usual end-of-rule actions, but they
|
|
are executed before the parser even recognizes the following components.
|
|
|
|
A mid-rule action may refer to the components preceding it using
|
|
`$N', but it may not refer to subsequent components because it is run
|
|
before they are parsed.
|
|
|
|
The mid-rule action itself counts as one of the components of the
|
|
rule. This makes a difference when there is another action later in
|
|
the same rule (and usually there is another at the end): you have to
|
|
count the actions along with the symbols when working out which number
|
|
N to use in `$N'.
|
|
|
|
The mid-rule action can also have a semantic value. The action can
|
|
set its value with an assignment to `$$', and actions later in the rule
|
|
can refer to the value using `$N'. Since there is no symbol to name
|
|
the action, there is no way to declare a data type for the value in
|
|
advance, so you must use the `$<...>N' construct to specify a data type
|
|
each time you refer to this value.
|
|
|
|
There is no way to set the value of the entire rule with a mid-rule
|
|
action, because assignments to `$$' do not have that effect. The only
|
|
way to set the value for the entire rule is with an ordinary action at
|
|
the end of the rule.
|
|
|
|
Here is an example from a hypothetical compiler, handling a `let'
|
|
statement that looks like `let (VARIABLE) STATEMENT' and serves to
|
|
create a variable named VARIABLE temporarily for the duration of
|
|
STATEMENT. To parse this construct, we must put VARIABLE into the
|
|
symbol table while STATEMENT is parsed, then remove it afterward. Here
|
|
is how it is done:
|
|
|
|
stmt: LET '(' var ')'
|
|
{ $<context>$ = push_context ();
|
|
declare_variable ($3); }
|
|
stmt { $$ = $6;
|
|
pop_context ($<context>5); }
|
|
|
|
As soon as `let (VARIABLE)' has been recognized, the first action is
|
|
run. It saves a copy of the current semantic context (the list of
|
|
accessible variables) as its semantic value, using alternative
|
|
`context' in the data-type union. Then it calls `declare_variable' to
|
|
add the new variable to that list. Once the first action is finished,
|
|
the embedded statement `stmt' can be parsed. Note that the mid-rule
|
|
action is component number 5, so the `stmt' is component number 6.
|
|
|
|
After the embedded statement is parsed, its semantic value becomes
|
|
the value of the entire `let'-statement. Then the semantic value from
|
|
the earlier action is used to restore the prior list of variables. This
|
|
removes the temporary `let'-variable from the list so that it won't
|
|
appear to exist while the rest of the program is parsed.
|
|
|
|
Taking action before a rule is completely recognized often leads to
|
|
conflicts since the parser must commit to a parse in order to execute
|
|
the action. For example, the following two rules, without mid-rule
|
|
actions, can coexist in a working parser because the parser can shift
|
|
the open-brace token and look at what follows before deciding whether
|
|
there is a declaration or not:
|
|
|
|
compound: '{' declarations statements '}'
|
|
| '{' statements '}'
|
|
;
|
|
|
|
But when we add a mid-rule action as follows, the rules become
|
|
nonfunctional:
|
|
|
|
compound: { prepare_for_local_variables (); }
|
|
'{' declarations statements '}'
|
|
| '{' statements '}'
|
|
;
|
|
|
|
Now the parser is forced to decide whether to run the mid-rule action
|
|
when it has read no farther than the open-brace. In other words, it
|
|
must commit to using one rule or the other, without sufficient
|
|
information to do it correctly. (The open-brace token is what is called
|
|
the "look-ahead" token at this time, since the parser is still deciding
|
|
what to do about it. *Note Look-Ahead Tokens: Look-Ahead.)
|
|
|
|
You might think that you could correct the problem by putting
|
|
identical actions into the two rules, like this:
|
|
|
|
compound: { prepare_for_local_variables (); }
|
|
'{' declarations statements '}'
|
|
| { prepare_for_local_variables (); }
|
|
'{' statements '}'
|
|
;
|
|
|
|
But this does not help, because Bison does not realize that the two
|
|
actions are identical. (Bison never tries to understand the C code in
|
|
an action.)
|
|
|
|
If the grammar is such that a declaration can be distinguished from a
|
|
statement by the first token (which is true in C), then one solution
|
|
which does work is to put the action after the open-brace, like this:
|
|
|
|
compound: '{' { prepare_for_local_variables (); }
|
|
declarations statements '}'
|
|
| '{' statements '}'
|
|
;
|
|
|
|
Now the first token of the following declaration or statement, which
|
|
would in any case tell Bison which rule to use, can still do so.
|
|
|
|
Another solution is to bury the action inside a nonterminal symbol
|
|
which serves as a subroutine:
|
|
|
|
subroutine: /* empty */
|
|
{ prepare_for_local_variables (); }
|
|
;
|
|
|
|
compound: subroutine
|
|
'{' declarations statements '}'
|
|
| subroutine
|
|
'{' statements '}'
|
|
;
|
|
|
|
Now Bison can execute the action in the rule for `subroutine' without
|
|
deciding which rule for `compound' it will eventually use. Note that
|
|
the action is now at the end of its rule. Any mid-rule action can be
|
|
converted to an end-of-rule action in this way, and this is what Bison
|
|
actually does to implement mid-rule actions.
|
|
|
|
|
|
File: bison.info, Node: Locations, Next: Declarations, Prev: Semantics, Up: Grammar File
|
|
|
|
Tracking Locations
|
|
==================
|
|
|
|
Though grammar rules and semantic actions are enough to write a fully
|
|
functional parser, it can be useful to process some additionnal
|
|
informations, especially symbol locations.
|
|
|
|
The way locations are handled is defined by providing a data type,
|
|
and actions to take when rules are matched.
|
|
|
|
* Menu:
|
|
|
|
* Location Type:: Specifying a data type for locations.
|
|
* Actions and Locations:: Using locations in actions.
|
|
* Location Default Action:: Defining a general way to compute locations.
|
|
|
|
|
|
File: bison.info, Node: Location Type, Next: Actions and Locations, Up: Locations
|
|
|
|
Data Type of Locations
|
|
----------------------
|
|
|
|
Defining a data type for locations is much simpler than for semantic
|
|
values, since all tokens and groupings always use the same type.
|
|
|
|
The type of locations is specified by defining a macro called
|
|
`YYLTYPE'. When `YYLTYPE' is not defined, Bison uses a default
|
|
structure type with four members:
|
|
|
|
struct
|
|
{
|
|
int first_line;
|
|
int first_column;
|
|
int last_line;
|
|
int last_column;
|
|
}
|
|
|
|
|
|
File: bison.info, Node: Actions and Locations, Next: Location Default Action, Prev: Location Type, Up: Locations
|
|
|
|
Actions and Locations
|
|
---------------------
|
|
|
|
Actions are not only useful for defining language semantics, but
|
|
also for describing the behavior of the output parser with locations.
|
|
|
|
The most obvious way for building locations of syntactic groupings
|
|
is very similar to the way semantic values are computed. In a given
|
|
rule, several constructs can be used to access the locations of the
|
|
elements being matched. The location of the Nth component of the right
|
|
hand side is `@N', while the location of the left hand side grouping is
|
|
`@$'.
|
|
|
|
Here is a basic example using the default data type for locations:
|
|
|
|
exp: ...
|
|
| exp '/' exp
|
|
{
|
|
@$.first_column = @1.first_column;
|
|
@$.first_line = @1.first_line;
|
|
@$.last_column = @3.last_column;
|
|
@$.last_line = @3.last_line;
|
|
if ($3)
|
|
$$ = $1 / $3;
|
|
else
|
|
{
|
|
$$ = 1;
|
|
printf("Division by zero, l%d,c%d-l%d,c%d",
|
|
@3.first_line, @3.first_column,
|
|
@3.last_line, @3.last_column);
|
|
}
|
|
}
|
|
|
|
As for semantic values, there is a default action for locations that
|
|
is run each time a rule is matched. It sets the beginning of `@$' to the
|
|
beginning of the first symbol, and the end of `@$' to the end of the
|
|
last symbol.
|
|
|
|
With this default action, the location tracking can be fully
|
|
automatic. The example above simply rewrites this way:
|
|
|
|
exp: ...
|
|
| exp '/' exp
|
|
{
|
|
if ($3)
|
|
$$ = $1 / $3;
|
|
else
|
|
{
|
|
$$ = 1;
|
|
printf("Division by zero, l%d,c%d-l%d,c%d",
|
|
@3.first_line, @3.first_column,
|
|
@3.last_line, @3.last_column);
|
|
}
|
|
}
|
|
|
|
|
|
File: bison.info, Node: Location Default Action, Prev: Actions and Locations, Up: Locations
|
|
|
|
Default Action for Locations
|
|
----------------------------
|
|
|
|
Actually, actions are not the best place to compute locations. Since
|
|
locations are much more general than semantic values, there is room in
|
|
the output parser to redefine the default action to take for each rule.
|
|
The `YYLLOC_DEFAULT' macro is called each time a rule is matched,
|
|
before the associated action is run.
|
|
|
|
Most of the time, this macro is general enough to suppress location
|
|
dedicated code from semantic actions.
|
|
|
|
The `YYLLOC_DEFAULT' macro takes three parameters. The first one is
|
|
the location of the grouping (the result of the computation). The
|
|
second one is an array holding locations of all right hand side
|
|
elements of the rule being matched. The last one is the size of the
|
|
right hand side rule.
|
|
|
|
By default, it is defined this way:
|
|
|
|
#define YYLLOC_DEFAULT(Current, Rhs, N) \
|
|
Current.last_line = Rhs[N].last_line; \
|
|
Current.last_column = Rhs[N].last_column;
|
|
|
|
When defining `YYLLOC_DEFAULT', you should consider that:
|
|
|
|
* All arguments are free of side-effects. However, only the first
|
|
one (the result) should be modified by `YYLLOC_DEFAULT'.
|
|
|
|
* Before `YYLLOC_DEFAULT' is executed, the output parser sets `@$'
|
|
to `@1'.
|
|
|
|
* For consistency with semantic actions, valid indexes for the
|
|
location array range from 1 to N.
|
|
|
|
|
|
File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Locations, Up: Grammar File
|
|
|
|
Bison Declarations
|
|
==================
|
|
|
|
The "Bison declarations" section of a Bison grammar defines the
|
|
symbols used in formulating the grammar and the data types of semantic
|
|
values. *Note Symbols::.
|
|
|
|
All token type names (but not single-character literal tokens such as
|
|
`'+'' and `'*'') must be declared. Nonterminal symbols must be
|
|
declared if you need to specify which data type to use for the semantic
|
|
value (*note More Than One Value Type: Multiple Types.).
|
|
|
|
The first rule in the file also specifies the start symbol, by
|
|
default. If you want some other symbol to be the start symbol, you
|
|
must declare it explicitly (*note Languages and Context-Free Grammars:
|
|
Language and Grammar.).
|
|
|
|
* Menu:
|
|
|
|
* Token Decl:: Declaring terminal symbols.
|
|
* Precedence Decl:: Declaring terminals with precedence and associativity.
|
|
* Union Decl:: Declaring the set of all semantic value types.
|
|
* Type Decl:: Declaring the choice of type for a nonterminal symbol.
|
|
* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
|
|
* Start Decl:: Specifying the start symbol.
|
|
* Pure Decl:: Requesting a reentrant parser.
|
|
* Decl Summary:: Table of all Bison declarations.
|
|
|
|
|
|
File: bison.info, Node: Token Decl, Next: Precedence Decl, Up: Declarations
|
|
|
|
Token Type Names
|
|
----------------
|
|
|
|
The basic way to declare a token type name (terminal symbol) is as
|
|
follows:
|
|
|
|
%token NAME
|
|
|
|
Bison will convert this into a `#define' directive in the parser, so
|
|
that the function `yylex' (if it is in this file) can use the name NAME
|
|
to stand for this token type's code.
|
|
|
|
Alternatively, you can use `%left', `%right', or `%nonassoc' instead
|
|
of `%token', if you wish to specify associativity and precedence.
|
|
*Note Operator Precedence: Precedence Decl.
|
|
|
|
You can explicitly specify the numeric code for a token type by
|
|
appending an integer value in the field immediately following the token
|
|
name:
|
|
|
|
%token NUM 300
|
|
|
|
It is generally best, however, to let Bison choose the numeric codes for
|
|
all token types. Bison will automatically select codes that don't
|
|
conflict with each other or with ASCII characters.
|
|
|
|
In the event that the stack type is a union, you must augment the
|
|
`%token' or other token declaration to include the data type
|
|
alternative delimited by angle-brackets (*note More Than One Value
|
|
Type: Multiple Types.).
|
|
|
|
For example:
|
|
|
|
%union { /* define stack type */
|
|
double val;
|
|
symrec *tptr;
|
|
}
|
|
%token <val> NUM /* define token NUM and its type */
|
|
|
|
You can associate a literal string token with a token type name by
|
|
writing the literal string at the end of a `%token' declaration which
|
|
declares the name. For example:
|
|
|
|
%token arrow "=>"
|
|
|
|
For example, a grammar for the C language might specify these names with
|
|
equivalent literal string tokens:
|
|
|
|
%token <operator> OR "||"
|
|
%token <operator> LE 134 "<="
|
|
%left OR "<="
|
|
|
|
Once you equate the literal string and the token name, you can use them
|
|
interchangeably in further declarations or the grammar rules. The
|
|
`yylex' function can use the token name or the literal string to obtain
|
|
the token type code number (*note Calling Convention::).
|
|
|
|
|
|
File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations
|
|
|
|
Operator Precedence
|
|
-------------------
|
|
|
|
Use the `%left', `%right' or `%nonassoc' declaration to declare a
|
|
token and specify its precedence and associativity, all at once. These
|
|
are called "precedence declarations". *Note Operator Precedence:
|
|
Precedence, for general information on operator precedence.
|
|
|
|
The syntax of a precedence declaration is the same as that of
|
|
`%token': either
|
|
|
|
%left SYMBOLS...
|
|
|
|
or
|
|
|
|
%left <TYPE> SYMBOLS...
|
|
|
|
And indeed any of these declarations serves the purposes of `%token'.
|
|
But in addition, they specify the associativity and relative precedence
|
|
for all the SYMBOLS:
|
|
|
|
* The associativity of an operator OP determines how repeated uses
|
|
of the operator nest: whether `X OP Y OP Z' is parsed by grouping
|
|
X with Y first or by grouping Y with Z first. `%left' specifies
|
|
left-associativity (grouping X with Y first) and `%right'
|
|
specifies right-associativity (grouping Y with Z first).
|
|
`%nonassoc' specifies no associativity, which means that `X OP Y
|
|
OP Z' is considered a syntax error.
|
|
|
|
* The precedence of an operator determines how it nests with other
|
|
operators. All the tokens declared in a single precedence
|
|
declaration have equal precedence and nest together according to
|
|
their associativity. When two tokens declared in different
|
|
precedence declarations associate, the one declared later has the
|
|
higher precedence and is grouped first.
|
|
|
|
|
|
File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations
|
|
|
|
The Collection of Value Types
|
|
-----------------------------
|
|
|
|
The `%union' declaration specifies the entire collection of possible
|
|
data types for semantic values. The keyword `%union' is followed by a
|
|
pair of braces containing the same thing that goes inside a `union' in
|
|
C.
|
|
|
|
For example:
|
|
|
|
%union {
|
|
double val;
|
|
symrec *tptr;
|
|
}
|
|
|
|
This says that the two alternative types are `double' and `symrec *'.
|
|
They are given names `val' and `tptr'; these names are used in the
|
|
`%token' and `%type' declarations to pick one of the types for a
|
|
terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.).
|
|
|
|
Note that, unlike making a `union' declaration in C, you do not write
|
|
a semicolon after the closing brace.
|
|
|
|
|
|
File: bison.info, Node: Type Decl, Next: Expect Decl, Prev: Union Decl, Up: Declarations
|
|
|
|
Nonterminal Symbols
|
|
-------------------
|
|
|
|
When you use `%union' to specify multiple value types, you must declare
|
|
the value type of each nonterminal symbol for which values are used.
|
|
This is done with a `%type' declaration, like this:
|
|
|
|
%type <TYPE> NONTERMINAL...
|
|
|
|
Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the
|
|
name given in the `%union' to the alternative that you want (*note The
|
|
Collection of Value Types: Union Decl.). You can give any number of
|
|
nonterminal symbols in the same `%type' declaration, if they have the
|
|
same value type. Use spaces to separate the symbol names.
|
|
|
|
You can also declare the value type of a terminal symbol. To do
|
|
this, use the same `<TYPE>' construction in a declaration for the
|
|
terminal symbol. All kinds of token declarations allow `<TYPE>'.
|
|
|
|
|
|
File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Type Decl, Up: Declarations
|
|
|
|
Suppressing Conflict Warnings
|
|
-----------------------------
|
|
|
|
Bison normally warns if there are any conflicts in the grammar
|
|
(*note Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars
|
|
have harmless shift/reduce conflicts which are resolved in a
|
|
predictable way and would be difficult to eliminate. It is desirable
|
|
to suppress the warning about these conflicts unless the number of
|
|
conflicts changes. You can do this with the `%expect' declaration.
|
|
|
|
The declaration looks like this:
|
|
|
|
%expect N
|
|
|
|
Here N is a decimal integer. The declaration says there should be no
|
|
warning if there are N shift/reduce conflicts and no reduce/reduce
|
|
conflicts. The usual warning is given if there are either more or fewer
|
|
conflicts, or if there are any reduce/reduce conflicts.
|
|
|
|
In general, using `%expect' involves these steps:
|
|
|
|
* Compile your grammar without `%expect'. Use the `-v' option to
|
|
get a verbose list of where the conflicts occur. Bison will also
|
|
print the number of conflicts.
|
|
|
|
* Check each of the conflicts to make sure that Bison's default
|
|
resolution is what you really want. If not, rewrite the grammar
|
|
and go back to the beginning.
|
|
|
|
* Add an `%expect' declaration, copying the number N from the number
|
|
which Bison printed.
|
|
|
|
Now Bison will stop annoying you about the conflicts you have
|
|
checked, but it will warn you again if changes in the grammar result in
|
|
additional conflicts.
|
|
|
|
|
|
File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations
|
|
|
|
The Start-Symbol
|
|
----------------
|
|
|
|
Bison assumes by default that the start symbol for the grammar is
|
|
the first nonterminal specified in the grammar specification section.
|
|
The programmer may override this restriction with the `%start'
|
|
declaration as follows:
|
|
|
|
%start SYMBOL
|
|
|
|
|
|
File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations
|
|
|
|
A Pure (Reentrant) Parser
|
|
-------------------------
|
|
|
|
A "reentrant" program is one which does not alter in the course of
|
|
execution; in other words, it consists entirely of "pure" (read-only)
|
|
code. Reentrancy is important whenever asynchronous execution is
|
|
possible; for example, a non-reentrant program may not be safe to call
|
|
from a signal handler. In systems with multiple threads of control, a
|
|
non-reentrant program must be called only within interlocks.
|
|
|
|
Normally, Bison generates a parser which is not reentrant. This is
|
|
suitable for most uses, and it permits compatibility with YACC. (The
|
|
standard YACC interfaces are inherently nonreentrant, because they use
|
|
statically allocated variables for communication with `yylex',
|
|
including `yylval' and `yylloc'.)
|
|
|
|
Alternatively, you can generate a pure, reentrant parser. The Bison
|
|
declaration `%pure_parser' says that you want the parser to be
|
|
reentrant. It looks like this:
|
|
|
|
%pure_parser
|
|
|
|
The result is that the communication variables `yylval' and `yylloc'
|
|
become local variables in `yyparse', and a different calling convention
|
|
is used for the lexical analyzer function `yylex'. *Note Calling
|
|
Conventions for Pure Parsers: Pure Calling, for the details of this.
|
|
The variable `yynerrs' also becomes local in `yyparse' (*note The Error
|
|
Reporting Function `yyerror': Error Reporting.). The convention for
|
|
calling `yyparse' itself is unchanged.
|
|
|
|
Whether the parser is pure has nothing to do with the grammar rules.
|
|
You can generate either a pure parser or a nonreentrant parser from any
|
|
valid grammar.
|
|
|
|
|
|
File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations
|
|
|
|
Bison Declaration Summary
|
|
-------------------------
|
|
|
|
Here is a summary of all Bison declarations:
|
|
|
|
`%union'
|
|
Declare the collection of data types that semantic values may have
|
|
(*note The Collection of Value Types: Union Decl.).
|
|
|
|
`%token'
|
|
Declare a terminal symbol (token type name) with no precedence or
|
|
associativity specified (*note Token Type Names: Token Decl.).
|
|
|
|
`%right'
|
|
Declare a terminal symbol (token type name) that is
|
|
right-associative (*note Operator Precedence: Precedence Decl.).
|
|
|
|
`%left'
|
|
Declare a terminal symbol (token type name) that is
|
|
left-associative (*note Operator Precedence: Precedence Decl.).
|
|
|
|
`%nonassoc'
|
|
Declare a terminal symbol (token type name) that is nonassociative
|
|
(using it in a way that would be associative is a syntax error)
|
|
(*note Operator Precedence: Precedence Decl.).
|
|
|
|
`%type'
|
|
Declare the type of semantic values for a nonterminal symbol
|
|
(*note Nonterminal Symbols: Type Decl.).
|
|
|
|
`%start'
|
|
Specify the grammar's start symbol (*note The Start-Symbol: Start
|
|
Decl.).
|
|
|
|
`%expect'
|
|
Declare the expected number of shift-reduce conflicts (*note
|
|
Suppressing Conflict Warnings: Expect Decl.).
|
|
|
|
`%yacc'
|
|
`%fixed_output_files'
|
|
Pretend the option `--yacc' was given, i.e., imitate Yacc,
|
|
including its naming conventions. *Note Bison Options::, for more.
|
|
|
|
`%locations'
|
|
Generate the code processing the locations (*note Special Features
|
|
for Use in Actions: Action Features.). This mode is enabled as
|
|
soon as the grammar uses the special `@N' tokens, but if your
|
|
grammar does not use it, using `%locations' allows for more
|
|
accurate parse error messages.
|
|
|
|
`%pure_parser'
|
|
Request a pure (reentrant) parser program (*note A Pure
|
|
(Reentrant) Parser: Pure Decl.).
|
|
|
|
`%no_parser'
|
|
Do not include any C code in the parser file; generate tables
|
|
only. The parser file contains just `#define' directives and
|
|
static variable declarations.
|
|
|
|
This option also tells Bison to write the C code for the grammar
|
|
actions into a file named `FILENAME.act', in the form of a
|
|
brace-surrounded body fit for a `switch' statement.
|
|
|
|
`%no_lines'
|
|
Don't generate any `#line' preprocessor commands in the parser
|
|
file. Ordinarily Bison writes these commands in the parser file
|
|
so that the C compiler and debuggers will associate errors and
|
|
object code with your source file (the grammar file). This
|
|
directive causes them to associate errors with the parser file,
|
|
treating it an independent source file in its own right.
|
|
|
|
`%debug'
|
|
Output a definition of the macro `YYDEBUG' into the parser file, so
|
|
that the debugging facilities are compiled. *Note Debugging Your
|
|
Parser: Debugging.
|
|
|
|
`%defines'
|
|
Write an extra output file containing macro definitions for the
|
|
token type names defined in the grammar and the semantic value type
|
|
`YYSTYPE', as well as a few `extern' variable declarations.
|
|
|
|
If the parser output file is named `NAME.c' then this file is
|
|
named `NAME.h'.
|
|
|
|
This output file is essential if you wish to put the definition of
|
|
`yylex' in a separate source file, because `yylex' needs to be
|
|
able to refer to token type codes and the variable `yylval'.
|
|
*Note Semantic Values of Tokens: Token Values.
|
|
|
|
`%verbose'
|
|
Write an extra output file containing verbose descriptions of the
|
|
parser states and what is done for each type of look-ahead token in
|
|
that state.
|
|
|
|
This file also describes all the conflicts, both those resolved by
|
|
operator precedence and the unresolved ones.
|
|
|
|
The file's name is made by removing `.tab.c' or `.c' from the
|
|
parser output file name, and adding `.output' instead.
|
|
|
|
Therefore, if the input file is `foo.y', then the parser file is
|
|
called `foo.tab.c' by default. As a consequence, the verbose
|
|
output file is called `foo.output'.
|
|
|
|
`%token_table'
|
|
Generate an array of token names in the parser file. The name of
|
|
the array is `yytname'; `yytname[I]' is the name of the token
|
|
whose internal Bison token code number is I. The first three
|
|
elements of `yytname' are always `"$"', `"error"', and
|
|
`"$illegal"'; after these come the symbols defined in the grammar
|
|
file.
|
|
|
|
For single-character literal tokens and literal string tokens, the
|
|
name in the table includes the single-quote or double-quote
|
|
characters: for example, `"'+'"' is a single-character literal and
|
|
`"\"<=\""' is a literal string token. All the characters of the
|
|
literal string token appear verbatim in the string found in the
|
|
table; even double-quote characters are not escaped. For example,
|
|
if the token consists of three characters `*"*', its string in
|
|
`yytname' contains `"*"*"'. (In C, that would be written as
|
|
`"\"*\"*\""').
|
|
|
|
When you specify `%token_table', Bison also generates macro
|
|
definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and
|
|
`YYNSTATES':
|
|
|
|
`YYNTOKENS'
|
|
The highest token number, plus one.
|
|
|
|
`YYNNTS'
|
|
The number of nonterminal symbols.
|
|
|
|
`YYNRULES'
|
|
The number of grammar rules,
|
|
|
|
`YYNSTATES'
|
|
The number of parser states (*note Parser States::).
|
|
|
|
|
|
File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File
|
|
|
|
Multiple Parsers in the Same Program
|
|
====================================
|
|
|
|
Most programs that use Bison parse only one language and therefore
|
|
contain only one Bison parser. But what if you want to parse more than
|
|
one language with the same program? Then you need to avoid a name
|
|
conflict between different definitions of `yyparse', `yylval', and so
|
|
on.
|
|
|
|
The easy way to do this is to use the option `-p PREFIX' (*note
|
|
Invoking Bison: Invocation.). This renames the interface functions and
|
|
variables of the Bison parser to start with PREFIX instead of `yy'.
|
|
You can use this to give each parser distinct names that do not
|
|
conflict.
|
|
|
|
The precise list of symbols renamed is `yyparse', `yylex',
|
|
`yyerror', `yynerrs', `yylval', `yychar' and `yydebug'. For example,
|
|
if you use `-p c', the names become `cparse', `clex', and so on.
|
|
|
|
*All the other variables and macros associated with Bison are not
|
|
renamed.* These others are not global; there is no conflict if the same
|
|
name is used in different parsers. For example, `YYSTYPE' is not
|
|
renamed, but defining this in different ways in different parsers causes
|
|
no trouble (*note Data Types of Semantic Values: Value Type.).
|
|
|
|
The `-p' option works by adding macro definitions to the beginning
|
|
of the parser source file, defining `yyparse' as `PREFIXparse', and so
|
|
on. This effectively substitutes one name for the other in the entire
|
|
parser file.
|
|
|
|
|
|
File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top
|
|
|
|
Parser C-Language Interface
|
|
***************************
|
|
|
|
The Bison parser is actually a C function named `yyparse'. Here we
|
|
describe the interface conventions of `yyparse' and the other functions
|
|
that it needs to use.
|
|
|
|
Keep in mind that the parser uses many C identifiers starting with
|
|
`yy' and `YY' for internal purposes. If you use such an identifier
|
|
(aside from those in this manual) in an action or in additional C code
|
|
in the grammar file, you are likely to run into trouble.
|
|
|
|
* Menu:
|
|
|
|
* Parser Function:: How to call `yyparse' and what it returns.
|
|
* Lexical:: You must supply a function `yylex'
|
|
which reads tokens.
|
|
* Error Reporting:: You must supply a function `yyerror'.
|
|
* Action Features:: Special features for use in actions.
|
|
|
|
|
|
File: bison.info, Node: Parser Function, Next: Lexical, Up: Interface
|
|
|
|
The Parser Function `yyparse'
|
|
=============================
|
|
|
|
You call the function `yyparse' to cause parsing to occur. This
|
|
function reads tokens, executes actions, and ultimately returns when it
|
|
encounters end-of-input or an unrecoverable syntax error. You can also
|
|
write an action which directs `yyparse' to return immediately without
|
|
reading further.
|
|
|
|
The value returned by `yyparse' is 0 if parsing was successful
|
|
(return is due to end-of-input).
|
|
|
|
The value is 1 if parsing failed (return is due to a syntax error).
|
|
|
|
In an action, you can cause immediate return from `yyparse' by using
|
|
these macros:
|
|
|
|
`YYACCEPT'
|
|
Return immediately with value 0 (to report success).
|
|
|
|
`YYABORT'
|
|
Return immediately with value 1 (to report failure).
|
|
|
|
|
|
File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Function, Up: Interface
|
|
|
|
The Lexical Analyzer Function `yylex'
|
|
=====================================
|
|
|
|
The "lexical analyzer" function, `yylex', recognizes tokens from the
|
|
input stream and returns them to the parser. Bison does not create
|
|
this function automatically; you must write it so that `yyparse' can
|
|
call it. The function is sometimes referred to as a lexical scanner.
|
|
|
|
In simple programs, `yylex' is often defined at the end of the Bison
|
|
grammar file. If `yylex' is defined in a separate source file, you
|
|
need to arrange for the token-type macro definitions to be available
|
|
there. To do this, use the `-d' option when you run Bison, so that it
|
|
will write these macro definitions into a separate header file
|
|
`NAME.tab.h' which you can include in the other source files that need
|
|
it. *Note Invoking Bison: Invocation.
|
|
|
|
* Menu:
|
|
|
|
* Calling Convention:: How `yyparse' calls `yylex'.
|
|
* Token Values:: How `yylex' must return the semantic value
|
|
of the token it has read.
|
|
* Token Positions:: How `yylex' must return the text position
|
|
(line number, etc.) of the token, if the
|
|
actions want that.
|
|
* Pure Calling:: How the calling convention differs
|
|
in a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.).
|
|
|
|
|
|
File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical
|
|
|
|
Calling Convention for `yylex'
|
|
------------------------------
|
|
|
|
The value that `yylex' returns must be the numeric code for the type
|
|
of token it has just found, or 0 for end-of-input.
|
|
|
|
When a token is referred to in the grammar rules by a name, that name
|
|
in the parser file becomes a C macro whose definition is the proper
|
|
numeric code for that token type. So `yylex' can use the name to
|
|
indicate that type. *Note Symbols::.
|
|
|
|
When a token is referred to in the grammar rules by a character
|
|
literal, the numeric code for that character is also the code for the
|
|
token type. So `yylex' can simply return that character code. The
|
|
null character must not be used this way, because its code is zero and
|
|
that is what signifies end-of-input.
|
|
|
|
Here is an example showing these things:
|
|
|
|
int
|
|
yylex (void)
|
|
{
|
|
...
|
|
if (c == EOF) /* Detect end of file. */
|
|
return 0;
|
|
...
|
|
if (c == '+' || c == '-')
|
|
return c; /* Assume token type for `+' is '+'. */
|
|
...
|
|
return INT; /* Return the type of the token. */
|
|
...
|
|
}
|
|
|
|
This interface has been designed so that the output from the `lex'
|
|
utility can be used without change as the definition of `yylex'.
|
|
|
|
If the grammar uses literal string tokens, there are two ways that
|
|
`yylex' can determine the token type codes for them:
|
|
|
|
* If the grammar defines symbolic token names as aliases for the
|
|
literal string tokens, `yylex' can use these symbolic names like
|
|
all others. In this case, the use of the literal string tokens in
|
|
the grammar file has no effect on `yylex'.
|
|
|
|
* `yylex' can find the multicharacter token in the `yytname' table.
|
|
The index of the token in the table is the token type's code. The
|
|
name of a multicharacter token is recorded in `yytname' with a
|
|
double-quote, the token's characters, and another double-quote.
|
|
The token's characters are not escaped in any way; they appear
|
|
verbatim in the contents of the string in the table.
|
|
|
|
Here's code for looking up a token in `yytname', assuming that the
|
|
characters of the token are stored in `token_buffer'.
|
|
|
|
for (i = 0; i < YYNTOKENS; i++)
|
|
{
|
|
if (yytname[i] != 0
|
|
&& yytname[i][0] == '"'
|
|
&& strncmp (yytname[i] + 1, token_buffer,
|
|
strlen (token_buffer))
|
|
&& yytname[i][strlen (token_buffer) + 1] == '"'
|
|
&& yytname[i][strlen (token_buffer) + 2] == 0)
|
|
break;
|
|
}
|
|
|
|
The `yytname' table is generated only if you use the
|
|
`%token_table' declaration. *Note Decl Summary::.
|
|
|
|
|
|
File: bison.info, Node: Token Values, Next: Token Positions, Prev: Calling Convention, Up: Lexical
|
|
|
|
Semantic Values of Tokens
|
|
-------------------------
|
|
|
|
In an ordinary (non-reentrant) parser, the semantic value of the
|
|
token must be stored into the global variable `yylval'. When you are
|
|
using just one data type for semantic values, `yylval' has that type.
|
|
Thus, if the type is `int' (the default), you might write this in
|
|
`yylex':
|
|
|
|
...
|
|
yylval = value; /* Put value onto Bison stack. */
|
|
return INT; /* Return the type of the token. */
|
|
...
|
|
|
|
When you are using multiple data types, `yylval''s type is a union
|
|
made from the `%union' declaration (*note The Collection of Value
|
|
Types: Union Decl.). So when you store a token's value, you must use
|
|
the proper member of the union. If the `%union' declaration looks like
|
|
this:
|
|
|
|
%union {
|
|
int intval;
|
|
double val;
|
|
symrec *tptr;
|
|
}
|
|
|
|
then the code in `yylex' might look like this:
|
|
|
|
...
|
|
yylval.intval = value; /* Put value onto Bison stack. */
|
|
return INT; /* Return the type of the token. */
|
|
...
|
|
|
|
|
|
File: bison.info, Node: Token Positions, Next: Pure Calling, Prev: Token Values, Up: Lexical
|
|
|
|
Textual Positions of Tokens
|
|
---------------------------
|
|
|
|
If you are using the `@N'-feature (*note Tracking Locations:
|
|
Locations.) in actions to keep track of the textual locations of tokens
|
|
and groupings, then you must provide this information in `yylex'. The
|
|
function `yyparse' expects to find the textual location of a token just
|
|
parsed in the global variable `yylloc'. So `yylex' must store the
|
|
proper data in that variable.
|
|
|
|
By default, the value of `yylloc' is a structure and you need only
|
|
initialize the members that are going to be used by the actions. The
|
|
four members are called `first_line', `first_column', `last_line' and
|
|
`last_column'. Note that the use of this feature makes the parser
|
|
noticeably slower.
|
|
|
|
The data type of `yylloc' has the name `YYLTYPE'.
|
|
|
|
|
|
File: bison.info, Node: Pure Calling, Prev: Token Positions, Up: Lexical
|
|
|
|
Calling Conventions for Pure Parsers
|
|
------------------------------------
|
|
|
|
When you use the Bison declaration `%pure_parser' to request a pure,
|
|
reentrant parser, the global communication variables `yylval' and
|
|
`yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.)
|
|
In such parsers the two global variables are replaced by pointers
|
|
passed as arguments to `yylex'. You must declare them as shown here,
|
|
and pass the information back by storing it through those pointers.
|
|
|
|
int
|
|
yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
|
|
{
|
|
...
|
|
*lvalp = value; /* Put value onto Bison stack. */
|
|
return INT; /* Return the type of the token. */
|
|
...
|
|
}
|
|
|
|
If the grammar file does not use the `@' constructs to refer to
|
|
textual positions, then the type `YYLTYPE' will not be defined. In
|
|
this case, omit the second argument; `yylex' will be called with only
|
|
one argument.
|
|
|
|
If you use a reentrant parser, you can optionally pass additional
|
|
parameter information to it in a reentrant way. To do so, define the
|
|
macro `YYPARSE_PARAM' as a variable name. This modifies the `yyparse'
|
|
function to accept one argument, of type `void *', with that name.
|
|
|
|
When you call `yyparse', pass the address of an object, casting the
|
|
address to `void *'. The grammar actions can refer to the contents of
|
|
the object by casting the pointer value back to its proper type and
|
|
then dereferencing it. Here's an example. Write this in the parser:
|
|
|
|
%{
|
|
struct parser_control
|
|
{
|
|
int nastiness;
|
|
int randomness;
|
|
};
|
|
|
|
#define YYPARSE_PARAM parm
|
|
%}
|
|
|
|
Then call the parser like this:
|
|
|
|
struct parser_control
|
|
{
|
|
int nastiness;
|
|
int randomness;
|
|
};
|
|
|
|
...
|
|
|
|
{
|
|
struct parser_control foo;
|
|
... /* Store proper data in `foo'. */
|
|
value = yyparse ((void *) &foo);
|
|
...
|
|
}
|
|
|
|
In the grammar actions, use expressions like this to refer to the data:
|
|
|
|
((struct parser_control *) parm)->randomness
|
|
|
|
If you wish to pass the additional parameter data to `yylex', define
|
|
the macro `YYLEX_PARAM' just like `YYPARSE_PARAM', as shown here:
|
|
|
|
%{
|
|
struct parser_control
|
|
{
|
|
int nastiness;
|
|
int randomness;
|
|
};
|
|
|
|
#define YYPARSE_PARAM parm
|
|
#define YYLEX_PARAM parm
|
|
%}
|
|
|
|
You should then define `yylex' to accept one additional
|
|
argument--the value of `parm'. (This makes either two or three
|
|
arguments in total, depending on whether an argument of type `YYLTYPE'
|
|
is passed.) You can declare the argument as a pointer to the proper
|
|
object type, or you can declare it as `void *' and access the contents
|
|
as shown above.
|
|
|
|
You can use `%pure_parser' to request a reentrant parser without
|
|
also using `YYPARSE_PARAM'. Then you should call `yyparse' with no
|
|
arguments, as usual.
|
|
|
|
|
|
File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface
|
|
|
|
The Error Reporting Function `yyerror'
|
|
======================================
|
|
|
|
The Bison parser detects a "parse error" or "syntax error" whenever
|
|
it reads a token which cannot satisfy any syntax rule. An action in
|
|
the grammar can also explicitly proclaim an error, using the macro
|
|
`YYERROR' (*note Special Features for Use in Actions: Action Features.).
|
|
|
|
The Bison parser expects to report the error by calling an error
|
|
reporting function named `yyerror', which you must supply. It is
|
|
called by `yyparse' whenever a syntax error is found, and it receives
|
|
one argument. For a parse error, the string is normally
|
|
`"parse error"'.
|
|
|
|
If you define the macro `YYERROR_VERBOSE' in the Bison declarations
|
|
section (*note The Bison Declarations Section: Bison Declarations.),
|
|
then Bison provides a more verbose and specific error message string
|
|
instead of just plain `"parse error"'. It doesn't matter what
|
|
definition you use for `YYERROR_VERBOSE', just whether you define it.
|
|
|
|
The parser can detect one other kind of error: stack overflow. This
|
|
happens when the input contains constructions that are very deeply
|
|
nested. It isn't likely you will encounter this, since the Bison
|
|
parser extends its stack automatically up to a very large limit. But
|
|
if overflow happens, `yyparse' calls `yyerror' in the usual fashion,
|
|
except that the argument string is `"parser stack overflow"'.
|
|
|
|
The following definition suffices in simple programs:
|
|
|
|
void
|
|
yyerror (char *s)
|
|
{
|
|
fprintf (stderr, "%s\n", s);
|
|
}
|
|
|
|
After `yyerror' returns to `yyparse', the latter will attempt error
|
|
recovery if you have written suitable error recovery grammar rules
|
|
(*note Error Recovery::). If recovery is impossible, `yyparse' will
|
|
immediately return 1.
|
|
|
|
The variable `yynerrs' contains the number of syntax errors
|
|
encountered so far. Normally this variable is global; but if you
|
|
request a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.)
|
|
then it is a local variable which only the actions can access.
|
|
|
|
|
|
File: bison.info, Node: Action Features, Prev: Error Reporting, Up: Interface
|
|
|
|
Special Features for Use in Actions
|
|
===================================
|
|
|
|
Here is a table of Bison constructs, variables and macros that are
|
|
useful in actions.
|
|
|
|
`$$'
|
|
Acts like a variable that contains the semantic value for the
|
|
grouping made by the current rule. *Note Actions::.
|
|
|
|
`$N'
|
|
Acts like a variable that contains the semantic value for the Nth
|
|
component of the current rule. *Note Actions::.
|
|
|
|
`$<TYPEALT>$'
|
|
Like `$$' but specifies alternative TYPEALT in the union specified
|
|
by the `%union' declaration. *Note Data Types of Values in
|
|
Actions: Action Types.
|
|
|
|
`$<TYPEALT>N'
|
|
Like `$N' but specifies alternative TYPEALT in the union specified
|
|
by the `%union' declaration. *Note Data Types of Values in
|
|
Actions: Action Types.
|
|
|
|
`YYABORT;'
|
|
Return immediately from `yyparse', indicating failure. *Note The
|
|
Parser Function `yyparse': Parser Function.
|
|
|
|
`YYACCEPT;'
|
|
Return immediately from `yyparse', indicating success. *Note The
|
|
Parser Function `yyparse': Parser Function.
|
|
|
|
`YYBACKUP (TOKEN, VALUE);'
|
|
Unshift a token. This macro is allowed only for rules that reduce
|
|
a single value, and only when there is no look-ahead token. It
|
|
installs a look-ahead token with token type TOKEN and semantic
|
|
value VALUE; then it discards the value that was going to be
|
|
reduced by this rule.
|
|
|
|
If the macro is used when it is not valid, such as when there is a
|
|
look-ahead token already, then it reports a syntax error with a
|
|
message `cannot back up' and performs ordinary error recovery.
|
|
|
|
In either case, the rest of the action is not executed.
|
|
|
|
`YYEMPTY'
|
|
Value stored in `yychar' when there is no look-ahead token.
|
|
|
|
`YYERROR;'
|
|
Cause an immediate syntax error. This statement initiates error
|
|
recovery just as if the parser itself had detected an error;
|
|
however, it does not call `yyerror', and does not print any
|
|
message. If you want to print an error message, call `yyerror'
|
|
explicitly before the `YYERROR;' statement. *Note Error
|
|
Recovery::.
|
|
|
|
`YYRECOVERING'
|
|
This macro stands for an expression that has the value 1 when the
|
|
parser is recovering from a syntax error, and 0 the rest of the
|
|
time. *Note Error Recovery::.
|
|
|
|
`yychar'
|
|
Variable containing the current look-ahead token. (In a pure
|
|
parser, this is actually a local variable within `yyparse'.) When
|
|
there is no look-ahead token, the value `YYEMPTY' is stored in the
|
|
variable. *Note Look-Ahead Tokens: Look-Ahead.
|
|
|
|
`yyclearin;'
|
|
Discard the current look-ahead token. This is useful primarily in
|
|
error rules. *Note Error Recovery::.
|
|
|
|
`yyerrok;'
|
|
Resume generating error messages immediately for subsequent syntax
|
|
errors. This is useful primarily in error rules. *Note Error
|
|
Recovery::.
|
|
|
|
`@$'
|
|
Acts like a structure variable containing information on the
|
|
textual position of the grouping made by the current rule. *Note
|
|
Tracking Locations: Locations.
|
|
|
|
`@N'
|
|
Acts like a structure variable containing information on the
|
|
textual position of the Nth component of the current rule. *Note
|
|
Tracking Locations: Locations.
|
|
|
|
|
|
File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top
|
|
|
|
The Bison Parser Algorithm
|
|
**************************
|
|
|
|
As Bison reads tokens, it pushes them onto a stack along with their
|
|
semantic values. The stack is called the "parser stack". Pushing a
|
|
token is traditionally called "shifting".
|
|
|
|
For example, suppose the infix calculator has read `1 + 5 *', with a
|
|
`3' to come. The stack will have four elements, one for each token
|
|
that was shifted.
|
|
|
|
But the stack does not always have an element for each token read.
|
|
When the last N tokens and groupings shifted match the components of a
|
|
grammar rule, they can be combined according to that rule. This is
|
|
called "reduction". Those tokens and groupings are replaced on the
|
|
stack by a single grouping whose symbol is the result (left hand side)
|
|
of that rule. Running the rule's action is part of the process of
|
|
reduction, because this is what computes the semantic value of the
|
|
resulting grouping.
|
|
|
|
For example, if the infix calculator's parser stack contains this:
|
|
|
|
1 + 5 * 3
|
|
|
|
and the next input token is a newline character, then the last three
|
|
elements can be reduced to 15 via the rule:
|
|
|
|
expr: expr '*' expr;
|
|
|
|
Then the stack contains just these three elements:
|
|
|
|
1 + 15
|
|
|
|
At this point, another reduction can be made, resulting in the single
|
|
value 16. Then the newline token can be shifted.
|
|
|
|
The parser tries, by shifts and reductions, to reduce the entire
|
|
input down to a single grouping whose symbol is the grammar's
|
|
start-symbol (*note Languages and Context-Free Grammars: Language and
|
|
Grammar.).
|
|
|
|
This kind of parser is known in the literature as a bottom-up parser.
|
|
|
|
* Menu:
|
|
|
|
* Look-Ahead:: Parser looks one token ahead when deciding what to do.
|
|
* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
|
|
* Precedence:: Operator precedence works by resolving conflicts.
|
|
* Contextual Precedence:: When an operator's precedence depends on context.
|
|
* Parser States:: The parser is a finite-state-machine with stack.
|
|
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
|
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
|
* Stack Overflow:: What happens when stack gets full. How to avoid it.
|
|
|
|
|
|
File: bison.info, Node: Look-Ahead, Next: Shift/Reduce, Up: Algorithm
|
|
|
|
Look-Ahead Tokens
|
|
=================
|
|
|
|
The Bison parser does _not_ always reduce immediately as soon as the
|
|
last N tokens and groupings match a rule. This is because such a
|
|
simple strategy is inadequate to handle most languages. Instead, when a
|
|
reduction is possible, the parser sometimes "looks ahead" at the next
|
|
token in order to decide what to do.
|
|
|
|
When a token is read, it is not immediately shifted; first it
|
|
becomes the "look-ahead token", which is not on the stack. Now the
|
|
parser can perform one or more reductions of tokens and groupings on
|
|
the stack, while the look-ahead token remains off to the side. When no
|
|
more reductions should take place, the look-ahead token is shifted onto
|
|
the stack. This does not mean that all possible reductions have been
|
|
done; depending on the token type of the look-ahead token, some rules
|
|
may choose to delay their application.
|
|
|
|
Here is a simple case where look-ahead is needed. These three rules
|
|
define expressions which contain binary addition operators and postfix
|
|
unary factorial operators (`!'), and allow parentheses for grouping.
|
|
|
|
expr: term '+' expr
|
|
| term
|
|
;
|
|
|
|
term: '(' expr ')'
|
|
| term '!'
|
|
| NUMBER
|
|
;
|
|
|
|
Suppose that the tokens `1 + 2' have been read and shifted; what
|
|
should be done? If the following token is `)', then the first three
|
|
tokens must be reduced to form an `expr'. This is the only valid
|
|
course, because shifting the `)' would produce a sequence of symbols
|
|
`term ')'', and no rule allows this.
|
|
|
|
If the following token is `!', then it must be shifted immediately so
|
|
that `2 !' can be reduced to make a `term'. If instead the parser were
|
|
to reduce before shifting, `1 + 2' would become an `expr'. It would
|
|
then be impossible to shift the `!' because doing so would produce on
|
|
the stack the sequence of symbols `expr '!''. No rule allows that
|
|
sequence.
|
|
|
|
The current look-ahead token is stored in the variable `yychar'.
|
|
*Note Special Features for Use in Actions: Action Features.
|
|
|