mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-04-24 02:29:43 +00:00
1387 lines
50 KiB
Plaintext
1387 lines
50 KiB
Plaintext
Ceci est le fichier Info bison.info, produit par Makeinfo version 4.0b
|
|
à partir bison.texinfo.
|
|
|
|
START-INFO-DIR-ENTRY
|
|
* bison: (bison). GNU Project parser generator (yacc replacement).
|
|
END-INFO-DIR-ENTRY
|
|
|
|
This file documents the Bison parser generator.
|
|
|
|
Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
|
|
2000, 2001 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of this
|
|
manual provided the copyright notice and this permission notice are
|
|
preserved on all copies.
|
|
|
|
Permission is granted to copy and distribute modified versions of
|
|
this manual under the conditions for verbatim copying, provided also
|
|
that the sections entitled "GNU General Public License" and "Conditions
|
|
for Using Bison" are included exactly as in the original, and provided
|
|
that the entire resulting derived work is distributed under the terms
|
|
of a permission notice identical to this one.
|
|
|
|
Permission is granted to copy and distribute translations of this
|
|
manual into another language, under the above conditions for modified
|
|
versions, except that the sections entitled "GNU General Public
|
|
License", "Conditions for Using Bison" and this permission notice may be
|
|
included in translations approved by the Free Software Foundation
|
|
instead of in the original English.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Decls, Next: Rpcalc Rules, Up: RPN Calc
|
|
|
|
Declarations for `rpcalc'
|
|
-------------------------
|
|
|
|
Here are the C and Bison declarations for the reverse polish notation
|
|
calculator. As in C, comments are placed between `/*...*/'.
|
|
|
|
/* Reverse polish notation calculator. */
|
|
|
|
%{
|
|
#define YYSTYPE double
|
|
#include <math.h>
|
|
%}
|
|
|
|
%token NUM
|
|
|
|
%% /* Grammar rules and actions follow */
|
|
|
|
The C declarations section (*note The C Declarations Section: C
|
|
Declarations.) contains two preprocessor directives.
|
|
|
|
The `#define' directive defines the macro `YYSTYPE', thus specifying
|
|
the C data type for semantic values of both tokens and groupings (*note
|
|
Data Types of Semantic Values: Value Type.). The Bison parser will use
|
|
whatever type `YYSTYPE' is defined as; if you don't define it, `int' is
|
|
the default. Because we specify `double', each token and each
|
|
expression has an associated value, which is a floating point number.
|
|
|
|
The `#include' directive is used to declare the exponentiation
|
|
function `pow'.
|
|
|
|
The second section, Bison declarations, provides information to
|
|
Bison about the token types (*note The Bison Declarations Section:
|
|
Bison Declarations.). Each terminal symbol that is not a
|
|
single-character literal must be declared here. (Single-character
|
|
literals normally don't need to be declared.) In this example, all the
|
|
arithmetic operators are designated by single-character literals, so the
|
|
only terminal symbol that needs to be declared is `NUM', the token type
|
|
for numeric constants.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc
|
|
|
|
Grammar Rules for `rpcalc'
|
|
--------------------------
|
|
|
|
Here are the grammar rules for the reverse polish notation
|
|
calculator.
|
|
|
|
input: /* empty */
|
|
| input line
|
|
;
|
|
|
|
line: '\n'
|
|
| exp '\n' { printf ("\t%.10g\n", $1); }
|
|
;
|
|
|
|
exp: NUM { $$ = $1; }
|
|
| exp exp '+' { $$ = $1 + $2; }
|
|
| exp exp '-' { $$ = $1 - $2; }
|
|
| exp exp '*' { $$ = $1 * $2; }
|
|
| exp exp '/' { $$ = $1 / $2; }
|
|
/* Exponentiation */
|
|
| exp exp '^' { $$ = pow ($1, $2); }
|
|
/* Unary minus */
|
|
| exp 'n' { $$ = -$1; }
|
|
;
|
|
%%
|
|
|
|
The groupings of the rpcalc "language" defined here are the
|
|
expression (given the name `exp'), the line of input (`line'), and the
|
|
complete input transcript (`input'). Each of these nonterminal symbols
|
|
has several alternate rules, joined by the `|' punctuator which is read
|
|
as "or". The following sections explain what these rules mean.
|
|
|
|
The semantics of the language is determined by the actions taken
|
|
when a grouping is recognized. The actions are the C code that appears
|
|
inside braces. *Note Actions::.
|
|
|
|
You must specify these actions in C, but Bison provides the means for
|
|
passing semantic values between the rules. In each action, the
|
|
pseudo-variable `$$' stands for the semantic value for the grouping
|
|
that the rule is going to construct. Assigning a value to `$$' is the
|
|
main job of most actions. The semantic values of the components of the
|
|
rule are referred to as `$1', `$2', and so on.
|
|
|
|
* Menu:
|
|
|
|
* Rpcalc Input::
|
|
* Rpcalc Line::
|
|
* Rpcalc Expr::
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules
|
|
|
|
Explanation of `input'
|
|
......................
|
|
|
|
Consider the definition of `input':
|
|
|
|
input: /* empty */
|
|
| input line
|
|
;
|
|
|
|
This definition reads as follows: "A complete input is either an
|
|
empty string, or a complete input followed by an input line". Notice
|
|
that "complete input" is defined in terms of itself. This definition
|
|
is said to be "left recursive" since `input' appears always as the
|
|
leftmost symbol in the sequence. *Note Recursive Rules: Recursion.
|
|
|
|
The first alternative is empty because there are no symbols between
|
|
the colon and the first `|'; this means that `input' can match an empty
|
|
string of input (no tokens). We write the rules this way because it is
|
|
legitimate to type `Ctrl-d' right after you start the calculator. It's
|
|
conventional to put an empty alternative first and write the comment
|
|
`/* empty */' in it.
|
|
|
|
The second alternate rule (`input line') handles all nontrivial
|
|
input. It means, "After reading any number of lines, read one more
|
|
line if possible." The left recursion makes this rule into a loop.
|
|
Since the first alternative matches empty input, the loop can be
|
|
executed zero or more times.
|
|
|
|
The parser function `yyparse' continues to process input until a
|
|
grammatical error is seen or the lexical analyzer says there are no more
|
|
input tokens; we will arrange for the latter to happen at end of file.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules
|
|
|
|
Explanation of `line'
|
|
.....................
|
|
|
|
Now consider the definition of `line':
|
|
|
|
line: '\n'
|
|
| exp '\n' { printf ("\t%.10g\n", $1); }
|
|
;
|
|
|
|
The first alternative is a token which is a newline character; this
|
|
means that rpcalc accepts a blank line (and ignores it, since there is
|
|
no action). The second alternative is an expression followed by a
|
|
newline. This is the alternative that makes rpcalc useful. The
|
|
semantic value of the `exp' grouping is the value of `$1' because the
|
|
`exp' in question is the first symbol in the alternative. The action
|
|
prints this value, which is the result of the computation the user
|
|
asked for.
|
|
|
|
This action is unusual because it does not assign a value to `$$'.
|
|
As a consequence, the semantic value associated with the `line' is
|
|
uninitialized (its value will be unpredictable). This would be a bug if
|
|
that value were ever used, but we don't use it: once rpcalc has printed
|
|
the value of the user's input line, that value is no longer needed.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules
|
|
|
|
Explanation of `expr'
|
|
.....................
|
|
|
|
The `exp' grouping has several rules, one for each kind of
|
|
expression. The first rule handles the simplest expressions: those
|
|
that are just numbers. The second handles an addition-expression,
|
|
which looks like two expressions followed by a plus-sign. The third
|
|
handles subtraction, and so on.
|
|
|
|
exp: NUM
|
|
| exp exp '+' { $$ = $1 + $2; }
|
|
| exp exp '-' { $$ = $1 - $2; }
|
|
...
|
|
;
|
|
|
|
We have used `|' to join all the rules for `exp', but we could
|
|
equally well have written them separately:
|
|
|
|
exp: NUM ;
|
|
exp: exp exp '+' { $$ = $1 + $2; } ;
|
|
exp: exp exp '-' { $$ = $1 - $2; } ;
|
|
...
|
|
|
|
Most of the rules have actions that compute the value of the
|
|
expression in terms of the value of its parts. For example, in the
|
|
rule for addition, `$1' refers to the first component `exp' and `$2'
|
|
refers to the second one. The third component, `'+'', has no meaningful
|
|
associated semantic value, but if it had one you could refer to it as
|
|
`$3'. When `yyparse' recognizes a sum expression using this rule, the
|
|
sum of the two subexpressions' values is produced as the value of the
|
|
entire expression. *Note Actions::.
|
|
|
|
You don't have to give an action for every rule. When a rule has no
|
|
action, Bison by default copies the value of `$1' into `$$'. This is
|
|
what happens in the first rule (the one that uses `NUM').
|
|
|
|
The formatting shown here is the recommended convention, but Bison
|
|
does not require it. You can add or change whitespace as much as you
|
|
wish. For example, this:
|
|
|
|
exp : NUM | exp exp '+' {$$ = $1 + $2; } | ...
|
|
|
|
means the same thing as this:
|
|
|
|
exp: NUM
|
|
| exp exp '+' { $$ = $1 + $2; }
|
|
| ...
|
|
|
|
The latter, however, is much more readable.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc
|
|
|
|
The `rpcalc' Lexical Analyzer
|
|
-----------------------------
|
|
|
|
The lexical analyzer's job is low-level parsing: converting
|
|
characters or sequences of characters into tokens. The Bison parser
|
|
gets its tokens by calling the lexical analyzer. *Note The Lexical
|
|
Analyzer Function `yylex': Lexical.
|
|
|
|
Only a simple lexical analyzer is needed for the RPN calculator.
|
|
This lexical analyzer skips blanks and tabs, then reads in numbers as
|
|
`double' and returns them as `NUM' tokens. Any other character that
|
|
isn't part of a number is a separate token. Note that the token-code
|
|
for such a single-character token is the character itself.
|
|
|
|
The return value of the lexical analyzer function is a numeric code
|
|
which represents a token type. The same text used in Bison rules to
|
|
stand for this token type is also a C expression for the numeric code
|
|
for the type. This works in two ways. If the token type is a
|
|
character literal, then its numeric code is the ASCII code for that
|
|
character; you can use the same character literal in the lexical
|
|
analyzer to express the number. If the token type is an identifier,
|
|
that identifier is defined by Bison as a C macro whose definition is
|
|
the appropriate number. In this example, therefore, `NUM' becomes a
|
|
macro for `yylex' to use.
|
|
|
|
The semantic value of the token (if it has one) is stored into the
|
|
global variable `yylval', which is where the Bison parser will look for
|
|
it. (The C data type of `yylval' is `YYSTYPE', which was defined at
|
|
the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc
|
|
Decls..)
|
|
|
|
A token type code of zero is returned if the end-of-file is
|
|
encountered. (Bison recognizes any nonpositive value as indicating the
|
|
end of the input.)
|
|
|
|
Here is the code for the lexical analyzer:
|
|
|
|
/* Lexical analyzer returns a double floating point
|
|
number on the stack and the token NUM, or the ASCII
|
|
character read if not a number. Skips all blanks
|
|
and tabs, returns 0 for EOF. */
|
|
|
|
#include <ctype.h>
|
|
|
|
int
|
|
yylex (void)
|
|
{
|
|
int c;
|
|
|
|
/* skip white space */
|
|
while ((c = getchar ()) == ' ' || c == '\t')
|
|
;
|
|
/* process numbers */
|
|
if (c == '.' || isdigit (c))
|
|
{
|
|
ungetc (c, stdin);
|
|
scanf ("%lf", &yylval);
|
|
return NUM;
|
|
}
|
|
/* return end-of-file */
|
|
if (c == EOF)
|
|
return 0;
|
|
/* return single chars */
|
|
return c;
|
|
}
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc
|
|
|
|
The Controlling Function
|
|
------------------------
|
|
|
|
In keeping with the spirit of this example, the controlling function
|
|
is kept to the bare minimum. The only requirement is that it call
|
|
`yyparse' to start the process of parsing.
|
|
|
|
int
|
|
main (void)
|
|
{
|
|
return yyparse ();
|
|
}
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc
|
|
|
|
The Error Reporting Routine
|
|
---------------------------
|
|
|
|
When `yyparse' detects a syntax error, it calls the error reporting
|
|
function `yyerror' to print an error message (usually but not always
|
|
`"parse error"'). It is up to the programmer to supply `yyerror'
|
|
(*note Parser C-Language Interface: Interface.), so here is the
|
|
definition we will use:
|
|
|
|
#include <stdio.h>
|
|
|
|
void
|
|
yyerror (const char *s) /* Called by yyparse on error */
|
|
{
|
|
printf ("%s\n", s);
|
|
}
|
|
|
|
After `yyerror' returns, the Bison parser may recover from the error
|
|
and continue parsing if the grammar contains a suitable error rule
|
|
(*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We
|
|
have not written any error rules in this example, so any invalid input
|
|
will cause the calculator program to exit. This is not clean behavior
|
|
for a real calculator, but it is adequate for the first example.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc
|
|
|
|
Running Bison to Make the Parser
|
|
--------------------------------
|
|
|
|
Before running Bison to produce a parser, we need to decide how to
|
|
arrange all the source code in one or more source files. For such a
|
|
simple example, the easiest thing is to put everything in one file. The
|
|
definitions of `yylex', `yyerror' and `main' go at the end, in the
|
|
"additional C code" section of the file (*note The Overall Layout of a
|
|
Bison Grammar: Grammar Layout.).
|
|
|
|
For a large project, you would probably have several source files,
|
|
and use `make' to arrange to recompile them.
|
|
|
|
With all the source in a single file, you use the following command
|
|
to convert it into a parser file:
|
|
|
|
bison FILE_NAME.y
|
|
|
|
In this example the file was called `rpcalc.y' (for "Reverse Polish
|
|
CALCulator"). Bison produces a file named `FILE_NAME.tab.c', removing
|
|
the `.y' from the original file name. The file output by Bison contains
|
|
the source code for `yyparse'. The additional functions in the input
|
|
file (`yylex', `yyerror' and `main') are copied verbatim to the output.
|
|
|
|
|
|
File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc
|
|
|
|
Compiling the Parser File
|
|
-------------------------
|
|
|
|
Here is how to compile and run the parser file:
|
|
|
|
# List files in current directory.
|
|
% ls
|
|
rpcalc.tab.c rpcalc.y
|
|
|
|
# Compile the Bison parser.
|
|
# `-lm' tells compiler to search math library for `pow'.
|
|
% cc rpcalc.tab.c -lm -o rpcalc
|
|
|
|
# List files again.
|
|
% ls
|
|
rpcalc rpcalc.tab.c rpcalc.y
|
|
|
|
The file `rpcalc' now contains the executable code. Here is an
|
|
example session using `rpcalc'.
|
|
|
|
% rpcalc
|
|
4 9 +
|
|
13
|
|
3 7 + 3 4 5 *+-
|
|
-13
|
|
3 7 + 3 4 5 * + - n Note the unary minus, `n'
|
|
13
|
|
5 6 / 4 n +
|
|
-3.166666667
|
|
3 4 ^ Exponentiation
|
|
81
|
|
^D End-of-file indicator
|
|
%
|
|
|
|
|
|
File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples
|
|
|
|
Infix Notation Calculator: `calc'
|
|
=================================
|
|
|
|
We now modify rpcalc to handle infix operators instead of postfix.
|
|
Infix notation involves the concept of operator precedence and the need
|
|
for parentheses nested to arbitrary depth. Here is the Bison code for
|
|
`calc.y', an infix desk-top calculator.
|
|
|
|
/* Infix notation calculator--calc */
|
|
|
|
%{
|
|
#define YYSTYPE double
|
|
#include <math.h>
|
|
%}
|
|
|
|
/* BISON Declarations */
|
|
%token NUM
|
|
%left '-' '+'
|
|
%left '*' '/'
|
|
%left NEG /* negation--unary minus */
|
|
%right '^' /* exponentiation */
|
|
|
|
/* Grammar follows */
|
|
%%
|
|
input: /* empty string */
|
|
| input line
|
|
;
|
|
|
|
line: '\n'
|
|
| exp '\n' { printf ("\t%.10g\n", $1); }
|
|
;
|
|
|
|
exp: NUM { $$ = $1; }
|
|
| exp '+' exp { $$ = $1 + $3; }
|
|
| exp '-' exp { $$ = $1 - $3; }
|
|
| exp '*' exp { $$ = $1 * $3; }
|
|
| exp '/' exp { $$ = $1 / $3; }
|
|
| '-' exp %prec NEG { $$ = -$2; }
|
|
| exp '^' exp { $$ = pow ($1, $3); }
|
|
| '(' exp ')' { $$ = $2; }
|
|
;
|
|
%%
|
|
|
|
The functions `yylex', `yyerror' and `main' can be the same as before.
|
|
|
|
There are two important new features shown in this code.
|
|
|
|
In the second section (Bison declarations), `%left' declares token
|
|
types and says they are left-associative operators. The declarations
|
|
`%left' and `%right' (right associativity) take the place of `%token'
|
|
which is used to declare a token type name without associativity.
|
|
(These tokens are single-character literals, which ordinarily don't
|
|
need to be declared. We declare them here to specify the
|
|
associativity.)
|
|
|
|
Operator precedence is determined by the line ordering of the
|
|
declarations; the higher the line number of the declaration (lower on
|
|
the page or screen), the higher the precedence. Hence, exponentiation
|
|
has the highest precedence, unary minus (`NEG') is next, followed by
|
|
`*' and `/', and so on. *Note Operator Precedence: Precedence.
|
|
|
|
The other important new feature is the `%prec' in the grammar section
|
|
for the unary minus operator. The `%prec' simply instructs Bison that
|
|
the rule `| '-' exp' has the same precedence as `NEG'--in this case the
|
|
next-to-highest. *Note Context-Dependent Precedence: Contextual
|
|
Precedence.
|
|
|
|
Here is a sample run of `calc.y':
|
|
|
|
% calc
|
|
4 + 4.5 - (34/(8*3+-3))
|
|
6.880952381
|
|
-56 + 2
|
|
-54
|
|
3 ^ 2
|
|
9
|
|
|
|
|
|
File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples
|
|
|
|
Simple Error Recovery
|
|
=====================
|
|
|
|
Up to this point, this manual has not addressed the issue of "error
|
|
recovery"--how to continue parsing after the parser detects a syntax
|
|
error. All we have handled is error reporting with `yyerror'. Recall
|
|
that by default `yyparse' returns after calling `yyerror'. This means
|
|
that an erroneous input line causes the calculator program to exit.
|
|
Now we show how to rectify this deficiency.
|
|
|
|
The Bison language itself includes the reserved word `error', which
|
|
may be included in the grammar rules. In the example below it has been
|
|
added to one of the alternatives for `line':
|
|
|
|
line: '\n'
|
|
| exp '\n' { printf ("\t%.10g\n", $1); }
|
|
| error '\n' { yyerrok; }
|
|
;
|
|
|
|
This addition to the grammar allows for simple error recovery in the
|
|
event of a parse error. If an expression that cannot be evaluated is
|
|
read, the error will be recognized by the third rule for `line', and
|
|
parsing will continue. (The `yyerror' function is still called upon to
|
|
print its message as well.) The action executes the statement
|
|
`yyerrok', a macro defined automatically by Bison; its meaning is that
|
|
error recovery is complete (*note Error Recovery::). Note the
|
|
difference between `yyerrok' and `yyerror'; neither one is a misprint.
|
|
|
|
This form of error recovery deals with syntax errors. There are
|
|
other kinds of errors; for example, division by zero, which raises an
|
|
exception signal that is normally fatal. A real calculator program
|
|
must handle this signal and use `longjmp' to return to `main' and
|
|
resume parsing input lines; it would also have to discard the rest of
|
|
the current line of input. We won't discuss this issue further because
|
|
it is not specific to Bison programs.
|
|
|
|
|
|
File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples
|
|
|
|
Multi-Function Calculator: `mfcalc'
|
|
===================================
|
|
|
|
Now that the basics of Bison have been discussed, it is time to move
|
|
on to a more advanced problem. The above calculators provided only five
|
|
functions, `+', `-', `*', `/' and `^'. It would be nice to have a
|
|
calculator that provides other mathematical functions such as `sin',
|
|
`cos', etc.
|
|
|
|
It is easy to add new operators to the infix calculator as long as
|
|
they are only single-character literals. The lexical analyzer `yylex'
|
|
passes back all nonnumber characters as tokens, so new grammar rules
|
|
suffice for adding a new operator. But we want something more
|
|
flexible: built-in functions whose syntax has this form:
|
|
|
|
FUNCTION_NAME (ARGUMENT)
|
|
|
|
At the same time, we will add memory to the calculator, by allowing you
|
|
to create named variables, store values in them, and use them later.
|
|
Here is a sample session with the multi-function calculator:
|
|
|
|
% mfcalc
|
|
pi = 3.141592653589
|
|
3.1415926536
|
|
sin(pi)
|
|
0.0000000000
|
|
alpha = beta1 = 2.3
|
|
2.3000000000
|
|
alpha
|
|
2.3000000000
|
|
ln(alpha)
|
|
0.8329091229
|
|
exp(ln(beta1))
|
|
2.3000000000
|
|
%
|
|
|
|
Note that multiple assignment and nested function calls are
|
|
permitted.
|
|
|
|
* Menu:
|
|
|
|
* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
|
|
* Rules: Mfcalc Rules. Grammar rules for the calculator.
|
|
* Symtab: Mfcalc Symtab. Symbol table management subroutines.
|
|
|
|
|
|
File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc
|
|
|
|
Declarations for `mfcalc'
|
|
-------------------------
|
|
|
|
Here are the C and Bison declarations for the multi-function
|
|
calculator.
|
|
|
|
%{
|
|
#include <math.h> /* For math functions, cos(), sin(), etc. */
|
|
#include "calc.h" /* Contains definition of `symrec' */
|
|
%}
|
|
%union {
|
|
double val; /* For returning numbers. */
|
|
symrec *tptr; /* For returning symbol-table pointers */
|
|
}
|
|
|
|
%token <val> NUM /* Simple double precision number */
|
|
%token <tptr> VAR FNCT /* Variable and Function */
|
|
%type <val> exp
|
|
|
|
%right '='
|
|
%left '-' '+'
|
|
%left '*' '/'
|
|
%left NEG /* Negation--unary minus */
|
|
%right '^' /* Exponentiation */
|
|
|
|
/* Grammar follows */
|
|
|
|
%%
|
|
|
|
The above grammar introduces only two new features of the Bison
|
|
language. These features allow semantic values to have various data
|
|
types (*note More Than One Value Type: Multiple Types.).
|
|
|
|
The `%union' declaration specifies the entire list of possible types;
|
|
this is instead of defining `YYSTYPE'. The allowable types are now
|
|
double-floats (for `exp' and `NUM') and pointers to entries in the
|
|
symbol table. *Note The Collection of Value Types: Union Decl.
|
|
|
|
Since values can now have various types, it is necessary to
|
|
associate a type with each grammar symbol whose semantic value is used.
|
|
These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations
|
|
are augmented with information about their data type (placed between
|
|
angle brackets).
|
|
|
|
The Bison construct `%type' is used for declaring nonterminal
|
|
symbols, just as `%token' is used for declaring token types. We have
|
|
not used `%type' before because nonterminal symbols are normally
|
|
declared implicitly by the rules that define them. But `exp' must be
|
|
declared explicitly so we can specify its value type. *Note
|
|
Nonterminal Symbols: Type Decl.
|
|
|
|
|
|
File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc
|
|
|
|
Grammar Rules for `mfcalc'
|
|
--------------------------
|
|
|
|
Here are the grammar rules for the multi-function calculator. Most
|
|
of them are copied directly from `calc'; three rules, those which
|
|
mention `VAR' or `FNCT', are new.
|
|
|
|
input: /* empty */
|
|
| input line
|
|
;
|
|
|
|
line:
|
|
'\n'
|
|
| exp '\n' { printf ("\t%.10g\n", $1); }
|
|
| error '\n' { yyerrok; }
|
|
;
|
|
|
|
exp: NUM { $$ = $1; }
|
|
| VAR { $$ = $1->value.var; }
|
|
| VAR '=' exp { $$ = $3; $1->value.var = $3; }
|
|
| FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }
|
|
| exp '+' exp { $$ = $1 + $3; }
|
|
| exp '-' exp { $$ = $1 - $3; }
|
|
| exp '*' exp { $$ = $1 * $3; }
|
|
| exp '/' exp { $$ = $1 / $3; }
|
|
| '-' exp %prec NEG { $$ = -$2; }
|
|
| exp '^' exp { $$ = pow ($1, $3); }
|
|
| '(' exp ')' { $$ = $2; }
|
|
;
|
|
/* End of grammar */
|
|
%%
|
|
|
|
|
|
File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc
|
|
|
|
The `mfcalc' Symbol Table
|
|
-------------------------
|
|
|
|
The multi-function calculator requires a symbol table to keep track
|
|
of the names and meanings of variables and functions. This doesn't
|
|
affect the grammar rules (except for the actions) or the Bison
|
|
declarations, but it requires some additional C functions for support.
|
|
|
|
The symbol table itself consists of a linked list of records. Its
|
|
definition, which is kept in the header `calc.h', is as follows. It
|
|
provides for either functions or variables to be placed in the table.
|
|
|
|
/* Fonctions type. */
|
|
typedef double (*func_t) (double);
|
|
|
|
/* Data type for links in the chain of symbols. */
|
|
struct symrec
|
|
{
|
|
char *name; /* name of symbol */
|
|
int type; /* type of symbol: either VAR or FNCT */
|
|
union
|
|
{
|
|
double var; /* value of a VAR */
|
|
func_t fnctptr; /* value of a FNCT */
|
|
} value;
|
|
struct symrec *next; /* link field */
|
|
};
|
|
|
|
typedef struct symrec symrec;
|
|
|
|
/* The symbol table: a chain of `struct symrec'. */
|
|
extern symrec *sym_table;
|
|
|
|
symrec *putsym (const char *, func_t);
|
|
symrec *getsym (const char *);
|
|
|
|
The new version of `main' includes a call to `init_table', a
|
|
function that initializes the symbol table. Here it is, and
|
|
`init_table' as well:
|
|
|
|
#include <stdio.h>
|
|
|
|
int
|
|
main (void)
|
|
{
|
|
init_table ();
|
|
return yyparse ();
|
|
}
|
|
|
|
void
|
|
yyerror (const char *s) /* Called by yyparse on error */
|
|
{
|
|
printf ("%s\n", s);
|
|
}
|
|
|
|
struct init
|
|
{
|
|
char *fname;
|
|
double (*fnct)(double);
|
|
};
|
|
|
|
struct init arith_fncts[] =
|
|
{
|
|
"sin", sin,
|
|
"cos", cos,
|
|
"atan", atan,
|
|
"ln", log,
|
|
"exp", exp,
|
|
"sqrt", sqrt,
|
|
0, 0
|
|
};
|
|
|
|
/* The symbol table: a chain of `struct symrec'. */
|
|
symrec *sym_table = (symrec *) 0;
|
|
|
|
/* Put arithmetic functions in table. */
|
|
void
|
|
init_table (void)
|
|
{
|
|
int i;
|
|
symrec *ptr;
|
|
for (i = 0; arith_fncts[i].fname != 0; i++)
|
|
{
|
|
ptr = putsym (arith_fncts[i].fname, FNCT);
|
|
ptr->value.fnctptr = arith_fncts[i].fnct;
|
|
}
|
|
}
|
|
|
|
By simply editing the initialization list and adding the necessary
|
|
include files, you can add additional functions to the calculator.
|
|
|
|
Two important functions allow look-up and installation of symbols in
|
|
the symbol table. The function `putsym' is passed a name and the type
|
|
(`VAR' or `FNCT') of the object to be installed. The object is linked
|
|
to the front of the list, and a pointer to the object is returned. The
|
|
function `getsym' is passed the name of the symbol to look up. If
|
|
found, a pointer to that symbol is returned; otherwise zero is returned.
|
|
|
|
symrec *
|
|
putsym (char *sym_name, int sym_type)
|
|
{
|
|
symrec *ptr;
|
|
ptr = (symrec *) malloc (sizeof (symrec));
|
|
ptr->name = (char *) malloc (strlen (sym_name) + 1);
|
|
strcpy (ptr->name,sym_name);
|
|
ptr->type = sym_type;
|
|
ptr->value.var = 0; /* set value to 0 even if fctn. */
|
|
ptr->next = (struct symrec *)sym_table;
|
|
sym_table = ptr;
|
|
return ptr;
|
|
}
|
|
|
|
symrec *
|
|
getsym (const char *sym_name)
|
|
{
|
|
symrec *ptr;
|
|
for (ptr = sym_table; ptr != (symrec *) 0;
|
|
ptr = (symrec *)ptr->next)
|
|
if (strcmp (ptr->name,sym_name) == 0)
|
|
return ptr;
|
|
return 0;
|
|
}
|
|
|
|
The function `yylex' must now recognize variables, numeric values,
|
|
and the single-character arithmetic operators. Strings of alphanumeric
|
|
characters with a leading non-digit are recognized as either variables
|
|
or functions depending on what the symbol table says about them.
|
|
|
|
The string is passed to `getsym' for look up in the symbol table. If
|
|
the name appears in the table, a pointer to its location and its type
|
|
(`VAR' or `FNCT') is returned to `yyparse'. If it is not already in
|
|
the table, then it is installed as a `VAR' using `putsym'. Again, a
|
|
pointer and its type (which must be `VAR') is returned to `yyparse'.
|
|
|
|
No change is needed in the handling of numeric values and arithmetic
|
|
operators in `yylex'.
|
|
|
|
#include <ctype.h>
|
|
|
|
int
|
|
yylex (void)
|
|
{
|
|
int c;
|
|
|
|
/* Ignore whitespace, get first nonwhite character. */
|
|
while ((c = getchar ()) == ' ' || c == '\t');
|
|
|
|
if (c == EOF)
|
|
return 0;
|
|
|
|
/* Char starts a number => parse the number. */
|
|
if (c == '.' || isdigit (c))
|
|
{
|
|
ungetc (c, stdin);
|
|
scanf ("%lf", &yylval.val);
|
|
return NUM;
|
|
}
|
|
|
|
/* Char starts an identifier => read the name. */
|
|
if (isalpha (c))
|
|
{
|
|
symrec *s;
|
|
static char *symbuf = 0;
|
|
static int length = 0;
|
|
int i;
|
|
|
|
/* Initially make the buffer long enough
|
|
for a 40-character symbol name. */
|
|
if (length == 0)
|
|
length = 40, symbuf = (char *)malloc (length + 1);
|
|
|
|
i = 0;
|
|
do
|
|
{
|
|
/* If buffer is full, make it bigger. */
|
|
if (i == length)
|
|
{
|
|
length *= 2;
|
|
symbuf = (char *)realloc (symbuf, length + 1);
|
|
}
|
|
/* Add this character to the buffer. */
|
|
symbuf[i++] = c;
|
|
/* Get another character. */
|
|
c = getchar ();
|
|
}
|
|
while (c != EOF && isalnum (c));
|
|
|
|
ungetc (c, stdin);
|
|
symbuf[i] = '\0';
|
|
|
|
s = getsym (symbuf);
|
|
if (s == 0)
|
|
s = putsym (symbuf, VAR);
|
|
yylval.tptr = s;
|
|
return s->type;
|
|
}
|
|
|
|
/* Any other character is a token by itself. */
|
|
return c;
|
|
}
|
|
|
|
This program is both powerful and flexible. You may easily add new
|
|
functions, and it is a simple job to modify this code to install
|
|
predefined variables such as `pi' or `e' as well.
|
|
|
|
|
|
File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples
|
|
|
|
Exercises
|
|
=========
|
|
|
|
1. Add some new functions from `math.h' to the initialization list.
|
|
|
|
2. Add another array that contains constants and their values. Then
|
|
modify `init_table' to add these constants to the symbol table.
|
|
It will be easiest to give the constants type `VAR'.
|
|
|
|
3. Make the program report an error if the user refers to an
|
|
uninitialized variable in any way except to store a value in it.
|
|
|
|
|
|
File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top
|
|
|
|
Bison Grammar Files
|
|
*******************
|
|
|
|
Bison takes as input a context-free grammar specification and
|
|
produces a C-language function that recognizes correct instances of the
|
|
grammar.
|
|
|
|
The Bison grammar input file conventionally has a name ending in
|
|
`.y'. *Note Invoking Bison: Invocation.
|
|
|
|
* Menu:
|
|
|
|
* Grammar Outline:: Overall layout of the grammar file.
|
|
* Symbols:: Terminal and nonterminal symbols.
|
|
* Rules:: How to write grammar rules.
|
|
* Recursion:: Writing recursive rules.
|
|
* Semantics:: Semantic values and actions.
|
|
* Locations:: Locations and actions.
|
|
* Declarations:: All kinds of Bison declarations are described here.
|
|
* Multiple Parsers:: Putting more than one Bison parser in one program.
|
|
|
|
|
|
File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File
|
|
|
|
Outline of a Bison Grammar
|
|
==========================
|
|
|
|
A Bison grammar file has four main sections, shown here with the
|
|
appropriate delimiters:
|
|
|
|
%{
|
|
C DECLARATIONS
|
|
%}
|
|
|
|
BISON DECLARATIONS
|
|
|
|
%%
|
|
GRAMMAR RULES
|
|
%%
|
|
|
|
ADDITIONAL C CODE
|
|
|
|
Comments enclosed in `/* ... */' may appear in any of the sections.
|
|
|
|
* Menu:
|
|
|
|
* C Declarations:: Syntax and usage of the C declarations section.
|
|
* Bison Declarations:: Syntax and usage of the Bison declarations section.
|
|
* Grammar Rules:: Syntax and usage of the grammar rules section.
|
|
* C Code:: Syntax and usage of the additional C code section.
|
|
|
|
|
|
File: bison.info, Node: C Declarations, Next: Bison Declarations, Up: Grammar Outline
|
|
|
|
The C Declarations Section
|
|
--------------------------
|
|
|
|
The C DECLARATIONS section contains macro definitions and
|
|
declarations of functions and variables that are used in the actions in
|
|
the grammar rules. These are copied to the beginning of the parser
|
|
file so that they precede the definition of `yyparse'. You can use
|
|
`#include' to get the declarations from a header file. If you don't
|
|
need any C declarations, you may omit the `%{' and `%}' delimiters that
|
|
bracket this section.
|
|
|
|
|
|
File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline
|
|
|
|
The Bison Declarations Section
|
|
------------------------------
|
|
|
|
The BISON DECLARATIONS section contains declarations that define
|
|
terminal and nonterminal symbols, specify precedence, and so on. In
|
|
some simple grammars you may not need any declarations. *Note Bison
|
|
Declarations: Declarations.
|
|
|
|
|
|
File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline
|
|
|
|
The Grammar Rules Section
|
|
-------------------------
|
|
|
|
The "grammar rules" section contains one or more Bison grammar
|
|
rules, and nothing else. *Note Syntax of Grammar Rules: Rules.
|
|
|
|
There must always be at least one grammar rule, and the first `%%'
|
|
(which precedes the grammar rules) may never be omitted even if it is
|
|
the first thing in the file.
|
|
|
|
|
|
File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline
|
|
|
|
The Additional C Code Section
|
|
-----------------------------
|
|
|
|
The ADDITIONAL C CODE section is copied verbatim to the end of the
|
|
parser file, just as the C DECLARATIONS section is copied to the
|
|
beginning. This is the most convenient place to put anything that you
|
|
want to have in the parser file but which need not come before the
|
|
definition of `yyparse'. For example, the definitions of `yylex' and
|
|
`yyerror' often go here. *Note Parser C-Language Interface: Interface.
|
|
|
|
If the last section is empty, you may omit the `%%' that separates it
|
|
from the grammar rules.
|
|
|
|
The Bison parser itself contains many static variables whose names
|
|
start with `yy' and many macros whose names start with `YY'. It is a
|
|
good idea to avoid using any such names (except those documented in this
|
|
manual) in the additional C code section of the grammar file.
|
|
|
|
|
|
File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File
|
|
|
|
Symbols, Terminal and Nonterminal
|
|
=================================
|
|
|
|
"Symbols" in Bison grammars represent the grammatical classifications
|
|
of the language.
|
|
|
|
A "terminal symbol" (also known as a "token type") represents a
|
|
class of syntactically equivalent tokens. You use the symbol in grammar
|
|
rules to mean that a token in that class is allowed. The symbol is
|
|
represented in the Bison parser by a numeric code, and the `yylex'
|
|
function returns a token type code to indicate what kind of token has
|
|
been read. You don't need to know what the code value is; you can use
|
|
the symbol to stand for it.
|
|
|
|
A "nonterminal symbol" stands for a class of syntactically equivalent
|
|
groupings. The symbol name is used in writing grammar rules. By
|
|
convention, it should be all lower case.
|
|
|
|
Symbol names can contain letters, digits (not at the beginning),
|
|
underscores and periods. Periods make sense only in nonterminals.
|
|
|
|
There are three ways of writing terminal symbols in the grammar:
|
|
|
|
* A "named token type" is written with an identifier, like an
|
|
identifier in C. By convention, it should be all upper case. Each
|
|
such name must be defined with a Bison declaration such as
|
|
`%token'. *Note Token Type Names: Token Decl.
|
|
|
|
* A "character token type" (or "literal character token") is written
|
|
in the grammar using the same syntax used in C for character
|
|
constants; for example, `'+'' is a character token type. A
|
|
character token type doesn't need to be declared unless you need to
|
|
specify its semantic value data type (*note Data Types of Semantic
|
|
Values: Value Type.), associativity, or precedence (*note Operator
|
|
Precedence: Precedence.).
|
|
|
|
By convention, a character token type is used only to represent a
|
|
token that consists of that particular character. Thus, the token
|
|
type `'+'' is used to represent the character `+' as a token.
|
|
Nothing enforces this convention, but if you depart from it, your
|
|
program will confuse other readers.
|
|
|
|
All the usual escape sequences used in character literals in C can
|
|
be used in Bison as well, but you must not use the null character
|
|
as a character literal because its ASCII code, zero, is the code
|
|
`yylex' returns for end-of-input (*note Calling Convention for
|
|
`yylex': Calling Convention.).
|
|
|
|
* A "literal string token" is written like a C string constant; for
|
|
example, `"<="' is a literal string token. A literal string token
|
|
doesn't need to be declared unless you need to specify its semantic
|
|
value data type (*note Value Type::), associativity, or precedence
|
|
(*note Precedence::).
|
|
|
|
You can associate the literal string token with a symbolic name as
|
|
an alias, using the `%token' declaration (*note Token
|
|
Declarations: Token Decl.). If you don't do that, the lexical
|
|
analyzer has to retrieve the token number for the literal string
|
|
token from the `yytname' table (*note Calling Convention::).
|
|
|
|
*WARNING*: literal string tokens do not work in Yacc.
|
|
|
|
By convention, a literal string token is used only to represent a
|
|
token that consists of that particular string. Thus, you should
|
|
use the token type `"<="' to represent the string `<=' as a token.
|
|
Bison does not enforce this convention, but if you depart from
|
|
it, people who read your program will be confused.
|
|
|
|
All the escape sequences used in string literals in C can be used
|
|
in Bison as well. A literal string token must contain two or more
|
|
characters; for a token containing just one character, use a
|
|
character token (see above).
|
|
|
|
How you choose to write a terminal symbol has no effect on its
|
|
grammatical meaning. That depends only on where it appears in rules and
|
|
on when the parser function returns that symbol.
|
|
|
|
The value returned by `yylex' is always one of the terminal symbols
|
|
(or 0 for end-of-input). Whichever way you write the token type in the
|
|
grammar rules, you write it the same way in the definition of `yylex'.
|
|
The numeric code for a character token type is simply the ASCII code for
|
|
the character, so `yylex' can use the identical character constant to
|
|
generate the requisite code. Each named token type becomes a C macro in
|
|
the parser file, so `yylex' can use the name to stand for the code.
|
|
(This is why periods don't make sense in terminal symbols.) *Note
|
|
Calling Convention for `yylex': Calling Convention.
|
|
|
|
If `yylex' is defined in a separate file, you need to arrange for the
|
|
token-type macro definitions to be available there. Use the `-d'
|
|
option when you run Bison, so that it will write these macro definitions
|
|
into a separate header file `NAME.tab.h' which you can include in the
|
|
other source files that need it. *Note Invoking Bison: Invocation.
|
|
|
|
The symbol `error' is a terminal symbol reserved for error recovery
|
|
(*note Error Recovery::); you shouldn't use it for any other purpose.
|
|
In particular, `yylex' should never return this value.
|
|
|
|
|
|
File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File
|
|
|
|
Syntax of Grammar Rules
|
|
=======================
|
|
|
|
A Bison grammar rule has the following general form:
|
|
|
|
RESULT: COMPONENTS...
|
|
;
|
|
|
|
where RESULT is the nonterminal symbol that this rule describes, and
|
|
COMPONENTS are various terminal and nonterminal symbols that are put
|
|
together by this rule (*note Symbols::).
|
|
|
|
For example,
|
|
|
|
exp: exp '+' exp
|
|
;
|
|
|
|
says that two groupings of type `exp', with a `+' token in between, can
|
|
be combined into a larger grouping of type `exp'.
|
|
|
|
Whitespace in rules is significant only to separate symbols. You
|
|
can add extra whitespace as you wish.
|
|
|
|
Scattered among the components can be ACTIONS that determine the
|
|
semantics of the rule. An action looks like this:
|
|
|
|
{C STATEMENTS}
|
|
|
|
Usually there is only one action and it follows the components. *Note
|
|
Actions::.
|
|
|
|
Multiple rules for the same RESULT can be written separately or can
|
|
be joined with the vertical-bar character `|' as follows:
|
|
|
|
RESULT: RULE1-COMPONENTS...
|
|
| RULE2-COMPONENTS...
|
|
...
|
|
;
|
|
|
|
They are still considered distinct rules even when joined in this way.
|
|
|
|
If COMPONENTS in a rule is empty, it means that RESULT can match the
|
|
empty string. For example, here is how to define a comma-separated
|
|
sequence of zero or more `exp' groupings:
|
|
|
|
expseq: /* empty */
|
|
| expseq1
|
|
;
|
|
|
|
expseq1: exp
|
|
| expseq1 ',' exp
|
|
;
|
|
|
|
It is customary to write a comment `/* empty */' in each rule with no
|
|
components.
|
|
|
|
|
|
File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File
|
|
|
|
Recursive Rules
|
|
===============
|
|
|
|
A rule is called "recursive" when its RESULT nonterminal appears
|
|
also on its right hand side. Nearly all Bison grammars need to use
|
|
recursion, because that is the only way to define a sequence of any
|
|
number of a particular thing. Consider this recursive definition of a
|
|
comma-separated sequence of one or more expressions:
|
|
|
|
expseq1: exp
|
|
| expseq1 ',' exp
|
|
;
|
|
|
|
Since the recursive use of `expseq1' is the leftmost symbol in the
|
|
right hand side, we call this "left recursion". By contrast, here the
|
|
same construct is defined using "right recursion":
|
|
|
|
expseq1: exp
|
|
| exp ',' expseq1
|
|
;
|
|
|
|
Any kind of sequence can be defined using either left recursion or
|
|
right recursion, but you should always use left recursion, because it
|
|
can parse a sequence of any number of elements with bounded stack
|
|
space. Right recursion uses up space on the Bison stack in proportion
|
|
to the number of elements in the sequence, because all the elements
|
|
must be shifted onto the stack before the rule can be applied even
|
|
once. *Note The Bison Parser Algorithm: Algorithm, for further
|
|
explanation of this.
|
|
|
|
"Indirect" or "mutual" recursion occurs when the result of the rule
|
|
does not appear directly on its right hand side, but does appear in
|
|
rules for other nonterminals which do appear on its right hand side.
|
|
|
|
For example:
|
|
|
|
expr: primary
|
|
| primary '+' primary
|
|
;
|
|
|
|
primary: constant
|
|
| '(' expr ')'
|
|
;
|
|
|
|
defines two mutually-recursive nonterminals, since each refers to the
|
|
other.
|
|
|
|
|
|
File: bison.info, Node: Semantics, Next: Locations, Prev: Recursion, Up: Grammar File
|
|
|
|
Defining Language Semantics
|
|
===========================
|
|
|
|
The grammar rules for a language determine only the syntax. The
|
|
semantics are determined by the semantic values associated with various
|
|
tokens and groupings, and by the actions taken when various groupings
|
|
are recognized.
|
|
|
|
For example, the calculator calculates properly because the value
|
|
associated with each expression is the proper number; it adds properly
|
|
because the action for the grouping `X + Y' is to add the numbers
|
|
associated with X and Y.
|
|
|
|
* Menu:
|
|
|
|
* Value Type:: Specifying one data type for all semantic values.
|
|
* Multiple Types:: Specifying several alternative data types.
|
|
* Actions:: An action is the semantic definition of a grammar rule.
|
|
* Action Types:: Specifying data types for actions to operate on.
|
|
* Mid-Rule Actions:: Most actions go at the end of a rule.
|
|
This says when, why and how to use the exceptional
|
|
action in the middle of a rule.
|
|
|
|
|
|
File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics
|
|
|
|
Data Types of Semantic Values
|
|
-----------------------------
|
|
|
|
In a simple program it may be sufficient to use the same data type
|
|
for the semantic values of all language constructs. This was true in
|
|
the RPN and infix calculator examples (*note Reverse Polish Notation
|
|
Calculator: RPN Calc.).
|
|
|
|
Bison's default is to use type `int' for all semantic values. To
|
|
specify some other type, define `YYSTYPE' as a macro, like this:
|
|
|
|
#define YYSTYPE double
|
|
|
|
This macro definition must go in the C declarations section of the
|
|
grammar file (*note Outline of a Bison Grammar: Grammar Outline.).
|
|
|
|
|
|
File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics
|
|
|
|
More Than One Value Type
|
|
------------------------
|
|
|
|
In most programs, you will need different data types for different
|
|
kinds of tokens and groupings. For example, a numeric constant may
|
|
need type `int' or `long', while a string constant needs type `char *',
|
|
and an identifier might need a pointer to an entry in the symbol table.
|
|
|
|
To use more than one data type for semantic values in one parser,
|
|
Bison requires you to do two things:
|
|
|
|
* Specify the entire collection of possible data types, with the
|
|
`%union' Bison declaration (*note The Collection of Value Types:
|
|
Union Decl.).
|
|
|
|
* Choose one of those types for each symbol (terminal or
|
|
nonterminal) for which semantic values are used. This is done for
|
|
tokens with the `%token' Bison declaration (*note Token Type
|
|
Names: Token Decl.) and for groupings with the `%type' Bison
|
|
declaration (*note Nonterminal Symbols: Type Decl.).
|
|
|
|
|
|
File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics
|
|
|
|
Actions
|
|
-------
|
|
|
|
An action accompanies a syntactic rule and contains C code to be
|
|
executed each time an instance of that rule is recognized. The task of
|
|
most actions is to compute a semantic value for the grouping built by
|
|
the rule from the semantic values associated with tokens or smaller
|
|
groupings.
|
|
|
|
An action consists of C statements surrounded by braces, much like a
|
|
compound statement in C. It can be placed at any position in the rule;
|
|
it is executed at that position. Most rules have just one action at
|
|
the end of the rule, following all the components. Actions in the
|
|
middle of a rule are tricky and used only for special purposes (*note
|
|
Actions in Mid-Rule: Mid-Rule Actions.).
|
|
|
|
The C code in an action can refer to the semantic values of the
|
|
components matched by the rule with the construct `$N', which stands for
|
|
the value of the Nth component. The semantic value for the grouping
|
|
being constructed is `$$'. (Bison translates both of these constructs
|
|
into array element references when it copies the actions into the parser
|
|
file.)
|
|
|
|
Here is a typical example:
|
|
|
|
exp: ...
|
|
| exp '+' exp
|
|
{ $$ = $1 + $3; }
|
|
|
|
This rule constructs an `exp' from two smaller `exp' groupings
|
|
connected by a plus-sign token. In the action, `$1' and `$3' refer to
|
|
the semantic values of the two component `exp' groupings, which are the
|
|
first and third symbols on the right hand side of the rule. The sum is
|
|
stored into `$$' so that it becomes the semantic value of the
|
|
addition-expression just recognized by the rule. If there were a
|
|
useful semantic value associated with the `+' token, it could be
|
|
referred to as `$2'.
|
|
|
|
If you don't specify an action for a rule, Bison supplies a default:
|
|
`$$ = $1'. Thus, the value of the first symbol in the rule becomes the
|
|
value of the whole rule. Of course, the default rule is valid only if
|
|
the two data types match. There is no meaningful default action for an
|
|
empty rule; every empty rule must have an explicit action unless the
|
|
rule's value does not matter.
|
|
|
|
`$N' with N zero or negative is allowed for reference to tokens and
|
|
groupings on the stack _before_ those that match the current rule.
|
|
This is a very risky practice, and to use it reliably you must be
|
|
certain of the context in which the rule is applied. Here is a case in
|
|
which you can use this reliably:
|
|
|
|
foo: expr bar '+' expr { ... }
|
|
| expr bar '-' expr { ... }
|
|
;
|
|
|
|
bar: /* empty */
|
|
{ previous_expr = $0; }
|
|
;
|
|
|
|
As long as `bar' is used only in the fashion shown here, `$0' always
|
|
refers to the `expr' which precedes `bar' in the definition of `foo'.
|
|
|
|
|
|
File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics
|
|
|
|
Data Types of Values in Actions
|
|
-------------------------------
|
|
|
|
If you have chosen a single data type for semantic values, the `$$'
|
|
and `$N' constructs always have that data type.
|
|
|
|
If you have used `%union' to specify a variety of data types, then
|
|
you must declare a choice among these types for each terminal or
|
|
nonterminal symbol that can have a semantic value. Then each time you
|
|
use `$$' or `$N', its data type is determined by which symbol it refers
|
|
to in the rule. In this example,
|
|
|
|
exp: ...
|
|
| exp '+' exp
|
|
{ $$ = $1 + $3; }
|
|
|
|
`$1' and `$3' refer to instances of `exp', so they all have the data
|
|
type declared for the nonterminal symbol `exp'. If `$2' were used, it
|
|
would have the data type declared for the terminal symbol `'+'',
|
|
whatever that might be.
|
|
|
|
Alternatively, you can specify the data type when you refer to the
|
|
value, by inserting `<TYPE>' after the `$' at the beginning of the
|
|
reference. For example, if you have defined types as shown here:
|
|
|
|
%union {
|
|
int itype;
|
|
double dtype;
|
|
}
|
|
|
|
then you can write `$<itype>1' to refer to the first subunit of the
|
|
rule as an integer, or `$<dtype>1' to refer to it as a double.
|
|
|