mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 04:13:03 +00:00
* doc/bison.1 (mandoc): Update. * src/system.h (EXT_GUARD_C, EXT_STYPE_H): Remove .c and .h. * src/files.c: Support output files extensions computing. (src_extension): New static variable. (header_extension): New static variable. (tr): New function. (get_extension_index): New function, gets the index of an extension filename in a string. (compute_exts_from_gf): New function, computes extensions from the grammar file extension. (compute_exts_from_src): New functions, computes extensions from the C source file extension, file given by ``-o'' option. (compute_base_names): Update. (output_files): Update.
1387 lines
50 KiB
Plaintext
1387 lines
50 KiB
Plaintext
Ceci est le fichier Info bison.info, produit par Makeinfo version 4.0b
|
||
à partir bison.texinfo.
|
||
|
||
START-INFO-DIR-ENTRY
|
||
* bison: (bison). GNU Project parser generator (yacc replacement).
|
||
END-INFO-DIR-ENTRY
|
||
|
||
This file documents the Bison parser generator.
|
||
|
||
Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
|
||
2000 Free Software Foundation, Inc.
|
||
|
||
Permission is granted to make and distribute verbatim copies of this
|
||
manual provided the copyright notice and this permission notice are
|
||
preserved on all copies.
|
||
|
||
Permission is granted to copy and distribute modified versions of
|
||
this manual under the conditions for verbatim copying, provided also
|
||
that the sections entitled "GNU General Public License" and "Conditions
|
||
for Using Bison" are included exactly as in the original, and provided
|
||
that the entire resulting derived work is distributed under the terms
|
||
of a permission notice identical to this one.
|
||
|
||
Permission is granted to copy and distribute translations of this
|
||
manual into another language, under the above conditions for modified
|
||
versions, except that the sections entitled "GNU General Public
|
||
License", "Conditions for Using Bison" and this permission notice may be
|
||
included in translations approved by the Free Software Foundation
|
||
instead of in the original English.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Decls, Next: Rpcalc Rules, Up: RPN Calc
|
||
|
||
Declarations for `rpcalc'
|
||
-------------------------
|
||
|
||
Here are the C and Bison declarations for the reverse polish notation
|
||
calculator. As in C, comments are placed between `/*...*/'.
|
||
|
||
/* Reverse polish notation calculator. */
|
||
|
||
%{
|
||
#define YYSTYPE double
|
||
#include <math.h>
|
||
%}
|
||
|
||
%token NUM
|
||
|
||
%% /* Grammar rules and actions follow */
|
||
|
||
The C declarations section (*note The C Declarations Section: C
|
||
Declarations.) contains two preprocessor directives.
|
||
|
||
The `#define' directive defines the macro `YYSTYPE', thus specifying
|
||
the C data type for semantic values of both tokens and groupings (*note
|
||
Data Types of Semantic Values: Value Type.). The Bison parser will use
|
||
whatever type `YYSTYPE' is defined as; if you don't define it, `int' is
|
||
the default. Because we specify `double', each token and each
|
||
expression has an associated value, which is a floating point number.
|
||
|
||
The `#include' directive is used to declare the exponentiation
|
||
function `pow'.
|
||
|
||
The second section, Bison declarations, provides information to
|
||
Bison about the token types (*note The Bison Declarations Section:
|
||
Bison Declarations.). Each terminal symbol that is not a
|
||
single-character literal must be declared here. (Single-character
|
||
literals normally don't need to be declared.) In this example, all the
|
||
arithmetic operators are designated by single-character literals, so the
|
||
only terminal symbol that needs to be declared is `NUM', the token type
|
||
for numeric constants.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc
|
||
|
||
Grammar Rules for `rpcalc'
|
||
--------------------------
|
||
|
||
Here are the grammar rules for the reverse polish notation
|
||
calculator.
|
||
|
||
input: /* empty */
|
||
| input line
|
||
;
|
||
|
||
line: '\n'
|
||
| exp '\n' { printf ("\t%.10g\n", $1); }
|
||
;
|
||
|
||
exp: NUM { $$ = $1; }
|
||
| exp exp '+' { $$ = $1 + $2; }
|
||
| exp exp '-' { $$ = $1 - $2; }
|
||
| exp exp '*' { $$ = $1 * $2; }
|
||
| exp exp '/' { $$ = $1 / $2; }
|
||
/* Exponentiation */
|
||
| exp exp '^' { $$ = pow ($1, $2); }
|
||
/* Unary minus */
|
||
| exp 'n' { $$ = -$1; }
|
||
;
|
||
%%
|
||
|
||
The groupings of the rpcalc "language" defined here are the
|
||
expression (given the name `exp'), the line of input (`line'), and the
|
||
complete input transcript (`input'). Each of these nonterminal symbols
|
||
has several alternate rules, joined by the `|' punctuator which is read
|
||
as "or". The following sections explain what these rules mean.
|
||
|
||
The semantics of the language is determined by the actions taken
|
||
when a grouping is recognized. The actions are the C code that appears
|
||
inside braces. *Note Actions::.
|
||
|
||
You must specify these actions in C, but Bison provides the means for
|
||
passing semantic values between the rules. In each action, the
|
||
pseudo-variable `$$' stands for the semantic value for the grouping
|
||
that the rule is going to construct. Assigning a value to `$$' is the
|
||
main job of most actions. The semantic values of the components of the
|
||
rule are referred to as `$1', `$2', and so on.
|
||
|
||
* Menu:
|
||
|
||
* Rpcalc Input::
|
||
* Rpcalc Line::
|
||
* Rpcalc Expr::
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules
|
||
|
||
Explanation of `input'
|
||
......................
|
||
|
||
Consider the definition of `input':
|
||
|
||
input: /* empty */
|
||
| input line
|
||
;
|
||
|
||
This definition reads as follows: "A complete input is either an
|
||
empty string, or a complete input followed by an input line". Notice
|
||
that "complete input" is defined in terms of itself. This definition
|
||
is said to be "left recursive" since `input' appears always as the
|
||
leftmost symbol in the sequence. *Note Recursive Rules: Recursion.
|
||
|
||
The first alternative is empty because there are no symbols between
|
||
the colon and the first `|'; this means that `input' can match an empty
|
||
string of input (no tokens). We write the rules this way because it is
|
||
legitimate to type `Ctrl-d' right after you start the calculator. It's
|
||
conventional to put an empty alternative first and write the comment
|
||
`/* empty */' in it.
|
||
|
||
The second alternate rule (`input line') handles all nontrivial
|
||
input. It means, "After reading any number of lines, read one more
|
||
line if possible." The left recursion makes this rule into a loop.
|
||
Since the first alternative matches empty input, the loop can be
|
||
executed zero or more times.
|
||
|
||
The parser function `yyparse' continues to process input until a
|
||
grammatical error is seen or the lexical analyzer says there are no more
|
||
input tokens; we will arrange for the latter to happen at end of file.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules
|
||
|
||
Explanation of `line'
|
||
.....................
|
||
|
||
Now consider the definition of `line':
|
||
|
||
line: '\n'
|
||
| exp '\n' { printf ("\t%.10g\n", $1); }
|
||
;
|
||
|
||
The first alternative is a token which is a newline character; this
|
||
means that rpcalc accepts a blank line (and ignores it, since there is
|
||
no action). The second alternative is an expression followed by a
|
||
newline. This is the alternative that makes rpcalc useful. The
|
||
semantic value of the `exp' grouping is the value of `$1' because the
|
||
`exp' in question is the first symbol in the alternative. The action
|
||
prints this value, which is the result of the computation the user
|
||
asked for.
|
||
|
||
This action is unusual because it does not assign a value to `$$'.
|
||
As a consequence, the semantic value associated with the `line' is
|
||
uninitialized (its value will be unpredictable). This would be a bug if
|
||
that value were ever used, but we don't use it: once rpcalc has printed
|
||
the value of the user's input line, that value is no longer needed.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules
|
||
|
||
Explanation of `expr'
|
||
.....................
|
||
|
||
The `exp' grouping has several rules, one for each kind of
|
||
expression. The first rule handles the simplest expressions: those
|
||
that are just numbers. The second handles an addition-expression,
|
||
which looks like two expressions followed by a plus-sign. The third
|
||
handles subtraction, and so on.
|
||
|
||
exp: NUM
|
||
| exp exp '+' { $$ = $1 + $2; }
|
||
| exp exp '-' { $$ = $1 - $2; }
|
||
...
|
||
;
|
||
|
||
We have used `|' to join all the rules for `exp', but we could
|
||
equally well have written them separately:
|
||
|
||
exp: NUM ;
|
||
exp: exp exp '+' { $$ = $1 + $2; } ;
|
||
exp: exp exp '-' { $$ = $1 - $2; } ;
|
||
...
|
||
|
||
Most of the rules have actions that compute the value of the
|
||
expression in terms of the value of its parts. For example, in the
|
||
rule for addition, `$1' refers to the first component `exp' and `$2'
|
||
refers to the second one. The third component, `'+'', has no meaningful
|
||
associated semantic value, but if it had one you could refer to it as
|
||
`$3'. When `yyparse' recognizes a sum expression using this rule, the
|
||
sum of the two subexpressions' values is produced as the value of the
|
||
entire expression. *Note Actions::.
|
||
|
||
You don't have to give an action for every rule. When a rule has no
|
||
action, Bison by default copies the value of `$1' into `$$'. This is
|
||
what happens in the first rule (the one that uses `NUM').
|
||
|
||
The formatting shown here is the recommended convention, but Bison
|
||
does not require it. You can add or change whitespace as much as you
|
||
wish. For example, this:
|
||
|
||
exp : NUM | exp exp '+' {$$ = $1 + $2; } | ...
|
||
|
||
means the same thing as this:
|
||
|
||
exp: NUM
|
||
| exp exp '+' { $$ = $1 + $2; }
|
||
| ...
|
||
|
||
The latter, however, is much more readable.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc
|
||
|
||
The `rpcalc' Lexical Analyzer
|
||
-----------------------------
|
||
|
||
The lexical analyzer's job is low-level parsing: converting
|
||
characters or sequences of characters into tokens. The Bison parser
|
||
gets its tokens by calling the lexical analyzer. *Note The Lexical
|
||
Analyzer Function `yylex': Lexical.
|
||
|
||
Only a simple lexical analyzer is needed for the RPN calculator.
|
||
This lexical analyzer skips blanks and tabs, then reads in numbers as
|
||
`double' and returns them as `NUM' tokens. Any other character that
|
||
isn't part of a number is a separate token. Note that the token-code
|
||
for such a single-character token is the character itself.
|
||
|
||
The return value of the lexical analyzer function is a numeric code
|
||
which represents a token type. The same text used in Bison rules to
|
||
stand for this token type is also a C expression for the numeric code
|
||
for the type. This works in two ways. If the token type is a
|
||
character literal, then its numeric code is the ASCII code for that
|
||
character; you can use the same character literal in the lexical
|
||
analyzer to express the number. If the token type is an identifier,
|
||
that identifier is defined by Bison as a C macro whose definition is
|
||
the appropriate number. In this example, therefore, `NUM' becomes a
|
||
macro for `yylex' to use.
|
||
|
||
The semantic value of the token (if it has one) is stored into the
|
||
global variable `yylval', which is where the Bison parser will look for
|
||
it. (The C data type of `yylval' is `YYSTYPE', which was defined at
|
||
the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc
|
||
Decls..)
|
||
|
||
A token type code of zero is returned if the end-of-file is
|
||
encountered. (Bison recognizes any nonpositive value as indicating the
|
||
end of the input.)
|
||
|
||
Here is the code for the lexical analyzer:
|
||
|
||
/* Lexical analyzer returns a double floating point
|
||
number on the stack and the token NUM, or the ASCII
|
||
character read if not a number. Skips all blanks
|
||
and tabs, returns 0 for EOF. */
|
||
|
||
#include <ctype.h>
|
||
|
||
int
|
||
yylex (void)
|
||
{
|
||
int c;
|
||
|
||
/* skip white space */
|
||
while ((c = getchar ()) == ' ' || c == '\t')
|
||
;
|
||
/* process numbers */
|
||
if (c == '.' || isdigit (c))
|
||
{
|
||
ungetc (c, stdin);
|
||
scanf ("%lf", &yylval);
|
||
return NUM;
|
||
}
|
||
/* return end-of-file */
|
||
if (c == EOF)
|
||
return 0;
|
||
/* return single chars */
|
||
return c;
|
||
}
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc
|
||
|
||
The Controlling Function
|
||
------------------------
|
||
|
||
In keeping with the spirit of this example, the controlling function
|
||
is kept to the bare minimum. The only requirement is that it call
|
||
`yyparse' to start the process of parsing.
|
||
|
||
int
|
||
main (void)
|
||
{
|
||
return yyparse ();
|
||
}
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc
|
||
|
||
The Error Reporting Routine
|
||
---------------------------
|
||
|
||
When `yyparse' detects a syntax error, it calls the error reporting
|
||
function `yyerror' to print an error message (usually but not always
|
||
`"parse error"'). It is up to the programmer to supply `yyerror'
|
||
(*note Parser C-Language Interface: Interface.), so here is the
|
||
definition we will use:
|
||
|
||
#include <stdio.h>
|
||
|
||
void
|
||
yyerror (const char *s) /* Called by yyparse on error */
|
||
{
|
||
printf ("%s\n", s);
|
||
}
|
||
|
||
After `yyerror' returns, the Bison parser may recover from the error
|
||
and continue parsing if the grammar contains a suitable error rule
|
||
(*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We
|
||
have not written any error rules in this example, so any invalid input
|
||
will cause the calculator program to exit. This is not clean behavior
|
||
for a real calculator, but it is adequate for the first example.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc
|
||
|
||
Running Bison to Make the Parser
|
||
--------------------------------
|
||
|
||
Before running Bison to produce a parser, we need to decide how to
|
||
arrange all the source code in one or more source files. For such a
|
||
simple example, the easiest thing is to put everything in one file. The
|
||
definitions of `yylex', `yyerror' and `main' go at the end, in the
|
||
"additional C code" section of the file (*note The Overall Layout of a
|
||
Bison Grammar: Grammar Layout.).
|
||
|
||
For a large project, you would probably have several source files,
|
||
and use `make' to arrange to recompile them.
|
||
|
||
With all the source in a single file, you use the following command
|
||
to convert it into a parser file:
|
||
|
||
bison FILE_NAME.y
|
||
|
||
In this example the file was called `rpcalc.y' (for "Reverse Polish
|
||
CALCulator"). Bison produces a file named `FILE_NAME.tab.c', removing
|
||
the `.y' from the original file name. The file output by Bison contains
|
||
the source code for `yyparse'. The additional functions in the input
|
||
file (`yylex', `yyerror' and `main') are copied verbatim to the output.
|
||
|
||
|
||
File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc
|
||
|
||
Compiling the Parser File
|
||
-------------------------
|
||
|
||
Here is how to compile and run the parser file:
|
||
|
||
# List files in current directory.
|
||
% ls
|
||
rpcalc.tab.c rpcalc.y
|
||
|
||
# Compile the Bison parser.
|
||
# `-lm' tells compiler to search math library for `pow'.
|
||
% cc rpcalc.tab.c -lm -o rpcalc
|
||
|
||
# List files again.
|
||
% ls
|
||
rpcalc rpcalc.tab.c rpcalc.y
|
||
|
||
The file `rpcalc' now contains the executable code. Here is an
|
||
example session using `rpcalc'.
|
||
|
||
% rpcalc
|
||
4 9 +
|
||
13
|
||
3 7 + 3 4 5 *+-
|
||
-13
|
||
3 7 + 3 4 5 * + - n Note the unary minus, `n'
|
||
13
|
||
5 6 / 4 n +
|
||
-3.166666667
|
||
3 4 ^ Exponentiation
|
||
81
|
||
^D End-of-file indicator
|
||
%
|
||
|
||
|
||
File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples
|
||
|
||
Infix Notation Calculator: `calc'
|
||
=================================
|
||
|
||
We now modify rpcalc to handle infix operators instead of postfix.
|
||
Infix notation involves the concept of operator precedence and the need
|
||
for parentheses nested to arbitrary depth. Here is the Bison code for
|
||
`calc.y', an infix desk-top calculator.
|
||
|
||
/* Infix notation calculator--calc */
|
||
|
||
%{
|
||
#define YYSTYPE double
|
||
#include <math.h>
|
||
%}
|
||
|
||
/* BISON Declarations */
|
||
%token NUM
|
||
%left '-' '+'
|
||
%left '*' '/'
|
||
%left NEG /* negation--unary minus */
|
||
%right '^' /* exponentiation */
|
||
|
||
/* Grammar follows */
|
||
%%
|
||
input: /* empty string */
|
||
| input line
|
||
;
|
||
|
||
line: '\n'
|
||
| exp '\n' { printf ("\t%.10g\n", $1); }
|
||
;
|
||
|
||
exp: NUM { $$ = $1; }
|
||
| exp '+' exp { $$ = $1 + $3; }
|
||
| exp '-' exp { $$ = $1 - $3; }
|
||
| exp '*' exp { $$ = $1 * $3; }
|
||
| exp '/' exp { $$ = $1 / $3; }
|
||
| '-' exp %prec NEG { $$ = -$2; }
|
||
| exp '^' exp { $$ = pow ($1, $3); }
|
||
| '(' exp ')' { $$ = $2; }
|
||
;
|
||
%%
|
||
|
||
The functions `yylex', `yyerror' and `main' can be the same as before.
|
||
|
||
There are two important new features shown in this code.
|
||
|
||
In the second section (Bison declarations), `%left' declares token
|
||
types and says they are left-associative operators. The declarations
|
||
`%left' and `%right' (right associativity) take the place of `%token'
|
||
which is used to declare a token type name without associativity.
|
||
(These tokens are single-character literals, which ordinarily don't
|
||
need to be declared. We declare them here to specify the
|
||
associativity.)
|
||
|
||
Operator precedence is determined by the line ordering of the
|
||
declarations; the higher the line number of the declaration (lower on
|
||
the page or screen), the higher the precedence. Hence, exponentiation
|
||
has the highest precedence, unary minus (`NEG') is next, followed by
|
||
`*' and `/', and so on. *Note Operator Precedence: Precedence.
|
||
|
||
The other important new feature is the `%prec' in the grammar section
|
||
for the unary minus operator. The `%prec' simply instructs Bison that
|
||
the rule `| '-' exp' has the same precedence as `NEG'--in this case the
|
||
next-to-highest. *Note Context-Dependent Precedence: Contextual
|
||
Precedence.
|
||
|
||
Here is a sample run of `calc.y':
|
||
|
||
% calc
|
||
4 + 4.5 - (34/(8*3+-3))
|
||
6.880952381
|
||
-56 + 2
|
||
-54
|
||
3 ^ 2
|
||
9
|
||
|
||
|
||
File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples
|
||
|
||
Simple Error Recovery
|
||
=====================
|
||
|
||
Up to this point, this manual has not addressed the issue of "error
|
||
recovery"--how to continue parsing after the parser detects a syntax
|
||
error. All we have handled is error reporting with `yyerror'. Recall
|
||
that by default `yyparse' returns after calling `yyerror'. This means
|
||
that an erroneous input line causes the calculator program to exit.
|
||
Now we show how to rectify this deficiency.
|
||
|
||
The Bison language itself includes the reserved word `error', which
|
||
may be included in the grammar rules. In the example below it has been
|
||
added to one of the alternatives for `line':
|
||
|
||
line: '\n'
|
||
| exp '\n' { printf ("\t%.10g\n", $1); }
|
||
| error '\n' { yyerrok; }
|
||
;
|
||
|
||
This addition to the grammar allows for simple error recovery in the
|
||
event of a parse error. If an expression that cannot be evaluated is
|
||
read, the error will be recognized by the third rule for `line', and
|
||
parsing will continue. (The `yyerror' function is still called upon to
|
||
print its message as well.) The action executes the statement
|
||
`yyerrok', a macro defined automatically by Bison; its meaning is that
|
||
error recovery is complete (*note Error Recovery::). Note the
|
||
difference between `yyerrok' and `yyerror'; neither one is a misprint.
|
||
|
||
This form of error recovery deals with syntax errors. There are
|
||
other kinds of errors; for example, division by zero, which raises an
|
||
exception signal that is normally fatal. A real calculator program
|
||
must handle this signal and use `longjmp' to return to `main' and
|
||
resume parsing input lines; it would also have to discard the rest of
|
||
the current line of input. We won't discuss this issue further because
|
||
it is not specific to Bison programs.
|
||
|
||
|
||
File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples
|
||
|
||
Multi-Function Calculator: `mfcalc'
|
||
===================================
|
||
|
||
Now that the basics of Bison have been discussed, it is time to move
|
||
on to a more advanced problem. The above calculators provided only five
|
||
functions, `+', `-', `*', `/' and `^'. It would be nice to have a
|
||
calculator that provides other mathematical functions such as `sin',
|
||
`cos', etc.
|
||
|
||
It is easy to add new operators to the infix calculator as long as
|
||
they are only single-character literals. The lexical analyzer `yylex'
|
||
passes back all nonnumber characters as tokens, so new grammar rules
|
||
suffice for adding a new operator. But we want something more
|
||
flexible: built-in functions whose syntax has this form:
|
||
|
||
FUNCTION_NAME (ARGUMENT)
|
||
|
||
At the same time, we will add memory to the calculator, by allowing you
|
||
to create named variables, store values in them, and use them later.
|
||
Here is a sample session with the multi-function calculator:
|
||
|
||
% mfcalc
|
||
pi = 3.141592653589
|
||
3.1415926536
|
||
sin(pi)
|
||
0.0000000000
|
||
alpha = beta1 = 2.3
|
||
2.3000000000
|
||
alpha
|
||
2.3000000000
|
||
ln(alpha)
|
||
0.8329091229
|
||
exp(ln(beta1))
|
||
2.3000000000
|
||
%
|
||
|
||
Note that multiple assignment and nested function calls are
|
||
permitted.
|
||
|
||
* Menu:
|
||
|
||
* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
|
||
* Rules: Mfcalc Rules. Grammar rules for the calculator.
|
||
* Symtab: Mfcalc Symtab. Symbol table management subroutines.
|
||
|
||
|
||
File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc
|
||
|
||
Declarations for `mfcalc'
|
||
-------------------------
|
||
|
||
Here are the C and Bison declarations for the multi-function
|
||
calculator.
|
||
|
||
%{
|
||
#include <math.h> /* For math functions, cos(), sin(), etc. */
|
||
#include "calc.h" /* Contains definition of `symrec' */
|
||
%}
|
||
%union {
|
||
double val; /* For returning numbers. */
|
||
symrec *tptr; /* For returning symbol-table pointers */
|
||
}
|
||
|
||
%token <val> NUM /* Simple double precision number */
|
||
%token <tptr> VAR FNCT /* Variable and Function */
|
||
%type <val> exp
|
||
|
||
%right '='
|
||
%left '-' '+'
|
||
%left '*' '/'
|
||
%left NEG /* Negation--unary minus */
|
||
%right '^' /* Exponentiation */
|
||
|
||
/* Grammar follows */
|
||
|
||
%%
|
||
|
||
The above grammar introduces only two new features of the Bison
|
||
language. These features allow semantic values to have various data
|
||
types (*note More Than One Value Type: Multiple Types.).
|
||
|
||
The `%union' declaration specifies the entire list of possible types;
|
||
this is instead of defining `YYSTYPE'. The allowable types are now
|
||
double-floats (for `exp' and `NUM') and pointers to entries in the
|
||
symbol table. *Note The Collection of Value Types: Union Decl.
|
||
|
||
Since values can now have various types, it is necessary to
|
||
associate a type with each grammar symbol whose semantic value is used.
|
||
These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations
|
||
are augmented with information about their data type (placed between
|
||
angle brackets).
|
||
|
||
The Bison construct `%type' is used for declaring nonterminal
|
||
symbols, just as `%token' is used for declaring token types. We have
|
||
not used `%type' before because nonterminal symbols are normally
|
||
declared implicitly by the rules that define them. But `exp' must be
|
||
declared explicitly so we can specify its value type. *Note
|
||
Nonterminal Symbols: Type Decl.
|
||
|
||
|
||
File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc
|
||
|
||
Grammar Rules for `mfcalc'
|
||
--------------------------
|
||
|
||
Here are the grammar rules for the multi-function calculator. Most
|
||
of them are copied directly from `calc'; three rules, those which
|
||
mention `VAR' or `FNCT', are new.
|
||
|
||
input: /* empty */
|
||
| input line
|
||
;
|
||
|
||
line:
|
||
'\n'
|
||
| exp '\n' { printf ("\t%.10g\n", $1); }
|
||
| error '\n' { yyerrok; }
|
||
;
|
||
|
||
exp: NUM { $$ = $1; }
|
||
| VAR { $$ = $1->value.var; }
|
||
| VAR '=' exp { $$ = $3; $1->value.var = $3; }
|
||
| FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }
|
||
| exp '+' exp { $$ = $1 + $3; }
|
||
| exp '-' exp { $$ = $1 - $3; }
|
||
| exp '*' exp { $$ = $1 * $3; }
|
||
| exp '/' exp { $$ = $1 / $3; }
|
||
| '-' exp %prec NEG { $$ = -$2; }
|
||
| exp '^' exp { $$ = pow ($1, $3); }
|
||
| '(' exp ')' { $$ = $2; }
|
||
;
|
||
/* End of grammar */
|
||
%%
|
||
|
||
|
||
File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc
|
||
|
||
The `mfcalc' Symbol Table
|
||
-------------------------
|
||
|
||
The multi-function calculator requires a symbol table to keep track
|
||
of the names and meanings of variables and functions. This doesn't
|
||
affect the grammar rules (except for the actions) or the Bison
|
||
declarations, but it requires some additional C functions for support.
|
||
|
||
The symbol table itself consists of a linked list of records. Its
|
||
definition, which is kept in the header `calc.h', is as follows. It
|
||
provides for either functions or variables to be placed in the table.
|
||
|
||
/* Fonctions type. */
|
||
typedef double (*func_t) (double);
|
||
|
||
/* Data type for links in the chain of symbols. */
|
||
struct symrec
|
||
{
|
||
char *name; /* name of symbol */
|
||
int type; /* type of symbol: either VAR or FNCT */
|
||
union
|
||
{
|
||
double var; /* value of a VAR */
|
||
func_t fnctptr; /* value of a FNCT */
|
||
} value;
|
||
struct symrec *next; /* link field */
|
||
};
|
||
|
||
typedef struct symrec symrec;
|
||
|
||
/* The symbol table: a chain of `struct symrec'. */
|
||
extern symrec *sym_table;
|
||
|
||
symrec *putsym (const char *, func_t);
|
||
symrec *getsym (const char *);
|
||
|
||
The new version of `main' includes a call to `init_table', a
|
||
function that initializes the symbol table. Here it is, and
|
||
`init_table' as well:
|
||
|
||
#include <stdio.h>
|
||
|
||
int
|
||
main (void)
|
||
{
|
||
init_table ();
|
||
return yyparse ();
|
||
}
|
||
|
||
void
|
||
yyerror (const char *s) /* Called by yyparse on error */
|
||
{
|
||
printf ("%s\n", s);
|
||
}
|
||
|
||
struct init
|
||
{
|
||
char *fname;
|
||
double (*fnct)(double);
|
||
};
|
||
|
||
struct init arith_fncts[] =
|
||
{
|
||
"sin", sin,
|
||
"cos", cos,
|
||
"atan", atan,
|
||
"ln", log,
|
||
"exp", exp,
|
||
"sqrt", sqrt,
|
||
0, 0
|
||
};
|
||
|
||
/* The symbol table: a chain of `struct symrec'. */
|
||
symrec *sym_table = (symrec *) 0;
|
||
|
||
/* Put arithmetic functions in table. */
|
||
void
|
||
init_table (void)
|
||
{
|
||
int i;
|
||
symrec *ptr;
|
||
for (i = 0; arith_fncts[i].fname != 0; i++)
|
||
{
|
||
ptr = putsym (arith_fncts[i].fname, FNCT);
|
||
ptr->value.fnctptr = arith_fncts[i].fnct;
|
||
}
|
||
}
|
||
|
||
By simply editing the initialization list and adding the necessary
|
||
include files, you can add additional functions to the calculator.
|
||
|
||
Two important functions allow look-up and installation of symbols in
|
||
the symbol table. The function `putsym' is passed a name and the type
|
||
(`VAR' or `FNCT') of the object to be installed. The object is linked
|
||
to the front of the list, and a pointer to the object is returned. The
|
||
function `getsym' is passed the name of the symbol to look up. If
|
||
found, a pointer to that symbol is returned; otherwise zero is returned.
|
||
|
||
symrec *
|
||
putsym (char *sym_name, int sym_type)
|
||
{
|
||
symrec *ptr;
|
||
ptr = (symrec *) malloc (sizeof (symrec));
|
||
ptr->name = (char *) malloc (strlen (sym_name) + 1);
|
||
strcpy (ptr->name,sym_name);
|
||
ptr->type = sym_type;
|
||
ptr->value.var = 0; /* set value to 0 even if fctn. */
|
||
ptr->next = (struct symrec *)sym_table;
|
||
sym_table = ptr;
|
||
return ptr;
|
||
}
|
||
|
||
symrec *
|
||
getsym (const char *sym_name)
|
||
{
|
||
symrec *ptr;
|
||
for (ptr = sym_table; ptr != (symrec *) 0;
|
||
ptr = (symrec *)ptr->next)
|
||
if (strcmp (ptr->name,sym_name) == 0)
|
||
return ptr;
|
||
return 0;
|
||
}
|
||
|
||
The function `yylex' must now recognize variables, numeric values,
|
||
and the single-character arithmetic operators. Strings of alphanumeric
|
||
characters with a leading non-digit are recognized as either variables
|
||
or functions depending on what the symbol table says about them.
|
||
|
||
The string is passed to `getsym' for look up in the symbol table. If
|
||
the name appears in the table, a pointer to its location and its type
|
||
(`VAR' or `FNCT') is returned to `yyparse'. If it is not already in
|
||
the table, then it is installed as a `VAR' using `putsym'. Again, a
|
||
pointer and its type (which must be `VAR') is returned to `yyparse'.
|
||
|
||
No change is needed in the handling of numeric values and arithmetic
|
||
operators in `yylex'.
|
||
|
||
#include <ctype.h>
|
||
|
||
int
|
||
yylex (void)
|
||
{
|
||
int c;
|
||
|
||
/* Ignore whitespace, get first nonwhite character. */
|
||
while ((c = getchar ()) == ' ' || c == '\t');
|
||
|
||
if (c == EOF)
|
||
return 0;
|
||
|
||
/* Char starts a number => parse the number. */
|
||
if (c == '.' || isdigit (c))
|
||
{
|
||
ungetc (c, stdin);
|
||
scanf ("%lf", &yylval.val);
|
||
return NUM;
|
||
}
|
||
|
||
/* Char starts an identifier => read the name. */
|
||
if (isalpha (c))
|
||
{
|
||
symrec *s;
|
||
static char *symbuf = 0;
|
||
static int length = 0;
|
||
int i;
|
||
|
||
/* Initially make the buffer long enough
|
||
for a 40-character symbol name. */
|
||
if (length == 0)
|
||
length = 40, symbuf = (char *)malloc (length + 1);
|
||
|
||
i = 0;
|
||
do
|
||
{
|
||
/* If buffer is full, make it bigger. */
|
||
if (i == length)
|
||
{
|
||
length *= 2;
|
||
symbuf = (char *)realloc (symbuf, length + 1);
|
||
}
|
||
/* Add this character to the buffer. */
|
||
symbuf[i++] = c;
|
||
/* Get another character. */
|
||
c = getchar ();
|
||
}
|
||
while (c != EOF && isalnum (c));
|
||
|
||
ungetc (c, stdin);
|
||
symbuf[i] = '\0';
|
||
|
||
s = getsym (symbuf);
|
||
if (s == 0)
|
||
s = putsym (symbuf, VAR);
|
||
yylval.tptr = s;
|
||
return s->type;
|
||
}
|
||
|
||
/* Any other character is a token by itself. */
|
||
return c;
|
||
}
|
||
|
||
This program is both powerful and flexible. You may easily add new
|
||
functions, and it is a simple job to modify this code to install
|
||
predefined variables such as `pi' or `e' as well.
|
||
|
||
|
||
File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples
|
||
|
||
Exercises
|
||
=========
|
||
|
||
1. Add some new functions from `math.h' to the initialization list.
|
||
|
||
2. Add another array that contains constants and their values. Then
|
||
modify `init_table' to add these constants to the symbol table.
|
||
It will be easiest to give the constants type `VAR'.
|
||
|
||
3. Make the program report an error if the user refers to an
|
||
uninitialized variable in any way except to store a value in it.
|
||
|
||
|
||
File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top
|
||
|
||
Bison Grammar Files
|
||
*******************
|
||
|
||
Bison takes as input a context-free grammar specification and
|
||
produces a C-language function that recognizes correct instances of the
|
||
grammar.
|
||
|
||
The Bison grammar input file conventionally has a name ending in
|
||
`.y'. *Note Invoking Bison: Invocation.
|
||
|
||
* Menu:
|
||
|
||
* Grammar Outline:: Overall layout of the grammar file.
|
||
* Symbols:: Terminal and nonterminal symbols.
|
||
* Rules:: How to write grammar rules.
|
||
* Recursion:: Writing recursive rules.
|
||
* Semantics:: Semantic values and actions.
|
||
* Locations:: Locations and actions.
|
||
* Declarations:: All kinds of Bison declarations are described here.
|
||
* Multiple Parsers:: Putting more than one Bison parser in one program.
|
||
|
||
|
||
File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File
|
||
|
||
Outline of a Bison Grammar
|
||
==========================
|
||
|
||
A Bison grammar file has four main sections, shown here with the
|
||
appropriate delimiters:
|
||
|
||
%{
|
||
C DECLARATIONS
|
||
%}
|
||
|
||
BISON DECLARATIONS
|
||
|
||
%%
|
||
GRAMMAR RULES
|
||
%%
|
||
|
||
ADDITIONAL C CODE
|
||
|
||
Comments enclosed in `/* ... */' may appear in any of the sections.
|
||
|
||
* Menu:
|
||
|
||
* C Declarations:: Syntax and usage of the C declarations section.
|
||
* Bison Declarations:: Syntax and usage of the Bison declarations section.
|
||
* Grammar Rules:: Syntax and usage of the grammar rules section.
|
||
* C Code:: Syntax and usage of the additional C code section.
|
||
|
||
|
||
File: bison.info, Node: C Declarations, Next: Bison Declarations, Up: Grammar Outline
|
||
|
||
The C Declarations Section
|
||
--------------------------
|
||
|
||
The C DECLARATIONS section contains macro definitions and
|
||
declarations of functions and variables that are used in the actions in
|
||
the grammar rules. These are copied to the beginning of the parser
|
||
file so that they precede the definition of `yyparse'. You can use
|
||
`#include' to get the declarations from a header file. If you don't
|
||
need any C declarations, you may omit the `%{' and `%}' delimiters that
|
||
bracket this section.
|
||
|
||
|
||
File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline
|
||
|
||
The Bison Declarations Section
|
||
------------------------------
|
||
|
||
The BISON DECLARATIONS section contains declarations that define
|
||
terminal and nonterminal symbols, specify precedence, and so on. In
|
||
some simple grammars you may not need any declarations. *Note Bison
|
||
Declarations: Declarations.
|
||
|
||
|
||
File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline
|
||
|
||
The Grammar Rules Section
|
||
-------------------------
|
||
|
||
The "grammar rules" section contains one or more Bison grammar
|
||
rules, and nothing else. *Note Syntax of Grammar Rules: Rules.
|
||
|
||
There must always be at least one grammar rule, and the first `%%'
|
||
(which precedes the grammar rules) may never be omitted even if it is
|
||
the first thing in the file.
|
||
|
||
|
||
File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline
|
||
|
||
The Additional C Code Section
|
||
-----------------------------
|
||
|
||
The ADDITIONAL C CODE section is copied verbatim to the end of the
|
||
parser file, just as the C DECLARATIONS section is copied to the
|
||
beginning. This is the most convenient place to put anything that you
|
||
want to have in the parser file but which need not come before the
|
||
definition of `yyparse'. For example, the definitions of `yylex' and
|
||
`yyerror' often go here. *Note Parser C-Language Interface: Interface.
|
||
|
||
If the last section is empty, you may omit the `%%' that separates it
|
||
from the grammar rules.
|
||
|
||
The Bison parser itself contains many static variables whose names
|
||
start with `yy' and many macros whose names start with `YY'. It is a
|
||
good idea to avoid using any such names (except those documented in this
|
||
manual) in the additional C code section of the grammar file.
|
||
|
||
|
||
File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File
|
||
|
||
Symbols, Terminal and Nonterminal
|
||
=================================
|
||
|
||
"Symbols" in Bison grammars represent the grammatical classifications
|
||
of the language.
|
||
|
||
A "terminal symbol" (also known as a "token type") represents a
|
||
class of syntactically equivalent tokens. You use the symbol in grammar
|
||
rules to mean that a token in that class is allowed. The symbol is
|
||
represented in the Bison parser by a numeric code, and the `yylex'
|
||
function returns a token type code to indicate what kind of token has
|
||
been read. You don't need to know what the code value is; you can use
|
||
the symbol to stand for it.
|
||
|
||
A "nonterminal symbol" stands for a class of syntactically equivalent
|
||
groupings. The symbol name is used in writing grammar rules. By
|
||
convention, it should be all lower case.
|
||
|
||
Symbol names can contain letters, digits (not at the beginning),
|
||
underscores and periods. Periods make sense only in nonterminals.
|
||
|
||
There are three ways of writing terminal symbols in the grammar:
|
||
|
||
* A "named token type" is written with an identifier, like an
|
||
identifier in C. By convention, it should be all upper case. Each
|
||
such name must be defined with a Bison declaration such as
|
||
`%token'. *Note Token Type Names: Token Decl.
|
||
|
||
* A "character token type" (or "literal character token") is written
|
||
in the grammar using the same syntax used in C for character
|
||
constants; for example, `'+'' is a character token type. A
|
||
character token type doesn't need to be declared unless you need to
|
||
specify its semantic value data type (*note Data Types of Semantic
|
||
Values: Value Type.), associativity, or precedence (*note Operator
|
||
Precedence: Precedence.).
|
||
|
||
By convention, a character token type is used only to represent a
|
||
token that consists of that particular character. Thus, the token
|
||
type `'+'' is used to represent the character `+' as a token.
|
||
Nothing enforces this convention, but if you depart from it, your
|
||
program will confuse other readers.
|
||
|
||
All the usual escape sequences used in character literals in C can
|
||
be used in Bison as well, but you must not use the null character
|
||
as a character literal because its ASCII code, zero, is the code
|
||
`yylex' returns for end-of-input (*note Calling Convention for
|
||
`yylex': Calling Convention.).
|
||
|
||
* A "literal string token" is written like a C string constant; for
|
||
example, `"<="' is a literal string token. A literal string token
|
||
doesn't need to be declared unless you need to specify its semantic
|
||
value data type (*note Value Type::), associativity, or precedence
|
||
(*note Precedence::).
|
||
|
||
You can associate the literal string token with a symbolic name as
|
||
an alias, using the `%token' declaration (*note Token
|
||
Declarations: Token Decl.). If you don't do that, the lexical
|
||
analyzer has to retrieve the token number for the literal string
|
||
token from the `yytname' table (*note Calling Convention::).
|
||
|
||
*WARNING*: literal string tokens do not work in Yacc.
|
||
|
||
By convention, a literal string token is used only to represent a
|
||
token that consists of that particular string. Thus, you should
|
||
use the token type `"<="' to represent the string `<=' as a token.
|
||
Bison does not enforce this convention, but if you depart from
|
||
it, people who read your program will be confused.
|
||
|
||
All the escape sequences used in string literals in C can be used
|
||
in Bison as well. A literal string token must contain two or more
|
||
characters; for a token containing just one character, use a
|
||
character token (see above).
|
||
|
||
How you choose to write a terminal symbol has no effect on its
|
||
grammatical meaning. That depends only on where it appears in rules and
|
||
on when the parser function returns that symbol.
|
||
|
||
The value returned by `yylex' is always one of the terminal symbols
|
||
(or 0 for end-of-input). Whichever way you write the token type in the
|
||
grammar rules, you write it the same way in the definition of `yylex'.
|
||
The numeric code for a character token type is simply the ASCII code for
|
||
the character, so `yylex' can use the identical character constant to
|
||
generate the requisite code. Each named token type becomes a C macro in
|
||
the parser file, so `yylex' can use the name to stand for the code.
|
||
(This is why periods don't make sense in terminal symbols.) *Note
|
||
Calling Convention for `yylex': Calling Convention.
|
||
|
||
If `yylex' is defined in a separate file, you need to arrange for the
|
||
token-type macro definitions to be available there. Use the `-d'
|
||
option when you run Bison, so that it will write these macro definitions
|
||
into a separate header file `NAME.tab.h' which you can include in the
|
||
other source files that need it. *Note Invoking Bison: Invocation.
|
||
|
||
The symbol `error' is a terminal symbol reserved for error recovery
|
||
(*note Error Recovery::); you shouldn't use it for any other purpose.
|
||
In particular, `yylex' should never return this value.
|
||
|
||
|
||
File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File
|
||
|
||
Syntax of Grammar Rules
|
||
=======================
|
||
|
||
A Bison grammar rule has the following general form:
|
||
|
||
RESULT: COMPONENTS...
|
||
;
|
||
|
||
where RESULT is the nonterminal symbol that this rule describes, and
|
||
COMPONENTS are various terminal and nonterminal symbols that are put
|
||
together by this rule (*note Symbols::).
|
||
|
||
For example,
|
||
|
||
exp: exp '+' exp
|
||
;
|
||
|
||
says that two groupings of type `exp', with a `+' token in between, can
|
||
be combined into a larger grouping of type `exp'.
|
||
|
||
Whitespace in rules is significant only to separate symbols. You
|
||
can add extra whitespace as you wish.
|
||
|
||
Scattered among the components can be ACTIONS that determine the
|
||
semantics of the rule. An action looks like this:
|
||
|
||
{C STATEMENTS}
|
||
|
||
Usually there is only one action and it follows the components. *Note
|
||
Actions::.
|
||
|
||
Multiple rules for the same RESULT can be written separately or can
|
||
be joined with the vertical-bar character `|' as follows:
|
||
|
||
RESULT: RULE1-COMPONENTS...
|
||
| RULE2-COMPONENTS...
|
||
...
|
||
;
|
||
|
||
They are still considered distinct rules even when joined in this way.
|
||
|
||
If COMPONENTS in a rule is empty, it means that RESULT can match the
|
||
empty string. For example, here is how to define a comma-separated
|
||
sequence of zero or more `exp' groupings:
|
||
|
||
expseq: /* empty */
|
||
| expseq1
|
||
;
|
||
|
||
expseq1: exp
|
||
| expseq1 ',' exp
|
||
;
|
||
|
||
It is customary to write a comment `/* empty */' in each rule with no
|
||
components.
|
||
|
||
|
||
File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File
|
||
|
||
Recursive Rules
|
||
===============
|
||
|
||
A rule is called "recursive" when its RESULT nonterminal appears
|
||
also on its right hand side. Nearly all Bison grammars need to use
|
||
recursion, because that is the only way to define a sequence of any
|
||
number of a particular thing. Consider this recursive definition of a
|
||
comma-separated sequence of one or more expressions:
|
||
|
||
expseq1: exp
|
||
| expseq1 ',' exp
|
||
;
|
||
|
||
Since the recursive use of `expseq1' is the leftmost symbol in the
|
||
right hand side, we call this "left recursion". By contrast, here the
|
||
same construct is defined using "right recursion":
|
||
|
||
expseq1: exp
|
||
| exp ',' expseq1
|
||
;
|
||
|
||
Any kind of sequence can be defined using either left recursion or
|
||
right recursion, but you should always use left recursion, because it
|
||
can parse a sequence of any number of elements with bounded stack
|
||
space. Right recursion uses up space on the Bison stack in proportion
|
||
to the number of elements in the sequence, because all the elements
|
||
must be shifted onto the stack before the rule can be applied even
|
||
once. *Note The Bison Parser Algorithm: Algorithm, for further
|
||
explanation of this.
|
||
|
||
"Indirect" or "mutual" recursion occurs when the result of the rule
|
||
does not appear directly on its right hand side, but does appear in
|
||
rules for other nonterminals which do appear on its right hand side.
|
||
|
||
For example:
|
||
|
||
expr: primary
|
||
| primary '+' primary
|
||
;
|
||
|
||
primary: constant
|
||
| '(' expr ')'
|
||
;
|
||
|
||
defines two mutually-recursive nonterminals, since each refers to the
|
||
other.
|
||
|
||
|
||
File: bison.info, Node: Semantics, Next: Locations, Prev: Recursion, Up: Grammar File
|
||
|
||
Defining Language Semantics
|
||
===========================
|
||
|
||
The grammar rules for a language determine only the syntax. The
|
||
semantics are determined by the semantic values associated with various
|
||
tokens and groupings, and by the actions taken when various groupings
|
||
are recognized.
|
||
|
||
For example, the calculator calculates properly because the value
|
||
associated with each expression is the proper number; it adds properly
|
||
because the action for the grouping `X + Y' is to add the numbers
|
||
associated with X and Y.
|
||
|
||
* Menu:
|
||
|
||
* Value Type:: Specifying one data type for all semantic values.
|
||
* Multiple Types:: Specifying several alternative data types.
|
||
* Actions:: An action is the semantic definition of a grammar rule.
|
||
* Action Types:: Specifying data types for actions to operate on.
|
||
* Mid-Rule Actions:: Most actions go at the end of a rule.
|
||
This says when, why and how to use the exceptional
|
||
action in the middle of a rule.
|
||
|
||
|
||
File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics
|
||
|
||
Data Types of Semantic Values
|
||
-----------------------------
|
||
|
||
In a simple program it may be sufficient to use the same data type
|
||
for the semantic values of all language constructs. This was true in
|
||
the RPN and infix calculator examples (*note Reverse Polish Notation
|
||
Calculator: RPN Calc.).
|
||
|
||
Bison's default is to use type `int' for all semantic values. To
|
||
specify some other type, define `YYSTYPE' as a macro, like this:
|
||
|
||
#define YYSTYPE double
|
||
|
||
This macro definition must go in the C declarations section of the
|
||
grammar file (*note Outline of a Bison Grammar: Grammar Outline.).
|
||
|
||
|
||
File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics
|
||
|
||
More Than One Value Type
|
||
------------------------
|
||
|
||
In most programs, you will need different data types for different
|
||
kinds of tokens and groupings. For example, a numeric constant may
|
||
need type `int' or `long', while a string constant needs type `char *',
|
||
and an identifier might need a pointer to an entry in the symbol table.
|
||
|
||
To use more than one data type for semantic values in one parser,
|
||
Bison requires you to do two things:
|
||
|
||
* Specify the entire collection of possible data types, with the
|
||
`%union' Bison declaration (*note The Collection of Value Types:
|
||
Union Decl.).
|
||
|
||
* Choose one of those types for each symbol (terminal or
|
||
nonterminal) for which semantic values are used. This is done for
|
||
tokens with the `%token' Bison declaration (*note Token Type
|
||
Names: Token Decl.) and for groupings with the `%type' Bison
|
||
declaration (*note Nonterminal Symbols: Type Decl.).
|
||
|
||
|
||
File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics
|
||
|
||
Actions
|
||
-------
|
||
|
||
An action accompanies a syntactic rule and contains C code to be
|
||
executed each time an instance of that rule is recognized. The task of
|
||
most actions is to compute a semantic value for the grouping built by
|
||
the rule from the semantic values associated with tokens or smaller
|
||
groupings.
|
||
|
||
An action consists of C statements surrounded by braces, much like a
|
||
compound statement in C. It can be placed at any position in the rule;
|
||
it is executed at that position. Most rules have just one action at
|
||
the end of the rule, following all the components. Actions in the
|
||
middle of a rule are tricky and used only for special purposes (*note
|
||
Actions in Mid-Rule: Mid-Rule Actions.).
|
||
|
||
The C code in an action can refer to the semantic values of the
|
||
components matched by the rule with the construct `$N', which stands for
|
||
the value of the Nth component. The semantic value for the grouping
|
||
being constructed is `$$'. (Bison translates both of these constructs
|
||
into array element references when it copies the actions into the parser
|
||
file.)
|
||
|
||
Here is a typical example:
|
||
|
||
exp: ...
|
||
| exp '+' exp
|
||
{ $$ = $1 + $3; }
|
||
|
||
This rule constructs an `exp' from two smaller `exp' groupings
|
||
connected by a plus-sign token. In the action, `$1' and `$3' refer to
|
||
the semantic values of the two component `exp' groupings, which are the
|
||
first and third symbols on the right hand side of the rule. The sum is
|
||
stored into `$$' so that it becomes the semantic value of the
|
||
addition-expression just recognized by the rule. If there were a
|
||
useful semantic value associated with the `+' token, it could be
|
||
referred to as `$2'.
|
||
|
||
If you don't specify an action for a rule, Bison supplies a default:
|
||
`$$ = $1'. Thus, the value of the first symbol in the rule becomes the
|
||
value of the whole rule. Of course, the default rule is valid only if
|
||
the two data types match. There is no meaningful default action for an
|
||
empty rule; every empty rule must have an explicit action unless the
|
||
rule's value does not matter.
|
||
|
||
`$N' with N zero or negative is allowed for reference to tokens and
|
||
groupings on the stack _before_ those that match the current rule.
|
||
This is a very risky practice, and to use it reliably you must be
|
||
certain of the context in which the rule is applied. Here is a case in
|
||
which you can use this reliably:
|
||
|
||
foo: expr bar '+' expr { ... }
|
||
| expr bar '-' expr { ... }
|
||
;
|
||
|
||
bar: /* empty */
|
||
{ previous_expr = $0; }
|
||
;
|
||
|
||
As long as `bar' is used only in the fashion shown here, `$0' always
|
||
refers to the `expr' which precedes `bar' in the definition of `foo'.
|
||
|
||
|
||
File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics
|
||
|
||
Data Types of Values in Actions
|
||
-------------------------------
|
||
|
||
If you have chosen a single data type for semantic values, the `$$'
|
||
and `$N' constructs always have that data type.
|
||
|
||
If you have used `%union' to specify a variety of data types, then
|
||
you must declare a choice among these types for each terminal or
|
||
nonterminal symbol that can have a semantic value. Then each time you
|
||
use `$$' or `$N', its data type is determined by which symbol it refers
|
||
to in the rule. In this example,
|
||
|
||
exp: ...
|
||
| exp '+' exp
|
||
{ $$ = $1 + $3; }
|
||
|
||
`$1' and `$3' refer to instances of `exp', so they all have the data
|
||
type declared for the nonterminal symbol `exp'. If `$2' were used, it
|
||
would have the data type declared for the terminal symbol `'+'',
|
||
whatever that might be.
|
||
|
||
Alternatively, you can specify the data type when you refer to the
|
||
value, by inserting `<TYPE>' after the `$' at the beginning of the
|
||
reference. For example, if you have defined types as shown here:
|
||
|
||
%union {
|
||
int itype;
|
||
double dtype;
|
||
}
|
||
|
||
then you can write `$<itype>1' to refer to the first subunit of the
|
||
rule as an integer, or `$<dtype>1' to refer to it as a double.
|
||
|