mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-04-24 02:29:43 +00:00
1066 lines
49 KiB
Plaintext
1066 lines
49 KiB
Plaintext
Ceci est le fichier Info bison.info, produit par Makeinfo version 4.0b
|
|
à partir bison.texinfo.
|
|
|
|
START-INFO-DIR-ENTRY
|
|
* bison: (bison). GNU Project parser generator (yacc replacement).
|
|
END-INFO-DIR-ENTRY
|
|
|
|
This file documents the Bison parser generator.
|
|
|
|
Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
|
|
2000, 2001 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of this
|
|
manual provided the copyright notice and this permission notice are
|
|
preserved on all copies.
|
|
|
|
Permission is granted to copy and distribute modified versions of
|
|
this manual under the conditions for verbatim copying, provided also
|
|
that the sections entitled "GNU General Public License" and "Conditions
|
|
for Using Bison" are included exactly as in the original, and provided
|
|
that the entire resulting derived work is distributed under the terms
|
|
of a permission notice identical to this one.
|
|
|
|
Permission is granted to copy and distribute translations of this
|
|
manual into another language, under the above conditions for modified
|
|
versions, except that the sections entitled "GNU General Public
|
|
License", "Conditions for Using Bison" and this permission notice may be
|
|
included in translations approved by the Free Software Foundation
|
|
instead of in the original English.
|
|
|
|
|
|
File: bison.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir)
|
|
|
|
This manual documents version 1.28c of Bison.
|
|
|
|
* Menu:
|
|
|
|
* Introduction::
|
|
* Conditions::
|
|
* Copying:: The GNU General Public License says
|
|
how you can copy and share Bison
|
|
|
|
Tutorial sections:
|
|
* Concepts:: Basic concepts for understanding Bison.
|
|
* Examples:: Three simple explained examples of using Bison.
|
|
|
|
Reference sections:
|
|
* Grammar File:: Writing Bison declarations and rules.
|
|
* Interface:: C-language interface to the parser function `yyparse'.
|
|
* Algorithm:: How the Bison parser works at run-time.
|
|
* Error Recovery:: Writing rules for error recovery.
|
|
* Context Dependency:: What to do if your language syntax is too
|
|
messy for Bison to handle straightforwardly.
|
|
* Debugging:: Debugging Bison parsers that parse wrong.
|
|
* Invocation:: How to run Bison (to produce the parser source file).
|
|
* Table of Symbols:: All the keywords of the Bison language are explained.
|
|
* Glossary:: Basic concepts are explained.
|
|
* Copying This Manual:: License for copying this manual.
|
|
* Index:: Cross-references to the text.
|
|
|
|
--- The Detailed Node Listing ---
|
|
|
|
The Concepts of Bison
|
|
|
|
* Language and Grammar:: Languages and context-free grammars,
|
|
as mathematical ideas.
|
|
* Grammar in Bison:: How we represent grammars for Bison's sake.
|
|
* Semantic Values:: Each token or syntactic grouping can have
|
|
a semantic value (the value of an integer,
|
|
the name of an identifier, etc.).
|
|
* Semantic Actions:: Each rule can have an action containing C code.
|
|
* Bison Parser:: What are Bison's input and output,
|
|
how is the output used?
|
|
* Stages:: Stages in writing and running Bison grammars.
|
|
* Grammar Layout:: Overall structure of a Bison grammar file.
|
|
|
|
Examples
|
|
|
|
* RPN Calc:: Reverse polish notation calculator;
|
|
a first example with no operator precedence.
|
|
* Infix Calc:: Infix (algebraic) notation calculator.
|
|
Operator precedence is introduced.
|
|
* Simple Error Recovery:: Continuing after syntax errors.
|
|
* Multi-function Calc:: Calculator with memory and trig functions.
|
|
It uses multiple data-types for semantic values.
|
|
* Exercises:: Ideas for improving the multi-function calculator.
|
|
|
|
Reverse Polish Notation Calculator
|
|
|
|
* Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
|
|
* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
|
|
* Lexer: Rpcalc Lexer. The lexical analyzer.
|
|
* Main: Rpcalc Main. The controlling function.
|
|
* Error: Rpcalc Error. The error reporting function.
|
|
* Gen: Rpcalc Gen. Running Bison on the grammar file.
|
|
* Comp: Rpcalc Compile. Run the C compiler on the output code.
|
|
|
|
Grammar Rules for `rpcalc'
|
|
|
|
* Rpcalc Input::
|
|
* Rpcalc Line::
|
|
* Rpcalc Expr::
|
|
|
|
Multi-Function Calculator: `mfcalc'
|
|
|
|
* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
|
|
* Rules: Mfcalc Rules. Grammar rules for the calculator.
|
|
* Symtab: Mfcalc Symtab. Symbol table management subroutines.
|
|
|
|
Bison Grammar Files
|
|
|
|
* Grammar Outline:: Overall layout of the grammar file.
|
|
* Symbols:: Terminal and nonterminal symbols.
|
|
* Rules:: How to write grammar rules.
|
|
* Recursion:: Writing recursive rules.
|
|
* Semantics:: Semantic values and actions.
|
|
* Declarations:: All kinds of Bison declarations are described here.
|
|
* Multiple Parsers:: Putting more than one Bison parser in one program.
|
|
|
|
Outline of a Bison Grammar
|
|
|
|
* C Declarations:: Syntax and usage of the C declarations section.
|
|
* Bison Declarations:: Syntax and usage of the Bison declarations section.
|
|
* Grammar Rules:: Syntax and usage of the grammar rules section.
|
|
* C Code:: Syntax and usage of the additional C code section.
|
|
|
|
Defining Language Semantics
|
|
|
|
* Value Type:: Specifying one data type for all semantic values.
|
|
* Multiple Types:: Specifying several alternative data types.
|
|
* Actions:: An action is the semantic definition of a grammar rule.
|
|
* Action Types:: Specifying data types for actions to operate on.
|
|
* Mid-Rule Actions:: Most actions go at the end of a rule.
|
|
This says when, why and how to use the exceptional
|
|
action in the middle of a rule.
|
|
|
|
Bison Declarations
|
|
|
|
* Token Decl:: Declaring terminal symbols.
|
|
* Precedence Decl:: Declaring terminals with precedence and associativity.
|
|
* Union Decl:: Declaring the set of all semantic value types.
|
|
* Type Decl:: Declaring the choice of type for a nonterminal symbol.
|
|
* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
|
|
* Start Decl:: Specifying the start symbol.
|
|
* Pure Decl:: Requesting a reentrant parser.
|
|
* Decl Summary:: Table of all Bison declarations.
|
|
|
|
Parser C-Language Interface
|
|
|
|
* Parser Function:: How to call `yyparse' and what it returns.
|
|
* Lexical:: You must supply a function `yylex'
|
|
which reads tokens.
|
|
* Error Reporting:: You must supply a function `yyerror'.
|
|
* Action Features:: Special features for use in actions.
|
|
|
|
The Lexical Analyzer Function `yylex'
|
|
|
|
* Calling Convention:: How `yyparse' calls `yylex'.
|
|
* Token Values:: How `yylex' must return the semantic value
|
|
of the token it has read.
|
|
* Token Positions:: How `yylex' must return the text position
|
|
(line number, etc.) of the token, if the
|
|
actions want that.
|
|
* Pure Calling:: How the calling convention differs
|
|
in a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.).
|
|
|
|
The Bison Parser Algorithm
|
|
|
|
* Look-Ahead:: Parser looks one token ahead when deciding what to do.
|
|
* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
|
|
* Precedence:: Operator precedence works by resolving conflicts.
|
|
* Contextual Precedence:: When an operator's precedence depends on context.
|
|
* Parser States:: The parser is a finite-state-machine with stack.
|
|
* Reduce/Reduce:: When two rules are applicable in the same situation.
|
|
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
|
|
* Stack Overflow:: What happens when stack gets full. How to avoid it.
|
|
|
|
Operator Precedence
|
|
|
|
* Why Precedence:: An example showing why precedence is needed.
|
|
* Using Precedence:: How to specify precedence in Bison grammars.
|
|
* Precedence Examples:: How these features are used in the previous example.
|
|
* How Precedence:: How they work.
|
|
|
|
Handling Context Dependencies
|
|
|
|
* Semantic Tokens:: Token parsing can depend on the semantic context.
|
|
* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
|
|
* Tie-in Recovery:: Lexical tie-ins have implications for how
|
|
error recovery rules must be written.
|
|
|
|
Invoking Bison
|
|
|
|
* Bison Options:: All the options described in detail,
|
|
in alphabetical order by short options.
|
|
* Option Cross Key:: Alphabetical list of long options.
|
|
* VMS Invocation:: Bison command syntax on VMS.
|
|
|
|
Copying This Manual
|
|
|
|
* GNU Free Documentation License:: License for copying this manual.
|
|
|
|
|
|
File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top
|
|
|
|
Introduction
|
|
************
|
|
|
|
"Bison" is a general-purpose parser generator that converts a
|
|
grammar description for an LALR(1) context-free grammar into a C
|
|
program to parse that grammar. Once you are proficient with Bison, you
|
|
may use it to develop a wide range of language parsers, from those used
|
|
in simple desk calculators to complex programming languages.
|
|
|
|
Bison is upward compatible with Yacc: all properly-written Yacc
|
|
grammars ought to work with Bison with no change. Anyone familiar with
|
|
Yacc should be able to use Bison with little trouble. You need to be
|
|
fluent in C programming in order to use Bison or to understand this
|
|
manual.
|
|
|
|
We begin with tutorial chapters that explain the basic concepts of
|
|
using Bison and show three explained examples, each building on the
|
|
last. If you don't know Bison or Yacc, start by reading these
|
|
chapters. Reference chapters follow which describe specific aspects of
|
|
Bison in detail.
|
|
|
|
Bison was written primarily by Robert Corbett; Richard Stallman made
|
|
it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
|
|
multi-character string literals and other features.
|
|
|
|
This edition corresponds to version 1.28c of Bison.
|
|
|
|
|
|
File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top
|
|
|
|
Conditions for Using Bison
|
|
**************************
|
|
|
|
As of Bison version 1.24, we have changed the distribution terms for
|
|
`yyparse' to permit using Bison's output in nonfree programs.
|
|
Formerly, Bison parsers could be used only in programs that were free
|
|
software.
|
|
|
|
The other GNU programming tools, such as the GNU C compiler, have
|
|
never had such a requirement. They could always be used for nonfree
|
|
software. The reason Bison was different was not due to a special
|
|
policy decision; it resulted from applying the usual General Public
|
|
License to all of the Bison source code.
|
|
|
|
The output of the Bison utility--the Bison parser file--contains a
|
|
verbatim copy of a sizable piece of Bison, which is the code for the
|
|
`yyparse' function. (The actions from your grammar are inserted into
|
|
this function at one point, but the rest of the function is not
|
|
changed.) When we applied the GPL terms to the code for `yyparse', the
|
|
effect was to restrict the use of Bison output to free software.
|
|
|
|
We didn't change the terms because of sympathy for people who want to
|
|
make software proprietary. *Software should be free.* But we
|
|
concluded that limiting Bison's use to free software was doing little to
|
|
encourage people to make other software free. So we decided to make the
|
|
practical conditions for using Bison match the practical conditions for
|
|
using the other GNU tools.
|
|
|
|
|
|
File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top
|
|
|
|
GNU GENERAL PUBLIC LICENSE
|
|
**************************
|
|
|
|
Version 2, June 1991
|
|
|
|
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
|
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
|
|
|
|
Everyone is permitted to copy and distribute verbatim copies
|
|
of this license document, but changing it is not allowed.
|
|
|
|
Preamble
|
|
========
|
|
|
|
The licenses for most software are designed to take away your
|
|
freedom to share and change it. By contrast, the GNU General Public
|
|
License is intended to guarantee your freedom to share and change free
|
|
software--to make sure the software is free for all its users. This
|
|
General Public License applies to most of the Free Software
|
|
Foundation's software and to any other program whose authors commit to
|
|
using it. (Some other Free Software Foundation software is covered by
|
|
the GNU Library General Public License instead.) You can apply it to
|
|
your programs, too.
|
|
|
|
When we speak of free software, we are referring to freedom, not
|
|
price. Our General Public Licenses are designed to make sure that you
|
|
have the freedom to distribute copies of free software (and charge for
|
|
this service if you wish), that you receive source code or can get it
|
|
if you want it, that you can change the software or use pieces of it in
|
|
new free programs; and that you know you can do these things.
|
|
|
|
To protect your rights, we need to make restrictions that forbid
|
|
anyone to deny you these rights or to ask you to surrender the rights.
|
|
These restrictions translate to certain responsibilities for you if you
|
|
distribute copies of the software, or if you modify it.
|
|
|
|
For example, if you distribute copies of such a program, whether
|
|
gratis or for a fee, you must give the recipients all the rights that
|
|
you have. You must make sure that they, too, receive or can get the
|
|
source code. And you must show them these terms so they know their
|
|
rights.
|
|
|
|
We protect your rights with two steps: (1) copyright the software,
|
|
and (2) offer you this license which gives you legal permission to copy,
|
|
distribute and/or modify the software.
|
|
|
|
Also, for each author's protection and ours, we want to make certain
|
|
that everyone understands that there is no warranty for this free
|
|
software. If the software is modified by someone else and passed on, we
|
|
want its recipients to know that what they have is not the original, so
|
|
that any problems introduced by others will not reflect on the original
|
|
authors' reputations.
|
|
|
|
Finally, any free program is threatened constantly by software
|
|
patents. We wish to avoid the danger that redistributors of a free
|
|
program will individually obtain patent licenses, in effect making the
|
|
program proprietary. To prevent this, we have made it clear that any
|
|
patent must be licensed for everyone's free use or not licensed at all.
|
|
|
|
The precise terms and conditions for copying, distribution and
|
|
modification follow.
|
|
|
|
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
|
0. This License applies to any program or other work which contains a
|
|
notice placed by the copyright holder saying it may be distributed
|
|
under the terms of this General Public License. The "Program",
|
|
below, refers to any such program or work, and a "work based on
|
|
the Program" means either the Program or any derivative work under
|
|
copyright law: that is to say, a work containing the Program or a
|
|
portion of it, either verbatim or with modifications and/or
|
|
translated into another language. (Hereinafter, translation is
|
|
included without limitation in the term "modification".) Each
|
|
licensee is addressed as "you".
|
|
|
|
Activities other than copying, distribution and modification are
|
|
not covered by this License; they are outside its scope. The act
|
|
of running the Program is not restricted, and the output from the
|
|
Program is covered only if its contents constitute a work based on
|
|
the Program (independent of having been made by running the
|
|
Program). Whether that is true depends on what the Program does.
|
|
|
|
1. You may copy and distribute verbatim copies of the Program's
|
|
source code as you receive it, in any medium, provided that you
|
|
conspicuously and appropriately publish on each copy an appropriate
|
|
copyright notice and disclaimer of warranty; keep intact all the
|
|
notices that refer to this License and to the absence of any
|
|
warranty; and give any other recipients of the Program a copy of
|
|
this License along with the Program.
|
|
|
|
You may charge a fee for the physical act of transferring a copy,
|
|
and you may at your option offer warranty protection in exchange
|
|
for a fee.
|
|
|
|
2. You may modify your copy or copies of the Program or any portion
|
|
of it, thus forming a work based on the Program, and copy and
|
|
distribute such modifications or work under the terms of Section 1
|
|
above, provided that you also meet all of these conditions:
|
|
|
|
a. You must cause the modified files to carry prominent notices
|
|
stating that you changed the files and the date of any change.
|
|
|
|
b. You must cause any work that you distribute or publish, that
|
|
in whole or in part contains or is derived from the Program
|
|
or any part thereof, to be licensed as a whole at no charge
|
|
to all third parties under the terms of this License.
|
|
|
|
c. If the modified program normally reads commands interactively
|
|
when run, you must cause it, when started running for such
|
|
interactive use in the most ordinary way, to print or display
|
|
an announcement including an appropriate copyright notice and
|
|
a notice that there is no warranty (or else, saying that you
|
|
provide a warranty) and that users may redistribute the
|
|
program under these conditions, and telling the user how to
|
|
view a copy of this License. (Exception: if the Program
|
|
itself is interactive but does not normally print such an
|
|
announcement, your work based on the Program is not required
|
|
to print an announcement.)
|
|
|
|
These requirements apply to the modified work as a whole. If
|
|
identifiable sections of that work are not derived from the
|
|
Program, and can be reasonably considered independent and separate
|
|
works in themselves, then this License, and its terms, do not
|
|
apply to those sections when you distribute them as separate
|
|
works. But when you distribute the same sections as part of a
|
|
whole which is a work based on the Program, the distribution of
|
|
the whole must be on the terms of this License, whose permissions
|
|
for other licensees extend to the entire whole, and thus to each
|
|
and every part regardless of who wrote it.
|
|
|
|
Thus, it is not the intent of this section to claim rights or
|
|
contest your rights to work written entirely by you; rather, the
|
|
intent is to exercise the right to control the distribution of
|
|
derivative or collective works based on the Program.
|
|
|
|
In addition, mere aggregation of another work not based on the
|
|
Program with the Program (or with a work based on the Program) on
|
|
a volume of a storage or distribution medium does not bring the
|
|
other work under the scope of this License.
|
|
|
|
3. You may copy and distribute the Program (or a work based on it,
|
|
under Section 2) in object code or executable form under the terms
|
|
of Sections 1 and 2 above provided that you also do one of the
|
|
following:
|
|
|
|
a. Accompany it with the complete corresponding machine-readable
|
|
source code, which must be distributed under the terms of
|
|
Sections 1 and 2 above on a medium customarily used for
|
|
software interchange; or,
|
|
|
|
b. Accompany it with a written offer, valid for at least three
|
|
years, to give any third party, for a charge no more than your
|
|
cost of physically performing source distribution, a complete
|
|
machine-readable copy of the corresponding source code, to be
|
|
distributed under the terms of Sections 1 and 2 above on a
|
|
medium customarily used for software interchange; or,
|
|
|
|
c. Accompany it with the information you received as to the offer
|
|
to distribute corresponding source code. (This alternative is
|
|
allowed only for noncommercial distribution and only if you
|
|
received the program in object code or executable form with
|
|
such an offer, in accord with Subsection b above.)
|
|
|
|
The source code for a work means the preferred form of the work for
|
|
making modifications to it. For an executable work, complete
|
|
source code means all the source code for all modules it contains,
|
|
plus any associated interface definition files, plus the scripts
|
|
used to control compilation and installation of the executable.
|
|
However, as a special exception, the source code distributed need
|
|
not include anything that is normally distributed (in either
|
|
source or binary form) with the major components (compiler,
|
|
kernel, and so on) of the operating system on which the executable
|
|
runs, unless that component itself accompanies the executable.
|
|
|
|
If distribution of executable or object code is made by offering
|
|
access to copy from a designated place, then offering equivalent
|
|
access to copy the source code from the same place counts as
|
|
distribution of the source code, even though third parties are not
|
|
compelled to copy the source along with the object code.
|
|
|
|
4. You may not copy, modify, sublicense, or distribute the Program
|
|
except as expressly provided under this License. Any attempt
|
|
otherwise to copy, modify, sublicense or distribute the Program is
|
|
void, and will automatically terminate your rights under this
|
|
License. However, parties who have received copies, or rights,
|
|
from you under this License will not have their licenses
|
|
terminated so long as such parties remain in full compliance.
|
|
|
|
5. You are not required to accept this License, since you have not
|
|
signed it. However, nothing else grants you permission to modify
|
|
or distribute the Program or its derivative works. These actions
|
|
are prohibited by law if you do not accept this License.
|
|
Therefore, by modifying or distributing the Program (or any work
|
|
based on the Program), you indicate your acceptance of this
|
|
License to do so, and all its terms and conditions for copying,
|
|
distributing or modifying the Program or works based on it.
|
|
|
|
6. Each time you redistribute the Program (or any work based on the
|
|
Program), the recipient automatically receives a license from the
|
|
original licensor to copy, distribute or modify the Program
|
|
subject to these terms and conditions. You may not impose any
|
|
further restrictions on the recipients' exercise of the rights
|
|
granted herein. You are not responsible for enforcing compliance
|
|
by third parties to this License.
|
|
|
|
7. If, as a consequence of a court judgment or allegation of patent
|
|
infringement or for any other reason (not limited to patent
|
|
issues), conditions are imposed on you (whether by court order,
|
|
agreement or otherwise) that contradict the conditions of this
|
|
License, they do not excuse you from the conditions of this
|
|
License. If you cannot distribute so as to satisfy simultaneously
|
|
your obligations under this License and any other pertinent
|
|
obligations, then as a consequence you may not distribute the
|
|
Program at all. For example, if a patent license would not permit
|
|
royalty-free redistribution of the Program by all those who
|
|
receive copies directly or indirectly through you, then the only
|
|
way you could satisfy both it and this License would be to refrain
|
|
entirely from distribution of the Program.
|
|
|
|
If any portion of this section is held invalid or unenforceable
|
|
under any particular circumstance, the balance of the section is
|
|
intended to apply and the section as a whole is intended to apply
|
|
in other circumstances.
|
|
|
|
It is not the purpose of this section to induce you to infringe any
|
|
patents or other property right claims or to contest validity of
|
|
any such claims; this section has the sole purpose of protecting
|
|
the integrity of the free software distribution system, which is
|
|
implemented by public license practices. Many people have made
|
|
generous contributions to the wide range of software distributed
|
|
through that system in reliance on consistent application of that
|
|
system; it is up to the author/donor to decide if he or she is
|
|
willing to distribute software through any other system and a
|
|
licensee cannot impose that choice.
|
|
|
|
This section is intended to make thoroughly clear what is believed
|
|
to be a consequence of the rest of this License.
|
|
|
|
8. If the distribution and/or use of the Program is restricted in
|
|
certain countries either by patents or by copyrighted interfaces,
|
|
the original copyright holder who places the Program under this
|
|
License may add an explicit geographical distribution limitation
|
|
excluding those countries, so that distribution is permitted only
|
|
in or among countries not thus excluded. In such case, this
|
|
License incorporates the limitation as if written in the body of
|
|
this License.
|
|
|
|
9. The Free Software Foundation may publish revised and/or new
|
|
versions of the General Public License from time to time. Such
|
|
new versions will be similar in spirit to the present version, but
|
|
may differ in detail to address new problems or concerns.
|
|
|
|
Each version is given a distinguishing version number. If the
|
|
Program specifies a version number of this License which applies
|
|
to it and "any later version", you have the option of following
|
|
the terms and conditions either of that version or of any later
|
|
version published by the Free Software Foundation. If the Program
|
|
does not specify a version number of this License, you may choose
|
|
any version ever published by the Free Software Foundation.
|
|
|
|
10. If you wish to incorporate parts of the Program into other free
|
|
programs whose distribution conditions are different, write to the
|
|
author to ask for permission. For software which is copyrighted
|
|
by the Free Software Foundation, write to the Free Software
|
|
Foundation; we sometimes make exceptions for this. Our decision
|
|
will be guided by the two goals of preserving the free status of
|
|
all derivatives of our free software and of promoting the sharing
|
|
and reuse of software generally.
|
|
|
|
NO WARRANTY
|
|
|
|
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
|
|
WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
|
|
LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
|
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
|
|
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
|
|
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
|
|
FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
|
|
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
|
|
SERVICING, REPAIR OR CORRECTION.
|
|
|
|
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
|
|
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
|
|
MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
|
|
LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
|
|
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
|
|
INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
|
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
|
|
OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
|
|
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
|
|
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
|
|
|
END OF TERMS AND CONDITIONS
|
|
|
|
Appendix: How to Apply These Terms to Your New Programs
|
|
=======================================================
|
|
|
|
If you develop a new program, and you want it to be of the greatest
|
|
possible use to the public, the best way to achieve this is to make it
|
|
free software which everyone can redistribute and change under these
|
|
terms.
|
|
|
|
To do so, attach the following notices to the program. It is safest
|
|
to attach them to the start of each source file to most effectively
|
|
convey the exclusion of warranty; and each file should have at least
|
|
the "copyright" line and a pointer to where the full notice is found.
|
|
|
|
ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
|
|
Copyright (C) YYYY NAME OF AUTHOR
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 2 of the License, or
|
|
(at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; if not, write to the Free Software
|
|
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
|
|
|
Also add information on how to contact you by electronic and paper
|
|
mail.
|
|
|
|
If the program is interactive, make it output a short notice like
|
|
this when it starts in an interactive mode:
|
|
|
|
Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
|
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
|
This is free software, and you are welcome to redistribute it
|
|
under certain conditions; type `show c' for details.
|
|
|
|
The hypothetical commands `show w' and `show c' should show the
|
|
appropriate parts of the General Public License. Of course, the
|
|
commands you use may be called something other than `show w' and `show
|
|
c'; they could even be mouse-clicks or menu items--whatever suits your
|
|
program.
|
|
|
|
You should also get your employer (if you work as a programmer) or
|
|
your school, if any, to sign a "copyright disclaimer" for the program,
|
|
if necessary. Here is a sample; alter the names:
|
|
|
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
|
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
|
|
|
SIGNATURE OF TY COON, 1 April 1989
|
|
Ty Coon, President of Vice
|
|
|
|
This General Public License does not permit incorporating your
|
|
program into proprietary programs. If your program is a subroutine
|
|
library, you may consider it more useful to permit linking proprietary
|
|
applications with the library. If this is what you want to do, use the
|
|
GNU Library General Public License instead of this License.
|
|
|
|
|
|
File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top
|
|
|
|
The Concepts of Bison
|
|
*********************
|
|
|
|
This chapter introduces many of the basic concepts without which the
|
|
details of Bison will not make sense. If you do not already know how to
|
|
use Bison or Yacc, we suggest you start by reading this chapter
|
|
carefully.
|
|
|
|
* Menu:
|
|
|
|
* Language and Grammar:: Languages and context-free grammars,
|
|
as mathematical ideas.
|
|
* Grammar in Bison:: How we represent grammars for Bison's sake.
|
|
* Semantic Values:: Each token or syntactic grouping can have
|
|
a semantic value (the value of an integer,
|
|
the name of an identifier, etc.).
|
|
* Semantic Actions:: Each rule can have an action containing C code.
|
|
* Locations Overview:: Tracking Locations.
|
|
* Bison Parser:: What are Bison's input and output,
|
|
how is the output used?
|
|
* Stages:: Stages in writing and running Bison grammars.
|
|
* Grammar Layout:: Overall structure of a Bison grammar file.
|
|
|
|
|
|
File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Up: Concepts
|
|
|
|
Languages and Context-Free Grammars
|
|
===================================
|
|
|
|
In order for Bison to parse a language, it must be described by a
|
|
"context-free grammar". This means that you specify one or more
|
|
"syntactic groupings" and give rules for constructing them from their
|
|
parts. For example, in the C language, one kind of grouping is called
|
|
an `expression'. One rule for making an expression might be, "An
|
|
expression can be made of a minus sign and another expression".
|
|
Another would be, "An expression can be an integer". As you can see,
|
|
rules are often recursive, but there must be at least one rule which
|
|
leads out of the recursion.
|
|
|
|
The most common formal system for presenting such rules for humans
|
|
to read is "Backus-Naur Form" or "BNF", which was developed in order to
|
|
specify the language Algol 60. Any grammar expressed in BNF is a
|
|
context-free grammar. The input to Bison is essentially
|
|
machine-readable BNF.
|
|
|
|
Not all context-free languages can be handled by Bison, only those
|
|
that are LALR(1). In brief, this means that it must be possible to
|
|
tell how to parse any portion of an input string with just a single
|
|
token of look-ahead. Strictly speaking, that is a description of an
|
|
LR(1) grammar, and LALR(1) involves additional restrictions that are
|
|
hard to explain simply; but it is rare in actual practice to find an
|
|
LR(1) grammar that fails to be LALR(1). *Note Mysterious Reduce/Reduce
|
|
Conflicts: Mystery Conflicts, for more information on this.
|
|
|
|
In the formal grammatical rules for a language, each kind of
|
|
syntactic unit or grouping is named by a "symbol". Those which are
|
|
built by grouping smaller constructs according to grammatical rules are
|
|
called "nonterminal symbols"; those which can't be subdivided are called
|
|
"terminal symbols" or "token types". We call a piece of input
|
|
corresponding to a single terminal symbol a "token", and a piece
|
|
corresponding to a single nonterminal symbol a "grouping".
|
|
|
|
We can use the C language as an example of what symbols, terminal and
|
|
nonterminal, mean. The tokens of C are identifiers, constants (numeric
|
|
and string), and the various keywords, arithmetic operators and
|
|
punctuation marks. So the terminal symbols of a grammar for C include
|
|
`identifier', `number', `string', plus one symbol for each keyword,
|
|
operator or punctuation mark: `if', `return', `const', `static', `int',
|
|
`char', `plus-sign', `open-brace', `close-brace', `comma' and many
|
|
more. (These tokens can be subdivided into characters, but that is a
|
|
matter of lexicography, not grammar.)
|
|
|
|
Here is a simple C function subdivided into tokens:
|
|
|
|
int /* keyword `int' */
|
|
square (x) /* identifier, open-paren, */
|
|
/* identifier, close-paren */
|
|
int x; /* keyword `int', identifier, semicolon */
|
|
{ /* open-brace */
|
|
return x * x; /* keyword `return', identifier, */
|
|
/* asterisk, identifier, semicolon */
|
|
} /* close-brace */
|
|
|
|
The syntactic groupings of C include the expression, the statement,
|
|
the declaration, and the function definition. These are represented in
|
|
the grammar of C by nonterminal symbols `expression', `statement',
|
|
`declaration' and `function definition'. The full grammar uses dozens
|
|
of additional language constructs, each with its own nonterminal
|
|
symbol, in order to express the meanings of these four. The example
|
|
above is a function definition; it contains one declaration, and one
|
|
statement. In the statement, each `x' is an expression and so is `x *
|
|
x'.
|
|
|
|
Each nonterminal symbol must have grammatical rules showing how it
|
|
is made out of simpler constructs. For example, one kind of C
|
|
statement is the `return' statement; this would be described with a
|
|
grammar rule which reads informally as follows:
|
|
|
|
A `statement' can be made of a `return' keyword, an `expression'
|
|
and a `semicolon'.
|
|
|
|
There would be many other rules for `statement', one for each kind of
|
|
statement in C.
|
|
|
|
One nonterminal symbol must be distinguished as the special one which
|
|
defines a complete utterance in the language. It is called the "start
|
|
symbol". In a compiler, this means a complete input program. In the C
|
|
language, the nonterminal symbol `sequence of definitions and
|
|
declarations' plays this role.
|
|
|
|
For example, `1 + 2' is a valid C expression--a valid part of a C
|
|
program--but it is not valid as an _entire_ C program. In the
|
|
context-free grammar of C, this follows from the fact that `expression'
|
|
is not the start symbol.
|
|
|
|
The Bison parser reads a sequence of tokens as its input, and groups
|
|
the tokens using the grammar rules. If the input is valid, the end
|
|
result is that the entire token sequence reduces to a single grouping
|
|
whose symbol is the grammar's start symbol. If we use a grammar for C,
|
|
the entire input must be a `sequence of definitions and declarations'.
|
|
If not, the parser reports a syntax error.
|
|
|
|
|
|
File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts
|
|
|
|
From Formal Rules to Bison Input
|
|
================================
|
|
|
|
A formal grammar is a mathematical construct. To define the language
|
|
for Bison, you must write a file expressing the grammar in Bison syntax:
|
|
a "Bison grammar" file. *Note Bison Grammar Files: Grammar File.
|
|
|
|
A nonterminal symbol in the formal grammar is represented in Bison
|
|
input as an identifier, like an identifier in C. By convention, it
|
|
should be in lower case, such as `expr', `stmt' or `declaration'.
|
|
|
|
The Bison representation for a terminal symbol is also called a
|
|
"token type". Token types as well can be represented as C-like
|
|
identifiers. By convention, these identifiers should be upper case to
|
|
distinguish them from nonterminals: for example, `INTEGER',
|
|
`IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a
|
|
particular keyword in the language should be named after that keyword
|
|
converted to upper case. The terminal symbol `error' is reserved for
|
|
error recovery. *Note Symbols::.
|
|
|
|
A terminal symbol can also be represented as a character literal,
|
|
just like a C character constant. You should do this whenever a token
|
|
is just a single character (parenthesis, plus-sign, etc.): use that
|
|
same character in a literal as the terminal symbol for that token.
|
|
|
|
A third way to represent a terminal symbol is with a C string
|
|
constant containing several characters. *Note Symbols::, for more
|
|
information.
|
|
|
|
The grammar rules also have an expression in Bison syntax. For
|
|
example, here is the Bison rule for a C `return' statement. The
|
|
semicolon in quotes is a literal character token, representing part of
|
|
the C syntax for the statement; the naked semicolon, and the colon, are
|
|
Bison punctuation used in every rule.
|
|
|
|
stmt: RETURN expr ';'
|
|
;
|
|
|
|
*Note Syntax of Grammar Rules: Rules.
|
|
|
|
|
|
File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts
|
|
|
|
Semantic Values
|
|
===============
|
|
|
|
A formal grammar selects tokens only by their classifications: for
|
|
example, if a rule mentions the terminal symbol `integer constant', it
|
|
means that _any_ integer constant is grammatically valid in that
|
|
position. The precise value of the constant is irrelevant to how to
|
|
parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is
|
|
equally grammatical.
|
|
|
|
But the precise value is very important for what the input means
|
|
once it is parsed. A compiler is useless if it fails to distinguish
|
|
between 4, 1 and 3989 as constants in the program! Therefore, each
|
|
token in a Bison grammar has both a token type and a "semantic value".
|
|
*Note Defining Language Semantics: Semantics, for details.
|
|
|
|
The token type is a terminal symbol defined in the grammar, such as
|
|
`INTEGER', `IDENTIFIER' or `',''. It tells everything you need to know
|
|
to decide where the token may validly appear and how to group it with
|
|
other tokens. The grammar rules know nothing about tokens except their
|
|
types.
|
|
|
|
The semantic value has all the rest of the information about the
|
|
meaning of the token, such as the value of an integer, or the name of an
|
|
identifier. (A token such as `','' which is just punctuation doesn't
|
|
need to have any semantic value.)
|
|
|
|
For example, an input token might be classified as token type
|
|
`INTEGER' and have the semantic value 4. Another input token might
|
|
have the same token type `INTEGER' but value 3989. When a grammar rule
|
|
says that `INTEGER' is allowed, either of these tokens is acceptable
|
|
because each is an `INTEGER'. When the parser accepts the token, it
|
|
keeps track of the token's semantic value.
|
|
|
|
Each grouping can also have a semantic value as well as its
|
|
nonterminal symbol. For example, in a calculator, an expression
|
|
typically has a semantic value that is a number. In a compiler for a
|
|
programming language, an expression typically has a semantic value that
|
|
is a tree structure describing the meaning of the expression.
|
|
|
|
|
|
File: bison.info, Node: Semantic Actions, Next: Locations Overview, Prev: Semantic Values, Up: Concepts
|
|
|
|
Semantic Actions
|
|
================
|
|
|
|
In order to be useful, a program must do more than parse input; it
|
|
must also produce some output based on the input. In a Bison grammar,
|
|
a grammar rule can have an "action" made up of C statements. Each time
|
|
the parser recognizes a match for that rule, the action is executed.
|
|
*Note Actions::.
|
|
|
|
Most of the time, the purpose of an action is to compute the
|
|
semantic value of the whole construct from the semantic values of its
|
|
parts. For example, suppose we have a rule which says an expression
|
|
can be the sum of two expressions. When the parser recognizes such a
|
|
sum, each of the subexpressions has a semantic value which describes
|
|
how it was built up. The action for this rule should create a similar
|
|
sort of value for the newly recognized larger expression.
|
|
|
|
For example, here is a rule that says an expression can be the sum of
|
|
two subexpressions:
|
|
|
|
expr: expr '+' expr { $$ = $1 + $3; }
|
|
;
|
|
|
|
The action says how to produce the semantic value of the sum expression
|
|
from the values of the two subexpressions.
|
|
|
|
|
|
File: bison.info, Node: Locations Overview, Next: Bison Parser, Prev: Semantic Actions, Up: Concepts
|
|
|
|
Locations
|
|
=========
|
|
|
|
Many applications, like interpreters or compilers, have to produce
|
|
verbose and useful error messages. To achieve this, one must be able to
|
|
keep track of the "textual position", or "location", of each syntactic
|
|
construct. Bison provides a mechanism for handling these locations.
|
|
|
|
Each token has a semantic value. In a similar fashion, each token
|
|
has an associated location, but the type of locations is the same for
|
|
all tokens and groupings. Moreover, the output parser is equipped with
|
|
a default data structure for storing locations (*note Locations::, for
|
|
more details).
|
|
|
|
Like semantic values, locations can be reached in actions using a
|
|
dedicated set of constructs. In the example above, the location of the
|
|
whole grouping is `@$', while the locations of the subexpressions are
|
|
`@1' and `@3'.
|
|
|
|
When a rule is matched, a default action is used to compute the
|
|
semantic value of its left hand side (*note Actions::). In the same
|
|
way, another default action is used for locations. However, the action
|
|
for locations is general enough for most cases, meaning there is
|
|
usually no need to describe for each rule how `@$' should be formed.
|
|
When building a new location for a given grouping, the default behavior
|
|
of the output parser is to take the beginning of the first symbol, and
|
|
the end of the last symbol.
|
|
|
|
|
|
File: bison.info, Node: Bison Parser, Next: Stages, Prev: Locations Overview, Up: Concepts
|
|
|
|
Bison Output: the Parser File
|
|
=============================
|
|
|
|
When you run Bison, you give it a Bison grammar file as input. The
|
|
output is a C source file that parses the language described by the
|
|
grammar. This file is called a "Bison parser". Keep in mind that the
|
|
Bison utility and the Bison parser are two distinct programs: the Bison
|
|
utility is a program whose output is the Bison parser that becomes part
|
|
of your program.
|
|
|
|
The job of the Bison parser is to group tokens into groupings
|
|
according to the grammar rules--for example, to build identifiers and
|
|
operators into expressions. As it does this, it runs the actions for
|
|
the grammar rules it uses.
|
|
|
|
The tokens come from a function called the "lexical analyzer" that
|
|
you must supply in some fashion (such as by writing it in C). The
|
|
Bison parser calls the lexical analyzer each time it wants a new token.
|
|
It doesn't know what is "inside" the tokens (though their semantic
|
|
values may reflect this). Typically the lexical analyzer makes the
|
|
tokens by parsing characters of text, but Bison does not depend on
|
|
this. *Note The Lexical Analyzer Function `yylex': Lexical.
|
|
|
|
The Bison parser file is C code which defines a function named
|
|
`yyparse' which implements that grammar. This function does not make a
|
|
complete C program: you must supply some additional functions. One is
|
|
the lexical analyzer. Another is an error-reporting function which the
|
|
parser calls to report an error. In addition, a complete C program must
|
|
start with a function called `main'; you have to provide this, and
|
|
arrange for it to call `yyparse' or the parser will never run. *Note
|
|
Parser C-Language Interface: Interface.
|
|
|
|
Aside from the token type names and the symbols in the actions you
|
|
write, all variable and function names used in the Bison parser file
|
|
begin with `yy' or `YY'. This includes interface functions such as the
|
|
lexical analyzer function `yylex', the error reporting function
|
|
`yyerror' and the parser function `yyparse' itself. This also includes
|
|
numerous identifiers used for internal purposes. Therefore, you should
|
|
avoid using C identifiers starting with `yy' or `YY' in the Bison
|
|
grammar file except for the ones defined in this manual.
|
|
|
|
|
|
File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts
|
|
|
|
Stages in Using Bison
|
|
=====================
|
|
|
|
The actual language-design process using Bison, from grammar
|
|
specification to a working compiler or interpreter, has these parts:
|
|
|
|
1. Formally specify the grammar in a form recognized by Bison (*note
|
|
Bison Grammar Files: Grammar File.). For each grammatical rule in
|
|
the language, describe the action that is to be taken when an
|
|
instance of that rule is recognized. The action is described by a
|
|
sequence of C statements.
|
|
|
|
2. Write a lexical analyzer to process input and pass tokens to the
|
|
parser. The lexical analyzer may be written by hand in C (*note
|
|
The Lexical Analyzer Function `yylex': Lexical.). It could also
|
|
be produced using Lex, but the use of Lex is not discussed in this
|
|
manual.
|
|
|
|
3. Write a controlling function that calls the Bison-produced parser.
|
|
|
|
4. Write error-reporting routines.
|
|
|
|
To turn this source code as written into a runnable program, you
|
|
must follow these steps:
|
|
|
|
1. Run Bison on the grammar to produce the parser.
|
|
|
|
2. Compile the code output by Bison, as well as any other source
|
|
files.
|
|
|
|
3. Link the object files to produce the finished product.
|
|
|
|
|
|
File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts
|
|
|
|
The Overall Layout of a Bison Grammar
|
|
=====================================
|
|
|
|
The input file for the Bison utility is a "Bison grammar file". The
|
|
general form of a Bison grammar file is as follows:
|
|
|
|
%{
|
|
C DECLARATIONS
|
|
%}
|
|
|
|
BISON DECLARATIONS
|
|
|
|
%%
|
|
GRAMMAR RULES
|
|
%%
|
|
ADDITIONAL C CODE
|
|
|
|
The `%%', `%{' and `%}' are punctuation that appears in every Bison
|
|
grammar file to separate the sections.
|
|
|
|
The C declarations may define types and variables used in the
|
|
actions. You can also use preprocessor commands to define macros used
|
|
there, and use `#include' to include header files that do any of these
|
|
things.
|
|
|
|
The Bison declarations declare the names of the terminal and
|
|
nonterminal symbols, and may also describe operator precedence and the
|
|
data types of semantic values of various symbols.
|
|
|
|
The grammar rules define how to construct each nonterminal symbol
|
|
from its parts.
|
|
|
|
The additional C code can contain any C code you want to use. Often
|
|
the definition of the lexical analyzer `yylex' goes here, plus
|
|
subroutines called by the actions in the grammar rules. In a simple
|
|
program, all the rest of the program can go here.
|
|
|
|
|
|
File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top
|
|
|
|
Examples
|
|
********
|
|
|
|
Now we show and explain three sample programs written using Bison: a
|
|
reverse polish notation calculator, an algebraic (infix) notation
|
|
calculator, and a multi-function calculator. All three have been tested
|
|
under BSD Unix 4.3; each produces a usable, though limited, interactive
|
|
desk-top calculator.
|
|
|
|
These examples are simple, but Bison grammars for real programming
|
|
languages are written the same way. You can copy these examples out of
|
|
the Info file and into a source file to try them.
|
|
|
|
* Menu:
|
|
|
|
* RPN Calc:: Reverse polish notation calculator;
|
|
a first example with no operator precedence.
|
|
* Infix Calc:: Infix (algebraic) notation calculator.
|
|
Operator precedence is introduced.
|
|
* Simple Error Recovery:: Continuing after syntax errors.
|
|
* Multi-function Calc:: Calculator with memory and trig functions.
|
|
It uses multiple data-types for semantic values.
|
|
* Exercises:: Ideas for improving the multi-function calculator.
|
|
|
|
|
|
File: bison.info, Node: RPN Calc, Next: Infix Calc, Up: Examples
|
|
|
|
Reverse Polish Notation Calculator
|
|
==================================
|
|
|
|
The first example is that of a simple double-precision "reverse
|
|
polish notation" calculator (a calculator using postfix operators).
|
|
This example provides a good starting point, since operator precedence
|
|
is not an issue. The second example will illustrate how operator
|
|
precedence is handled.
|
|
|
|
The source code for this calculator is named `rpcalc.y'. The `.y'
|
|
extension is a convention used for Bison input files.
|
|
|
|
* Menu:
|
|
|
|
* Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
|
|
* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
|
|
* Lexer: Rpcalc Lexer. The lexical analyzer.
|
|
* Main: Rpcalc Main. The controlling function.
|
|
* Error: Rpcalc Error. The error reporting function.
|
|
* Gen: Rpcalc Gen. Running Bison on the grammar file.
|
|
* Comp: Rpcalc Compile. Run the C compiler on the output code.
|
|
|