doc: refer to the token kind rather than the token type

* doc/bison.texi: Replace occurrences of "token type" with "token
kind".
Stop referring to the "macro definitions" of the token kinds, just
name them "definitions".
This commit is contained in:
Akim Demaille
2020-04-05 15:14:59 +02:00
parent 9b70d69f39
commit 04d62346f3
3 changed files with 130 additions and 122 deletions

14
NEWS
View File

@@ -118,8 +118,17 @@ GNU Bison NEWS
** Documentation ** Documentation
*** User Manual
In order to avoid ambiguities with "type" as in "typing", we now refer to
the "token kind" (e.g., `PLUS`, `NUMBER`, etc.) rather than the "token
type". We now also refer to the "symbol type" (e.g., `PLUS`, `expr`,
etc.).
*** Examples
There are now two examples in examples/java: a very simple calculator, and There are now two examples in examples/java: a very simple calculator, and
one that tracks locations to provide acurate error messages. one that tracks locations to provide accurate error messages.
The lexcalc example (a simple example in C based on Flex and Bison) now The lexcalc example (a simple example in C based on Flex and Bison) now
also demonstrates location tracking. also demonstrates location tracking.
@@ -4038,7 +4047,8 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
LocalWords: YYPRINT Mangold Bonzini's Wdangling exVal baz checkable gcc LocalWords: YYPRINT Mangold Bonzini's Wdangling exVal baz checkable gcc
LocalWords: fsanitize Vogelsgesang lis redeclared stdint automata yytname LocalWords: fsanitize Vogelsgesang lis redeclared stdint automata yytname
LocalWords: yysymbol yytnamerr yyreport ctx ARGMAX yysyntax stderr LocalWords: yysymbol yytnamerr yyreport ctx ARGMAX yysyntax stderr
LocalWords: symrec LocalWords: symrec yypcontext TOKENMAX yyexpected YYEMPTY yypstate
LocalWords: autocompletion bistromathic submessages Cayuela lexcalc
Local Variables: Local Variables:
ispell-dictionary: "american" ispell-dictionary: "american"

View File

@@ -314,7 +314,7 @@ Parser C-Language Interface
The Lexical Analyzer Function @code{yylex} The Lexical Analyzer Function @code{yylex}
* Calling Convention:: How @code{yyparse} calls @code{yylex}. * Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Tokens from Literals:: Finding token types from string aliases. * Tokens from Literals:: Finding token kinds from string aliases.
* Token Values:: How @code{yylex} must return the semantic value * Token Values:: How @code{yylex} must return the semantic value
of the token it has read. of the token it has read.
* Token Locations:: How @code{yylex} must return the text location * Token Locations:: How @code{yylex} must return the text location
@@ -623,11 +623,11 @@ possible parses of any given string is finite.
@cindex token @cindex token
@cindex syntactic grouping @cindex syntactic grouping
@cindex grouping, syntactic @cindex grouping, syntactic
In the formal grammatical rules for a language, each kind of syntactic In the formal grammatical rules for a language, each kind of syntactic unit
unit or grouping is named by a @dfn{symbol}. Those which are built by or grouping is named by a @dfn{symbol}. Those which are built by grouping
grouping smaller constructs according to grammatical rules are called smaller constructs according to grammatical rules are called
@dfn{nonterminal symbols}; those which can't be subdivided are called @dfn{nonterminal symbols}; those which can't be subdivided are called
@dfn{terminal symbols} or @dfn{token types}. We call a piece of input @dfn{terminal symbols} or @dfn{token kinds}. We call a piece of input
corresponding to a single terminal symbol a @dfn{token}, and a piece corresponding to a single terminal symbol a @dfn{token}, and a piece
corresponding to a single nonterminal symbol a @dfn{grouping}. corresponding to a single nonterminal symbol a @dfn{grouping}.
@@ -710,7 +710,7 @@ as an identifier, like an identifier in C@. By convention, it should be
in lower case, such as @code{expr}, @code{stmt} or @code{declaration}. in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
The Bison representation for a terminal symbol is also called a @dfn{token The Bison representation for a terminal symbol is also called a @dfn{token
type}. Token types as well can be represented as C-like identifiers. By kind}. Token kinds as well can be represented as C-like identifiers. By
convention, these identifiers should be upper case to distinguish them from convention, these identifiers should be upper case to distinguish them from
nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
@code{RETURN}. A terminal symbol that stands for a particular keyword in @code{RETURN}. A terminal symbol that stands for a particular keyword in
@@ -754,26 +754,26 @@ grammatical.
But the precise value is very important for what the input means once it is But the precise value is very important for what the input means once it is
parsed. A compiler is useless if it fails to distinguish between 4, 1 and parsed. A compiler is useless if it fails to distinguish between 4, 1 and
3989 as constants in the program! Therefore, each token in a Bison grammar 3989 as constants in the program! Therefore, each token in a Bison grammar
has both a token type and a @dfn{semantic value}. @xref{Semantics}, has both a token kind and a @dfn{semantic value}. @xref{Semantics}, for
for details. details.
The token type is a terminal symbol defined in the grammar, such as The token kind is a terminal symbol defined in the grammar, such as
@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything @code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything you
you need to know to decide where the token may validly appear and how to need to know to decide where the token may validly appear and how to group
group it with other tokens. The grammar rules know nothing about tokens it with other tokens. The grammar rules know nothing about tokens except
except their types. their kinds.
The semantic value has all the rest of the information about the The semantic value has all the rest of the information about the
meaning of the token, such as the value of an integer, or the name of an meaning of the token, such as the value of an integer, or the name of an
identifier. (A token such as @code{','} which is just punctuation doesn't identifier. (A token such as @code{','} which is just punctuation doesn't
need to have any semantic value.) need to have any semantic value.)
For example, an input token might be classified as token type For example, an input token might be classified as token kind @code{INTEGER}
@code{INTEGER} and have the semantic value 4. Another input token might and have the semantic value 4. Another input token might have the same
have the same token type @code{INTEGER} but value 3989. When a grammar token kind @code{INTEGER} but value 3989. When a grammar rule says that
rule says that @code{INTEGER} is allowed, either of these tokens is @code{INTEGER} is allowed, either of these tokens is acceptable because each
acceptable because each is an @code{INTEGER}. When the parser accepts the is an @code{INTEGER}. When the parser accepts the token, it keeps track of
token, it keeps track of the token's semantic value. the token's semantic value.
Each grouping can also have a semantic value as well as its nonterminal Each grouping can also have a semantic value as well as its nonterminal
symbol. For example, in a calculator, an expression typically has a symbol. For example, in a calculator, an expression typically has a
@@ -1428,7 +1428,7 @@ In addition, a complete C program must start with a function called
@code{main}; you have to provide this, and arrange for it to call @code{main}; you have to provide this, and arrange for it to call
@code{yyparse} or the parser will never run. @xref{Interface}. @code{yyparse} or the parser will never run. @xref{Interface}.
Aside from the token type names and the symbols in the actions you Aside from the token kind names and the symbols in the actions you
write, all symbols defined in the Bison parser implementation file write, all symbols defined in the Bison parser implementation file
itself begin with @samp{yy} or @samp{YY}. This includes interface itself begin with @samp{yy} or @samp{YY}. This includes interface
functions such as the lexical analyzer function @code{yylex}, the functions such as the lexical analyzer function @code{yylex}, the
@@ -1643,7 +1643,7 @@ Each terminal symbol that is not a single-character literal must be
declared. (Single-character literals normally don't need to be declared.) declared. (Single-character literals normally don't need to be declared.)
In this example, all the arithmetic operators are designated by In this example, all the arithmetic operators are designated by
single-character literals, so the only terminal symbol that needs to be single-character literals, so the only terminal symbol that needs to be
declared is @code{NUM}, the token type for numeric constants. declared is @code{NUM}, the token kind for numeric constants.
@node Rpcalc Rules @node Rpcalc Rules
@subsection Grammar Rules for @code{rpcalc} @subsection Grammar Rules for @code{rpcalc}
@@ -1850,14 +1850,14 @@ that isn't part of a number is a separate token. Note that the token-code
for such a single-character token is the character itself. for such a single-character token is the character itself.
The return value of the lexical analyzer function is a numeric code which The return value of the lexical analyzer function is a numeric code which
represents a token type. The same text used in Bison rules to stand for represents a token kind. The same text used in Bison rules to stand for
this token type is also a C expression for the numeric code for the type. this token kind is also a C expression for the numeric code for the type.
This works in two ways. If the token type is a character literal, then its This works in two ways. If the token kind is a character literal, then its
numeric code is that of the character; you can use the same numeric code is that of the character; you can use the same character
character literal in the lexical analyzer to express the number. If the literal in the lexical analyzer to express the number. If the token kind is
token type is an identifier, that identifier is defined by Bison as a C an identifier, that identifier is defined by Bison as a C macro whose
macro whose definition is the appropriate number. In this example, definition is the appropriate number. In this example, therefore,
therefore, @code{NUM} becomes a macro for @code{yylex} to use. @code{NUM} becomes a macro for @code{yylex} to use.
The semantic value of the token (if it has one) is stored into the global The semantic value of the token (if it has one) is stored into the global
variable @code{yylval}, which is where the Bison parser will look for it. variable @code{yylval}, which is where the Bison parser will look for it.
@@ -1865,7 +1865,7 @@ variable @code{yylval}, which is where the Bison parser will look for it.
at the beginning of the grammar via @samp{%define api.value.type at the beginning of the grammar via @samp{%define api.value.type
@{double@}}; @pxref{Rpcalc Declarations}.) @{double@}}; @pxref{Rpcalc Declarations}.)
A token type code of zero is returned if the end-of-input is encountered. A token kind code of zero is returned if the end-of-input is encountered.
(Bison recognizes any nonpositive value as indicating end-of-input.) (Bison recognizes any nonpositive value as indicating end-of-input.)
Here is the code for the lexical analyzer: Here is the code for the lexical analyzer:
@@ -2106,11 +2106,11 @@ same as before.
There are two important new features shown in this code. There are two important new features shown in this code.
In the second section (Bison declarations), @code{%left} declares token In the second section (Bison declarations), @code{%left} declares token
types and says they are left-associative operators. The declarations kinds and says they are left-associative operators. The declarations
@code{%left} and @code{%right} (right associativity) take the place of @code{%left} and @code{%right} (right associativity) take the place of
@code{%token} which is used to declare a token type name without @code{%token} which is used to declare a token kind name without
associativity/precedence. (These tokens are single-character literals, which associativity/precedence. (These tokens are single-character literals,
ordinarily don't need to be declared. We declare them here to specify which ordinarily don't need to be declared. We declare them here to specify
the associativity/precedence.) the associativity/precedence.)
Operator precedence is determined by the line ordering of the Operator precedence is determined by the line ordering of the
@@ -2498,7 +2498,7 @@ augmented with their data type (placed between angle brackets). For
instance, values of @code{NUM} are stored in @code{double}. instance, values of @code{NUM} are stored in @code{double}.
The Bison construct @code{%nterm} is used for declaring nonterminal symbols, The Bison construct @code{%nterm} is used for declaring nonterminal symbols,
just as @code{%token} is used for declaring token types. Previously we did just as @code{%token} is used for declaring token kinds. Previously we did
not use @code{%nterm} before because nonterminal symbols are normally not use @code{%nterm} before because nonterminal symbols are normally
declared implicitly by the rules that define them. But @code{exp} must be declared implicitly by the rules that define them. But @code{exp} must be
declared explicitly so we can specify its value type. @xref{Type Decl}. declared explicitly so we can specify its value type. @xref{Type Decl}.
@@ -3310,19 +3310,19 @@ of the grammar file.
@section Symbols, Terminal and Nonterminal @section Symbols, Terminal and Nonterminal
@cindex nonterminal symbol @cindex nonterminal symbol
@cindex terminal symbol @cindex terminal symbol
@cindex token type @cindex token kind
@cindex symbol @cindex symbol
@dfn{Symbols} in Bison grammars represent the grammatical classifications @dfn{Symbols} in Bison grammars represent the grammatical classifications
of the language. of the language.
A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a A @dfn{terminal symbol} (also known as a @dfn{token kind}) represents a
class of syntactically equivalent tokens. You use the symbol in grammar class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the @code{yylex} represented in the Bison parser by a numeric code, and the @code{yylex}
function returns a token type code to indicate what kind of token has function returns a token kind code to indicate what kind of token has been
been read. You don't need to know what the code value is; you can use read. You don't need to know what the code value is; you can use the symbol
the symbol to stand for it. to stand for it.
A @dfn{nonterminal symbol} stands for a class of syntactically A @dfn{nonterminal symbol} stands for a class of syntactically
equivalent groupings. The symbol name is used in writing grammar rules. equivalent groupings. The symbol name is used in writing grammar rules.
@@ -3340,27 +3340,26 @@ There are three ways of writing terminal symbols in the grammar:
@itemize @bullet @itemize @bullet
@item @item
A @dfn{named token type} is written with an identifier, like an A @dfn{named token kind} is written with an identifier, like an identifier
identifier in C@. By convention, it should be all upper case. Each in C@. By convention, it should be all upper case. Each such name must be
such name must be defined with a Bison declaration such as defined with a Bison declaration such as @code{%token}. @xref{Token Decl}.
@code{%token}. @xref{Token Decl}.
@item @item
@cindex character token @cindex character token
@cindex literal token @cindex literal token
@cindex single-character literal @cindex single-character literal
A @dfn{character token type} (or @dfn{literal character token}) is written A @dfn{character token kind} (or @dfn{literal character token}) is written
in the grammar using the same syntax used in C for character constants; for in the grammar using the same syntax used in C for character constants; for
example, @code{'+'} is a character token type. A character token type example, @code{'+'} is a character token kind. A character token kind
doesn't need to be declared unless you need to specify its semantic value doesn't need to be declared unless you need to specify its semantic value
data type (@pxref{Value Type}), associativity, or precedence data type (@pxref{Value Type}), associativity, or precedence
(@pxref{Precedence}). (@pxref{Precedence}).
By convention, a character token type is used only to represent a By convention, a character token kind is used only to represent a token that
token that consists of that particular character. Thus, the token consists of that particular character. Thus, the token kind @code{'+'} is
type @code{'+'} is used to represent the character @samp{+} as a used to represent the character @samp{+} as a token. Nothing enforces this
token. Nothing enforces this convention, but if you depart from it, convention, but if you depart from it, your program will confuse other
your program will confuse other readers. readers.
All the usual escape sequences used in character literals in C can be used All the usual escape sequences used in character literals in C can be used
in Bison as well, but you must not use the null character as a character in Bison as well, but you must not use the null character as a character
@@ -3388,7 +3387,7 @@ string token from the @code{yytname} table (@pxref{Calling Convention}).
By convention, a literal string token is used only to represent a token By convention, a literal string token is used only to represent a token
that consists of that particular string. Thus, you should use the token that consists of that particular string. Thus, you should use the token
type @code{"<="} to represent the string @samp{<=} as a token. Bison kind @code{"<="} to represent the string @samp{<=} as a token. Bison
does not enforce this convention, but if you depart from it, people who does not enforce this convention, but if you depart from it, people who
read your program will be confused. read your program will be confused.
@@ -3406,22 +3405,22 @@ on when the parser function returns that symbol.
The value returned by @code{yylex} is always one of the terminal The value returned by @code{yylex} is always one of the terminal
symbols, except that a zero or negative value signifies end-of-input. symbols, except that a zero or negative value signifies end-of-input.
Whichever way you write the token type in the grammar rules, you write Whichever way you write the token kind in the grammar rules, you write
it the same way in the definition of @code{yylex}. The numeric code it the same way in the definition of @code{yylex}. The numeric code
for a character token type is simply the positive numeric code of the for a character token kind is simply the positive numeric code of the
character, so @code{yylex} can use the identical value to generate the character, so @code{yylex} can use the identical value to generate the
requisite code, though you may need to convert it to @code{unsigned requisite code, though you may need to convert it to @code{unsigned
char} to avoid sign-extension on hosts where @code{char} is signed. char} to avoid sign-extension on hosts where @code{char} is signed.
Each named token type becomes a C macro in the parser implementation Each named token kind becomes a C macro in the parser implementation
file, so @code{yylex} can use the name to stand for the code. (This file, so @code{yylex} can use the name to stand for the code. (This
is why periods don't make sense in terminal symbols.) @xref{Calling is why periods don't make sense in terminal symbols.) @xref{Calling
Convention}. Convention}.
If @code{yylex} is defined in a separate file, you need to arrange for the If @code{yylex} is defined in a separate file, you need to arrange for the
token-type macro definitions to be available there. Use the @samp{-d} token-kind definitions to be available there. Use the @samp{-d} option when
option when you run Bison, so that it will write these macro definitions you run Bison, so that it will write these definitions into a separate
into a separate header file @file{@var{name}.tab.h} which you can include header file @file{@var{name}.tab.h} which you can include in the other
in the other source files that need it. @xref{Invocation}. source files that need it. @xref{Invocation}.
If you want to write a grammar that is portable to any Standard C If you want to write a grammar that is portable to any Standard C
host, you must use only nonnull character tokens taken from the basic host, you must use only nonnull character tokens taken from the basic
@@ -3726,11 +3725,10 @@ this:
@noindent @noindent
This macro definition must go in the prologue of the grammar file This macro definition must go in the prologue of the grammar file
(@pxref{Grammar Outline}). If compatibility (@pxref{Grammar Outline}). If compatibility with POSIX Yacc matters to you,
with POSIX Yacc matters to you, use this. Note however that Bison cannot use this. Note however that Bison cannot know @code{YYSTYPE}'s value, not
know @code{YYSTYPE}'s value, not even whether it is defined, so there are even whether it is defined, so there are services it cannot provide.
services it cannot provide. Besides this works only for languages that have Besides this works only for languages that have a preprocessor.
a preprocessor.
@node Multiple Types @node Multiple Types
@subsection More Than One Value Type @subsection More Than One Value Type
@@ -4772,7 +4770,7 @@ The @dfn{Bison declarations} section of a Bison grammar defines the symbols
used in formulating the grammar and the data types of semantic values. used in formulating the grammar and the data types of semantic values.
@xref{Symbols}. @xref{Symbols}.
All token type names (but not single-character literal tokens such as All token kind names (but not single-character literal tokens such as
@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be @code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
declared if you need to specify which data type to use for the semantic declared if you need to specify which data type to use for the semantic
value (@pxref{Multiple Types}). value (@pxref{Multiple Types}).
@@ -4828,21 +4826,21 @@ for the name of the generated DOT file. @xref{Graphviz}.
@node Token Decl @node Token Decl
@subsection Token Type Names @subsection Token Kind Names
@cindex declaring token type names @cindex declaring token kind names
@cindex token type names, declaring @cindex token kind names, declaring
@cindex declaring literal string tokens @cindex declaring literal string tokens
@findex %token @findex %token
The basic way to declare a token type name (terminal symbol) is as follows: The basic way to declare a token kind name (terminal symbol) is as follows:
@example @example
%token @var{name} %token @var{name}
@end example @end example
Bison will convert this into a definition in the parser, so Bison will convert this into a definition in the parser, so that the
that the function @code{yylex} (if it is in this file) can use the name function @code{yylex} (if it is in this file) can use the name @var{name} to
@var{name} to stand for this token type's code. stand for this token kind's code.
Alternatively, you can use @code{%left}, @code{%right}, @code{%precedence}, Alternatively, you can use @code{%left}, @code{%right}, @code{%precedence},
or @code{%nonassoc} instead of @code{%token}, if you wish to specify or @code{%nonassoc} instead of @code{%token}, if you wish to specify
@@ -4850,7 +4848,7 @@ associativity and precedence. @xref{Precedence Decl}. However, for
clarity, we recommend to use these directives only to declare associativity clarity, we recommend to use these directives only to declare associativity
and precedence, and not to add string aliases, semantic types, etc. and precedence, and not to add string aliases, semantic types, etc.
You can explicitly specify the numeric code for a token type by appending a You can explicitly specify the numeric code for a token kind by appending a
nonnegative decimal or hexadecimal integer value in the field immediately nonnegative decimal or hexadecimal integer value in the field immediately
following the token name: following the token name:
@@ -4861,7 +4859,7 @@ following the token name:
@noindent @noindent
It is generally best, however, to let Bison choose the numeric codes for all It is generally best, however, to let Bison choose the numeric codes for all
token types. Bison will automatically select codes that don't conflict with token kinds. Bison will automatically select codes that don't conflict with
each other or with normal characters. each other or with normal characters.
In the event that the stack type is a union, you must augment the In the event that the stack type is a union, you must augment the
@@ -4880,7 +4878,7 @@ For example:
@end group @end group
@end example @end example
You can associate a literal string token with a token type name by writing You can associate a literal string token with a token kind name by writing
the literal string at the end of a @code{%token} declaration which declares the literal string at the end of a @code{%token} declaration which declares
the name. For example: the name. For example:
@@ -4902,7 +4900,7 @@ equivalent literal string tokens:
Once you equate the literal string and the token name, you can use them Once you equate the literal string and the token name, you can use them
interchangeably in further declarations or the grammar rules. The interchangeably in further declarations or the grammar rules. The
@code{yylex} function can use the token name or the literal string to obtain @code{yylex} function can use the token name or the literal string to obtain
the token type code number (@pxref{Calling Convention}). the token kind code number (@pxref{Calling Convention}).
String aliases allow for better error messages using the literal strings String aliases allow for better error messages using the literal strings
instead of the token names, such as @samp{syntax error, unexpected ||, instead of the token names, such as @samp{syntax error, unexpected ||,
@@ -4990,7 +4988,7 @@ declared later has the higher precedence and is grouped first.
For backward compatibility, there is a confusing difference between the For backward compatibility, there is a confusing difference between the
argument lists of @code{%token} and precedence declarations. Only a argument lists of @code{%token} and precedence declarations. Only a
@code{%token} can associate a literal string with a token type name. A @code{%token} can associate a literal string with a token kind name. A
precedence declaration always interprets a literal string as a reference to precedence declaration always interprets a literal string as a reference to
a separate token. For example: a separate token. For example:
@@ -5581,22 +5579,22 @@ Declare the collection of data types that semantic values may have
@end deffn @end deffn
@deffn {Directive} %token @deffn {Directive} %token
Declare a terminal symbol (token type name) with no precedence Declare a terminal symbol (token kind name) with no precedence
or associativity specified (@pxref{Token Decl}). or associativity specified (@pxref{Token Decl}).
@end deffn @end deffn
@deffn {Directive} %right @deffn {Directive} %right
Declare a terminal symbol (token type name) that is right-associative Declare a terminal symbol (token kind name) that is right-associative
(@pxref{Precedence Decl}). (@pxref{Precedence Decl}).
@end deffn @end deffn
@deffn {Directive} %left @deffn {Directive} %left
Declare a terminal symbol (token type name) that is left-associative Declare a terminal symbol (token kind name) that is left-associative
(@pxref{Precedence Decl}). (@pxref{Precedence Decl}).
@end deffn @end deffn
@deffn {Directive} %nonassoc @deffn {Directive} %nonassoc
Declare a terminal symbol (token type name) that is nonassociative Declare a terminal symbol (token kind name) that is nonassociative
(@pxref{Precedence Decl}). (@pxref{Precedence Decl}).
Using it in a way that would be associative is a syntax error. Using it in a way that would be associative is a syntax error.
@end deffn @end deffn
@@ -5661,10 +5659,10 @@ Define a variable to adjust Bison's behavior. @xref{%define Summary}.
@end deffn @end deffn
@deffn {Directive} %defines @deffn {Directive} %defines
Write a parser header file containing macro definitions for the token Write a parser header file containing definitions for the token kind names
type names defined in the grammar as well as a few other declarations. defined in the grammar as well as a few other declarations. If the parser
If the parser implementation file is named @file{@var{name}.c} then implementation file is named @file{@var{name}.c} then the parser header file
the parser header file is named @file{@var{name}.h}. is named @file{@var{name}.h}.
For C parsers, the parser header file declares @code{YYSTYPE} unless For C parsers, the parser header file declares @code{YYSTYPE} unless
@code{YYSTYPE} is already defined as a macro or you have used a @code{YYSTYPE} is already defined as a macro or you have used a
@@ -5686,7 +5684,7 @@ If you have also used locations, the parser header file declares
This parser header file is normally essential if you wish to put the This parser header file is normally essential if you wish to put the
definition of @code{yylex} in a separate source file, because definition of @code{yylex} in a separate source file, because
@code{yylex} typically needs to be able to refer to the @code{yylex} typically needs to be able to refer to the
above-mentioned declarations and to the token type codes. @xref{Token above-mentioned declarations and to the token kind codes. @xref{Token
Values}. Values}.
@findex %code requires @findex %code requires
@@ -5855,7 +5853,7 @@ for (int i = 0; i < YYNTOKENS; i++)
This method is discouraged: the primary purpose of string aliases is forging This method is discouraged: the primary purpose of string aliases is forging
good error messages, not describing the spelling of keywords. In addition, good error messages, not describing the spelling of keywords. In addition,
looking for the token type at runtime incurs a (small but noticeable) cost. looking for the token kind at runtime incurs a (small but noticeable) cost.
Finally, @code{%token-table} is incompatible with the @code{custom} and Finally, @code{%token-table} is incompatible with the @code{custom} and
@code{detailed} values of the @code{parse.error} @code{%define} variable. @code{detailed} values of the @code{parse.error} @code{%define} variable.
@@ -7051,17 +7049,17 @@ the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that @code{yyparse} can this function automatically; you must write it so that @code{yyparse} can
call it. The function is sometimes referred to as a lexical scanner. call it. The function is sometimes referred to as a lexical scanner.
In simple programs, @code{yylex} is often defined at the end of the In simple programs, @code{yylex} is often defined at the end of the Bison
Bison grammar file. If @code{yylex} is defined in a separate source grammar file. If @code{yylex} is defined in a separate source file, you
file, you need to arrange for the token-type macro definitions to be need to arrange for the token-kind definitions to be available there. To do
available there. To do this, use the @samp{-d} option when you run this, use the @samp{-d} option when you run Bison, so that it will write
Bison, so that it will write these macro definitions into the separate these definitions into the separate parser header file,
parser header file, @file{@var{name}.tab.h}, which you can include in @file{@var{name}.tab.h}, which you can include in the other source files
the other source files that need it. @xref{Invocation}. that need it. @xref{Invocation}.
@menu @menu
* Calling Convention:: How @code{yyparse} calls @code{yylex}. * Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Tokens from Literals:: Finding token types from string aliases. * Tokens from Literals:: Finding token kinds from string aliases.
* Token Values:: How @code{yylex} must return the semantic value * Token Values:: How @code{yylex} must return the semantic value
of the token it has read. of the token it has read.
* Token Locations:: How @code{yylex} must return the text location * Token Locations:: How @code{yylex} must return the text location
@@ -7080,11 +7078,11 @@ end-of-input.
When a token is referred to in the grammar rules by a name, that name in the When a token is referred to in the grammar rules by a name, that name in the
parser implementation file becomes a C macro whose definition is the proper parser implementation file becomes a C macro whose definition is the proper
numeric code for that token type. So @code{yylex} can use the name to numeric code for that token kind. So @code{yylex} can use the name to
indicate that type. @xref{Symbols}. indicate that type. @xref{Symbols}.
When a token is referred to in the grammar rules by a character literal, the When a token is referred to in the grammar rules by a character literal, the
numeric code for that character is also the code for the token type. So numeric code for that character is also the code for the token kind. So
@code{yylex} can simply return that character code, possibly converted to @code{yylex} can simply return that character code, possibly converted to
@code{unsigned char} to avoid sign-extension. The null character must not @code{unsigned char} to avoid sign-extension. The null character must not
be used this way, because its code is zero and that signifies end-of-input. be used this way, because its code is zero and that signifies end-of-input.
@@ -7100,7 +7098,7 @@ yylex (void)
return 0; return 0;
@dots{} @dots{}
if (c == '+' || c == '-') if (c == '+' || c == '-')
return c; /* Assume token type for '+' is '+'. */ return c; /* Assume token kind for '+' is '+'. */
@dots{} @dots{}
return INT; /* Return the type of the token. */ return INT; /* Return the type of the token. */
@dots{} @dots{}
@@ -7116,7 +7114,7 @@ utility can be used without change as the definition of @code{yylex}.
@subsection Finding Tokens by String Literals @subsection Finding Tokens by String Literals
If the grammar uses literal string tokens, there are two ways that If the grammar uses literal string tokens, there are two ways that
@code{yylex} can determine the token type codes for them: @code{yylex} can determine the token kind codes for them:
@itemize @bullet @itemize @bullet
@item @item
@@ -7131,7 +7129,7 @@ This is the preferred approach.
@code{yylex} can search for the multicharacter token in the @code{yytname} @code{yylex} can search for the multicharacter token in the @code{yytname}
table. This method is discouraged: the primary purpose of string aliases is table. This method is discouraged: the primary purpose of string aliases is
forging good error messages, not describing the spelling of keywords. In forging good error messages, not describing the spelling of keywords. In
addition, looking for the token type at runtime incurs a (small but addition, looking for the token kind at runtime incurs a (small but
noticeable) cost. noticeable) cost.
The @code{yytname} table is generated only if you use the The @code{yytname} table is generated only if you use the
@@ -7493,7 +7491,7 @@ Return immediately from @code{yyparse}, indicating success.
Unshift a token. This macro is allowed only for rules that reduce Unshift a token. This macro is allowed only for rules that reduce
a single value, and only when there is no lookahead token. a single value, and only when there is no lookahead token.
It is also disallowed in GLR parsers. It is also disallowed in GLR parsers.
It installs a lookahead token with token type @var{token} and It installs a lookahead token with token kind @var{token} and
semantic value @var{value}; then it discards the value that was semantic value @var{value}; then it discards the value that was
going to be reduced by this rule. going to be reduced by this rule.
@@ -7814,7 +7812,7 @@ perform one or more reductions of tokens and groupings on the stack, while
the lookahead token remains off to the side. When no more reductions the lookahead token remains off to the side. When no more reductions
should take place, the lookahead token is shifted onto the stack. This should take place, the lookahead token is shifted onto the stack. This
does not mean that all possible reductions have been done; depending on the does not mean that all possible reductions have been done; depending on the
token type of the lookahead token, some rules may choose to delay their token kind of the lookahead token, some rules may choose to delay their
application. application.
Here is a simple case where lookahead is needed. These three rules define Here is a simple case where lookahead is needed. These three rules define
@@ -8266,7 +8264,7 @@ The effect of @code{%no-default-prec;} can be reversed by giving
@cindex state (of parser) @cindex state (of parser)
The function @code{yyparse} is implemented using a finite-state machine. The function @code{yyparse} is implemented using a finite-state machine.
The values pushed on the parser stack are not simply token type codes; they The values pushed on the parser stack are not simply token kind codes; they
represent the entire sequence of terminal and nonterminal symbols at or represent the entire sequence of terminal and nonterminal symbols at or
near the top of the stack. The current state collects all the information near the top of the stack. The current state collects all the information
about previous input which is relevant to deciding what to do next. about previous input which is relevant to deciding what to do next.
@@ -9262,7 +9260,7 @@ languages.
neither clean nor robust.) neither clean nor robust.)
@node Semantic Tokens @node Semantic Tokens
@section Semantic Info in Token Types @section Semantic Info in Token Kinds
The C language has a context dependency: the way an identifier is used The C language has a context dependency: the way an identifier is used
depends on what its current meaning is. For example, consider this: depends on what its current meaning is. For example, consider this:
@@ -9275,19 +9273,19 @@ This looks like a function call statement, but if @code{foo} is a typedef
name, then this is actually a declaration of @code{x}. How can a Bison name, then this is actually a declaration of @code{x}. How can a Bison
parser for C decide how to parse this input? parser for C decide how to parse this input?
The method used in GNU C is to have two different token types, The method used in GNU C is to have two different token kinds,
@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an @code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
identifier, it looks up the current declaration of the identifier in order identifier, it looks up the current declaration of the identifier in order
to decide which token type to return: @code{TYPENAME} if the identifier is to decide which token kind to return: @code{TYPENAME} if the identifier is
declared as a typedef, @code{IDENTIFIER} otherwise. declared as a typedef, @code{IDENTIFIER} otherwise.
The grammar rules can then express the context dependency by the choice of The grammar rules can then express the context dependency by the choice of
token type to recognize. @code{IDENTIFIER} is accepted as an expression, token kind to recognize. @code{IDENTIFIER} is accepted as an expression,
but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier @code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
is @emph{not} significant, such as in declarations that can shadow a is @emph{not} significant, such as in declarations that can shadow a
typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
accepted---there is one rule for each of the two token types. accepted---there is one rule for each of the two token kinds.
This technique is simple to use if the decision of which kinds of This technique is simple to use if the decision of which kinds of
identifiers to allow is made at a place close to where the identifier is identifiers to allow is made at a place close to where the identifier is
@@ -10190,7 +10188,7 @@ variables show where in the grammar it is working.
@node Mfcalc Traces @node Mfcalc Traces
@subsection Enabling Debug Traces for @code{mfcalc} @subsection Enabling Debug Traces for @code{mfcalc}
The debugging information normally gives the token type of each token read, The debugging information normally gives the token kind of each token read,
but not its semantic value. The @code{%printer} directive allows specify but not its semantic value. The @code{%printer} directive allows specify
how semantic values are reported, see @ref{Printer Decl}. how semantic values are reported, see @ref{Printer Decl}.
@@ -10374,7 +10372,7 @@ terminal symbols and only with the @file{yacc.c} skeleton.
Deprecated, will be removed eventually. Deprecated, will be removed eventually.
If you define @code{YYPRINT}, it should take three arguments. The parser If you define @code{YYPRINT}, it should take three arguments. The parser
will pass a standard I/O stream, the numeric code for the token type, and will pass a standard I/O stream, the numeric code for the token kind, and
the token value (from @code{yylval}). the token value (from @code{yylval}).
For @file{yacc.c} only. Obsoleted by @code{%printer}. For @file{yacc.c} only. Obsoleted by @code{%printer}.
@@ -11017,9 +11015,9 @@ Options controlling the output.
@c Please, keep this ordered as in 'bison --help'. @c Please, keep this ordered as in 'bison --help'.
@table @option @table @option
@item --defines[=@var{file}] @item --defines[=@var{file}]
Pretend that @code{%defines} was specified, i.e., write an extra output Pretend that @code{%defines} was specified, i.e., write an extra output file
file containing macro definitions for the token type names defined in containing definitions for the token kind names defined in the grammar, as
the grammar, as well as a few other declarations. @xref{Decl Summary}. well as a few other declarations. @xref{Decl Summary}.
@item -d @item -d
This is the same as @option{--defines} except @option{-d} does not accept a This is the same as @option{--defines} except @option{-d} does not accept a
@@ -11278,7 +11276,7 @@ In the case of @code{TEXT}, the implicit default action applies: @w{@code{$$
@sp 1 @sp 1
Our scanner deserves some attention. The traditional interface of Our scanner deserves some attention. The traditional interface of
@code{yylex} is not type safe: since the token type and the token value are @code{yylex} is not type safe: since the token kind and the token value are
not correlated, you may return a @code{NUMBER} with a string as semantic not correlated, you may return a @code{NUMBER} with a string as semantic
value. To avoid this, we use @emph{token constructors} (@pxref{Complete value. To avoid this, we use @emph{token constructors} (@pxref{Complete
Symbols}). This directive: Symbols}). This directive:
@@ -11960,13 +11958,13 @@ location. Invocations of @samp{%lex-param @{@var{type1} @var{arg1}@}} yield
additional arguments. additional arguments.
@end deftypefun @end deftypefun
For each token type, Bison generates named constructors as follows. For each token kind, Bison generates named constructors as follows.
@deftypeop {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value}, @code{const location_type&} @var{location}) @deftypeop {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value}, @code{const location_type&} @var{location})
@deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const location_type&} @var{location}) @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const location_type&} @var{location})
@deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value}) @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value})
@deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}) @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token})
Build a complete terminal symbol for the token type @var{token} (including Build a complete terminal symbol for the token kind @var{token} (including
the @code{api.token.prefix}), whose semantic value, if it has one, is the @code{api.token.prefix}), whose semantic value, if it has one, is
@var{value} of adequate @var{value_type}. Pass the @var{location} iff @var{value} of adequate @var{value_type}. Pass the @var{location} iff
location tracking is enabled. location tracking is enabled.
@@ -11993,11 +11991,11 @@ symbol_type (int token, const int&, const location_type&);
symbol_type (int token, const location_type&); symbol_type (int token, const location_type&);
@end example @end example
Correct matching between token types and value types is checked via Correct matching between token kinds and value types is checked via
@code{assert}; for instance, @samp{symbol_type (ID, 42)} would abort. Named @code{assert}; for instance, @samp{symbol_type (ID, 42)} would abort. Named
constructors are preferable (see below), as they offer better type safety constructors are preferable (see below), as they offer better type safety
(for instance @samp{make_ID (42)} would not even compile), but symbol_type (for instance @samp{make_ID (42)} would not even compile), but symbol_type
constructors may help when token types are discovered at run-time, e.g., constructors may help when token kinds are discovered at run-time, e.g.,
@example @example
@group @group
@@ -12023,7 +12021,7 @@ constructors} as follows.
@deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const location_type&} @var{location}) @deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const location_type&} @var{location})
@deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const @var{value_type}&} @var{value}) @deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const @var{value_type}&} @var{value})
@deftypemethodx {parser} {symbol_type} {make_@var{token}} () @deftypemethodx {parser} {symbol_type} {make_@var{token}} ()
Build a complete terminal symbol for the token type @var{token} (not Build a complete terminal symbol for the token kind @var{token} (not
including the @code{api.token.prefix}), whose semantic value, if it has one, including the @code{api.token.prefix}), whose semantic value, if it has one,
is @var{value} of adequate @var{value_type}. Pass the @var{location} iff is @var{value} of adequate @var{value_type}. Pass the @var{location} iff
location tracking is enabled. location tracking is enabled.

View File

@@ -63,7 +63,7 @@
// with locations. // with locations.
%locations %locations
// and acurate list of expected tokens. // and accurate list of expected tokens.
%define parse.lac full %define parse.lac full
// Generate the parser description file (calc.output). // Generate the parser description file (calc.output).