mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
doc: updates for 3.6
* doc/bison.texi: More s/token type/token kind/. * NEWS: Update.
This commit is contained in:
52
NEWS
52
NEWS
@@ -19,7 +19,7 @@ GNU Bison NEWS
|
||||
*** Improved syntax error messages
|
||||
|
||||
Two new values for the %define parse.error variable offer more control to
|
||||
the user.
|
||||
the user. Available in all the skeletons (C, C++, Java).
|
||||
|
||||
**** %define parse.error detailed
|
||||
|
||||
@@ -34,7 +34,12 @@ GNU Bison NEWS
|
||||
**** %define parse.error custom
|
||||
|
||||
With this directive, the user forges and emits the syntax error message
|
||||
herself by defining a function such as:
|
||||
herself by defining the yyreport_syntax_error function. A new type,
|
||||
yypcontext_t, captures the circumstances of the error, and provides the
|
||||
user with functions to get details, such as yypcontext_expected_tokens to
|
||||
get the list of expected token kinds.
|
||||
|
||||
A possible implementation of yyreport_syntax_error is:
|
||||
|
||||
int
|
||||
yyreport_syntax_error (const yypcontext_t *ctx)
|
||||
@@ -86,35 +91,42 @@ GNU Bison NEWS
|
||||
|
||||
*** List of expected tokens (yacc.c)
|
||||
|
||||
At any point during parsing (including even before submitting the first
|
||||
token), push parsers may now invoke yypstate_expected_tokens to get the
|
||||
list of possible tokens. This feature can be used to propose
|
||||
autocompletion (see below the "bistromathic" example).
|
||||
Push parsers may invoke yypstate_expected_tokens at any point during
|
||||
parsing (including even before submitting the first token) to get the list
|
||||
of possible tokens. This feature can be used to propose autocompletion
|
||||
(see below the "bistromathic" example).
|
||||
|
||||
It makes little sense to use this feature without enabling LAC (lookahead
|
||||
correction).
|
||||
|
||||
*** Deep overhaul of the symbol and token kinds
|
||||
|
||||
To avoid the confusion with typing in programming languages, we now refer
|
||||
to token and symbol "kinds" instead of token and symbol "types".
|
||||
To avoid the confusion with types in programming languages, we now refer
|
||||
to token and symbol "kinds" instead of token and symbol "types". The
|
||||
documentation and error messages have been revised.
|
||||
|
||||
All the skeletons have been updated to use dedicated enum types rather
|
||||
than integral types. Special symbols are now regular citizens, instead of
|
||||
being declared in ad hoc ways.
|
||||
|
||||
**** Token kinds
|
||||
|
||||
The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER,
|
||||
LPAREN, etc. Users are invited to replace their uses of "enum
|
||||
yytokentype" by "yytoken_kind_t".
|
||||
LPAREN, etc. While backward compatibility is of course ensured, users are
|
||||
nonetheless invited to replace their uses of "enum yytokentype" by
|
||||
"yytoken_kind_t".
|
||||
|
||||
This type now also includes tokens that were previously hidden: YYEOF (end
|
||||
of input), YYUNDEF (undefined token), and YYERRCODE (error token). They
|
||||
now have string aliases, internationalized if internationalization is
|
||||
now have string aliases, internationalized when internationalization is
|
||||
enabled. Therefore, by default, error messages now refer to "end of file"
|
||||
(internationalized) rather than the cryptic "$end".
|
||||
(internationalized) rather than the cryptic "$end", or to "invaid token"
|
||||
rather than "$undefined".
|
||||
|
||||
In most case, it is now useless to define the end-of-line token as
|
||||
follows:
|
||||
Therefore in most cases it is now useless to define the end-of-line token
|
||||
as follows:
|
||||
|
||||
%token EOF 0 _("end of file")
|
||||
%token T_EOF 0 "end of file"
|
||||
|
||||
Rather simply use "YYEOF" in your scanner.
|
||||
|
||||
@@ -126,7 +138,9 @@ GNU Bison NEWS
|
||||
|
||||
They are now exposed as a enum, "yysymbol_kind_t".
|
||||
|
||||
This allows users to tailor the error messages the way they want.
|
||||
This allows users to tailor the error messages the way they want, or to
|
||||
process some symbols in a specific way in autocompletion (see the
|
||||
bistromathic example below).
|
||||
|
||||
*** Modernize display of explanatory statements in diagnostics
|
||||
|
||||
@@ -166,12 +180,18 @@ GNU Bison NEWS
|
||||
The lexcalc example (a simple example in C based on Flex and Bison) now
|
||||
also demonstrates location tracking.
|
||||
|
||||
|
||||
A new C example, bistromathic, is a fully featured interactive calculator
|
||||
using many Bison features: pure interface, push parser, autocompletion
|
||||
based on the current parser state (using yypstate_expected_tokens),
|
||||
location tracking, internationalized custom error messages, lookahead
|
||||
correction, rich debug traces, etc.
|
||||
|
||||
It shows how to depend on the symbol kinds to tailor autocompletion. For
|
||||
instance it recognizes the symbol kind "VARIABLE" to propose
|
||||
autocompletion on the existing variables, rather than of the word
|
||||
"variable".
|
||||
|
||||
* Noteworthy changes in release 3.5.4 (2020-04-05) [stable]
|
||||
|
||||
** WARNING: Future backward-incompatibilities!
|
||||
|
||||
9
TODO
9
TODO
@@ -19,12 +19,11 @@
|
||||
- symbol.type_get should be kind_get, and it's not documented.
|
||||
- YYERRCODE and "end of file" and translation
|
||||
|
||||
*** The documentation
|
||||
You can explicitly specify the numeric code for a token type...
|
||||
** Java
|
||||
*** Examples
|
||||
Have an example with a push parser. Use autocompletion in that case.
|
||||
|
||||
The token numbered as 0.
|
||||
|
||||
** Java: calc.at
|
||||
*** calc.at
|
||||
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
|
||||
|
||||
** doc
|
||||
|
||||
@@ -1232,7 +1232,7 @@ action in a GLR parser.
|
||||
@cindex GLR parsers and @code{yylval}
|
||||
@vindex yylloc
|
||||
@cindex GLR parsers and @code{yylloc}
|
||||
In any semantic action, you can examine @code{yychar} to determine the type
|
||||
In any semantic action, you can examine @code{yychar} to determine the kind
|
||||
of the lookahead token present at the time of the associated reduction.
|
||||
After checking that @code{yychar} is not set to @code{YYEMPTY} or
|
||||
@code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to
|
||||
@@ -1853,7 +1853,7 @@ for such a single-character token is the character itself.
|
||||
|
||||
The return value of the lexical analyzer function is a numeric code which
|
||||
represents a token kind. The same text used in Bison rules to stand for
|
||||
this token kind is also a C expression for the numeric code for the type.
|
||||
this token kind is also a C expression for the numeric code of the kind.
|
||||
This works in two ways. If the token kind is a character literal, then its
|
||||
numeric code is that of the character; you can use the same character
|
||||
literal in the lexical analyzer to express the number. If the token kind is
|
||||
@@ -2230,14 +2230,13 @@ the same as the declarations for the infix notation calculator.
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
Note there are no declarations specific to locations. Defining a data
|
||||
type for storing locations is not needed: we will use the type provided
|
||||
by default (@pxref{Location Type}), which is a
|
||||
four member structure with the following integer fields:
|
||||
@code{first_line}, @code{first_column}, @code{last_line} and
|
||||
@code{last_column}. By conventions, and in accordance with the GNU
|
||||
Coding Standards and common practice, the line and column count both
|
||||
start at 1.
|
||||
Note there are no declarations specific to locations. Defining a data type
|
||||
for storing locations is not needed: we will use the type provided by
|
||||
default (@pxref{Location Type}), which is a four member structure with the
|
||||
following integer fields: @code{first_line}, @code{first_column},
|
||||
@code{last_line} and @code{last_column}. By conventions, and in accordance
|
||||
with the GNU Coding Standards and common practice, the line and column count
|
||||
both start at 1.
|
||||
|
||||
@node Ltcalc Rules
|
||||
@subsection Grammar Rules for @code{ltcalc}
|
||||
@@ -2646,7 +2645,7 @@ By simply editing the initialization list and adding the necessary include
|
||||
files, you can add additional functions to the calculator.
|
||||
|
||||
Two important functions allow look-up and installation of symbols in the
|
||||
symbol table. The function @code{putsym} is passed a name and the type
|
||||
symbol table. The function @code{putsym} is passed a name and the kind
|
||||
(@code{VAR} or @code{FUN}) of the object to be installed. The object is
|
||||
linked to the front of the list, and a pointer to the object is returned.
|
||||
The function @code{getsym} is passed the name of the symbol to look up. If
|
||||
@@ -3698,10 +3697,9 @@ In a simple program it may be sufficient to use the same data type for
|
||||
the semantic values of all language constructs. This was true in the
|
||||
RPN and infix calculator examples (@pxref{RPN Calc}).
|
||||
|
||||
Bison normally uses the type @code{int} for semantic values if your
|
||||
program uses the same data type for all language constructs. To
|
||||
specify some other type, define the @code{%define} variable
|
||||
@code{api.value.type} like this:
|
||||
Bison normally uses the type @code{int} for semantic values if your program
|
||||
uses the same data type for all language constructs. To specify some other
|
||||
type, define the @code{%define} variable @code{api.value.type} like this:
|
||||
|
||||
@example
|
||||
%define api.value.type @{double@}
|
||||
@@ -4492,10 +4490,9 @@ Defining a data type for locations is much simpler than for semantic values,
|
||||
since all tokens and groupings always use the same type.
|
||||
|
||||
You can specify the type of locations by defining a macro called
|
||||
@code{YYLTYPE}, just as you can specify the semantic value type by
|
||||
defining a @code{YYSTYPE} macro (@pxref{Value Type}).
|
||||
When @code{YYLTYPE} is not defined, Bison uses a default structure type with
|
||||
four members:
|
||||
@code{YYLTYPE}, just as you can specify the semantic value type by defining
|
||||
a @code{YYSTYPE} macro (@pxref{Value Type}). When @code{YYLTYPE} is not
|
||||
defined, Bison uses a default structure type with four members:
|
||||
|
||||
@example
|
||||
typedef struct YYLTYPE
|
||||
@@ -7161,7 +7158,7 @@ yylex (void)
|
||||
return c; /* Assume token kind for '+' is '+'. */
|
||||
@dots{}
|
||||
else
|
||||
return INT; /* Return the type of the token. */
|
||||
return INT; /* Return the kind of the token. */
|
||||
@dots{}
|
||||
@}
|
||||
@end example
|
||||
@@ -7211,7 +7208,7 @@ the type is @code{int} (the default), you might write this in @code{yylex}:
|
||||
@group
|
||||
@dots{}
|
||||
yylval = value; /* Put value onto Bison stack. */
|
||||
return INT; /* Return the type of the token. */
|
||||
return INT; /* Return the kind of the token. */
|
||||
@dots{}
|
||||
@end group
|
||||
@end example
|
||||
@@ -7238,7 +7235,7 @@ then the code in @code{yylex} might look like this:
|
||||
@group
|
||||
@dots{}
|
||||
yylval.intval = value; /* Put value onto Bison stack. */
|
||||
return INT; /* Return the type of the token. */
|
||||
return INT; /* Return the kind of the token. */
|
||||
@dots{}
|
||||
@end group
|
||||
@end example
|
||||
@@ -7279,7 +7276,7 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
|
||||
@{
|
||||
@dots{}
|
||||
*lvalp = value; /* Put value onto Bison stack. */
|
||||
return INT; /* Return the type of the token. */
|
||||
return INT; /* Return the kind of the token. */
|
||||
@dots{}
|
||||
@}
|
||||
@end example
|
||||
@@ -8383,15 +8380,14 @@ represent the entire sequence of terminal and nonterminal symbols at or
|
||||
near the top of the stack. The current state collects all the information
|
||||
about previous input which is relevant to deciding what to do next.
|
||||
|
||||
Each time a lookahead token is read, the current parser state together
|
||||
with the type of lookahead token are looked up in a table. This table
|
||||
entry can say, ``Shift the lookahead token.'' In this case, it also
|
||||
specifies the new parser state, which is pushed onto the top of the
|
||||
parser stack. Or it can say, ``Reduce using rule number @var{n}.''
|
||||
This means that a certain number of tokens or groupings are taken off
|
||||
the top of the stack, and replaced by one grouping. In other words,
|
||||
that number of states are popped from the stack, and one new state is
|
||||
pushed.
|
||||
Each time a lookahead token is read, the current parser state together with
|
||||
the kind of lookahead token are looked up in a table. This table entry can
|
||||
say, ``Shift the lookahead token.'' In this case, it also specifies the new
|
||||
parser state, which is pushed onto the top of the parser stack. Or it can
|
||||
say, ``Reduce using rule number @var{n}.'' This means that a certain number
|
||||
of tokens or groupings are taken off the top of the stack, and replaced by
|
||||
one grouping. In other words, that number of states are popped from the
|
||||
stack, and one new state is pushed.
|
||||
|
||||
There is one other alternative: the table can say that the lookahead token
|
||||
is erroneous in the current state. This causes error processing to begin
|
||||
@@ -11624,8 +11620,8 @@ particular it produces a genuine @code{union}, which have a few specific
|
||||
features in C++.
|
||||
@itemize @minus
|
||||
@item
|
||||
The type @code{YYSTYPE} is defined but its use is discouraged: rather
|
||||
you should refer to the parser's encapsulated type
|
||||
The type @code{YYSTYPE} is defined but its use is discouraged: rather you
|
||||
should refer to the parser's encapsulated type
|
||||
@code{yy::parser::semantic_type}.
|
||||
@item
|
||||
Non POD (Plain Old Data) types cannot be used. C++98 forbids any instance
|
||||
|
||||
Reference in New Issue
Block a user