doc: updates for 3.6

* doc/bison.texi: More s/token type/token kind/.
* NEWS: Update.
This commit is contained in:
Akim Demaille
2020-04-13 19:06:06 +02:00
parent caadfc552b
commit 5d983253f7
3 changed files with 70 additions and 55 deletions

52
NEWS
View File

@@ -19,7 +19,7 @@ GNU Bison NEWS
*** Improved syntax error messages *** Improved syntax error messages
Two new values for the %define parse.error variable offer more control to Two new values for the %define parse.error variable offer more control to
the user. the user. Available in all the skeletons (C, C++, Java).
**** %define parse.error detailed **** %define parse.error detailed
@@ -34,7 +34,12 @@ GNU Bison NEWS
**** %define parse.error custom **** %define parse.error custom
With this directive, the user forges and emits the syntax error message With this directive, the user forges and emits the syntax error message
herself by defining a function such as: herself by defining the yyreport_syntax_error function. A new type,
yypcontext_t, captures the circumstances of the error, and provides the
user with functions to get details, such as yypcontext_expected_tokens to
get the list of expected token kinds.
A possible implementation of yyreport_syntax_error is:
int int
yyreport_syntax_error (const yypcontext_t *ctx) yyreport_syntax_error (const yypcontext_t *ctx)
@@ -86,35 +91,42 @@ GNU Bison NEWS
*** List of expected tokens (yacc.c) *** List of expected tokens (yacc.c)
At any point during parsing (including even before submitting the first Push parsers may invoke yypstate_expected_tokens at any point during
token), push parsers may now invoke yypstate_expected_tokens to get the parsing (including even before submitting the first token) to get the list
list of possible tokens. This feature can be used to propose of possible tokens. This feature can be used to propose autocompletion
autocompletion (see below the "bistromathic" example). (see below the "bistromathic" example).
It makes little sense to use this feature without enabling LAC (lookahead It makes little sense to use this feature without enabling LAC (lookahead
correction). correction).
*** Deep overhaul of the symbol and token kinds *** Deep overhaul of the symbol and token kinds
To avoid the confusion with typing in programming languages, we now refer To avoid the confusion with types in programming languages, we now refer
to token and symbol "kinds" instead of token and symbol "types". to token and symbol "kinds" instead of token and symbol "types". The
documentation and error messages have been revised.
All the skeletons have been updated to use dedicated enum types rather
than integral types. Special symbols are now regular citizens, instead of
being declared in ad hoc ways.
**** Token kinds **** Token kinds
The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER, The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER,
LPAREN, etc. Users are invited to replace their uses of "enum LPAREN, etc. While backward compatibility is of course ensured, users are
yytokentype" by "yytoken_kind_t". nonetheless invited to replace their uses of "enum yytokentype" by
"yytoken_kind_t".
This type now also includes tokens that were previously hidden: YYEOF (end This type now also includes tokens that were previously hidden: YYEOF (end
of input), YYUNDEF (undefined token), and YYERRCODE (error token). They of input), YYUNDEF (undefined token), and YYERRCODE (error token). They
now have string aliases, internationalized if internationalization is now have string aliases, internationalized when internationalization is
enabled. Therefore, by default, error messages now refer to "end of file" enabled. Therefore, by default, error messages now refer to "end of file"
(internationalized) rather than the cryptic "$end". (internationalized) rather than the cryptic "$end", or to "invaid token"
rather than "$undefined".
In most case, it is now useless to define the end-of-line token as Therefore in most cases it is now useless to define the end-of-line token
follows: as follows:
%token EOF 0 _("end of file") %token T_EOF 0 "end of file"
Rather simply use "YYEOF" in your scanner. Rather simply use "YYEOF" in your scanner.
@@ -126,7 +138,9 @@ GNU Bison NEWS
They are now exposed as a enum, "yysymbol_kind_t". They are now exposed as a enum, "yysymbol_kind_t".
This allows users to tailor the error messages the way they want. This allows users to tailor the error messages the way they want, or to
process some symbols in a specific way in autocompletion (see the
bistromathic example below).
*** Modernize display of explanatory statements in diagnostics *** Modernize display of explanatory statements in diagnostics
@@ -166,12 +180,18 @@ GNU Bison NEWS
The lexcalc example (a simple example in C based on Flex and Bison) now The lexcalc example (a simple example in C based on Flex and Bison) now
also demonstrates location tracking. also demonstrates location tracking.
A new C example, bistromathic, is a fully featured interactive calculator A new C example, bistromathic, is a fully featured interactive calculator
using many Bison features: pure interface, push parser, autocompletion using many Bison features: pure interface, push parser, autocompletion
based on the current parser state (using yypstate_expected_tokens), based on the current parser state (using yypstate_expected_tokens),
location tracking, internationalized custom error messages, lookahead location tracking, internationalized custom error messages, lookahead
correction, rich debug traces, etc. correction, rich debug traces, etc.
It shows how to depend on the symbol kinds to tailor autocompletion. For
instance it recognizes the symbol kind "VARIABLE" to propose
autocompletion on the existing variables, rather than of the word
"variable".
* Noteworthy changes in release 3.5.4 (2020-04-05) [stable] * Noteworthy changes in release 3.5.4 (2020-04-05) [stable]
** WARNING: Future backward-incompatibilities! ** WARNING: Future backward-incompatibilities!

9
TODO
View File

@@ -19,12 +19,11 @@
- symbol.type_get should be kind_get, and it's not documented. - symbol.type_get should be kind_get, and it's not documented.
- YYERRCODE and "end of file" and translation - YYERRCODE and "end of file" and translation
*** The documentation ** Java
You can explicitly specify the numeric code for a token type... *** Examples
Have an example with a push parser. Use autocompletion in that case.
The token numbered as 0. *** calc.at
** Java: calc.at
Stop hard-coding "Calc". Adjust local.at (look for FIXME). Stop hard-coding "Calc". Adjust local.at (look for FIXME).
** doc ** doc

View File

@@ -1232,7 +1232,7 @@ action in a GLR parser.
@cindex GLR parsers and @code{yylval} @cindex GLR parsers and @code{yylval}
@vindex yylloc @vindex yylloc
@cindex GLR parsers and @code{yylloc} @cindex GLR parsers and @code{yylloc}
In any semantic action, you can examine @code{yychar} to determine the type In any semantic action, you can examine @code{yychar} to determine the kind
of the lookahead token present at the time of the associated reduction. of the lookahead token present at the time of the associated reduction.
After checking that @code{yychar} is not set to @code{YYEMPTY} or After checking that @code{yychar} is not set to @code{YYEMPTY} or
@code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to @code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to
@@ -1853,7 +1853,7 @@ for such a single-character token is the character itself.
The return value of the lexical analyzer function is a numeric code which The return value of the lexical analyzer function is a numeric code which
represents a token kind. The same text used in Bison rules to stand for represents a token kind. The same text used in Bison rules to stand for
this token kind is also a C expression for the numeric code for the type. this token kind is also a C expression for the numeric code of the kind.
This works in two ways. If the token kind is a character literal, then its This works in two ways. If the token kind is a character literal, then its
numeric code is that of the character; you can use the same character numeric code is that of the character; you can use the same character
literal in the lexical analyzer to express the number. If the token kind is literal in the lexical analyzer to express the number. If the token kind is
@@ -2230,14 +2230,13 @@ the same as the declarations for the infix notation calculator.
@end example @end example
@noindent @noindent
Note there are no declarations specific to locations. Defining a data Note there are no declarations specific to locations. Defining a data type
type for storing locations is not needed: we will use the type provided for storing locations is not needed: we will use the type provided by
by default (@pxref{Location Type}), which is a default (@pxref{Location Type}), which is a four member structure with the
four member structure with the following integer fields: following integer fields: @code{first_line}, @code{first_column},
@code{first_line}, @code{first_column}, @code{last_line} and @code{last_line} and @code{last_column}. By conventions, and in accordance
@code{last_column}. By conventions, and in accordance with the GNU with the GNU Coding Standards and common practice, the line and column count
Coding Standards and common practice, the line and column count both both start at 1.
start at 1.
@node Ltcalc Rules @node Ltcalc Rules
@subsection Grammar Rules for @code{ltcalc} @subsection Grammar Rules for @code{ltcalc}
@@ -2646,7 +2645,7 @@ By simply editing the initialization list and adding the necessary include
files, you can add additional functions to the calculator. files, you can add additional functions to the calculator.
Two important functions allow look-up and installation of symbols in the Two important functions allow look-up and installation of symbols in the
symbol table. The function @code{putsym} is passed a name and the type symbol table. The function @code{putsym} is passed a name and the kind
(@code{VAR} or @code{FUN}) of the object to be installed. The object is (@code{VAR} or @code{FUN}) of the object to be installed. The object is
linked to the front of the list, and a pointer to the object is returned. linked to the front of the list, and a pointer to the object is returned.
The function @code{getsym} is passed the name of the symbol to look up. If The function @code{getsym} is passed the name of the symbol to look up. If
@@ -3698,10 +3697,9 @@ In a simple program it may be sufficient to use the same data type for
the semantic values of all language constructs. This was true in the the semantic values of all language constructs. This was true in the
RPN and infix calculator examples (@pxref{RPN Calc}). RPN and infix calculator examples (@pxref{RPN Calc}).
Bison normally uses the type @code{int} for semantic values if your Bison normally uses the type @code{int} for semantic values if your program
program uses the same data type for all language constructs. To uses the same data type for all language constructs. To specify some other
specify some other type, define the @code{%define} variable type, define the @code{%define} variable @code{api.value.type} like this:
@code{api.value.type} like this:
@example @example
%define api.value.type @{double@} %define api.value.type @{double@}
@@ -4492,10 +4490,9 @@ Defining a data type for locations is much simpler than for semantic values,
since all tokens and groupings always use the same type. since all tokens and groupings always use the same type.
You can specify the type of locations by defining a macro called You can specify the type of locations by defining a macro called
@code{YYLTYPE}, just as you can specify the semantic value type by @code{YYLTYPE}, just as you can specify the semantic value type by defining
defining a @code{YYSTYPE} macro (@pxref{Value Type}). a @code{YYSTYPE} macro (@pxref{Value Type}). When @code{YYLTYPE} is not
When @code{YYLTYPE} is not defined, Bison uses a default structure type with defined, Bison uses a default structure type with four members:
four members:
@example @example
typedef struct YYLTYPE typedef struct YYLTYPE
@@ -7161,7 +7158,7 @@ yylex (void)
return c; /* Assume token kind for '+' is '+'. */ return c; /* Assume token kind for '+' is '+'. */
@dots{} @dots{}
else else
return INT; /* Return the type of the token. */ return INT; /* Return the kind of the token. */
@dots{} @dots{}
@} @}
@end example @end example
@@ -7211,7 +7208,7 @@ the type is @code{int} (the default), you might write this in @code{yylex}:
@group @group
@dots{} @dots{}
yylval = value; /* Put value onto Bison stack. */ yylval = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */ return INT; /* Return the kind of the token. */
@dots{} @dots{}
@end group @end group
@end example @end example
@@ -7238,7 +7235,7 @@ then the code in @code{yylex} might look like this:
@group @group
@dots{} @dots{}
yylval.intval = value; /* Put value onto Bison stack. */ yylval.intval = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */ return INT; /* Return the kind of the token. */
@dots{} @dots{}
@end group @end group
@end example @end example
@@ -7279,7 +7276,7 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
@{ @{
@dots{} @dots{}
*lvalp = value; /* Put value onto Bison stack. */ *lvalp = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */ return INT; /* Return the kind of the token. */
@dots{} @dots{}
@} @}
@end example @end example
@@ -8383,15 +8380,14 @@ represent the entire sequence of terminal and nonterminal symbols at or
near the top of the stack. The current state collects all the information near the top of the stack. The current state collects all the information
about previous input which is relevant to deciding what to do next. about previous input which is relevant to deciding what to do next.
Each time a lookahead token is read, the current parser state together Each time a lookahead token is read, the current parser state together with
with the type of lookahead token are looked up in a table. This table the kind of lookahead token are looked up in a table. This table entry can
entry can say, ``Shift the lookahead token.'' In this case, it also say, ``Shift the lookahead token.'' In this case, it also specifies the new
specifies the new parser state, which is pushed onto the top of the parser state, which is pushed onto the top of the parser stack. Or it can
parser stack. Or it can say, ``Reduce using rule number @var{n}.'' say, ``Reduce using rule number @var{n}.'' This means that a certain number
This means that a certain number of tokens or groupings are taken off of tokens or groupings are taken off the top of the stack, and replaced by
the top of the stack, and replaced by one grouping. In other words, one grouping. In other words, that number of states are popped from the
that number of states are popped from the stack, and one new state is stack, and one new state is pushed.
pushed.
There is one other alternative: the table can say that the lookahead token There is one other alternative: the table can say that the lookahead token
is erroneous in the current state. This causes error processing to begin is erroneous in the current state. This causes error processing to begin
@@ -11624,8 +11620,8 @@ particular it produces a genuine @code{union}, which have a few specific
features in C++. features in C++.
@itemize @minus @itemize @minus
@item @item
The type @code{YYSTYPE} is defined but its use is discouraged: rather The type @code{YYSTYPE} is defined but its use is discouraged: rather you
you should refer to the parser's encapsulated type should refer to the parser's encapsulated type
@code{yy::parser::semantic_type}. @code{yy::parser::semantic_type}.
@item @item
Non POD (Plain Old Data) types cannot be used. C++98 forbids any instance Non POD (Plain Old Data) types cannot be used. C++98 forbids any instance