doc: updates for 3.6

* doc/bison.texi: More s/token type/token kind/.
* NEWS: Update.
This commit is contained in:
Akim Demaille
2020-04-13 19:06:06 +02:00
parent caadfc552b
commit 5d983253f7
3 changed files with 70 additions and 55 deletions

52
NEWS
View File

@@ -19,7 +19,7 @@ GNU Bison NEWS
*** Improved syntax error messages
Two new values for the %define parse.error variable offer more control to
the user.
the user. Available in all the skeletons (C, C++, Java).
**** %define parse.error detailed
@@ -34,7 +34,12 @@ GNU Bison NEWS
**** %define parse.error custom
With this directive, the user forges and emits the syntax error message
herself by defining a function such as:
herself by defining the yyreport_syntax_error function. A new type,
yypcontext_t, captures the circumstances of the error, and provides the
user with functions to get details, such as yypcontext_expected_tokens to
get the list of expected token kinds.
A possible implementation of yyreport_syntax_error is:
int
yyreport_syntax_error (const yypcontext_t *ctx)
@@ -86,35 +91,42 @@ GNU Bison NEWS
*** List of expected tokens (yacc.c)
At any point during parsing (including even before submitting the first
token), push parsers may now invoke yypstate_expected_tokens to get the
list of possible tokens. This feature can be used to propose
autocompletion (see below the "bistromathic" example).
Push parsers may invoke yypstate_expected_tokens at any point during
parsing (including even before submitting the first token) to get the list
of possible tokens. This feature can be used to propose autocompletion
(see below the "bistromathic" example).
It makes little sense to use this feature without enabling LAC (lookahead
correction).
*** Deep overhaul of the symbol and token kinds
To avoid the confusion with typing in programming languages, we now refer
to token and symbol "kinds" instead of token and symbol "types".
To avoid the confusion with types in programming languages, we now refer
to token and symbol "kinds" instead of token and symbol "types". The
documentation and error messages have been revised.
All the skeletons have been updated to use dedicated enum types rather
than integral types. Special symbols are now regular citizens, instead of
being declared in ad hoc ways.
**** Token kinds
The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER,
LPAREN, etc. Users are invited to replace their uses of "enum
yytokentype" by "yytoken_kind_t".
LPAREN, etc. While backward compatibility is of course ensured, users are
nonetheless invited to replace their uses of "enum yytokentype" by
"yytoken_kind_t".
This type now also includes tokens that were previously hidden: YYEOF (end
of input), YYUNDEF (undefined token), and YYERRCODE (error token). They
now have string aliases, internationalized if internationalization is
now have string aliases, internationalized when internationalization is
enabled. Therefore, by default, error messages now refer to "end of file"
(internationalized) rather than the cryptic "$end".
(internationalized) rather than the cryptic "$end", or to "invaid token"
rather than "$undefined".
In most case, it is now useless to define the end-of-line token as
follows:
Therefore in most cases it is now useless to define the end-of-line token
as follows:
%token EOF 0 _("end of file")
%token T_EOF 0 "end of file"
Rather simply use "YYEOF" in your scanner.
@@ -126,7 +138,9 @@ GNU Bison NEWS
They are now exposed as a enum, "yysymbol_kind_t".
This allows users to tailor the error messages the way they want.
This allows users to tailor the error messages the way they want, or to
process some symbols in a specific way in autocompletion (see the
bistromathic example below).
*** Modernize display of explanatory statements in diagnostics
@@ -166,12 +180,18 @@ GNU Bison NEWS
The lexcalc example (a simple example in C based on Flex and Bison) now
also demonstrates location tracking.
A new C example, bistromathic, is a fully featured interactive calculator
using many Bison features: pure interface, push parser, autocompletion
based on the current parser state (using yypstate_expected_tokens),
location tracking, internationalized custom error messages, lookahead
correction, rich debug traces, etc.
It shows how to depend on the symbol kinds to tailor autocompletion. For
instance it recognizes the symbol kind "VARIABLE" to propose
autocompletion on the existing variables, rather than of the word
"variable".
* Noteworthy changes in release 3.5.4 (2020-04-05) [stable]
** WARNING: Future backward-incompatibilities!

9
TODO
View File

@@ -19,12 +19,11 @@
- symbol.type_get should be kind_get, and it's not documented.
- YYERRCODE and "end of file" and translation
*** The documentation
You can explicitly specify the numeric code for a token type...
** Java
*** Examples
Have an example with a push parser. Use autocompletion in that case.
The token numbered as 0.
** Java: calc.at
*** calc.at
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
** doc

View File

@@ -1232,7 +1232,7 @@ action in a GLR parser.
@cindex GLR parsers and @code{yylval}
@vindex yylloc
@cindex GLR parsers and @code{yylloc}
In any semantic action, you can examine @code{yychar} to determine the type
In any semantic action, you can examine @code{yychar} to determine the kind
of the lookahead token present at the time of the associated reduction.
After checking that @code{yychar} is not set to @code{YYEMPTY} or
@code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to
@@ -1853,7 +1853,7 @@ for such a single-character token is the character itself.
The return value of the lexical analyzer function is a numeric code which
represents a token kind. The same text used in Bison rules to stand for
this token kind is also a C expression for the numeric code for the type.
this token kind is also a C expression for the numeric code of the kind.
This works in two ways. If the token kind is a character literal, then its
numeric code is that of the character; you can use the same character
literal in the lexical analyzer to express the number. If the token kind is
@@ -2230,14 +2230,13 @@ the same as the declarations for the infix notation calculator.
@end example
@noindent
Note there are no declarations specific to locations. Defining a data
type for storing locations is not needed: we will use the type provided
by default (@pxref{Location Type}), which is a
four member structure with the following integer fields:
@code{first_line}, @code{first_column}, @code{last_line} and
@code{last_column}. By conventions, and in accordance with the GNU
Coding Standards and common practice, the line and column count both
start at 1.
Note there are no declarations specific to locations. Defining a data type
for storing locations is not needed: we will use the type provided by
default (@pxref{Location Type}), which is a four member structure with the
following integer fields: @code{first_line}, @code{first_column},
@code{last_line} and @code{last_column}. By conventions, and in accordance
with the GNU Coding Standards and common practice, the line and column count
both start at 1.
@node Ltcalc Rules
@subsection Grammar Rules for @code{ltcalc}
@@ -2646,7 +2645,7 @@ By simply editing the initialization list and adding the necessary include
files, you can add additional functions to the calculator.
Two important functions allow look-up and installation of symbols in the
symbol table. The function @code{putsym} is passed a name and the type
symbol table. The function @code{putsym} is passed a name and the kind
(@code{VAR} or @code{FUN}) of the object to be installed. The object is
linked to the front of the list, and a pointer to the object is returned.
The function @code{getsym} is passed the name of the symbol to look up. If
@@ -3698,10 +3697,9 @@ In a simple program it may be sufficient to use the same data type for
the semantic values of all language constructs. This was true in the
RPN and infix calculator examples (@pxref{RPN Calc}).
Bison normally uses the type @code{int} for semantic values if your
program uses the same data type for all language constructs. To
specify some other type, define the @code{%define} variable
@code{api.value.type} like this:
Bison normally uses the type @code{int} for semantic values if your program
uses the same data type for all language constructs. To specify some other
type, define the @code{%define} variable @code{api.value.type} like this:
@example
%define api.value.type @{double@}
@@ -4492,10 +4490,9 @@ Defining a data type for locations is much simpler than for semantic values,
since all tokens and groupings always use the same type.
You can specify the type of locations by defining a macro called
@code{YYLTYPE}, just as you can specify the semantic value type by
defining a @code{YYSTYPE} macro (@pxref{Value Type}).
When @code{YYLTYPE} is not defined, Bison uses a default structure type with
four members:
@code{YYLTYPE}, just as you can specify the semantic value type by defining
a @code{YYSTYPE} macro (@pxref{Value Type}). When @code{YYLTYPE} is not
defined, Bison uses a default structure type with four members:
@example
typedef struct YYLTYPE
@@ -7161,7 +7158,7 @@ yylex (void)
return c; /* Assume token kind for '+' is '+'. */
@dots{}
else
return INT; /* Return the type of the token. */
return INT; /* Return the kind of the token. */
@dots{}
@}
@end example
@@ -7211,7 +7208,7 @@ the type is @code{int} (the default), you might write this in @code{yylex}:
@group
@dots{}
yylval = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */
return INT; /* Return the kind of the token. */
@dots{}
@end group
@end example
@@ -7238,7 +7235,7 @@ then the code in @code{yylex} might look like this:
@group
@dots{}
yylval.intval = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */
return INT; /* Return the kind of the token. */
@dots{}
@end group
@end example
@@ -7279,7 +7276,7 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
@{
@dots{}
*lvalp = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */
return INT; /* Return the kind of the token. */
@dots{}
@}
@end example
@@ -8383,15 +8380,14 @@ represent the entire sequence of terminal and nonterminal symbols at or
near the top of the stack. The current state collects all the information
about previous input which is relevant to deciding what to do next.
Each time a lookahead token is read, the current parser state together
with the type of lookahead token are looked up in a table. This table
entry can say, ``Shift the lookahead token.'' In this case, it also
specifies the new parser state, which is pushed onto the top of the
parser stack. Or it can say, ``Reduce using rule number @var{n}.''
This means that a certain number of tokens or groupings are taken off
the top of the stack, and replaced by one grouping. In other words,
that number of states are popped from the stack, and one new state is
pushed.
Each time a lookahead token is read, the current parser state together with
the kind of lookahead token are looked up in a table. This table entry can
say, ``Shift the lookahead token.'' In this case, it also specifies the new
parser state, which is pushed onto the top of the parser stack. Or it can
say, ``Reduce using rule number @var{n}.'' This means that a certain number
of tokens or groupings are taken off the top of the stack, and replaced by
one grouping. In other words, that number of states are popped from the
stack, and one new state is pushed.
There is one other alternative: the table can say that the lookahead token
is erroneous in the current state. This causes error processing to begin
@@ -11624,8 +11620,8 @@ particular it produces a genuine @code{union}, which have a few specific
features in C++.
@itemize @minus
@item
The type @code{YYSTYPE} is defined but its use is discouraged: rather
you should refer to the parser's encapsulated type
The type @code{YYSTYPE} is defined but its use is discouraged: rather you
should refer to the parser's encapsulated type
@code{yy::parser::semantic_type}.
@item
Non POD (Plain Old Data) types cannot be used. C++98 forbids any instance