doc: updates for 3.6

* doc/bison.texi: More s/token type/token kind/. * NEWS: Update.
2026-04-24 02:29:43 +00:00 · 2020-04-13 19:06:06 +02:00
parent caadfc552b
commit 5d983253f7
3 changed files with 70 additions and 55 deletions
@@ -19,7 +19,7 @@ GNU Bison NEWS
 *** Improved syntax error messages

  Two new values for the %define parse.error variable offer more control to
-  the user.
+  the user.  Available in all the skeletons (C, C++, Java).

 **** %define parse.error detailed

@@ -34,7 +34,12 @@ GNU Bison NEWS
 **** %define parse.error custom

  With this directive, the user forges and emits the syntax error message
-  herself by defining a function such as:
+  herself by defining the yyreport_syntax_error function.  A new type,
+  yypcontext_t, captures the circumstances of the error, and provides the
+  user with functions to get details, such as yypcontext_expected_tokens to
+  get the list of expected token kinds.
+
+  A possible implementation of yyreport_syntax_error is:

    int
    yyreport_syntax_error (const yypcontext_t *ctx)
@@ -86,35 +91,42 @@ GNU Bison NEWS

 *** List of expected tokens (yacc.c)

-  At any point during parsing (including even before submitting the first
-  token), push parsers may now invoke yypstate_expected_tokens to get the
-  list of possible tokens.  This feature can be used to propose
-  autocompletion (see below the "bistromathic" example).
+  Push parsers may invoke yypstate_expected_tokens at any point during
+  parsing (including even before submitting the first token) to get the list
+  of possible tokens.  This feature can be used to propose autocompletion
+  (see below the "bistromathic" example).

  It makes little sense to use this feature without enabling LAC (lookahead
  correction).

 *** Deep overhaul of the symbol and token kinds

-  To avoid the confusion with typing in programming languages, we now refer
-  to token and symbol "kinds" instead of token and symbol "types".
+  To avoid the confusion with types in programming languages, we now refer
+  to token and symbol "kinds" instead of token and symbol "types".  The
+  documentation and error messages have been revised.
+
+  All the skeletons have been updated to use dedicated enum types rather
+  than integral types.  Special symbols are now regular citizens, instead of
+  being declared in ad hoc ways.

 **** Token kinds

  The "token kind" is what is returned by the scanner, e.g., PLUS, NUMBER,
-  LPAREN, etc.  Users are invited to replace their uses of "enum
-  yytokentype" by "yytoken_kind_t".
+  LPAREN, etc.  While backward compatibility is of course ensured, users are
+  nonetheless invited to replace their uses of "enum yytokentype" by
+  "yytoken_kind_t".

  This type now also includes tokens that were previously hidden: YYEOF (end
  of input), YYUNDEF (undefined token), and YYERRCODE (error token).  They
-  now have string aliases, internationalized if internationalization is
+  now have string aliases, internationalized when internationalization is
  enabled.  Therefore, by default, error messages now refer to "end of file"
-  (internationalized) rather than the cryptic "$end".
+  (internationalized) rather than the cryptic "$end", or to "invaid token"
+  rather than "$undefined".

-  In most case, it is now useless to define the end-of-line token as
-  follows:
+  Therefore in most cases it is now useless to define the end-of-line token
+  as follows:

-    %token EOF 0  _("end of file")
+    %token T_EOF 0 "end of file"

  Rather simply use "YYEOF" in your scanner.

@@ -126,7 +138,9 @@ GNU Bison NEWS

  They are now exposed as a enum, "yysymbol_kind_t".

-  This allows users to tailor the error messages the way they want.
+  This allows users to tailor the error messages the way they want, or to
+  process some symbols in a specific way in autocompletion (see the
+  bistromathic example below).

 *** Modernize display of explanatory statements in diagnostics

@@ -166,12 +180,18 @@ GNU Bison NEWS
  The lexcalc example (a simple example in C based on Flex and Bison) now
  also demonstrates location tracking.

+
  A new C example, bistromathic, is a fully featured interactive calculator
  using many Bison features: pure interface, push parser, autocompletion
  based on the current parser state (using yypstate_expected_tokens),
  location tracking, internationalized custom error messages, lookahead
  correction, rich debug traces, etc.

+  It shows how to depend on the symbol kinds to tailor autocompletion.  For
+  instance it recognizes the symbol kind "VARIABLE" to propose
+  autocompletion on the existing variables, rather than of the word
+  "variable".
+
 * Noteworthy changes in release 3.5.4 (2020-04-05) [stable]

 ** WARNING: Future backward-incompatibilities!
@@ -19,12 +19,11 @@
 - symbol.type_get should be kind_get, and it's not documented.
 - YYERRCODE and "end of file" and translation

-*** The documentation
-You can explicitly specify the numeric code for a token type...
+** Java
+*** Examples
+Have an example with a push parser.  Use autocompletion in that case.

-The token numbered as 0.
-
-** Java: calc.at
+*** calc.at
 Stop hard-coding "Calc".  Adjust local.at (look for FIXME).

 ** doc
@@ -1232,7 +1232,7 @@ action in a GLR parser.
@cindex GLR parsers and @code{yylval}
@vindex yylloc
@cindex GLR parsers and @code{yylloc}
-In any semantic action, you can examine @code{yychar} to determine the type
+In any semantic action, you can examine @code{yychar} to determine the kind
 of the lookahead token present at the time of the associated reduction.
 After checking that @code{yychar} is not set to @code{YYEMPTY} or
@code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to
@@ -1853,7 +1853,7 @@ for such a single-character token is the character itself.

 The return value of the lexical analyzer function is a numeric code which
 represents a token kind.  The same text used in Bison rules to stand for
-this token kind is also a C expression for the numeric code for the type.
+this token kind is also a C expression for the numeric code of the kind.
 This works in two ways.  If the token kind is a character literal, then its
 numeric code is that of the character; you can use the same character
 literal in the lexical analyzer to express the number.  If the token kind is
@@ -2230,14 +2230,13 @@ the same as the declarations for the infix notation calculator.
@end example

@noindent
-Note there are no declarations specific to locations.  Defining a data
-type for storing locations is not needed: we will use the type provided
-by default (@pxref{Location Type}), which is a
-four member structure with the following integer fields:
-@code{first_line}, @code{first_column}, @code{last_line} and
-@code{last_column}.  By conventions, and in accordance with the GNU
-Coding Standards and common practice, the line and column count both
-start at 1.
+Note there are no declarations specific to locations.  Defining a data type
+for storing locations is not needed: we will use the type provided by
+default (@pxref{Location Type}), which is a four member structure with the
+following integer fields: @code{first_line}, @code{first_column},
+@code{last_line} and @code{last_column}.  By conventions, and in accordance
+with the GNU Coding Standards and common practice, the line and column count
+both start at 1.

@node Ltcalc Rules
@subsection Grammar Rules for @code{ltcalc}
@@ -2646,7 +2645,7 @@ By simply editing the initialization list and adding the necessary include
 files, you can add additional functions to the calculator.

 Two important functions allow look-up and installation of symbols in the
-symbol table.  The function @code{putsym} is passed a name and the type
+symbol table.  The function @code{putsym} is passed a name and the kind
 (@code{VAR} or @code{FUN}) of the object to be installed.  The object is
 linked to the front of the list, and a pointer to the object is returned.
 The function @code{getsym} is passed the name of the symbol to look up.  If
@@ -3698,10 +3697,9 @@ In a simple program it may be sufficient to use the same data type for
 the semantic values of all language constructs.  This was true in the
 RPN and infix calculator examples (@pxref{RPN Calc}).

-Bison normally uses the type @code{int} for semantic values if your
-program uses the same data type for all language constructs.  To
-specify some other type, define the @code{%define} variable
-@code{api.value.type} like this:
+Bison normally uses the type @code{int} for semantic values if your program
+uses the same data type for all language constructs.  To specify some other
+type, define the @code{%define} variable @code{api.value.type} like this:

@example
 %define api.value.type @{double@}
@@ -4492,10 +4490,9 @@ Defining a data type for locations is much simpler than for semantic values,
 since all tokens and groupings always use the same type.

 You can specify the type of locations by defining a macro called
-@code{YYLTYPE}, just as you can specify the semantic value type by
-defining a @code{YYSTYPE} macro (@pxref{Value Type}).
-When @code{YYLTYPE} is not defined, Bison uses a default structure type with
-four members:
+@code{YYLTYPE}, just as you can specify the semantic value type by defining
+a @code{YYSTYPE} macro (@pxref{Value Type}).  When @code{YYLTYPE} is not
+defined, Bison uses a default structure type with four members:

@example
 typedef struct YYLTYPE
@@ -7161,7 +7158,7 @@ yylex (void)
    return c;      /* Assume token kind for '+' is '+'. */
  @dots{}
  else
-    return INT;    /* Return the type of the token. */
+    return INT;    /* Return the kind of the token. */
  @dots{}
@}
@end example
@@ -7211,7 +7208,7 @@ the type is @code{int} (the default), you might write this in @code{yylex}:
@group
  @dots{}
  yylval = value;  /* Put value onto Bison stack. */
-  return INT;      /* Return the type of the token. */
+  return INT;      /* Return the kind of the token. */
  @dots{}
@end group
@end example
@@ -7238,7 +7235,7 @@ then the code in @code{yylex} might look like this:
@group
  @dots{}
  yylval.intval = value; /* Put value onto Bison stack. */
-  return INT;            /* Return the type of the token. */
+  return INT;            /* Return the kind of the token. */
  @dots{}
@end group
@end example
@@ -7279,7 +7276,7 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
@{
  @dots{}
  *lvalp = value;  /* Put value onto Bison stack. */
-  return INT;      /* Return the type of the token. */
+  return INT;      /* Return the kind of the token. */
  @dots{}
@}
@end example
@@ -8383,15 +8380,14 @@ represent the entire sequence of terminal and nonterminal symbols at or
 near the top of the stack.  The current state collects all the information
 about previous input which is relevant to deciding what to do next.

-Each time a lookahead token is read, the current parser state together
-with the type of lookahead token are looked up in a table.  This table
-entry can say, ``Shift the lookahead token.''  In this case, it also
-specifies the new parser state, which is pushed onto the top of the
-parser stack.  Or it can say, ``Reduce using rule number @var{n}.''
-This means that a certain number of tokens or groupings are taken off
-the top of the stack, and replaced by one grouping.  In other words,
-that number of states are popped from the stack, and one new state is
-pushed.
+Each time a lookahead token is read, the current parser state together with
+the kind of lookahead token are looked up in a table.  This table entry can
+say, ``Shift the lookahead token.''  In this case, it also specifies the new
+parser state, which is pushed onto the top of the parser stack.  Or it can
+say, ``Reduce using rule number @var{n}.''  This means that a certain number
+of tokens or groupings are taken off the top of the stack, and replaced by
+one grouping.  In other words, that number of states are popped from the
+stack, and one new state is pushed.

 There is one other alternative: the table can say that the lookahead token
 is erroneous in the current state.  This causes error processing to begin
@@ -11624,8 +11620,8 @@ particular it produces a genuine @code{union}, which have a few specific
 features in C++.
@itemize @minus
@item
-The type @code{YYSTYPE} is defined but its use is discouraged: rather
-you should refer to the parser's encapsulated type
+The type @code{YYSTYPE} is defined but its use is discouraged: rather you
+should refer to the parser's encapsulated type
@code{yy::parser::semantic_type}.
@item
 Non POD (Plain Old Data) types cannot be used.  C++98 forbids any instance