Merge branch maint

* maint: maint: post-release administrivia version 3.3.2 style: minor fixes NEWS: named constructors are preferable to symbol_type ctors gram: fix handling of nterms in actions when some are unused style: rename local variable CI: update the ICC serial number for travis-ci.org
2026-07-25 21:40:32 +00:00 · 2019-02-03 15:23:54 +01:00
parent 56c00ed1ea 3d25b52a10
commit cf96d1b0af
10 changed files with 198 additions and 52 deletions
@@ -1 +1 @@
-3.3.1
+3.3.2
@@ -9,6 +9,13 @@ GNU Bison NEWS
  When given -fsyntax-only, the diagnostics are reported, but no output is
  generated.

+* Noteworthy changes in release 3.3.2 (2019-02-03) [stable]
+
+** Bug fixes
+
+  Bison 3.3 failed to generate parsers for grammars with unused nonterminal
+  symbols.
+
 * Noteworthy changes in release 3.3.1 (2019-01-27) [stable]

 ** Changes
@@ -225,17 +232,18 @@ GNU Bison NEWS
    symbol_type (int token, const int&);
    symbol_type (int token);

-  which should be used in a Flex-scanner as follows.
-
-    %%
-    [a-z]+   return yy::parser::symbol_type (ID, yytext);
-    [0-9]+   return yy::parser::symbol_type (INT, text_to_int (yytext);
-    ":"      return yy::parser::symbol_type (’:’);
-    <<EOF>>  return yy::parser::symbol_type (0);
-
  Correct matching between token types and value types is checked via
-  'assert'.  For instance, 'symbol_type (ID, 42)' would abort (while
-  'make_ID (42)' would not even compile).
+  'assert'; for instance, 'symbol_type (ID, 42)' would abort.  Named
+  constructors are preferable, as they offer better type safety (for
+  instance 'make_ID (42)' would not even compile), but symbol_type
+  constructors may help when token types are discovered at run-time, e.g.,
+
+     [a-z]+   {
+                if (auto i = lookup_keyword (yytext))
+                  return yy::parser::symbol_type (i);
+                else
+                  return yy::parser::make_ID (yytext);
+              }

 *** C++: Variadic emplace

@@ -3488,7 +3496,7 @@ along with this program.  If not, see <http://www.gnu.org/licenses/>.
 LocalWords:  Heimbigner AST src ast Makefile srcdir MinGW xxlex XXSTYPE
 LocalWords:  XXLTYPE strictfp IDEs ffixit fdiagnostics parseable fixits
 LocalWords:  Wdeprecated yytext Variadic variadic yyrhs yyphrs RCS README
- LocalWords:  noexcept constexpr ispell american
+ LocalWords:  noexcept constexpr ispell american deprecations

 Local Variables:
 ispell-dictionary: "american"
@@ -18,6 +18,7 @@ Antonio Silva Correia     [email protected]
 Arnold Robbins            [email protected]
 Art Haas                  [email protected]
 Askar Safin               [email protected]
+Balázs Scheidler          [email protected]
 Baron Schwartz            [email protected]
 Ben Pfaff                 [email protected]
 Benoit Perrot             [email protected]
@@ -75,48 +75,75 @@ skeletons.

 ## Symbols

+### `b4_symbol(NUM, FIELD)`
 In order to unify the handling of the various aspects of symbols (tag, type
 name, whether terminal, etc.), bison.exe defines one macro per (token,
 field), where field can `has_id`, `id`, etc.: see
-src/output.c:prepare_symbols_definitions().
+`prepare_symbols_definitions()` in `src/output.c`.

-The various FIELDS are:
+The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS:
+
+- `has_id`: 0 or 1.

- has_id: 0 or 1.
  Whether the symbol has an id.
- id: string
-  If has_id, the id.  Guaranteed to be usable as a C identifier.
-  Prefixed by api.token.prefix if defined.
- tag: string.
+
+- `id`: string
+  If has_id, the id (prefixed by api.token.prefix if defined), otherwise
+  defined as empty.  Guaranteed to be usable as a C identifier.
+
+- `tag`: string.
  A representation of the symbol.  Can be 'foo', 'foo.id', '"foo"' etc.
- user_number: integer
+
+- `user_number`: integer
  The external number as used by yylex.  Can be ASCII code when a character,
  some number chosen by bison, or some user number in the case of
  %token FOO <NUM>.  Corresponds to yychar in yacc.c.
- is_token: 0 or 1
+
+- `is_token`: 0 or 1
  Whether this is a terminal symbol.
- number: integer
+
+- `number`: integer
  The internal number (computed from the external number by yytranslate).
  Corresponds to yytoken in yacc.c.  This is the same number that serves as
  key in b4_symbol(NUM, FIELD).
- has_type: 0, 1
+
+  In bison, symbols are first assigned increasing numbers in order of
+  appearance (but tokens first, then nterms).  After grammar reduction,
+  unused nterms are then renumbered to appear last (i.e., first tokens, then
+  used nterms and finally unused nterms).  This final number NUM is the one
+  contained in this field, and it is the one used as key in `b4_symbol(NUM,
+  FIELD)`.
+
+  The code of the rule actions, however, is emitted before we know what
+  symbols are unused, so they use the original numbers.  To avoid confusion,
+  they actually use "orig NUM" instead of just "NUM".  bison also emits
+  definitions for `b4_symbol(orig NUM, number)` that map from original
+  numbers to the new ones.  `b4_symbol` actually resolves `orig NUM` in the
+  other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
+  symbols whose original number was 42.
+
+- `has_type`: 0, 1
  Whether has a semantic value.
- type_tag: string
+
+- `type_tag`: string
  When api.value.type=union, the generated name for the union member.
  yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
- type
+
+- `type`
  If it has a semantic value, its type tag, or, if variant are used,
  its type.
  In the case of api.value.type=union, type is the real type (e.g. int).
- has_printer: 0, 1
- printer: string
- printer_file: string
- printer_line: integer
+
+- `has_printer`: 0, 1
+- `printer`: string
+- `printer_file`: string
+- `printer_line`: integer
  If the symbol has a printer, everything about it.
- has_destructor, destructor, destructor_file, destructor_line
+
+- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`
  Likewise.

-### b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])
+### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
 Expansion of $$, $1, $<TYPE-TAG>3, etc.

 The semantic value from a given VAL.
@@ -127,14 +154,14 @@ The semantic value from a given VAL.
 The result can be used safely, it is put in parens to avoid nasty precedence
 issues.

-### b4_lhs_value(SYMBOL-NUM, [TYPE])
+### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
 Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.

-### b4_rhs_data(RULE-LENGTH, POS)
+### `b4_rhs_data(RULE-LENGTH, POS)`
 The data corresponding to the symbol `#POS`, where the current rule has
 `RULE-LENGTH` symbols on RHS.

-### b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])
+### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
 Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
 on RHS.

@@ -389,17 +389,28 @@ m4_define([b4_glr_cc_if],
 #
 # The following macros provide access to symbol related values.

-# _b4_symbol(NUM, FIELD)
-# ----------------------
+# __b4_symbol(NUM, FIELD)
+# -----------------------
 # Recover a FIELD about symbol #NUM.  Thanks to m4_indir, fails if
 # undefined.
-m4_define([_b4_symbol],
+m4_define([__b4_symbol],
 [m4_indir([b4_symbol($1, $2)])])


+# _b4_symbol(NUM, FIELD)
+# ----------------------
+# Recover a FIELD about symbol #NUM (or "orig NUM").  Fails if
+# undefined.
+m4_define([_b4_symbol],
+[m4_ifdef([b4_symbol($1, number)],
+          [__b4_symbol(m4_indir([b4_symbol($1, number)]), $2)],
+          [__b4_symbol([$1], [$2])])])
+
+
+
 # b4_symbol(NUM, FIELD)
 # ---------------------
-# Recover a FIELD about symbol #NUM.  Thanks to m4_indir, fails if
+# Recover a FIELD about symbol #NUM (or "orig NUM").  Fails if
 # undefined.  If FIELD = id, prepend the token prefix.
 m4_define([b4_symbol],
 [m4_case([$2],
@@ -38,6 +38,7 @@
 #include "muscle-tab.h"
 #include "output.h"
 #include "reader.h"
+#include "reduce.h"
 #include "scan-code.h"    /* max_left_semantic_context */
 #include "scan-skel.h"
 #include "symtab.h"
@@ -413,6 +414,14 @@ merger_output (FILE *out)
 static void
 prepare_symbol_definitions (void)
 {
+  /* Map "orig NUM" to new numbers.  See data/README.  */
+  for (symbol_number i = ntokens; i < nsyms + nuseless_nonterminals; ++i)
+    {
+      obstack_printf (&format_obstack, "symbol(orig %d, number)", i);
+      const char *key = obstack_finish0 (&format_obstack);
+      MUSCLE_INSERT_INT (key, nterm_map ? nterm_map[i - ntokens] : i);
+    }
+
  for (int i = 0; i < nsyms; ++i)
    {
      symbol *sym = symbols[i];
@@ -258,22 +258,23 @@ reduce_grammar_tables (void)
 | Remove useless nonterminals.  |
 `------------------------------*/

+symbol_number *nterm_map = NULL;
+
 static void
 nonterminals_reduce (void)
 {
+  nterm_map = xnmalloc (nvars, sizeof *nterm_map);
  /* Map the nonterminals to their new index: useful first, useless
     afterwards.  Kept for later report.  */
-
-  symbol_number *nontermmap = xnmalloc (nvars, sizeof *nontermmap);
  {
    symbol_number n = ntokens;
    for (symbol_number i = ntokens; i < nsyms; ++i)
      if (bitset_test (V, i))
-        nontermmap[i - ntokens] = n++;
+        nterm_map[i - ntokens] = n++;
    for (symbol_number i = ntokens; i < nsyms; ++i)
      if (!bitset_test (V, i))
        {
-          nontermmap[i - ntokens] = n++;
+          nterm_map[i - ntokens] = n++;
          if (symbols[i]->content->status != used)
            complain (&symbols[i]->location, Wother,
                      _("nonterminal useless in grammar: %s"),
@@ -281,32 +282,30 @@ nonterminals_reduce (void)
        }
  }

-
  /* Shuffle elements of tables indexed by symbol number.  */
  {
    symbol **symbols_sorted = xnmalloc (nvars, sizeof *symbols_sorted);
    for (symbol_number i = ntokens; i < nsyms; ++i)
-      symbols[i]->content->number = nontermmap[i - ntokens];
+      symbols[i]->content->number = nterm_map[i - ntokens];
    for (symbol_number i = ntokens; i < nsyms; ++i)
-      symbols_sorted[nontermmap[i - ntokens] - ntokens] = symbols[i];
+      symbols_sorted[nterm_map[i - ntokens] - ntokens] = symbols[i];
    for (symbol_number i = ntokens; i < nsyms; ++i)
      symbols[i] = symbols_sorted[i - ntokens];
    free (symbols_sorted);
  }

+  /* Update nonterminal numbers in the RHS of the rules.  LHS are
+     pointers to the symbol structure, they don't need renumbering. */
  {
    for (rule_number r = 0; r < nrules; ++r)
      for (item_number *rhsp = rules[r].rhs; 0 <= *rhsp; ++rhsp)
        if (ISVAR (*rhsp))
-          *rhsp =  symbol_number_as_item_number (nontermmap[*rhsp
-                                                            - ntokens]);
-    accept->content->number = nontermmap[accept->content->number - ntokens];
+          *rhsp = symbol_number_as_item_number (nterm_map[*rhsp - ntokens]);
+    accept->content->number = nterm_map[accept->content->number - ntokens];
  }

  nsyms -= nuseless_nonterminals;
  nvars -= nuseless_nonterminals;
-
-  free (nontermmap);
 }


@@ -432,4 +431,6 @@ reduce_free (void)
  bitset_free (V);
  bitset_free (V1);
  bitset_free (P);
+  free (nterm_map);
+  nterm_map = NULL;
 }
@@ -32,6 +32,11 @@ bool reduce_nonterminal_useless_in_grammar (const sym_content *sym);

 void reduce_free (void);

+/** Map initial nterm numbers to the new ones.  Built by
+ * reduce_grammar.  Size nvars + nuseless_nonterminals.  */
+extern symbol_number *nterm_map;
+
 extern unsigned nuseless_nonterminals;
 extern unsigned nuseless_productions;
+
 #endif /* !REDUCE_H_ */
@@ -648,7 +648,7 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc)
              untyped_var_seen = true;
          }

-        obstack_printf (&obstack_for_string, "]b4_lhs_value(%d, ",
+        obstack_printf (&obstack_for_string, "]b4_lhs_value(orig %d, ",
                        sym->content.sym->content->number);
        obstack_quote (&obstack_for_string, type_name);
        obstack_sgrow (&obstack_for_string, ")[");
@@ -677,7 +677,9 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc)
                        "]b4_rhs_value(%d, %d, ",
                        effective_rule_length, n);
        if (sym)
-          obstack_printf (&obstack_for_string, "%d, ", sym->content.sym->content->number);
+          obstack_printf (&obstack_for_string, "%s%d, ",
+                          sym->content.sym->content->class == nterm_sym ? "orig " : "",
+                          sym->content.sym->content->number);
        else
          obstack_sgrow (&obstack_for_string, "[], ");

@@ -217,6 +217,88 @@ AT_CLEANUP



+## --------------- ##
+## Useless Parts.  ##
+## --------------- ##
+
+AT_SETUP([Useless Parts])
+
+# We used to emit code that used symbol numbers before the useless
+# symbol elimination, hence before the renumbering of the useful
+# symbols.  As a result, the evaluation of the skeleton failed because
+# it used non existing symbol numbers.  Which is the happy scenario:
+# we could use numbers of other existing symbols...
+# http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
+
+AT_BISON_OPTION_PUSHDEFS
+AT_DATA([[input.y]],
+[[%code {
+  ]AT_YYERROR_DECLARE_EXTERN[
+  ]AT_YYLEX_DECLARE_EXTERN[
+}
+%union { void* ptr; }
+%type <ptr> used1
+%type <ptr> used2
+
+%%
+start
+ : used1
+ ;
+
+used1
+ : used2 { $$ = $1; }
+ ;
+
+unused
+ : used2
+ ;
+
+used2
+ : { $$ = YY_NULLPTR; }
+ ;
+]])
+
+AT_BISON_CHECK([[-fcaret -rall -o input.c input.y]], 0, [],
+[[input.y: warning: 1 nonterminal useless in grammar [-Wother]
+input.y: warning: 1 rule useless in grammar [-Wother]
+input.y:18.1-6: warning: nonterminal useless in grammar: unused [-Wother]
+ unused
+ ^~~~~~
+]])
+
+
+AT_CHECK([[sed -n '/^State 0/q;/^$/!p' input.output]], 0,
+[[Nonterminals useless in grammar
+   unused
+Rules useless in grammar
+    4 unused: used2
+Grammar
+    0 $accept: start $end
+    1 start: used1
+    2 used1: used2
+    3 used2: %empty
+Terminals, with rules where they appear
+$end (0) 0
+error (256)
+Nonterminals, with rules where they appear
+$accept (3)
+    on left: 0
+start (4)
+    on left: 1, on right: 0
+used1 <ptr> (5)
+    on left: 2, on right: 1
+used2 <ptr> (6)
+    on left: 3, on right: 2
+]])
+
+# Make sure the generated parser is correct.
+AT_COMPILE([input.o])
+
+AT_BISON_OPTION_POPDEFS
+AT_CLEANUP
+
+
+
 ## ------------------- ##
 ## Reduced Automaton.  ##
 ## ------------------- ##
@@ -1 +1 @@
 .3.1
 .3.2