diff --git a/.prev-version b/.prev-version index bea438e9..47725433 100644 --- a/.prev-version +++ b/.prev-version @@ -1 +1 @@ -3.3.1 +3.3.2 diff --git a/NEWS b/NEWS index 9889af47..39db9104 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ GNU Bison NEWS When given -fsyntax-only, the diagnostics are reported, but no output is generated. +* Noteworthy changes in release 3.3.2 (2019-02-03) [stable] + +** Bug fixes + + Bison 3.3 failed to generate parsers for grammars with unused nonterminal + symbols. + * Noteworthy changes in release 3.3.1 (2019-01-27) [stable] ** Changes @@ -225,17 +232,18 @@ GNU Bison NEWS symbol_type (int token, const int&); symbol_type (int token); - which should be used in a Flex-scanner as follows. - - %% - [a-z]+ return yy::parser::symbol_type (ID, yytext); - [0-9]+ return yy::parser::symbol_type (INT, text_to_int (yytext); - ":" return yy::parser::symbol_type (’:’); - <> return yy::parser::symbol_type (0); - Correct matching between token types and value types is checked via - 'assert'. For instance, 'symbol_type (ID, 42)' would abort (while - 'make_ID (42)' would not even compile). + 'assert'; for instance, 'symbol_type (ID, 42)' would abort. Named + constructors are preferable, as they offer better type safety (for + instance 'make_ID (42)' would not even compile), but symbol_type + constructors may help when token types are discovered at run-time, e.g., + + [a-z]+ { + if (auto i = lookup_keyword (yytext)) + return yy::parser::symbol_type (i); + else + return yy::parser::make_ID (yytext); + } *** C++: Variadic emplace @@ -3488,7 +3496,7 @@ along with this program. If not, see . LocalWords: Heimbigner AST src ast Makefile srcdir MinGW xxlex XXSTYPE LocalWords: XXLTYPE strictfp IDEs ffixit fdiagnostics parseable fixits LocalWords: Wdeprecated yytext Variadic variadic yyrhs yyphrs RCS README - LocalWords: noexcept constexpr ispell american + LocalWords: noexcept constexpr ispell american deprecations Local Variables: ispell-dictionary: "american" diff --git a/THANKS b/THANKS index f704388a..597ca3f7 100644 --- a/THANKS +++ b/THANKS @@ -18,6 +18,7 @@ Antonio Silva Correia amsilvacorreia@hotmail.com Arnold Robbins arnold@skeeve.com Art Haas ahaas@neosoft.com Askar Safin safinaskar@mail.ru +Balázs Scheidler balazs.scheidler@oneidentity.com Baron Schwartz baron@sequent.org Ben Pfaff blp@cs.stanford.edu Benoit Perrot benoit.perrot@epita.fr diff --git a/data/README.md b/data/README.md index 1a2e71cc..ed217365 100644 --- a/data/README.md +++ b/data/README.md @@ -75,48 +75,75 @@ skeletons. ## Symbols +### `b4_symbol(NUM, FIELD)` In order to unify the handling of the various aspects of symbols (tag, type name, whether terminal, etc.), bison.exe defines one macro per (token, field), where field can `has_id`, `id`, etc.: see -src/output.c:prepare_symbols_definitions(). +`prepare_symbols_definitions()` in `src/output.c`. -The various FIELDS are: +The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS: + +- `has_id`: 0 or 1. -- has_id: 0 or 1. Whether the symbol has an id. -- id: string - If has_id, the id. Guaranteed to be usable as a C identifier. - Prefixed by api.token.prefix if defined. -- tag: string. + +- `id`: string + If has_id, the id (prefixed by api.token.prefix if defined), otherwise + defined as empty. Guaranteed to be usable as a C identifier. + +- `tag`: string. A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc. -- user_number: integer + +- `user_number`: integer The external number as used by yylex. Can be ASCII code when a character, some number chosen by bison, or some user number in the case of %token FOO . Corresponds to yychar in yacc.c. -- is_token: 0 or 1 + +- `is_token`: 0 or 1 Whether this is a terminal symbol. -- number: integer + +- `number`: integer The internal number (computed from the external number by yytranslate). Corresponds to yytoken in yacc.c. This is the same number that serves as key in b4_symbol(NUM, FIELD). -- has_type: 0, 1 + + In bison, symbols are first assigned increasing numbers in order of + appearance (but tokens first, then nterms). After grammar reduction, + unused nterms are then renumbered to appear last (i.e., first tokens, then + used nterms and finally unused nterms). This final number NUM is the one + contained in this field, and it is the one used as key in `b4_symbol(NUM, + FIELD)`. + + The code of the rule actions, however, is emitted before we know what + symbols are unused, so they use the original numbers. To avoid confusion, + they actually use "orig NUM" instead of just "NUM". bison also emits + definitions for `b4_symbol(orig NUM, number)` that map from original + numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the + other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the + symbols whose original number was 42. + +- `has_type`: 0, 1 Whether has a semantic value. -- type_tag: string + +- `type_tag`: string When api.value.type=union, the generated name for the union member. yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc. -- type + +- `type` If it has a semantic value, its type tag, or, if variant are used, its type. In the case of api.value.type=union, type is the real type (e.g. int). -- has_printer: 0, 1 -- printer: string -- printer_file: string -- printer_line: integer + +- `has_printer`: 0, 1 +- `printer`: string +- `printer_file`: string +- `printer_line`: integer If the symbol has a printer, everything about it. -- has_destructor, destructor, destructor_file, destructor_line + +- `has_destructor`, `destructor`, `destructor_file`, `destructor_line` Likewise. -### b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG]) +### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])` Expansion of $$, $1, $3, etc. The semantic value from a given VAL. @@ -127,14 +154,14 @@ The semantic value from a given VAL. The result can be used safely, it is put in parens to avoid nasty precedence issues. -### b4_lhs_value(SYMBOL-NUM, [TYPE]) +### `b4_lhs_value(SYMBOL-NUM, [TYPE])` Expansion of `$$` or `$$`, for symbol `SYMBOL-NUM`. -### b4_rhs_data(RULE-LENGTH, POS) +### `b4_rhs_data(RULE-LENGTH, POS)` The data corresponding to the symbol `#POS`, where the current rule has `RULE-LENGTH` symbols on RHS. -### b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE]) +### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])` Expansion of `$POS`, where the current rule has `RULE-LENGTH` symbols on RHS. diff --git a/data/skeletons/bison.m4 b/data/skeletons/bison.m4 index 8a33a582..e3591875 100644 --- a/data/skeletons/bison.m4 +++ b/data/skeletons/bison.m4 @@ -389,17 +389,28 @@ m4_define([b4_glr_cc_if], # # The following macros provide access to symbol related values. -# _b4_symbol(NUM, FIELD) -# ---------------------- +# __b4_symbol(NUM, FIELD) +# ----------------------- # Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if # undefined. -m4_define([_b4_symbol], +m4_define([__b4_symbol], [m4_indir([b4_symbol($1, $2)])]) +# _b4_symbol(NUM, FIELD) +# ---------------------- +# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if +# undefined. +m4_define([_b4_symbol], +[m4_ifdef([b4_symbol($1, number)], + [__b4_symbol(m4_indir([b4_symbol($1, number)]), $2)], + [__b4_symbol([$1], [$2])])]) + + + # b4_symbol(NUM, FIELD) # --------------------- -# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if +# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if # undefined. If FIELD = id, prepend the token prefix. m4_define([b4_symbol], [m4_case([$2], diff --git a/src/output.c b/src/output.c index 7132b4a9..b674f4ca 100644 --- a/src/output.c +++ b/src/output.c @@ -38,6 +38,7 @@ #include "muscle-tab.h" #include "output.h" #include "reader.h" +#include "reduce.h" #include "scan-code.h" /* max_left_semantic_context */ #include "scan-skel.h" #include "symtab.h" @@ -413,6 +414,14 @@ merger_output (FILE *out) static void prepare_symbol_definitions (void) { + /* Map "orig NUM" to new numbers. See data/README. */ + for (symbol_number i = ntokens; i < nsyms + nuseless_nonterminals; ++i) + { + obstack_printf (&format_obstack, "symbol(orig %d, number)", i); + const char *key = obstack_finish0 (&format_obstack); + MUSCLE_INSERT_INT (key, nterm_map ? nterm_map[i - ntokens] : i); + } + for (int i = 0; i < nsyms; ++i) { symbol *sym = symbols[i]; diff --git a/src/reduce.c b/src/reduce.c index 574e9b07..4408e6a2 100644 --- a/src/reduce.c +++ b/src/reduce.c @@ -258,22 +258,23 @@ reduce_grammar_tables (void) | Remove useless nonterminals. | `------------------------------*/ +symbol_number *nterm_map = NULL; + static void nonterminals_reduce (void) { + nterm_map = xnmalloc (nvars, sizeof *nterm_map); /* Map the nonterminals to their new index: useful first, useless afterwards. Kept for later report. */ - - symbol_number *nontermmap = xnmalloc (nvars, sizeof *nontermmap); { symbol_number n = ntokens; for (symbol_number i = ntokens; i < nsyms; ++i) if (bitset_test (V, i)) - nontermmap[i - ntokens] = n++; + nterm_map[i - ntokens] = n++; for (symbol_number i = ntokens; i < nsyms; ++i) if (!bitset_test (V, i)) { - nontermmap[i - ntokens] = n++; + nterm_map[i - ntokens] = n++; if (symbols[i]->content->status != used) complain (&symbols[i]->location, Wother, _("nonterminal useless in grammar: %s"), @@ -281,32 +282,30 @@ nonterminals_reduce (void) } } - /* Shuffle elements of tables indexed by symbol number. */ { symbol **symbols_sorted = xnmalloc (nvars, sizeof *symbols_sorted); for (symbol_number i = ntokens; i < nsyms; ++i) - symbols[i]->content->number = nontermmap[i - ntokens]; + symbols[i]->content->number = nterm_map[i - ntokens]; for (symbol_number i = ntokens; i < nsyms; ++i) - symbols_sorted[nontermmap[i - ntokens] - ntokens] = symbols[i]; + symbols_sorted[nterm_map[i - ntokens] - ntokens] = symbols[i]; for (symbol_number i = ntokens; i < nsyms; ++i) symbols[i] = symbols_sorted[i - ntokens]; free (symbols_sorted); } + /* Update nonterminal numbers in the RHS of the rules. LHS are + pointers to the symbol structure, they don't need renumbering. */ { for (rule_number r = 0; r < nrules; ++r) for (item_number *rhsp = rules[r].rhs; 0 <= *rhsp; ++rhsp) if (ISVAR (*rhsp)) - *rhsp = symbol_number_as_item_number (nontermmap[*rhsp - - ntokens]); - accept->content->number = nontermmap[accept->content->number - ntokens]; + *rhsp = symbol_number_as_item_number (nterm_map[*rhsp - ntokens]); + accept->content->number = nterm_map[accept->content->number - ntokens]; } nsyms -= nuseless_nonterminals; nvars -= nuseless_nonterminals; - - free (nontermmap); } @@ -432,4 +431,6 @@ reduce_free (void) bitset_free (V); bitset_free (V1); bitset_free (P); + free (nterm_map); + nterm_map = NULL; } diff --git a/src/reduce.h b/src/reduce.h index c3866fc3..9814962d 100644 --- a/src/reduce.h +++ b/src/reduce.h @@ -32,6 +32,11 @@ bool reduce_nonterminal_useless_in_grammar (const sym_content *sym); void reduce_free (void); +/** Map initial nterm numbers to the new ones. Built by + * reduce_grammar. Size nvars + nuseless_nonterminals. */ +extern symbol_number *nterm_map; + extern unsigned nuseless_nonterminals; extern unsigned nuseless_productions; + #endif /* !REDUCE_H_ */ diff --git a/src/scan-code.l b/src/scan-code.l index 52349ffb..894cc542 100644 --- a/src/scan-code.l +++ b/src/scan-code.l @@ -648,7 +648,7 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc) untyped_var_seen = true; } - obstack_printf (&obstack_for_string, "]b4_lhs_value(%d, ", + obstack_printf (&obstack_for_string, "]b4_lhs_value(orig %d, ", sym->content.sym->content->number); obstack_quote (&obstack_for_string, type_name); obstack_sgrow (&obstack_for_string, ")["); @@ -677,7 +677,9 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc) "]b4_rhs_value(%d, %d, ", effective_rule_length, n); if (sym) - obstack_printf (&obstack_for_string, "%d, ", sym->content.sym->content->number); + obstack_printf (&obstack_for_string, "%s%d, ", + sym->content.sym->content->class == nterm_sym ? "orig " : "", + sym->content.sym->content->number); else obstack_sgrow (&obstack_for_string, "[], "); diff --git a/tests/reduce.at b/tests/reduce.at index b96531ce..b9385bb3 100644 --- a/tests/reduce.at +++ b/tests/reduce.at @@ -217,6 +217,88 @@ AT_CLEANUP +## --------------- ## +## Useless Parts. ## +## --------------- ## + +AT_SETUP([Useless Parts]) + +# We used to emit code that used symbol numbers before the useless +# symbol elimination, hence before the renumbering of the useful +# symbols. As a result, the evaluation of the skeleton failed because +# it used non existing symbol numbers. Which is the happy scenario: +# we could use numbers of other existing symbols... +# http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html + +AT_BISON_OPTION_PUSHDEFS +AT_DATA([[input.y]], +[[%code { + ]AT_YYERROR_DECLARE_EXTERN[ + ]AT_YYLEX_DECLARE_EXTERN[ +} +%union { void* ptr; } +%type used1 +%type used2 + +%% +start + : used1 + ; + +used1 + : used2 { $$ = $1; } + ; + +unused + : used2 + ; + +used2 + : { $$ = YY_NULLPTR; } + ; +]]) + +AT_BISON_CHECK([[-fcaret -rall -o input.c input.y]], 0, [], +[[input.y: warning: 1 nonterminal useless in grammar [-Wother] +input.y: warning: 1 rule useless in grammar [-Wother] +input.y:18.1-6: warning: nonterminal useless in grammar: unused [-Wother] + unused + ^~~~~~ +]]) + + +AT_CHECK([[sed -n '/^State 0/q;/^$/!p' input.output]], 0, +[[Nonterminals useless in grammar + unused +Rules useless in grammar + 4 unused: used2 +Grammar + 0 $accept: start $end + 1 start: used1 + 2 used1: used2 + 3 used2: %empty +Terminals, with rules where they appear +$end (0) 0 +error (256) +Nonterminals, with rules where they appear +$accept (3) + on left: 0 +start (4) + on left: 1, on right: 0 +used1 (5) + on left: 2, on right: 1 +used2 (6) + on left: 3, on right: 2 +]]) + +# Make sure the generated parser is correct. +AT_COMPILE([input.o]) + +AT_BISON_OPTION_POPDEFS +AT_CLEANUP + + + ## ------------------- ## ## Reduced Automaton. ## ## ------------------- ##