mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
gram: fix handling of nterms in actions when some are unused
Since Bison 3.3, semantic values in rule actions (i.e., '$...') are passed to the m4 backend as the symbol number. Unfortunately, when there are unused symbols, the symbols are renumbered _after_ the numbers were used in the rule actions. As a result, the evaluation of the skeleton failed because it used non existing symbol numbers. Which is the happy scenario: we could use numbers of other existing symbols... Reported by Balázs Scheidler. http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html Translating the rule actions after the symbol renumbering moves too many parts in bison. Relying on the symbol identifiers is more troublesome than it might first seem: some don't have an identifier (tokens with only a literal string), some might have a complex one (tokens with a literal string with characters special for M4). Well, these are tokens, but nterms also have issues: "dummy" nterms (for midrule actions) are named $@32 etc. which is risky for M4. Instead, let's simply give M4 the mapping between the old numbers and the new ones. To avoid confusion between old and new numbers, always emit pre-renumbering numbers as "orig NUM". * data/README: Give details about "orig NUM". * data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the "orig NUM". * src/output.c (prepare_symbol_definitions): Pass nterm_map to m4. * src/reduce.h, src/reduce.c (nterm_map): Extract it from nonterminals_reduce, to make it public. (reduce_free): Free it. * src/scan-code.l (handle_action_dollar): When referring to a nterm, use "orig NUM". * tests/reduce.at (Useless Parts): New, based Balázs Scheidler's report.
This commit is contained in:
4
NEWS
4
NEWS
@@ -2,6 +2,10 @@ GNU Bison NEWS
|
|||||||
|
|
||||||
* Noteworthy changes in release ?.? (????-??-??) [?]
|
* Noteworthy changes in release ?.? (????-??-??) [?]
|
||||||
|
|
||||||
|
** Bug fixes
|
||||||
|
|
||||||
|
Bison 3.3 failed to generate parsers for grammars with unused non terminal
|
||||||
|
symbols.
|
||||||
|
|
||||||
* Noteworthy changes in release 3.3.1 (2019-01-27) [stable]
|
* Noteworthy changes in release 3.3.1 (2019-01-27) [stable]
|
||||||
|
|
||||||
|
|||||||
1
THANKS
1
THANKS
@@ -18,6 +18,7 @@ Antonio Silva Correia amsilvacorreia@hotmail.com
|
|||||||
Arnold Robbins arnold@skeeve.com
|
Arnold Robbins arnold@skeeve.com
|
||||||
Art Haas ahaas@neosoft.com
|
Art Haas ahaas@neosoft.com
|
||||||
Askar Safin safinaskar@mail.ru
|
Askar Safin safinaskar@mail.ru
|
||||||
|
Balázs Scheidler balazs.scheidler@oneidentity.com
|
||||||
Baron Schwartz baron@sequent.org
|
Baron Schwartz baron@sequent.org
|
||||||
Ben Pfaff blp@cs.stanford.edu
|
Ben Pfaff blp@cs.stanford.edu
|
||||||
Benoit Perrot benoit.perrot@epita.fr
|
Benoit Perrot benoit.perrot@epita.fr
|
||||||
|
|||||||
71
data/README
71
data/README
@@ -75,48 +75,75 @@ skeletons.
|
|||||||
|
|
||||||
## Symbols
|
## Symbols
|
||||||
|
|
||||||
|
### `b4_symbol(NUM, FIELD)`
|
||||||
In order to unify the handling of the various aspects of symbols (tag, type
|
In order to unify the handling of the various aspects of symbols (tag, type
|
||||||
name, whether terminal, etc.), bison.exe defines one macro per (token,
|
name, whether terminal, etc.), bison.exe defines one macro per (token,
|
||||||
field), where field can `has_id`, `id`, etc.: see
|
field), where field can `has_id`, `id`, etc.: see
|
||||||
src/output.c:prepare_symbols_definitions().
|
`prepare_symbols_definitions()` in `src/output.c`.
|
||||||
|
|
||||||
The various FIELDS are:
|
The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS:
|
||||||
|
|
||||||
|
- `has_id`: 0 or 1.
|
||||||
|
|
||||||
- has_id: 0 or 1.
|
|
||||||
Whether the symbol has an id.
|
Whether the symbol has an id.
|
||||||
- id: string
|
|
||||||
If has_id, the id. Guaranteed to be usable as a C identifier.
|
- `id`: string
|
||||||
Prefixed by api.token.prefix if defined.
|
If has_id, the id (prefixed by api.token.prefix if defined), otherwise
|
||||||
- tag: string.
|
defined as empty. Guaranteed to be usable as a C identifier.
|
||||||
|
|
||||||
|
- `tag`: string.
|
||||||
A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc.
|
A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc.
|
||||||
- user_number: integer
|
|
||||||
|
- `user_number`: integer
|
||||||
The external number as used by yylex. Can be ASCII code when a character,
|
The external number as used by yylex. Can be ASCII code when a character,
|
||||||
some number chosen by bison, or some user number in the case of
|
some number chosen by bison, or some user number in the case of
|
||||||
%token FOO <NUM>. Corresponds to yychar in yacc.c.
|
%token FOO <NUM>. Corresponds to yychar in yacc.c.
|
||||||
- is_token: 0 or 1
|
|
||||||
|
- `is_token`: 0 or 1
|
||||||
Whether this is a terminal symbol.
|
Whether this is a terminal symbol.
|
||||||
- number: integer
|
|
||||||
|
- `number`: integer
|
||||||
The internal number (computed from the external number by yytranslate).
|
The internal number (computed from the external number by yytranslate).
|
||||||
Corresponds to yytoken in yacc.c. This is the same number that serves as
|
Corresponds to yytoken in yacc.c. This is the same number that serves as
|
||||||
key in b4_symbol(NUM, FIELD).
|
key in b4_symbol(NUM, FIELD).
|
||||||
- has_type: 0, 1
|
|
||||||
|
In bison, symbols are first assigned increasing numbers in order of
|
||||||
|
appearance (but tokens first, then nterms). After grammar reduction,
|
||||||
|
unused nterms are then renumbered to appear last (i.e., first tokens, then
|
||||||
|
used nterms and finally unused nterms). This final number NUM is the one
|
||||||
|
contained in this field, and it is the one used as key in `b4_symbol(NUM,
|
||||||
|
FIELD)`.
|
||||||
|
|
||||||
|
The code of the rule actions, however, is emitted before we know what
|
||||||
|
symbols are unused, so they use the original numbers. To avoid confusion,
|
||||||
|
they actually use "orig NUM" instead of just "NUM". bison also emits
|
||||||
|
definitions for `b4_symbol(orig NUM, number)` that map from original
|
||||||
|
numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the
|
||||||
|
other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
|
||||||
|
symbols whose original number was 42.
|
||||||
|
|
||||||
|
- `has_type`: 0, 1
|
||||||
Whether has a semantic value.
|
Whether has a semantic value.
|
||||||
- type_tag: string
|
|
||||||
|
- `type_tag`: string
|
||||||
When api.value.type=union, the generated name for the union member.
|
When api.value.type=union, the generated name for the union member.
|
||||||
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
|
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
|
||||||
- type
|
|
||||||
|
- `type`
|
||||||
If it has a semantic value, its type tag, or, if variant are used,
|
If it has a semantic value, its type tag, or, if variant are used,
|
||||||
its type.
|
its type.
|
||||||
In the case of api.value.type=union, type is the real type (e.g. int).
|
In the case of api.value.type=union, type is the real type (e.g. int).
|
||||||
- has_printer: 0, 1
|
|
||||||
- printer: string
|
- `has_printer`: 0, 1
|
||||||
- printer_file: string
|
- `printer`: string
|
||||||
- printer_line: integer
|
- `printer_file`: string
|
||||||
|
- `printer_line`: integer
|
||||||
If the symbol has a printer, everything about it.
|
If the symbol has a printer, everything about it.
|
||||||
- has_destructor, destructor, destructor_file, destructor_line
|
|
||||||
|
- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`
|
||||||
Likewise.
|
Likewise.
|
||||||
|
|
||||||
### b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])
|
### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
|
||||||
Expansion of $$, $1, $<TYPE-TAG>3, etc.
|
Expansion of $$, $1, $<TYPE-TAG>3, etc.
|
||||||
|
|
||||||
The semantic value from a given VAL.
|
The semantic value from a given VAL.
|
||||||
@@ -127,14 +154,14 @@ The semantic value from a given VAL.
|
|||||||
The result can be used safely, it is put in parens to avoid nasty precedence
|
The result can be used safely, it is put in parens to avoid nasty precedence
|
||||||
issues.
|
issues.
|
||||||
|
|
||||||
### b4_lhs_value(SYMBOL-NUM, [TYPE])
|
### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
|
||||||
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
|
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
|
||||||
|
|
||||||
### b4_rhs_data(RULE-LENGTH, POS)
|
### `b4_rhs_data(RULE-LENGTH, POS)`
|
||||||
The data corresponding to the symbol `#POS`, where the current rule has
|
The data corresponding to the symbol `#POS`, where the current rule has
|
||||||
`RULE-LENGTH` symbols on RHS.
|
`RULE-LENGTH` symbols on RHS.
|
||||||
|
|
||||||
### b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])
|
### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
|
||||||
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
|
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
|
||||||
on RHS.
|
on RHS.
|
||||||
|
|
||||||
|
|||||||
@@ -389,17 +389,28 @@ m4_define([b4_glr_cc_if],
|
|||||||
#
|
#
|
||||||
# The following macros provide access to symbol related values.
|
# The following macros provide access to symbol related values.
|
||||||
|
|
||||||
# _b4_symbol(NUM, FIELD)
|
# __b4_symbol(NUM, FIELD)
|
||||||
# ----------------------
|
# -----------------------
|
||||||
# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
|
# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
|
||||||
# undefined.
|
# undefined.
|
||||||
m4_define([_b4_symbol],
|
m4_define([__b4_symbol],
|
||||||
[m4_indir([b4_symbol($1, $2)])])
|
[m4_indir([b4_symbol($1, $2)])])
|
||||||
|
|
||||||
|
|
||||||
|
# _b4_symbol(NUM, FIELD)
|
||||||
|
# ----------------------
|
||||||
|
# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
|
||||||
|
# undefined.
|
||||||
|
m4_define([_b4_symbol],
|
||||||
|
[m4_ifdef([b4_symbol($1, number)],
|
||||||
|
[__b4_symbol(m4_indir([b4_symbol($1, number)]), $2)],
|
||||||
|
[__b4_symbol([$1], [$2])])])
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# b4_symbol(NUM, FIELD)
|
# b4_symbol(NUM, FIELD)
|
||||||
# ---------------------
|
# ---------------------
|
||||||
# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
|
# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
|
||||||
# undefined. If FIELD = id, prepend the token prefix.
|
# undefined. If FIELD = id, prepend the token prefix.
|
||||||
m4_define([b4_symbol],
|
m4_define([b4_symbol],
|
||||||
[m4_case([$2],
|
[m4_case([$2],
|
||||||
|
|||||||
@@ -38,6 +38,7 @@
|
|||||||
#include "muscle-tab.h"
|
#include "muscle-tab.h"
|
||||||
#include "output.h"
|
#include "output.h"
|
||||||
#include "reader.h"
|
#include "reader.h"
|
||||||
|
#include "reduce.h"
|
||||||
#include "scan-code.h" /* max_left_semantic_context */
|
#include "scan-code.h" /* max_left_semantic_context */
|
||||||
#include "scan-skel.h"
|
#include "scan-skel.h"
|
||||||
#include "symtab.h"
|
#include "symtab.h"
|
||||||
@@ -414,6 +415,14 @@ merger_output (FILE *out)
|
|||||||
static void
|
static void
|
||||||
prepare_symbol_definitions (void)
|
prepare_symbol_definitions (void)
|
||||||
{
|
{
|
||||||
|
/* Map "orig NUM" to new numbers. See data/README. */
|
||||||
|
for (symbol_number i = ntokens; i < nsyms + nuseless_nonterminals; ++i)
|
||||||
|
{
|
||||||
|
obstack_printf (&format_obstack, "symbol(orig %d, number)", i);
|
||||||
|
const char *key = obstack_finish0 (&format_obstack);
|
||||||
|
MUSCLE_INSERT_INT (key, nterm_map ? nterm_map[i - ntokens] : i);
|
||||||
|
}
|
||||||
|
|
||||||
for (int i = 0; i < nsyms; ++i)
|
for (int i = 0; i < nsyms; ++i)
|
||||||
{
|
{
|
||||||
symbol *sym = symbols[i];
|
symbol *sym = symbols[i];
|
||||||
|
|||||||
@@ -259,13 +259,14 @@ reduce_grammar_tables (void)
|
|||||||
| Remove useless nonterminals. |
|
| Remove useless nonterminals. |
|
||||||
`------------------------------*/
|
`------------------------------*/
|
||||||
|
|
||||||
|
symbol_number *nterm_map = NULL;
|
||||||
|
|
||||||
static void
|
static void
|
||||||
nonterminals_reduce (void)
|
nonterminals_reduce (void)
|
||||||
{
|
{
|
||||||
|
nterm_map = xnmalloc (nvars, sizeof *nterm_map);
|
||||||
/* Map the nonterminals to their new index: useful first, useless
|
/* Map the nonterminals to their new index: useful first, useless
|
||||||
afterwards. Kept for later report. */
|
afterwards. Kept for later report. */
|
||||||
|
|
||||||
symbol_number *nterm_map = xnmalloc (nvars, sizeof *nterm_map);
|
|
||||||
{
|
{
|
||||||
symbol_number n = ntokens;
|
symbol_number n = ntokens;
|
||||||
for (symbol_number i = ntokens; i < nsyms; ++i)
|
for (symbol_number i = ntokens; i < nsyms; ++i)
|
||||||
@@ -306,8 +307,6 @@ nonterminals_reduce (void)
|
|||||||
|
|
||||||
nsyms -= nuseless_nonterminals;
|
nsyms -= nuseless_nonterminals;
|
||||||
nvars -= nuseless_nonterminals;
|
nvars -= nuseless_nonterminals;
|
||||||
|
|
||||||
free (nterm_map);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -433,4 +432,6 @@ reduce_free (void)
|
|||||||
bitset_free (V);
|
bitset_free (V);
|
||||||
bitset_free (V1);
|
bitset_free (V1);
|
||||||
bitset_free (P);
|
bitset_free (P);
|
||||||
|
free (nterm_map);
|
||||||
|
nterm_map = NULL;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -32,6 +32,11 @@ bool reduce_nonterminal_useless_in_grammar (const sym_content *sym);
|
|||||||
|
|
||||||
void reduce_free (void);
|
void reduce_free (void);
|
||||||
|
|
||||||
|
/** Map initial nterm numbers to the new ones. Built by
|
||||||
|
* reduce_grammar. Size nvars. */
|
||||||
|
extern symbol_number *nterm_map;
|
||||||
|
|
||||||
extern unsigned nuseless_nonterminals;
|
extern unsigned nuseless_nonterminals;
|
||||||
extern unsigned nuseless_productions;
|
extern unsigned nuseless_productions;
|
||||||
|
|
||||||
#endif /* !REDUCE_H_ */
|
#endif /* !REDUCE_H_ */
|
||||||
|
|||||||
@@ -648,7 +648,7 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc)
|
|||||||
untyped_var_seen = true;
|
untyped_var_seen = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
obstack_printf (&obstack_for_string, "]b4_lhs_value(%d, ",
|
obstack_printf (&obstack_for_string, "]b4_lhs_value(orig %d, ",
|
||||||
sym->content.sym->content->number);
|
sym->content.sym->content->number);
|
||||||
obstack_quote (&obstack_for_string, type_name);
|
obstack_quote (&obstack_for_string, type_name);
|
||||||
obstack_sgrow (&obstack_for_string, ")[");
|
obstack_sgrow (&obstack_for_string, ")[");
|
||||||
@@ -677,7 +677,9 @@ handle_action_dollar (symbol_list *rule, char *text, location dollar_loc)
|
|||||||
"]b4_rhs_value(%d, %d, ",
|
"]b4_rhs_value(%d, %d, ",
|
||||||
effective_rule_length, n);
|
effective_rule_length, n);
|
||||||
if (sym)
|
if (sym)
|
||||||
obstack_printf (&obstack_for_string, "%d, ", sym->content.sym->content->number);
|
obstack_printf (&obstack_for_string, "%s%d, ",
|
||||||
|
sym->content.sym->content->class == nterm_sym ? "orig " : "",
|
||||||
|
sym->content.sym->content->number);
|
||||||
else
|
else
|
||||||
obstack_sgrow (&obstack_for_string, "[], ");
|
obstack_sgrow (&obstack_for_string, "[], ");
|
||||||
|
|
||||||
|
|||||||
@@ -199,6 +199,88 @@ AT_CLEANUP
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## --------------- ##
|
||||||
|
## Useless Parts. ##
|
||||||
|
## --------------- ##
|
||||||
|
|
||||||
|
AT_SETUP([Useless Parts])
|
||||||
|
|
||||||
|
# We used to emit code that used symbol numbers before the useless
|
||||||
|
# symbol elimination, hence before the renumbering of the useful
|
||||||
|
# symbols. As a result, the evaluation of the skeleton failed because
|
||||||
|
# it used non existing symbol numbers. Which is the happy scenario:
|
||||||
|
# we could use numbers of other existing symbols...
|
||||||
|
# http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
|
||||||
|
|
||||||
|
AT_BISON_OPTION_PUSHDEFS
|
||||||
|
AT_DATA([[input.y]],
|
||||||
|
[[%code {
|
||||||
|
]AT_YYERROR_DECLARE_EXTERN[
|
||||||
|
]AT_YYLEX_DECLARE_EXTERN[
|
||||||
|
}
|
||||||
|
%union { void* ptr; }
|
||||||
|
%type <ptr> used1
|
||||||
|
%type <ptr> used2
|
||||||
|
|
||||||
|
%%
|
||||||
|
start
|
||||||
|
: used1
|
||||||
|
;
|
||||||
|
|
||||||
|
used1
|
||||||
|
: used2 { $$ = $1; }
|
||||||
|
;
|
||||||
|
|
||||||
|
unused
|
||||||
|
: used2
|
||||||
|
;
|
||||||
|
|
||||||
|
used2
|
||||||
|
: { $$ = YY_NULLPTR; }
|
||||||
|
;
|
||||||
|
]])
|
||||||
|
|
||||||
|
AT_BISON_CHECK([[-fcaret -rall -o input.c input.y]], 0, [],
|
||||||
|
[[input.y: warning: 1 nonterminal useless in grammar [-Wother]
|
||||||
|
input.y: warning: 1 rule useless in grammar [-Wother]
|
||||||
|
input.y:18.1-6: warning: nonterminal useless in grammar: unused [-Wother]
|
||||||
|
unused
|
||||||
|
^~~~~~
|
||||||
|
]])
|
||||||
|
|
||||||
|
|
||||||
|
AT_CHECK([[sed -n '/^State 0/q;/^$/!p' input.output]], 0,
|
||||||
|
[[Nonterminals useless in grammar
|
||||||
|
unused
|
||||||
|
Rules useless in grammar
|
||||||
|
4 unused: used2
|
||||||
|
Grammar
|
||||||
|
0 $accept: start $end
|
||||||
|
1 start: used1
|
||||||
|
2 used1: used2
|
||||||
|
3 used2: %empty
|
||||||
|
Terminals, with rules where they appear
|
||||||
|
$end (0) 0
|
||||||
|
error (256)
|
||||||
|
Nonterminals, with rules where they appear
|
||||||
|
$accept (3)
|
||||||
|
on left: 0
|
||||||
|
start (4)
|
||||||
|
on left: 1, on right: 0
|
||||||
|
used1 <ptr> (5)
|
||||||
|
on left: 2, on right: 1
|
||||||
|
used2 <ptr> (6)
|
||||||
|
on left: 3, on right: 2
|
||||||
|
]])
|
||||||
|
|
||||||
|
# Make sure the generated parser is correct.
|
||||||
|
AT_COMPILE([input.o])
|
||||||
|
|
||||||
|
AT_BISON_OPTION_POPDEFS
|
||||||
|
AT_CLEANUP
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## ------------------- ##
|
## ------------------- ##
|
||||||
## Reduced Automaton. ##
|
## Reduced Automaton. ##
|
||||||
## ------------------- ##
|
## ------------------- ##
|
||||||
|
|||||||
Reference in New Issue
Block a user