gram: fix handling of nterms in actions when some are unused

Since Bison 3.3, semantic values in rule actions (i.e., '$...') are
passed to the m4 backend as the symbol number.  Unfortunately, when
there are unused symbols, the symbols are renumbered _after_ the
numbers were used in the rule actions.  As a result, the evaluation of
the skeleton failed because it used non existing symbol numbers.
Which is the happy scenario: we could use numbers of other existing
symbols...

Reported by Balázs Scheidler.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html

Translating the rule actions after the symbol renumbering moves too
many parts in bison.  Relying on the symbol identifiers is more
troublesome than it might first seem: some don't have an
identifier (tokens with only a literal string), some might have a
complex one (tokens with a literal string with characters special for
M4).  Well, these are tokens, but nterms also have issues: "dummy"
nterms (for midrule actions) are named $@32 etc. which is risky for
M4.

Instead, let's simply give M4 the mapping between the old numbers and
the new ones.  To avoid confusion between old and new numbers, always
emit pre-renumbering numbers as "orig NUM".

* data/README: Give details about "orig NUM".
* data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the
"orig NUM".
* src/output.c (prepare_symbol_definitions): Pass nterm_map to m4.
* src/reduce.h, src/reduce.c (nterm_map): Extract it from
nonterminals_reduce, to make it public.
(reduce_free): Free it.
* src/scan-code.l (handle_action_dollar): When referring to a nterm,
use "orig NUM".
* tests/reduce.at (Useless Parts): New, based Balázs Scheidler's
report.
This commit is contained in:
Akim Demaille
2019-02-02 07:18:00 +01:00
parent 31788ed4c7
commit cacdfc2f6e
9 changed files with 174 additions and 32 deletions

View File

@@ -75,48 +75,75 @@ skeletons.
## Symbols
### `b4_symbol(NUM, FIELD)`
In order to unify the handling of the various aspects of symbols (tag, type
name, whether terminal, etc.), bison.exe defines one macro per (token,
field), where field can `has_id`, `id`, etc.: see
src/output.c:prepare_symbols_definitions().
`prepare_symbols_definitions()` in `src/output.c`.
The various FIELDS are:
The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS:
- `has_id`: 0 or 1.
- has_id: 0 or 1.
Whether the symbol has an id.
- id: string
If has_id, the id. Guaranteed to be usable as a C identifier.
Prefixed by api.token.prefix if defined.
- tag: string.
- `id`: string
If has_id, the id (prefixed by api.token.prefix if defined), otherwise
defined as empty. Guaranteed to be usable as a C identifier.
- `tag`: string.
A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc.
- user_number: integer
- `user_number`: integer
The external number as used by yylex. Can be ASCII code when a character,
some number chosen by bison, or some user number in the case of
%token FOO <NUM>. Corresponds to yychar in yacc.c.
- is_token: 0 or 1
- `is_token`: 0 or 1
Whether this is a terminal symbol.
- number: integer
- `number`: integer
The internal number (computed from the external number by yytranslate).
Corresponds to yytoken in yacc.c. This is the same number that serves as
key in b4_symbol(NUM, FIELD).
- has_type: 0, 1
In bison, symbols are first assigned increasing numbers in order of
appearance (but tokens first, then nterms). After grammar reduction,
unused nterms are then renumbered to appear last (i.e., first tokens, then
used nterms and finally unused nterms). This final number NUM is the one
contained in this field, and it is the one used as key in `b4_symbol(NUM,
FIELD)`.
The code of the rule actions, however, is emitted before we know what
symbols are unused, so they use the original numbers. To avoid confusion,
they actually use "orig NUM" instead of just "NUM". bison also emits
definitions for `b4_symbol(orig NUM, number)` that map from original
numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the
other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
symbols whose original number was 42.
- `has_type`: 0, 1
Whether has a semantic value.
- type_tag: string
- `type_tag`: string
When api.value.type=union, the generated name for the union member.
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
- type
- `type`
If it has a semantic value, its type tag, or, if variant are used,
its type.
In the case of api.value.type=union, type is the real type (e.g. int).
- has_printer: 0, 1
- printer: string
- printer_file: string
- printer_line: integer
- `has_printer`: 0, 1
- `printer`: string
- `printer_file`: string
- `printer_line`: integer
If the symbol has a printer, everything about it.
- has_destructor, destructor, destructor_file, destructor_line
- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`
Likewise.
### b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])
### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
Expansion of $$, $1, $<TYPE-TAG>3, etc.
The semantic value from a given VAL.
@@ -127,14 +154,14 @@ The semantic value from a given VAL.
The result can be used safely, it is put in parens to avoid nasty precedence
issues.
### b4_lhs_value(SYMBOL-NUM, [TYPE])
### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
### b4_rhs_data(RULE-LENGTH, POS)
### `b4_rhs_data(RULE-LENGTH, POS)`
The data corresponding to the symbol `#POS`, where the current rule has
`RULE-LENGTH` symbols on RHS.
### b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])
### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
on RHS.

View File

@@ -389,17 +389,28 @@ m4_define([b4_glr_cc_if],
#
# The following macros provide access to symbol related values.
# _b4_symbol(NUM, FIELD)
# ----------------------
# __b4_symbol(NUM, FIELD)
# -----------------------
# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
# undefined.
m4_define([_b4_symbol],
m4_define([__b4_symbol],
[m4_indir([b4_symbol($1, $2)])])
# _b4_symbol(NUM, FIELD)
# ----------------------
# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
# undefined.
m4_define([_b4_symbol],
[m4_ifdef([b4_symbol($1, number)],
[__b4_symbol(m4_indir([b4_symbol($1, number)]), $2)],
[__b4_symbol([$1], [$2])])])
# b4_symbol(NUM, FIELD)
# ---------------------
# Recover a FIELD about symbol #NUM. Thanks to m4_indir, fails if
# Recover a FIELD about symbol #NUM (or "orig NUM"). Fails if
# undefined. If FIELD = id, prepend the token prefix.
m4_define([b4_symbol],
[m4_case([$2],