mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
Since Bison 3.3, semantic values in rule actions (i.e., '$...') are passed to the m4 backend as the symbol number. Unfortunately, when there are unused symbols, the symbols are renumbered _after_ the numbers were used in the rule actions. As a result, the evaluation of the skeleton failed because it used non existing symbol numbers. Which is the happy scenario: we could use numbers of other existing symbols... Reported by Balázs Scheidler. http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html Translating the rule actions after the symbol renumbering moves too many parts in bison. Relying on the symbol identifiers is more troublesome than it might first seem: some don't have an identifier (tokens with only a literal string), some might have a complex one (tokens with a literal string with characters special for M4). Well, these are tokens, but nterms also have issues: "dummy" nterms (for midrule actions) are named $@32 etc. which is risky for M4. Instead, let's simply give M4 the mapping between the old numbers and the new ones. To avoid confusion between old and new numbers, always emit pre-renumbering numbers as "orig NUM". * data/README: Give details about "orig NUM". * data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the "orig NUM". * src/output.c (prepare_symbol_definitions): Pass nterm_map to m4. * src/reduce.h, src/reduce.c (nterm_map): Extract it from nonterminals_reduce, to make it public. (reduce_free): Free it. * src/scan-code.l (handle_action_dollar): When referring to a nterm, use "orig NUM". * tests/reduce.at (Useless Parts): New, based Balázs Scheidler's report.
192 lines
6.6 KiB
Plaintext
192 lines
6.6 KiB
Plaintext
This directory contains data needed by Bison.
|
|
|
|
# Directory content
|
|
## Skeletons
|
|
Bison skeletons: the general shapes of the different parser kinds, that are
|
|
specialized for specific grammars by the bison program.
|
|
|
|
Currently, the supported skeletons are:
|
|
|
|
- yacc.c
|
|
It used to be named bison.simple: it corresponds to C Yacc
|
|
compatible LALR(1) parsers.
|
|
|
|
- lalr1.cc
|
|
Produces a C++ parser class.
|
|
|
|
- lalr1.java
|
|
Produces a Java parser class.
|
|
|
|
- glr.c
|
|
A Generalized LR C parser based on Bison's LALR(1) tables.
|
|
|
|
- glr.cc
|
|
A Generalized LR C++ parser. Actually a C++ wrapper around glr.c.
|
|
|
|
These skeletons are the only ones supported by the Bison team. Because the
|
|
interface between skeletons and the bison program is not finished, *we are
|
|
not bound to it*. In particular, Bison is not mature enough for us to
|
|
consider that "foreign skeletons" are supported.
|
|
|
|
## m4sugar
|
|
This directory contains M4sugar, sort of an extended library for M4, which
|
|
is used by Bison to instantiate the skeletons.
|
|
|
|
## xslt
|
|
This directory contains XSLT programs that transform Bison's XML output into
|
|
various formats.
|
|
|
|
- bison.xsl
|
|
A library of routines used by the other XSLT programs.
|
|
|
|
- xml2dot.xsl
|
|
Conversion into GraphViz's dot format.
|
|
|
|
- xml2text.xsl
|
|
Conversion into text.
|
|
|
|
- xml2xhtml.xsl
|
|
Conversion into XHTML.
|
|
|
|
# Implementation note about the skeletons
|
|
|
|
"Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison
|
|
executable with LR tables, facts about the symbols, etc. and they generate
|
|
the output (say parser.cc, parser.hh, location.hh, etc.). They are only in
|
|
charge of generating the parser and its auxiliary files, they do not
|
|
generate the XML output, the parser.output reports, nor the graphical
|
|
rendering.
|
|
|
|
The bits of information passing from bison to the backend is named
|
|
"muscles". Muscles are passed to M4 via its standard input: it's a set of
|
|
m4 definitions. To see them, use `--trace=muscles`.
|
|
|
|
Except for muscles, whose names are generated by bison, the skeletons have
|
|
no constraint at all on the macro names: there is no technical/theoretical
|
|
limitation, as long as you generate the output, you can do what you want.
|
|
However, of course, that would be a bad idea if, say, the C and C++
|
|
skeletons used different approaches and had completely different
|
|
implementations. That would be a maintenance nightmare.
|
|
|
|
Below, we document some of the macros that we use in several of the
|
|
skeletons. If you are to write a new skeleton, please, implement them for
|
|
your language. Overall, be sure to follow the same patterns as the existing
|
|
skeletons.
|
|
|
|
## Symbols
|
|
|
|
### `b4_symbol(NUM, FIELD)`
|
|
In order to unify the handling of the various aspects of symbols (tag, type
|
|
name, whether terminal, etc.), bison.exe defines one macro per (token,
|
|
field), where field can `has_id`, `id`, etc.: see
|
|
`prepare_symbols_definitions()` in `src/output.c`.
|
|
|
|
The macro `b4_symbol(NUM, FIELD)` gives access to the following FIELDS:
|
|
|
|
- `has_id`: 0 or 1.
|
|
|
|
Whether the symbol has an id.
|
|
|
|
- `id`: string
|
|
If has_id, the id (prefixed by api.token.prefix if defined), otherwise
|
|
defined as empty. Guaranteed to be usable as a C identifier.
|
|
|
|
- `tag`: string.
|
|
A representation of the symbol. Can be 'foo', 'foo.id', '"foo"' etc.
|
|
|
|
- `user_number`: integer
|
|
The external number as used by yylex. Can be ASCII code when a character,
|
|
some number chosen by bison, or some user number in the case of
|
|
%token FOO <NUM>. Corresponds to yychar in yacc.c.
|
|
|
|
- `is_token`: 0 or 1
|
|
Whether this is a terminal symbol.
|
|
|
|
- `number`: integer
|
|
The internal number (computed from the external number by yytranslate).
|
|
Corresponds to yytoken in yacc.c. This is the same number that serves as
|
|
key in b4_symbol(NUM, FIELD).
|
|
|
|
In bison, symbols are first assigned increasing numbers in order of
|
|
appearance (but tokens first, then nterms). After grammar reduction,
|
|
unused nterms are then renumbered to appear last (i.e., first tokens, then
|
|
used nterms and finally unused nterms). This final number NUM is the one
|
|
contained in this field, and it is the one used as key in `b4_symbol(NUM,
|
|
FIELD)`.
|
|
|
|
The code of the rule actions, however, is emitted before we know what
|
|
symbols are unused, so they use the original numbers. To avoid confusion,
|
|
they actually use "orig NUM" instead of just "NUM". bison also emits
|
|
definitions for `b4_symbol(orig NUM, number)` that map from original
|
|
numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the
|
|
other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
|
|
symbols whose original number was 42.
|
|
|
|
- `has_type`: 0, 1
|
|
Whether has a semantic value.
|
|
|
|
- `type_tag`: string
|
|
When api.value.type=union, the generated name for the union member.
|
|
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
|
|
|
|
- `type`
|
|
If it has a semantic value, its type tag, or, if variant are used,
|
|
its type.
|
|
In the case of api.value.type=union, type is the real type (e.g. int).
|
|
|
|
- `has_printer`: 0, 1
|
|
- `printer`: string
|
|
- `printer_file`: string
|
|
- `printer_line`: integer
|
|
If the symbol has a printer, everything about it.
|
|
|
|
- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`
|
|
Likewise.
|
|
|
|
### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
|
|
Expansion of $$, $1, $<TYPE-TAG>3, etc.
|
|
|
|
The semantic value from a given VAL.
|
|
- `VAL`: some semantic value storage (typically a union). e.g., `yylval`
|
|
- `SYMBOL-NUM`: the symbol number from which we extract the type tag.
|
|
- `TYPE-TAG`, the user forced the `<TYPE-TAG>`.
|
|
|
|
The result can be used safely, it is put in parens to avoid nasty precedence
|
|
issues.
|
|
|
|
### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
|
|
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
|
|
|
|
### `b4_rhs_data(RULE-LENGTH, POS)`
|
|
The data corresponding to the symbol `#POS`, where the current rule has
|
|
`RULE-LENGTH` symbols on RHS.
|
|
|
|
### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
|
|
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
|
|
on RHS.
|
|
|
|
-----
|
|
|
|
Local Variables:
|
|
mode: markdown
|
|
fill-column: 76
|
|
ispell-dictionary: "american"
|
|
End:
|
|
|
|
Copyright (C) 2002, 2008-2015, 2018-2019 Free Software Foundation, Inc.
|
|
|
|
This file is part of GNU Bison.
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
(at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|