Since Bison 3.3, semantic values in rule actions (i.e., '$...') are
passed to the m4 backend as the symbol number. Unfortunately, when
there are unused symbols, the symbols are renumbered _after_ the
numbers were used in the rule actions. As a result, the evaluation of
the skeleton failed because it used non existing symbol numbers.
Which is the happy scenario: we could use numbers of other existing
symbols...
Reported by Balázs Scheidler.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
Translating the rule actions after the symbol renumbering moves too
many parts in bison. Relying on the symbol identifiers is more
troublesome than it might first seem: some don't have an
identifier (tokens with only a literal string), some might have a
complex one (tokens with a literal string with characters special for
M4). Well, these are tokens, but nterms also have issues: "dummy"
nterms (for midrule actions) are named $@32 etc. which is risky for
M4.
Instead, let's simply give M4 the mapping between the old numbers and
the new ones. To avoid confusion between old and new numbers, always
emit pre-renumbering numbers as "orig NUM".
* data/README: Give details about "orig NUM".
* data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the
"orig NUM".
* src/output.c (prepare_symbol_definitions): Pass nterm_map to m4.
* src/reduce.h, src/reduce.c (nterm_map): Extract it from
nonterminals_reduce, to make it public.
(reduce_free): Free it.
* src/scan-code.l (handle_action_dollar): When referring to a nterm,
use "orig NUM".
* tests/reduce.at (Useless Parts): New, based Balázs Scheidler's
report.
Reported by Derek Clegg
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00021.html
aux/parser-internal.h:429:12: error: 'syntax_error' has no out-of-line virtual
method definitions; its vtable will be emitted in every translation unit
[-Werror,-Wweak-vtables]
struct syntax_error : std::runtime_error
To avoid this warning, we need syntax_error to have a virtual function
defined in a compilation unit. Let it be the destructor. To comply
with C++98, this dtor should be 'throw()'. Merely making YY_NOEXCEPT
be 'throw()' in C++98 triggers
errors (http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00022.html),
so let's introduce YY_NOTHROW and flag only ~syntax_error with it.
Also, since we now have an explicit dtor, we need to provide an copy
ctor.
* configure.ac (warn_cxx): Add -Wweak-vtables.
* data/skeletons/c++.m4 (YY_NOTHROW): New.
(syntax_error): Declare the dtor, and define the copy ctor.
* data/skeletons/glr.cc, data/skeletons/lalr1.cc (~syntax_error):
Define.
Reported by Derek Clegg.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00004.html
* configure.ac (warn_common): Add -Wimplicit-fallthrough.
This does trigger failures in the test suite.
* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c, tests/c++.at:
Make fall-throws explicit.
Reported by Derek Clegg.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00006.html
Clang does not like this:
template <typename D>
struct basic_symbol : D
{
basic_symbol();
};
struct by_type {};
struct symbol_type : basic_symbol<by_type>
{
symbol_type(){}
};
It gives:
$ clang++-mp-7.0 -Wundefined-func-template foo.cc -c
foo.cc:11:3: warning: instantiation of function 'basic_symbol<by_type>::basic_symbol'
required here, but no definition is available [-Wundefined-func-template]
symbol_type(){}
^
foo.cc:4:3: note: forward declaration of template entity is here
basic_symbol();
^
foo.cc:11:3: note: add an explicit instantiation declaration to suppress this warning
if 'basic_symbol<by_type>::basic_symbol' is explicitly instantiated in
another translation unit
symbol_type(){}
^
1 warning generated.
The same applies for the basic_symbol's destructor and `clear()`.
* configure.ac (warn_cxx): Add -Wundefined-func-template.
This triggered one failure in the test suite:
* tests/headers.at (Sane headers): here, where we check that we can
compile the generated headers in other compilation units than the
parser's.
Add a variant type to make sure that basic_symbol and symbol_type are
properly generated in this case.
* data/skeletons/c++.m4 (basic_symbol): Inline the definitions of the
destructor and of `clear` in the class definition.
This line:
slice<stack_symbol_type, stack_type> slice (yystack_, yylen);
triggers warnings:
parse.h:1790:11: note: shadowed declaration is here
Reported by Frank Heckenbach.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00002.html
* configure.ac (warn_c): Move -Wshadow to...
(warn_common): here.
* data/skeletons/stack.hh (slice): Define as an inner class of stack.
* data/skeletons/lalr1.cc: Adjust.
Rename the variable as 'range' instead of 'slice'.
There are many macros that are defined and used just
once (b4_public_if, b4_abstract_if, etc.). That's overkill. Rather,
let's define a macro to build the "public class YYParser" line.
It appears that the same syntax with "extends", "abstract", etc. is
implemented in the D parser, which looks very fishy...
* data/skeletons/d.m4, data/skeletons/java.m4 (b4_public_if)
(b4_abstract_if, b4_final_if, b4_strictfp_if): Replace with
(b4_parser_class_declaration): this.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java: Adjust.
Commit 90a8537e62 was right, but issued
two error messages. Commit 80ef7e7639
tried to address that by mapping yychar and yytoken to empty, but that
completely breaks the invariants of glr.c. In particular, yygetToken
can be called repeatedly and is expected to return the latest result,
unless yytoken is YYEMPTY. Since the previous attempt was "recording"
that the token was coming from an exception by setting it to YYEMPTY,
instead of getting again the faulty token, we fetched another one.
Rather, revert to the first approach: map yytoken to "invalid token",
but record in yychar the fact that we come from an exception thrown in
the scanner.
* data/skeletons/glr.c (YYFAULTYTOK): New.
(yygetToken): Use it to record syntax errors from the scanner.
* tests/c++.at (Syntax error as exception): In addition to checking
syntax_error with error recovery, make sure it also behaves as
expected without.
The previous name was historical and inconsistent.
* src/muscle-tab.c (define_directive): Use the proper value passing
syntax, based on the muscle kind.
(muscle_percent_variable_update): Use the right value passing syntax.
Migrate from parser_class_name to api.parser.class.
* data/skeletons: Migrate from parser_class_name to api.parser.class.
* doc/bison.texi (%define Summary): Document both parser_class_name
and api.parser.class.
Promote the latter over the former.
This is very debatable. This function is not pure at all, so it could
stick to returning void: that's a common coding style to tell the
difference between "real" (pure) functions and side-effecting
subroutines. However, we already have this style elsewhere (e.g.,
yylex), and I feel the callers are somewhat nice to read this way.
* data/skeletons/glr.c (yygetLRActions): Return the action rather than
passing by pointer.
While at it, fix type of yytoken.
Adjust callers.
Reported by Askar Safin.
https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00000.html
* data/skeletons/glr.c (yygetToken): Return YYEMPTY when an exception
is thrown.
* data/skeletons/lalr1.cc: Log when an exception is caught.
* tests/c++.at (Syntax error as exception): Be sure to recover from
error before triggering another error.
This way, it is easier to make sure its implementation is available in
glr.cc too, which is not the case currently.
* data/skeletons/c++.m4 (b4_public_types_define): Move the
implementation of syntax_error...
(b4_public_types_declare): here.
We used to create a short definition of yy::parser with all the
implementations of its member functions outside. But yy::parser is no
longer short and simple to read. Maintaining each function twice is
painful: a lot of redundancy but different indentation levels, output
which depends on whether we are in a header or not (see
d132c2d545), etc.
Let's simplify this and put the implementations into the class
definition itself.
Discussed in this monologue:
https://lists.gnu.org/archive/html/bison-patches/2018-12/msg00058.html.
* data/skeletons/c++.m4, data/skeletons/lalr1.cc,
* data/skeletons/variant.hh (b4_basic_symbol_constructor_define)
(_b4_token_constructor_declare, b4_token_constructor_declare)
Merge into...
(b4_basic_symbol_constructor_define, _b4_token_constructor_define)
(b4_token_constructor_define): these.
We used to define such auxiliary structures outside the class, mainly
as a matter of style to keep the definition of yy::parser short and
simple. However, now there's a lot more code generated inside the
class definition (e.g., all the token constructors), so the
readability no longer applies.
However, if we move stack (and slice) inside yy::parser, then it
should no longer be needed to change the namespace to have multiple
parsers: changing the class name should suffice.
One common argument against inner classes is that they code bloat. It
hardly applies here, since typically different parsers will have
different semantic value types, hence different actual stack types.
* data/skeletons/lalr1.cc: Invoke b4_stack_define inside yy::parser.
Suggested by Wolfgang Thaller.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00081.html
* data/c++.m4 (basic_symbol, by_type): Instead of provide either move
or copy constructor, always provide the copy one.
* tests/c++.at (C++ Variant-based Symbols Unit Tests): Check it.
Currently the following piece of code crashes (with parse.assert),
because we don't record that s was moved-from, and we invoke its dtor.
{
auto s = parser::make_INT (42);
auto s2 = std::move (s);
}
Reported by Wolfgang Thaller.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00077.html
* data/c++.m4 (by_type): Provide a move-ctor.
(basic_symbol): Be sure not to read a moved-from value.
* tests/c++.at (C++ Variant-based Symbols Unit Tests): Check this case.
Instead of introducing make_symbol (whose name, btw, somewhat
infringes on the user's "name space", if she defines a token named
"symbol"), let's make the construction of symbol_type safer, using
assertions.
For instance with:
%token ':' <std::string> ID <int> INT;
generate:
symbol_type (int token, const std::string&);
symbol_type (int token, const int&);
symbol_type (int token);
It does mean that now named token constructors (make_ID, make_INT,
etc.) go through a useless assert, but I think we can ignore this: I
assume any decent compiler will inline the symbol_type ctor inside the
make_TOKEN functions, which will show that the assert is trivially
verified, hence I expect no code will be emitted for it. And anyway,
that's an assert, NDEBUG controls it.
* data/c++.m4 (symbol_type): Turn into a subclass of
basic_symbol<by_type>.
Declare symbol constructors when variants are enabled.
* data/variant.hh (_b4_type_constructor_declare)
(_b4_type_constructor_define): Replace with...
(_b4_symbol_constructor_declare, _b4_symbol_constructor_def): these.
Generate symbol_type constructors.
* doc/bison.texi (Complete Symbols): Document.
* tests/types.at: Check.
On
%token <int> FOO BAR
we currently generate make_FOO(int) and make_BAR(int). However, in
order to factor their scanners, some users would also like to have
make_symbol(tok, int), where tok is FOO or BAR. To ensure type
safety, add assertions that do check that value type and token type
match. Bind this assertion to the parse.assert %define variable.
Suggested by Frank Heckenbach.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00034.html
Should also match expectations from Аскар Сафин.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00023.html
* data/variant.hh: Use b4_token_visible_if where applicable.
(_b4_type_constructor_declare, _b4_type_constructor_define): New.
Use them.
Bitten by macros, again.
See 680b715518.
* data/variant.hh (_b4_symbol_constructor_declare)
(_b4_symbol_constructor_define): Do not use user types, which can
include commas as in `std::pair<int, int>`, to macros.
* tests/local.at: Adjust the lex related macros to support the
case of token constructors.
* tests/types.at: Also check token constructors on types with commas.
I personally prefer 'non terminal', or 'non-terminal', but
'nonterminal' is the common spelling.
* data/glr.c, src/parse-gram.y, src/symtab.c, src/symtab.h,
* tests/input.at, doc/refcard.tex: here.