Currently glr2.cc does not use 'symbol's everywhere, in various places
it also uses yykind, yyval and yyloc "by hand". So we need two
different calls for user-defined constructors: once for ~symbol,
another for yy_destroy_. Both need to call the user destructors with
different calling conventions.
* data/skeletons/glr2.cc (b4_symbol_action): Rename as...
(b4_symbol_action_for_yyval): this.
(b4_symbol_action): New, taken from lalr1.cc.
(yy_destroy_, yy_symbol_value_print_): Use b4_symbol_action_for_yyval.
I am not aware of people subclassing the parser class, and I fail to
see how this could be useful. Rather than leaving a badly baked
feature (as in glr.cc currently), let's not support it at all, until
someone comes and explains why and how it would be useful.
* data/skeletons/glr2.cc (parser): We need no virtual function members.
It's on purpose that I keep the `this->` now. We'll see later if we
want to remove them.
* data/skeletons/glr2.cc (yygetToken): Move into...
(glr_stack::yyget_token): this.
(b4_lex): Adjust.
Currently we have two classes that actually should be fused together:
parser and glr_stack. Both carry part of the parsing: (i) parser
contains `parse`, which is the top-level of the parsing process, it
uses yygetToken, etc., and (ii) glr_stack takes care of all the
details (dealing with the stack), and also calls yygetToken.
However if we fuse them together, we would get a large parser class,
in the header file. So it is probably better to split this large
class using the pimpl idiom. But then it appears that glr_stack is
very close from being the impl of parser.
Let's improve this.
For a start...
* data/skeletons/glr2.cc (parser::parse): Move to...
(glr_stack::parse): here.
(parser::parse): Use it.
* data/skeletons/glr2.cc: We no longer play dirty tricks with
parse-params, remove the now useless dance around them.
We now have genuine objects for locations, leave the initial action
alone.
* data/skeletons/glr2.cc: Add support for api.token.constructor.
* examples/c++/glr/c++-types.yy: Use it.
* examples/c++/glr/c++-types.test: Adjust expectations for error
messages.
* data/skeletons/c++.m4 (b4_yytranslate_define): Use static_cast
rather than the YY_CAST macro.
Avoids the need to define YY_CAST in the header.
* data/skeletons/glr2.cc: Fix calls to b4_shared_declarations: pass
the type of file we are in.
Don't define YYTRANSLATE.
(parser::yytranslate_): New, as in lalr1.cc.
Adjust to use it.
* tests/glr-regression.at: Adjust.
Instead of tracking the lookahead with yychar, use yytoken. This is
consistent with lalr1.cc, saves us from calls to YYTRANSLATE (except
when calling yylex), and makes it easier to migrate to using
symbol_type.
* data/skeletons/glr2.cc: Replace all uses of `int yychar` with
`symbol_kind_type yytoken`.
(yygetToken): Don't take/return the lookahead's token-kind and
symbol-kind, just work directly on yystack's `yytoken` member.
* tests/glr-regression.at (AT_PRINT_LOOKAHEAD_DECLARE)
(AT_PRINT_LOOKAHEAD_DEFINE): New.
Adjust to the fact that we have yytoken, not yychar, in glr2.cc.
* data/skeletons/glr2.cc (b4_symbol_action): New, so that we use
`yyval` and `yyloc` rather than the `yyvaluep` and `yylocationp`
pointers.
(yy_symbol_value_print_, yy_symbol_print_, yy_destroy_): Use
references rather than pointers.
Drop support for the undocumented, obsolete, `yyoutput` variable.
Adjust callers.
(glr_state::destroy): Don't use yy_symbol_print_ when we don't have a
symbol. Rather, write dedicated code.
(Bison) Variants are extremely picky, which makes them both
annoying (lots of micro-details must be taken care of) and
precious (all the micro-details must be taken care of, in particular
object lifetime).
So (i) each time a semantic value is stored, it must be stored in a
place that exists, and (ii) each time a semantic value is discarded,
its place must have been emptied.
Example of (i)
- new (&yys.value ()) value_type (s->value ());
+ {]b4_variant_if([[
+ new (&yys.value ()) value_type ();
+ ]b4_symbol_variant([yy_accessing_symbol (s->yylrState)],
+ [yys.value ()], [copy], [s->value ()])], [[
+ new (&yys.value ()) value_type (s->value ());]])[
+ }
Example of (ii)
yyparser.yy_destroy_ ("Error: discarding",
- yytoken, &yylval]b4_locations_if([, &yylloc])[);
+ yytoken, &yylval]b4_locations_if([, &yylloc])[);]b4_variant_if([[
+ // Value type destructor.
+ ]b4_symbol_variant([[YYTRANSLATE (this->yychar)]], [[yylval]], [[template destroy]])])[
this->yychar = ]b4_namespace_ref[::]b4_parser_class[::token::]b4_symbol(empty, id)[;
However, in some places we must not be "pure". In particular:
glr_stack_item (const glr_stack_item& other) YY_NOEXCEPT YY_NOTHROW
: is_state_ (other.is_state_)
{
std::memcpy (raw_, other.raw_, union_size);
}
still must use memcpy, because the constructor would change pred, and
it must not. This constructor is used only when resizing the stack,
in which case pred (which is relative) must not be "adjusted".
The result works, but is messy. Its verbosity comes from at least two
factors:
- we don't have support for complete symbols (binding kind, value and
location), and we should at least try to have it. That simplified
lalr1.cc a lot.
- I have not tried to be smart and use 'move' when possible. As a
consequence many places have 'copy' and then 'destroy'. That kind
of clean up can be done once everything appears to be solid.
* data/skeletons/glr2.cc: Be more rigorous in object lifetime.
In particular, don't forget to discard the lookahead when we're done
with it.
Call variant routines where needed.
Deal with plenty of details.
(b4_call_merger): Add support for variants.
Use references in mergers, rather than pointers.
* examples/c++/glr/c++-types.yy: Exercise variants.
This macro is not exposed to users, make start it with 'YY_'.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* src/parse-gram.c, tests/actions.at, tests/c++.at, tests/headers.at,
* tests/local.at (YYUSE): Rename as...
(YY_USE): this.
See "glr.c: log the execution of deferred actions".
* data/skeletons/glr2.cc (yyuserAction): Take yyk as a new argument.
Rename argument yyn as yyrule for clarity.
Log before and after the user action.
Adjust callers to not call YY_REDUCE_PRINT and YY_SYMBOL_PRINT.
Reported by Jot Dot.
https://lists.gnu.org/r/help-bison/2020-12/msg00014.html
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): Use
the symbol's slot, not its type.
* examples/c/glr/c++-types.y: Use explicit per-symbol typing together
with api.value.type=union.
(yylex): Use yytoken_kind_t.
Don't generate C code from bison, leave that to the skeletons.
* src/output.c (merger_output): Emit invocations to b4_call_merger.
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): New.
Now that the lookahead macros (that used yystackp) are out of the way,
there is no reason to continue using a pointer.
* data/skeletons/glr2.cc: Use yystack, a reference, rather that
yystackp, a pointer.
Fix tons of const-correctness issues.
Now that we no longer play dangerous games with macros, we can give
the lookahead's token kind its proper name. The content of yychar
_is_ raw (as opposed to yytoken), there's no reason to pleonasmicate
it (and thus to neologize).
* data/skeletons/glr2.cc (glr_stack::yyrawchar): Rename as...
(glr_stack::yychar): this.
In glr.c, the macros yychar, yylval and yylloc allow to deal with
api.pure: sometimes they point to global variables (impure), sometimes
they point to the member variables (pure).
There's no room for globals in glr2.cc. Besides, they map yychar to
yyrawchar, yylval to yyval, etc. which obfuscates what is actually
going on.
* data/skeletons/glr2.cc (glr_stack::yyval, glr_stack::yyloc): Rename
as...
(glr_stack::yylval, glr_stack::yylloc): these, for clarity.
(yynerrs, yychar, yylval, yylloc, yystackp): Remove these macros.
(b4_yygetToken_call): Remove.
Restore a more natural order: first define the macros and then use
them. Currently, some macros were defined between the moment the
header is issued, and then the implementation file. As a result, it
was possible for the header and the implementation to not use the same
versions of the macros.
* data/skeletons/glr2.cc: Define the macros first, then use them.
* data/skeletons/lalr1.cc: Minor comment and quoting changes.
* data/skeletons/glr2.cc: Define value_type and location_type where
needed, and use them only.
(yyuserMerge): Make it a member function of the glr_state class.
We always refer to the triplet "kind, value, location". All of them
are nouns, and we support api.value.type and api.location.type. On
this regard, "semantic_type" was a poor choice. Make it "value_type".
The test suite was not updated to use value_type, on purpose, to
enforce backward compatibility.
* data/skeletons/c++.m4, data/skeletons/glr.cc, data/skeletons/glr2.cc,
* data/skeletons/variant.hh, doc/bison.texi: Define value_type rather
than semantic_type.
Add a backward compatibility typedef.
* examples/c++/glr/c++-types.yy: Migrate.
* data/skeletons/glr2.cc (glr_state_set::yyremoveDeletes): Use
vector::resize rather than vector::erase.
(glr_state::copyFrom): Merge into...
(glr_state::operator=): here.
Valentin wanted each assignment to be explicit, hence copyFrom rather
that operator=. But in 0a82316e54
(glr2.cc: example: use objects (not pointers) to represent the AST),
in order to get real objects to be processed correctly, we had to
introduce the assignment operator. Afterward, we also introduced a
full implementation of the copy-ctor, independent of copyFrom. As a
result, today the only invocation of copyFrom is from the assignment
operator. Simplify this.
With GCC10, the CI shows tons of warnings such as
(327. actions.at:374: testing Initial location: glr2.cc):
input.cc: In member function 'YYRESULTTAG glr_stack::yyglrReduce(state_set_index, rule_num, bool)':
input.cc:1357:11: error: '<anonymous>.glr_state::yyloc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1357 | yyloc = other.yyloc;
| ~~~~~~^~~~~~~~~~~~~
This is because we don't have the constructors for locations. But we
should have them! That's only because of glr.cc that ctors were not
enabled by default. In glr2.cc, they should.
That fixes all the warnings when Bison's locations are used. However,
when user-defined locations without constructor are used, we still
have:
550. calc.at:1409: testing Calculator glr2.cc %locations api.location.type={Span} ...
calc.cc: In member function 'YYRESULTTAG glr_stack::yyglrReduce(state_set_index, rule_num, bool)':
calc.cc:1261:11: error: '<anonymous>.glr_state::yyloc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1261 | yyloc = other.yyloc;
| ~~~~~~^~~~~~~~~~~~~
To address this case, we need glr_state to explicily initialize its
yyloc member.
* data/skeletons/glr2.cc: Use genuine objects, with ctors, for position
and location.
(glr_state): Explicitly initialize yyloc in the constructors.
The copy constructor was (lazily) implemented by a call to copyFrom.
Unfortunately copyFrom reads yyresolved from the destination (and
source), and in the case of the copy-ctor this is random garbagge,
which UBSAN catches:
glr-regr2a.cc:1072:10: runtime error: load of value 7, which is not a valid value for type 'bool'
Rather than defining yyresolved before calling copyFrom, let's just
provide a genuine cpy-ctor for glr_state.
* data/skeletons/glr2.cc (glr_state::glr_state): Implement properly.
In yycompressStack:
while (yyr != YY_NULLPTR)
{
nextFreeItem->check_ ();
yyr->check_();
nextFreeItem->setState(*yyr);
glr_state& nextFreeState = nextFreeItem->getState();
yyr = yyr->pred();
nextFreeState.setPred(&(nextFreeItem - 1)->getState());
setFirstTop(&nextFreeState);
++nextFreeItem;
}
it is possible that nextFreeItem and yyr are actually the same state.
In which case `nextFreeItem->setState(*yyr)` does really bad things.
* data/skeletons/glr2.cc (glr_stack_item::setState): Beware of
self-assignment.
* data/skeletons/glr.c: A bit more doc.
(yypstates): Rename yyst (only occurrence) to yys (commonly used for
yyGLRState).
* data/skeletons/glr2.cc: Ditto.
Prefer '\n' to "\n".
examples/c++/glr/c++-types.cc:721:24: error:
expected the class name after '~' to name a destructor
yysval.YYSTYPE::~semantic_type ();
^
Using a local typedef, for some reaon, result in clang complaining
about a useless local typedef. Since anyway we don't want to keep on
using YYSTYPE and YYLTYPE, it is time to introduce proper typedefs to
reach these guys. And to be slightly in advance of the other
skeletons: use value_type, not semantic_type. This is much more
consistent with our use of the (kind, value, location) triplet.
* data/skeletons/glr2.cc (glr_state::value_type)
(glr_state::location_type): New.
(glr_state::~glr_state): Use value_type to name the dtor.
Currently we are using pointers. The whole point of
glr2.cc (vs. glr.cc) is precisely to allow genuine C++ objects to be
semantic values. Let's make that work.
* data/skeletons/glr2.cc (glr_state::glr_state): Be sure to initialize
yysval.
(glr_state): Add copy-ctor, assignment and dtor.
(glr_state::copyFrom): Be sure to initialize the destination if it was
not.
(glr_state::~glr_state): Destroy the semantic value.
* examples/c++/glr/ast.hh: Rewrite so that we use genuine objects,
rather than a traditional OOP hierarchy that requires to deal with
pointers.
With help from Bruno Belanyi <bruno.belanyi@epita.fr>.
* examples/c++/glr/c++-types.yy: Remove memory management.
Use true objects.
(main): Don't reach yydebug directly.
* examples/c++/glr/local.mk: We need C++11.