I am not aware of people subclassing the parser class, and I fail to
see how this could be useful. Rather than leaving a badly baked
feature (as in glr.cc currently), let's not support it at all, until
someone comes and explains why and how it would be useful.
* data/skeletons/glr2.cc (parser): We need no virtual function members.
It's on purpose that I keep the `this->` now. We'll see later if we
want to remove them.
* data/skeletons/glr2.cc (yygetToken): Move into...
(glr_stack::yyget_token): this.
(b4_lex): Adjust.
Currently we have two classes that actually should be fused together:
parser and glr_stack. Both carry part of the parsing: (i) parser
contains `parse`, which is the top-level of the parsing process, it
uses yygetToken, etc., and (ii) glr_stack takes care of all the
details (dealing with the stack), and also calls yygetToken.
However if we fuse them together, we would get a large parser class,
in the header file. So it is probably better to split this large
class using the pimpl idiom. But then it appears that glr_stack is
very close from being the impl of parser.
Let's improve this.
For a start...
* data/skeletons/glr2.cc (parser::parse): Move to...
(glr_stack::parse): here.
(parser::parse): Use it.
When comparing traces from different machines, the mixture of
stdout/stderr in the output are making things uselessly difficult.
* src/lssi.c, src/state-item.c: Output debug traces on stderr.
* data/skeletons/glr2.cc: We no longer play dirty tricks with
parse-params, remove the now useless dance around them.
We now have genuine objects for locations, leave the initial action
alone.
* data/skeletons/glr2.cc: Add support for api.token.constructor.
* examples/c++/glr/c++-types.yy: Use it.
* examples/c++/glr/c++-types.test: Adjust expectations for error
messages.
* data/skeletons/c++.m4 (b4_yytranslate_define): Use static_cast
rather than the YY_CAST macro.
Avoids the need to define YY_CAST in the header.
* data/skeletons/glr2.cc: Fix calls to b4_shared_declarations: pass
the type of file we are in.
Don't define YYTRANSLATE.
(parser::yytranslate_): New, as in lalr1.cc.
Adjust to use it.
* tests/glr-regression.at: Adjust.
Instead of tracking the lookahead with yychar, use yytoken. This is
consistent with lalr1.cc, saves us from calls to YYTRANSLATE (except
when calling yylex), and makes it easier to migrate to using
symbol_type.
* data/skeletons/glr2.cc: Replace all uses of `int yychar` with
`symbol_kind_type yytoken`.
(yygetToken): Don't take/return the lookahead's token-kind and
symbol-kind, just work directly on yystack's `yytoken` member.
* tests/glr-regression.at (AT_PRINT_LOOKAHEAD_DECLARE)
(AT_PRINT_LOOKAHEAD_DEFINE): New.
Adjust to the fact that we have yytoken, not yychar, in glr2.cc.
* data/skeletons/glr2.cc (b4_symbol_action): New, so that we use
`yyval` and `yyloc` rather than the `yyvaluep` and `yylocationp`
pointers.
(yy_symbol_value_print_, yy_symbol_print_, yy_destroy_): Use
references rather than pointers.
Drop support for the undocumented, obsolete, `yyoutput` variable.
Adjust callers.
(glr_state::destroy): Don't use yy_symbol_print_ when we don't have a
symbol. Rather, write dedicated code.
The locations are actually false: they should include the location of
the semicolon (we print statements), but they don't.
* examples/c++/glr/c++-types.test, examples/c++/glr/c++-types.yy,
* examples/c/glr/c++-types.test, examples/c/glr/c++-types.y,
* tests/cxx-type.at:
Fix locations and adjust expectations.
This will be useful to support changes in glr2.cc.
tests/glr-regression.at
(Incorrect lookahead during deterministic GLR)
(Incorrect lookahead during nondeterministc GLR):
Introduce and use PRINT_LOOKAHEAD.
Without the history, D should not support this option. Before the
removal, 'detailed' and 'verbose' options generated the same code.
* data/skeletons/lalr1.d: Here.
* doc/bison.texi: Adapt tests to use 'detailed' instead of 'verbose'.
* tests/calc.at: Document it.
Currently we display the addresses of the semantic values. Instead,
print the values.
Add support for YY_USE across languages.
* data/skeletons/c.m4, data/skeletons/d.m4 (b4_use): New.
* data/skeletons/bison.m4 (b4_symbol_actions): Use b4_use to be
portable to D.
Add support for %printer, and use it.
* data/skeletons/d.m4: Instead of duplicating what's already in
c-like.m4, include it.
(b4_symbol_action): New.
Differs from the one in bison.m4 in that it uses yyval/yyloc instead
of *yyvaluep and *yylocationp.
* data/skeletons/lalr1.d (yy_symbol_print): Avoid calls to formatting,
just call write directly.
Use the %printer.
* examples/d/calc/calc.y: Specify a printer.
Enable traces when $YYDEBUG is set.
* tests/calc.at: Fix the use of %printer with D.
The D parser implements this feature similarly to the C parser,
by using Gettext. Functions gettext() and dgettext() are
imported using extern(C). The internationalisation uses yysymbol_name
to report the name of the SymbolKinds.
* data/skeletons/d.m4 (SymbolKind.toString.yytranslatable,
SymbolKind.toString.yysymbol_name: New), data/skeletons/lalr1.d: Here.
* doc/bison.texi: Document it.
* tests/calc.at: Test it.
(Bison) Variants are extremely picky, which makes them both
annoying (lots of micro-details must be taken care of) and
precious (all the micro-details must be taken care of, in particular
object lifetime).
So (i) each time a semantic value is stored, it must be stored in a
place that exists, and (ii) each time a semantic value is discarded,
its place must have been emptied.
Example of (i)
- new (&yys.value ()) value_type (s->value ());
+ {]b4_variant_if([[
+ new (&yys.value ()) value_type ();
+ ]b4_symbol_variant([yy_accessing_symbol (s->yylrState)],
+ [yys.value ()], [copy], [s->value ()])], [[
+ new (&yys.value ()) value_type (s->value ());]])[
+ }
Example of (ii)
yyparser.yy_destroy_ ("Error: discarding",
- yytoken, &yylval]b4_locations_if([, &yylloc])[);
+ yytoken, &yylval]b4_locations_if([, &yylloc])[);]b4_variant_if([[
+ // Value type destructor.
+ ]b4_symbol_variant([[YYTRANSLATE (this->yychar)]], [[yylval]], [[template destroy]])])[
this->yychar = ]b4_namespace_ref[::]b4_parser_class[::token::]b4_symbol(empty, id)[;
However, in some places we must not be "pure". In particular:
glr_stack_item (const glr_stack_item& other) YY_NOEXCEPT YY_NOTHROW
: is_state_ (other.is_state_)
{
std::memcpy (raw_, other.raw_, union_size);
}
still must use memcpy, because the constructor would change pred, and
it must not. This constructor is used only when resizing the stack,
in which case pred (which is relative) must not be "adjusted".
The result works, but is messy. Its verbosity comes from at least two
factors:
- we don't have support for complete symbols (binding kind, value and
location), and we should at least try to have it. That simplified
lalr1.cc a lot.
- I have not tried to be smart and use 'move' when possible. As a
consequence many places have 'copy' and then 'destroy'. That kind
of clean up can be done once everything appears to be solid.
* data/skeletons/glr2.cc: Be more rigorous in object lifetime.
In particular, don't forget to discard the lookahead when we're done
with it.
Call variant routines where needed.
Deal with plenty of details.
(b4_call_merger): Add support for variants.
Use references in mergers, rather than pointers.
* examples/c++/glr/c++-types.yy: Exercise variants.
On some experimentation I was running, the test suite was passing, yet
the example crashed when run in verbose mode. Let's add this case to
the test suite.
* tests/cxx-type.at: Run all these tests in verbose mode too.
* examples/c/glr/c++-types.y: Flush stdout so that the logs (on
stderr) and the effective output (on stdout) mix correctly.
While at it, be a bit more const-correct.
This macro is not exposed to users, make start it with 'YY_'.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* src/parse-gram.c, tests/actions.at, tests/c++.at, tests/headers.at,
* tests/local.at (YYUSE): Rename as...
(YY_USE): this.
See "glr.c: log the execution of deferred actions".
* data/skeletons/glr2.cc (yyuserAction): Take yyk as a new argument.
Rename argument yyn as yyrule for clarity.
Log before and after the user action.
Adjust callers to not call YY_REDUCE_PRINT and YY_SYMBOL_PRINT.
Currently deferred reductions are not "verbose" at all: only immediate
reductions are displayed in the YYDEBUG traces. I don't understand
why. Besides it seems actually simpler the install the reduction
traces right around the user action inside yyuserAction rather that
around calls to yyuserAction.
This only trouble is that yyuserAction does not know the stack number
it works on, so we have to pass it. And pass -1 when we are actually
running on a temporary stack.
The glr example, on "T(x) + y;" as input, adds these logs, which
allow to see when the `<cast>` is built:
Stack 0 Entering state 26
Reduced stack 0 by rule 7 (line 108); action deferred. Now in state 7.
Stack 0 Entering state 7
Reading a token
Next token is token '+' (1.6: )
Stack 1 Entering state 27
Reduced stack 1 by rule 13 (line 123); action deferred. Now in state 12.
Stack 1 Entering state 12
Next token is token '+' (1.6: )
Stack 1 dies.
Removing dead stacks.
On stack 0, shifting token '+' (1.6: )
Stack 0 now in state #14
+Reducing stack -1 by rule 6 (line 107):
+ $1 = token identifier (1.3: x)
+-> $$ = nterm expr (1.3: x)
+Reducing stack -1 by rule 7 (line 108):
+ $1 = token typename (1.0: T)
+ $2 = token '(' (1.2: )
+ $3 = nterm expr (1.3: x)
+ $4 = token ')' (1.4: )
+-> $$ = nterm expr (1.0-3: <cast>(x,T))
Returning to deterministic operation.
* data/skeletons/glr.c (yyuserAction): Take yyk as a new argument.
Rename argument yyn as yyrule for clarity.
Log before and after the user action.
Adjust callers to not call YY_REDUCE_PRINT and YY_SYMBOL_PRINT.
The next commit wants to use YY_REDUCE_PRINT above its current
definition. Move it higher.
* data/skeletons/glr.c (yylhsNonterm, YY_REDUCE_PRINT): Make available
earlier.
* examples/c/glr/c++-types.y (node_print): New.
Use YY_LOCATION_PRINT instead of duplicating it.
And actually use it in the action instead of badly duplicating it.
(main): Add proper option support.
* examples/c/glr/c++-types.test: Adjust expectations on locations.
* examples/c++/glr/c++-types.yy: Fix bad iteration.
Currently each time we meet %merge we record this location as the
defining location (and symbol). Instead, record the first definition.
In the generated code we go from
yy0->A = merge (*yy0, *yy1);
to
yy0->S = merge (*yy0, *yy1);
where S was indeed the first symbol, and in the diagnostics we go from
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type2>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:30.18-24: note: previous declaration
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
to
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type1>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
where both duplicates are reported against definition 1, rather than
using definition 1 as a reference when diagnosing about definition 2,
and then 2 as a reference for 3.
* src/reader.c (record_merge_function_type): Keep the first definition.
* tests/glr-regression.at: Adjust.
Reported by Jot Dot.
https://lists.gnu.org/r/help-bison/2020-12/msg00014.html
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): Use
the symbol's slot, not its type.
* examples/c/glr/c++-types.y: Use explicit per-symbol typing together
with api.value.type=union.
(yylex): Use yytoken_kind_t.
Don't generate C code from bison, leave that to the skeletons.
* src/output.c (merger_output): Emit invocations to b4_call_merger.
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): New.