When a pstate is used for multiple successive parses, some state may
leak from one run into the following one. That was introduced in
330552ea49 "yacc.c: push: don't clear
the parser state when accepting/rejecting".
Reported by Ryan <dev@splintermail.com>
https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html
* data/skeletons/yacc.c (yypush_parse): We reusing a pstate from a
previous run, do behave as if it were the first run.
* tests/push.at (Pstate reuse): Check this.
The right-shift added in c22902e360
("tables: fix handling for useless tokens") is incorrect. In
particular, we need to reset the "new" bits.
Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/74
* src/tables.c (pos_set_set): Fix the right-shift.
In some rare conditions, the generated parser can be wrong when there
are useless tokens.
Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/72
Balázs managed to prove that the bug was introduced in
commit af1c6f973a
Author: Theophile Ranquet <ranquet@lrde.epita.fr>
Date: Tue Nov 13 10:38:49 2012 +0000
tables: use bitsets for a performance boost
Suggested by Yuri at
<http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.
The improvement is marginal for most grammars, but notable for large
grammars (e.g., PosgreSQL's postgre.y), and very large for the
sample.y grammar submitted by Yuri in
http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
Measured with --trace=time -fsyntax-only.
parser action tables postgre.y sample.y
Before 0,129 (44%) 37,095 (99%)
After 0,117 (42%) 5,046 (93%)
* src/tables.c (pos): Replace this set of integer coded as an unsorted
array of integers with...
(pos_set): this bitset.
which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.
That patch introduces a bitset to represent a set of integers. It
managed negative integers by using a (fixed) base (the smallest
integer to represent). It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.
So we need to be able to accept negative integers that are smaller
than the current base.
"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:
/* Store PLACE into POS_SET. PLACE might not belong to the set
of possible values for instance with useless tokens. It
would be more satisfying to eliminate the need for this
'if'. */
This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases
* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
There were several bugs in pruning that would leave the state-item
graph in an inconsistent state which could cause crashes later on:
- Pruning now happens in one pass instead of two.
- Disabled state-items no longer prune the state-items they transition
to if that state-item has other states that transition to it.
- State-items that transition to disabled state-items are always
pruned even if they have productions.
Reported by Michal Bartkowiak <michal.bartkowiak@nokia.com>
https://lists.gnu.org/r/bug-bison/2021-01/msg00000.html
and Zartaj Majeed
https://github.com/akimd/bison/issues/71
* src/state-item.c (prune_forward, prune_backward): Fuse into...
(prune_state_item): this.
Adjust callers.
Currently each time we meet %merge we record this location as the
defining location (and symbol). Instead, record the first definition.
In the generated code we go from
yy0->A = merge (*yy0, *yy1);
to
yy0->S = merge (*yy0, *yy1);
where S was indeed the first symbol, and in the diagnostics we go from
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type2>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:30.18-24: note: previous declaration
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
to
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type1>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
where both duplicates are reported against definition 1, rather than
using definition 1 as a reference when diagnosing about definition 2,
and then 2 as a reference for 3.
* src/reader.c (record_merge_function_type): Keep the first definition.
* tests/glr-regression.at: Adjust.
Reported by Jot Dot.
https://lists.gnu.org/r/help-bison/2020-12/msg00014.html
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): Use
the symbol's slot, not its type.
* examples/c/glr/c++-types.y: Use explicit per-symbol typing together
with api.value.type=union.
(yylex): Use yytoken_kind_t.
Don't generate C code from bison, leave that to the skeletons.
* src/output.c (merger_output): Emit invocations to b4_call_merger.
* data/skeletons/glr.c (b4_call_merger): New.
Symbols are richer than types, and in M4 it is my simpler (and more
common) to deal with symbols rather than types. So let's associate
mergers to a symbol rather than a type name.
* src/reader.h (merger_list): Replace the 'type' member by a symbol
member.
* src/reader.c (record_merge_function_type): Take a symbol as
argument, rather than a type name.
* src/output.c (merger_output): Adjust.
This macro is not exposed to users, make start it with 'YY_'.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* src/parse-gram.c, tests/actions.at, tests/c++.at, tests/headers.at,
* tests/local.at (YYUSE): Rename as...
(YY_USE): this.
When comparing traces from different machines, the mixture of
stdout/stderr in the output are making things uselessly difficult.
* src/lssi.c, src/state-item.c: Output debug traces on stderr.
When using glr.cc, the C function yyparse is an internal detail that
should not be exposed. Users might call it by accident (I did).
* data/skeletons/glr.c (yyparse): When used for glr.cc, rename as yy_parse_impl.
* data/skeletons/glr.cc: Adjust.
The yydefgoto table uses -1 as an invalid for an impossible case (we
never use yydefgoto[0], since it corresponds to the reduction to
$accept, which never happens). Since yydefgoto is a table of state
numbers, this -1 forces a signed type uselessly, which (1) might
trigger compiler warnings when storing a value from yydefgoto into a
state number (nonnegative), and (2) wastes bits which might result in
using a int16 where a uint8 suffices.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00027.html
* src/tables.c (default_goto): Use 0 rather than -1 as invalid value.
* tests/regression.at: Adjust.
Before:
YY_ASSERT (tok == token::YYEOF || tok == token::YYerror || tok == token::YYUNDEF || tok == 120 || tok == 49 || tok == 50 || tok == 51 || tok == 52 || tok == 53 || tok == 54 || tok == 55 || tok == 56 || tok == 57 || tok == 97 || tok == 98);
After:
YY_ASSERT (tok == token::YYEOF
|| (token::YYerror <= tok && tok <= token::YYUNDEF)
|| tok == 120
|| (49 <= tok && tok <= 57)
|| (97 <= tok && tok <= 98));
Clauses are now also wrapped on several lines. This is nicer to read
and diff, but also avoids pushing Visual C++ to its arbitrary
limits (640K and lines of 16380 bytes ought to be enough for anybody,
otherwise make an C2026 error).
The useless parens are there for the dummy warnings about
precedence (in the future, will we also have to put parens in
`1+2*3`?).
* data/skeletons/variant.hh (_b4_filter_tokens, b4_tok_in, b4_tok_in):
New.
(_b4_token_constructor_define): Use them.
Working on the previous commit I realized that YY_ASSERT was used in
the generated headers, so must follow api.prefix to avoid clashes when
multiple C++ parser with variants are used.
Actually many more macros should obey api.prefix (YY_CPLUSPLUS,
YY_COPY, etc.). There was no complaint so far, so it's not urgent
enough for 3.7.4, but it should be addressed in 3.8.
* data/skeletons/variant.hh (b4_assert): New.
Use it.
* tests/local.at (AT_YYLEX_RETURN): Fix.
* tests/headers.at: Make sure variant-based C++ parsers are checked
too.
This test did find that YY_ASSERT escaped renaming (before the fix in
this commit).
In some extreme situations (about 800 tokens), we generate a
single-line assertion long enough for Visual C++ to discard the end of
the line, thus falling into parse ends for the missing `);`. On a
shorter example:
YY_ASSERT (tok == token::TOK_YYEOF || tok == token::TOK_YYerror || tok == token::TOK_YYUNDEF || tok == token::TOK_ASSIGN || tok == token::TOK_MINUS || tok == token::TOK_PLUS || tok == token::TOK_STAR || tok == token::TOK_SLASH || tok == token::TOK_LPAREN || tok == token::TOK_RPAREN);
Whether NDEBUG is used or not is irrelevant, the parser dies anyway.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00002.html
We should avoid emitting lines so long.
We probably should also use a range-based assertion (with extraneous
parens to pacify fascist compilers):
YY_ASSERT ((token::TOK_YYEOF <= tok && tok <= token::TOK_YYUNDEF)
|| (token::TOK_ASSIGN <= tok && ...)
But anyway, we should simply not emit this assertion at all when not
asked for.
* data/skeletons/variant.hh: Do not define, nor use, YY_ASSERT when it
is not enabled.
When generating a C parser, YYEMPTY is present in enum yytokentype but
there is no corresponding #define like there is for the other values.
There is a special case for YYEMPTY in b4_token_enums but no
corresponding case in b4_token_defines.
* data/skeletons/c.m4 (b4_token_defines): Do define YYEMPTY.