When a pstate is used for multiple successive parses, some state may
leak from one run into the following one. That was introduced in
330552ea49 "yacc.c: push: don't clear
the parser state when accepting/rejecting".
Reported by Ryan <dev@splintermail.com>
https://lists.gnu.org/r/bug-bison/2021-03/msg00000.html
* data/skeletons/yacc.c (yypush_parse): We reusing a pstate from a
previous run, do behave as if it were the first run.
* tests/push.at (Pstate reuse): Check this.
The right-shift added in c22902e360
("tables: fix handling for useless tokens") is incorrect. In
particular, we need to reset the "new" bits.
Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/74
* src/tables.c (pos_set_set): Fix the right-shift.
In some rare conditions, the generated parser can be wrong when there
are useless tokens.
Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/72
Balázs managed to prove that the bug was introduced in
commit af1c6f973a
Author: Theophile Ranquet <ranquet@lrde.epita.fr>
Date: Tue Nov 13 10:38:49 2012 +0000
tables: use bitsets for a performance boost
Suggested by Yuri at
<http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.
The improvement is marginal for most grammars, but notable for large
grammars (e.g., PosgreSQL's postgre.y), and very large for the
sample.y grammar submitted by Yuri in
http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
Measured with --trace=time -fsyntax-only.
parser action tables postgre.y sample.y
Before 0,129 (44%) 37,095 (99%)
After 0,117 (42%) 5,046 (93%)
* src/tables.c (pos): Replace this set of integer coded as an unsorted
array of integers with...
(pos_set): this bitset.
which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.
That patch introduces a bitset to represent a set of integers. It
managed negative integers by using a (fixed) base (the smallest
integer to represent). It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.
So we need to be able to accept negative integers that are smaller
than the current base.
"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:
/* Store PLACE into POS_SET. PLACE might not belong to the set
of possible values for instance with useless tokens. It
would be more satisfying to eliminate the need for this
'if'. */
This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases
* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
There were several bugs in pruning that would leave the state-item
graph in an inconsistent state which could cause crashes later on:
- Pruning now happens in one pass instead of two.
- Disabled state-items no longer prune the state-items they transition
to if that state-item has other states that transition to it.
- State-items that transition to disabled state-items are always
pruned even if they have productions.
Reported by Michal Bartkowiak <michal.bartkowiak@nokia.com>
https://lists.gnu.org/r/bug-bison/2021-01/msg00000.html
and Zartaj Majeed
https://github.com/akimd/bison/issues/71
* src/state-item.c (prune_forward, prune_backward): Fuse into...
(prune_state_item): this.
Adjust callers.
Before:
YY_ASSERT (tok == token::YYEOF || tok == token::YYerror || tok == token::YYUNDEF || tok == 120 || tok == 49 || tok == 50 || tok == 51 || tok == 52 || tok == 53 || tok == 54 || tok == 55 || tok == 56 || tok == 57 || tok == 97 || tok == 98);
After:
YY_ASSERT (tok == token::YYEOF
|| (token::YYerror <= tok && tok <= token::YYUNDEF)
|| tok == 120
|| (49 <= tok && tok <= 57)
|| (97 <= tok && tok <= 98));
Clauses are now also wrapped on several lines. This is nicer to read
and diff, but also avoids pushing Visual C++ to its arbitrary
limits (640K and lines of 16380 bytes ought to be enough for anybody,
otherwise make an C2026 error).
The useless parens are there for the dummy warnings about
precedence (in the future, will we also have to put parens in
`1+2*3`?).
* data/skeletons/variant.hh (_b4_filter_tokens, b4_tok_in, b4_tok_in):
New.
(_b4_token_constructor_define): Use them.
Working on the previous commit I realized that YY_ASSERT was used in
the generated headers, so must follow api.prefix to avoid clashes when
multiple C++ parser with variants are used.
Actually many more macros should obey api.prefix (YY_CPLUSPLUS,
YY_COPY, etc.). There was no complaint so far, so it's not urgent
enough for 3.7.4, but it should be addressed in 3.8.
* data/skeletons/variant.hh (b4_assert): New.
Use it.
* tests/local.at (AT_YYLEX_RETURN): Fix.
* tests/headers.at: Make sure variant-based C++ parsers are checked
too.
This test did find that YY_ASSERT escaped renaming (before the fix in
this commit).
In some extreme situations (about 800 tokens), we generate a
single-line assertion long enough for Visual C++ to discard the end of
the line, thus falling into parse ends for the missing `);`. On a
shorter example:
YY_ASSERT (tok == token::TOK_YYEOF || tok == token::TOK_YYerror || tok == token::TOK_YYUNDEF || tok == token::TOK_ASSIGN || tok == token::TOK_MINUS || tok == token::TOK_PLUS || tok == token::TOK_STAR || tok == token::TOK_SLASH || tok == token::TOK_LPAREN || tok == token::TOK_RPAREN);
Whether NDEBUG is used or not is irrelevant, the parser dies anyway.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00002.html
We should avoid emitting lines so long.
We probably should also use a range-based assertion (with extraneous
parens to pacify fascist compilers):
YY_ASSERT ((token::TOK_YYEOF <= tok && tok <= token::TOK_YYUNDEF)
|| (token::TOK_ASSIGN <= tok && ...)
But anyway, we should simply not emit this assertion at all when not
asked for.
* data/skeletons/variant.hh: Do not define, nor use, YY_ASSERT when it
is not enabled.
When generating a C parser, YYEMPTY is present in enum yytokentype but
there is no corresponding #define like there is for the other values.
There is a special case for YYEMPTY in b4_token_enums but no
corresponding case in b4_token_defines.
* data/skeletons/c.m4 (b4_token_defines): Do define YYEMPTY.
Commit af000bab11 ("doc: work around
Texinfo 6.7 bug"), published in 3.4.91, added a dependency on the
"all" target.
This is a super bad idea, since "make all" will run this
target *before* "all", which builds bison. It turns out that this new
dependency actually needed bison to be built. So all the regular
process (i) build $(BUILT_SOURCES) and then (ii) build bison, was
wrecked since some of the $(BUILT_SOURCES) depended on bison...
It was "easy" to see in the logs of "make V=1" because we were
building bison files (such as src/files.o) *before* displaying the
banner for "all-recursive". With this fix, we finally get again the
proper sequence:
rm -f examples/c/reccalc/scan.stamp examples/c/reccalc/scan.stamp.tmp
/opt/local/libexec/gnubin/mkdir -p examples/c/reccalc
touch examples/c/reccalc/scan.stamp.tmp
flex -oexamples/c/reccalc/scan.c --header=examples/c/reccalc/scan.h ./examples/c/reccalc/scan.l
mv examples/c/reccalc/scan.stamp.tmp examples/c/reccalc/scan.stamp
rm -f lib/fcntl.h-t lib/fcntl.h && \
{ echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
...
} > lib/fcntl.h-t && \
mv lib/fcntl.h-t lib/fcntl.h
...
mv -f lib/alloca.h-t lib/alloca.h
make all-recursive
Reported by Mingli Yu <mingli.yu@windriver.com>.
https://github.com/akimd/bison/issues/31https://lists.gnu.org/r/bison-patches/2020-05/msg00055.html
Reported by Claudio Calvelli <bugb@w42.org>.
https://lists.gnu.org/r/bug-bison/2020-09/msg00001.htmlhttps://bugs.gentoo.org/716516
* doc/local.mk (all): Rename as...
(all-local): this.
So that we don't compete with BUILT_SOURCES.
From
foo.y:1.7-11: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~~~
foo.y:1.7-11: note: previous declaration
1 | %type <foo> bar bar
| ^~~~~
to
foo.y:1.17-19: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~
foo.y:1.13-15: note: previous declaration
1 | %type <foo> bar bar
| ^~~
* src/symlist.h, src/symlist.c (symbol_list_type_set): There's no need
for the tag's location, use that of the symbol.
* src/parse-gram.y: Adjust.
* tests/input.at: Adjust.
The code was written on top of buffers of `char[26]`, and then was
changed to use `char *`, yet was still using `sizeof buf`, which
became `sizeof (char *)` instead of `sizeof (char[26])`.
Reported by Dagobert Michelsen.
https://lists.gnu.org/r/bug-bison/2020-07/msg00023.html
* src/glyphs.h, src/glyphs.c: Get rid of uses of `char *`, use only
glyph_buffer_t.
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
* examples/c/bistromathic/parse.y (user_context): We need the current
line.
(yyreport_syntax_error): Quote the guilty line, with squiggles.
* examples/c/bistromathic/bistromathic.test: Adjust.