This test fails:
748: Incorrect lookahead during nondeterministic GLR: glr2.cc
It consumes lots of stack space, so at some point we need to expand
it. Because of Boolean logic mistakes, we then claim
memory-exhausted (first error). Hence we jump to cleaning the
stack (popall_), calling all the destructors, and at some point we
crash with heap-use-after-free (second error).
This commit fixes the first error. Unfortunately, even though we now
do expand the stack, we crash again with (another)
heap-use-after-free, not addressed here.
Eventually, we should make sure popall_() properly works.
* data/skeletons/glr2.cc (yyexpandGLRStackIfNeeded): Return true iff
success (i.e., memory not exhausted).
And also, remove the incorrect indentation of these comments:
- /* YYR2[YYN] -- Number of symbols on the right hand side of rule YYN. */
+/* YYR2[RULE-NUM] -- Number of symbols on the right-hand side of rule RULE-NUM. */
static const yytype_int8 yyr2[] =
{
0, 2, 4, 0, 2, 1, 1, 1, 3, 2,
I don't remember why this indentation was added (in
0991e29b75), but it seems wrong,
at least for yacc.c. I suspect this was done with lalr1.cc (where
this is embeded in the class definition, so it should be indented),
but today lalr1.cc uses other routines to output these comments.
* data/skeletons/bison.m4 (b4_integral_parser_tables_map): Improve the
wording of the comments of some tables.
* data/skeletons/c.m4 (b4_integral_parser_table_define): Remove
indentation.
A glr_stack_item has "raw" memory to store either a glr_state or a
semantic_option. glr_stack_item::setState stores a state using a copy
assignment. However, this is more like a construction: we are
starting from "raw" memory, so use the placement new operator instead.
While it probably makes no difference when parse.assert is disabled,
it does make one when it is: the constructor properly initialize the
magic number, the assignment does not. So without these changes, the
next commit (which stores genuine objects in semantic values) fails
tests 712 and 730 because of incorrect magic numbers.
* data/skeletons/glr2.cc (glr_stack_item::setState): Build the state,
don't just copy it.
Amusingly enough, glr2.cc still had its core function, yyparse, being
a free function instead of a member function.
* data/skeletons/glr2.cc (yyparse): Remove this free function called
from yyparser::parse. Inline its body into...
(yyparser::parse): this member function.
This requires moving a bit the yychar, etc. macros.
Access to token can be simplified (the
b4_namespace_ref::b4_parser_class prefix is no longer needed).
Remove the useless conditional b4_pure_if: the skeleton is, of course,
pure (no global variables). Make glr2.cc intrinsically pure.
* data/skeletons/glr2.cc (b4_pure_if): Remove definition and uses.
(b4_lex): New.
Stolen from lalr1.cc to avoid needing to use the one from c.m4.
Currently, yycompressStack expects the free items to be states only.
That's not the case.
Fixes 712 and 730 pass. 748 still fails, but later and
differently (heap-use-after-free).
* data/skeletons/glr2.cc (glr_stack_item::setState): New.
(glr_stack_item::yycompressStack): Use it.
* tests/glr-regression.at: Adjust.
A glr_state keeps tracks of its predecessor using an offset relative
to itself (i.e., pointer subtraction). Unfortunately we sometimes
have to compute offsets for pointers that live in different
containers, in particular in yyfillin. In that case there is no
reason for the distance between the two objects to be a multiple of
the object size (0x40 on my machine), and the resulting ptrdiff_t may
be "wrong", i.e., it does allow to recover one from the other. We
cannot use "typed" pointer arithmetics here, the Euclidean division
has it wrong. So use "plain" char* pointers.
Fixes 718 (Duplicate representation of merged trees: glr2.cc) and
examples/c++/glr/c++-types.
Still XFAIL:
712: Improper handling of embedded actions and dollar(-N) in GLR parsers: glr2.cc
730: Incorrectly initialized location for empty right-hand side in GLR: glr2.cc
748: Incorrect lookahead during nondeterministic GLR: glr2.cc
* data/skeletons/glr2.cc (glr_state::as_pointer_): New.
(glr_state::pred): Use it.
* examples/c++/glr/c++-types.test: The test passes.
* tests/glr-regression.at (Duplicate representation of merged trees:
glr2.cc): Passes.
The use of YY_IGNORE_NULL_DEREFERENCE_BEGIN/END in `check_` is to
please GCC 10:
glr-regr8.cc: In member function 'YYRESULTTAG glr_stack::yyresolveValue(glr_state&)':
glr-regr8.cc:1433:21: error: potential null pointer dereference [-Werror=null-dereference]
1433 | YYASSERT (this->magic_ == MAGIC);
| ~~~~~~^~~~~~
glr-regr8.cc:905:40: note: in definition of macro 'YYASSERT'
905 | # define YYASSERT(Condition) ((void) ((Condition) || (abort (), 0)))
| ^~~~~~~~~
* data/skeletons/glr2.cc (glr_state::check_): New.
Use it in the member functions.
We obviously have broken pointer arithmetics that hands us
glr_stack_items that are not glr_stack_items. Have a simple check for
this, to have earlier failures.
* data/skeletons/glr2.cc (glr_stack_item::check_): New.
Use it.
(glr_stack_item::contents): Avoid the useless struct.
Fix minor stylistic issues.
When "tests: glr2.cc: run the glr-regression tests" tests are run,
before this commit the following tests used to loop endlessly:
709: Badly Collapsed GLR States: glr2.cc FAILED (glr-regression.at:123)
715: Improper merging of GLR delayed action sets: glr2.cc FAILED (glr-regression.at:397)
718: Duplicate representation of merged trees: glr2.cc FAILED (glr-regression.at:495)
751: Leaked semantic values when reporting ambiguity: glr2.cc FAILED (glr-regression.at:1632)
After this commit, no test loops and 709, 715, and 751 pass. Only 718
still fails.
* data/skeletons/glr2.cc (yyresolveValue): Add missing incrementation
of the iteration variable.
* data/skeletons/glr2.cc: Use 'const' on variables and applicable
member functions.
Improve comments.
Use references where applicable.
Enforce names_like_this, notLikeThis.
Reduce scopes.
When using glr.cc, the C function yyparse is an internal detail that
should not be exposed. Users might call it by accident (I did).
* data/skeletons/glr.c (yyparse): When used for glr.cc, rename as yy_parse_impl.
* data/skeletons/glr.cc: Adjust.
We were declaring foo_parse_expr but implementing yyparse_expr.
* data/skeletons/yacc.c (_b4_define_sub_yyparse): Use api.prefix for
the parse function name.
This avoids heap allocation and gives minimal costs for the
creation and destruction of the YYParser.Symbol struct if
the location tracking is active.
Suggested by H. S. Teoh.
* data/skeletons/lalr1.d: Here.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at: Test it.
The complete symbol approach was deemed to be the right approach for Dlang.
Now, the user can return from yylex() an instance of YYParser.Symbol structure,
which binds together the TokenKind, the semantic value and the location. Before,
the last two were reported separately to the parser.
Only the user API is changed, Bisons's internal structure is kept the same.
* data/skeletons/d.m4 (struct YYParser.Symbol): New.
* data/skeletons/lalr1.d: Change the return value.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate it.
* tests/calc.at, tests/scanner.at: Test it.
When using lookahead correction, the method YYParser.Context.getExpectedTokens
is not annotated with const, because the method calls yylacCheck, which is not
const. Also, because of yylacStack and yylacEstablished, yylacCheck needs to
be called from the context of the parser class, which is sent as parameter to
the Context's constructor.
* data/skeletons/lalr1.d (yylacCheck, yylacEstablish, yylacDiscard,
yylacStack, yylacEstablished): New.
(Context): Use it.
* doc/bison.texi: Document it.
* tests/calc.at: Check it.
Changed from syntax_error to reportSyntaxError to be similar to the Java parser.
* data/skeletons/lalr1.d: Change the function name.
* doc/bison.texi: Document it.
* tests/local.at: Adjust.
* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
Before:
YY_ASSERT (tok == token::YYEOF || tok == token::YYerror || tok == token::YYUNDEF || tok == 120 || tok == 49 || tok == 50 || tok == 51 || tok == 52 || tok == 53 || tok == 54 || tok == 55 || tok == 56 || tok == 57 || tok == 97 || tok == 98);
After:
YY_ASSERT (tok == token::YYEOF
|| (token::YYerror <= tok && tok <= token::YYUNDEF)
|| tok == 120
|| (49 <= tok && tok <= 57)
|| (97 <= tok && tok <= 98));
Clauses are now also wrapped on several lines. This is nicer to read
and diff, but also avoids pushing Visual C++ to its arbitrary
limits (640K and lines of 16380 bytes ought to be enough for anybody,
otherwise make an C2026 error).
The useless parens are there for the dummy warnings about
precedence (in the future, will we also have to put parens in
`1+2*3`?).
* data/skeletons/variant.hh (_b4_filter_tokens, b4_tok_in, b4_tok_in):
New.
(_b4_token_constructor_define): Use them.
Working on the previous commit I realized that YY_ASSERT was used in
the generated headers, so must follow api.prefix to avoid clashes when
multiple C++ parser with variants are used.
Actually many more macros should obey api.prefix (YY_CPLUSPLUS,
YY_COPY, etc.). There was no complaint so far, so it's not urgent
enough for 3.7.4, but it should be addressed in 3.8.
* data/skeletons/variant.hh (b4_assert): New.
Use it.
* tests/local.at (AT_YYLEX_RETURN): Fix.
* tests/headers.at: Make sure variant-based C++ parsers are checked
too.
This test did find that YY_ASSERT escaped renaming (before the fix in
this commit).
In some extreme situations (about 800 tokens), we generate a
single-line assertion long enough for Visual C++ to discard the end of
the line, thus falling into parse ends for the missing `);`. On a
shorter example:
YY_ASSERT (tok == token::TOK_YYEOF || tok == token::TOK_YYerror || tok == token::TOK_YYUNDEF || tok == token::TOK_ASSIGN || tok == token::TOK_MINUS || tok == token::TOK_PLUS || tok == token::TOK_STAR || tok == token::TOK_SLASH || tok == token::TOK_LPAREN || tok == token::TOK_RPAREN);
Whether NDEBUG is used or not is irrelevant, the parser dies anyway.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00002.html
We should avoid emitting lines so long.
We probably should also use a range-based assertion (with extraneous
parens to pacify fascist compilers):
YY_ASSERT ((token::TOK_YYEOF <= tok && tok <= token::TOK_YYUNDEF)
|| (token::TOK_ASSIGN <= tok && ...)
But anyway, we should simply not emit this assertion at all when not
asked for.
* data/skeletons/variant.hh: Do not define, nor use, YY_ASSERT when it
is not enabled.
When generating a C parser, YYEMPTY is present in enum yytokentype but
there is no corresponding #define like there is for the other values.
There is a special case for YYEMPTY in b4_token_enums but no
corresponding case in b4_token_defines.
* data/skeletons/c.m4 (b4_token_defines): Do define YYEMPTY.
Parser.Context class returns a const YYLocation, so Lexer's method
yyerror() needs to receive the location as a const parameter.
Internal error reporting flow is changed to be similar to that of
the other skeletons. Before, case YYERRLAB was calling yyerror()
with the result of yysyntax_error() as the string parameter. As the
custom error message lets the user decide if they want to use
yyerror() or not, this flow needed to be changed. Now, case YYERRLAB
calls yyreportSyntaxError(), that builds the error message using
yysyntaxErrorArguments(). Then yyreportSyntaxError() passes the
error message to the user defined syntax_error() in case of a custom
message, or to yyerror() otherwise.
In the tests in tests/calc.at, the order of the tokens needs to be
changed in order of precedence, so that the D program outputs the
expected tokens in the same order as the other parsers.
* data/skeletons/lalr1.d: Add the custom error message feature.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at, tests/local.at: Test it.
This should have been part of commit "symbols: stop dealing with YYEMPTY
as b4_symbol(-2, ...)" (cd40ec9526).
Give names to all the special symbols: "eof", "error" and "undef".
* data/skeletons/bison.m4 (b4_symbol): Let `b4_symbol(eof, ...)` mean
`b4_symbol(0, ...)`, `b4_symbol(error, ...)` mean `b4_symbol(1, ...)`,
and , `b4_symbol(undef, ...)` mean `b4_symbol(2, ...)`..
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c:
Prefer symbols to numbers.
All methods are now declared as const. Method getExpectedTokens() has now the
functionality of reporting only the number of expected tokens.
* data/skeletons/lalr1.d: Fix Context class.
In D's case, yyerrok() is a private method of the Parser class.
It can be called directly as `yyerrok()` from the grammar rules section.
* data/skeletons/lalr1.d: Add yyerrok().
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate yyerrok().
* tests/calc.at: Update D tests to use yyerrok().
Vector is synchronized, which is completely useless in our case (and
not even relevant when concurrency matters). No seasoned Java
programmer would use it.
Reported by Uxio Prego.
* data/skeletons/lalr1.java: Replace Vector with ArrayList.
Unfortunately its API is not as rich, and lacks lastElement and
setSize.
Shamelessly stolen from Adrian Vogelsgesang's implementation in
lalr1.cc.
* data/skeletons/lalr1.java (yycdebugNnl, yyLacCheck, yyLacEstablish)
(yyLacDiscard, yylacStack, yylacEstablished): New.
(Context): Use it.
* examples/java/calc/Calc.y: Enable LAC.