The complete symbol approach in yylex removes the need for the methods
semanticVal, startPos and endPos, which were used when the values were
reported separately.
* data/skeletons/lalr1.d: Here.
* doc/bison.texi: Remove sections about the three methods.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Remove the unused methods.
* tests/calc.at, tests/d.at, tests/scanner.at: Test it.
* data/skeletons/d.m4 (b4_public_types_declare): Here.
* data/skeletons/lalr1.d: Adjust.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Use it.
* tests/calc.at: Test it.
* examples/d/calc/calc.y (start, end): Replace by this...
(location): new member variable in the Lexer class.
Use it.
* tests/calc.at: Use the defined location variable.
When expanding the GLR stack, none of the pointers were updated to
reflect the new location of the displaced objects.
This fixes
748: Incorrect lookahead during nondeterministic GLR: glr2.cc
* data/skeletons/glr2.cc (yyexpandGLRStack): Update the split point
and the stack tops.
(reduceToOneStack): Factor a bit.
When debugging these parsers, we really need debug traces.
Enable them, and bind them to $YYDEBUG.
* tests/glr-regression.at: Support the YYDEBUG envvar.
As a consequence, now that syntactic ambiguities are reported, adjust
the expected output.
(No users destructors if stack 0 deleted): Don't return 0 on memory
exhaustion, really return the parser's status, and adust expectations.
Currently, yycompressStack expects the free items to be states only.
That's not the case.
Fixes 712 and 730 pass. 748 still fails, but later and
differently (heap-use-after-free).
* data/skeletons/glr2.cc (glr_stack_item::setState): New.
(glr_stack_item::yycompressStack): Use it.
* tests/glr-regression.at: Adjust.
A glr_state keeps tracks of its predecessor using an offset relative
to itself (i.e., pointer subtraction). Unfortunately we sometimes
have to compute offsets for pointers that live in different
containers, in particular in yyfillin. In that case there is no
reason for the distance between the two objects to be a multiple of
the object size (0x40 on my machine), and the resulting ptrdiff_t may
be "wrong", i.e., it does allow to recover one from the other. We
cannot use "typed" pointer arithmetics here, the Euclidean division
has it wrong. So use "plain" char* pointers.
Fixes 718 (Duplicate representation of merged trees: glr2.cc) and
examples/c++/glr/c++-types.
Still XFAIL:
712: Improper handling of embedded actions and dollar(-N) in GLR parsers: glr2.cc
730: Incorrectly initialized location for empty right-hand side in GLR: glr2.cc
748: Incorrect lookahead during nondeterministic GLR: glr2.cc
* data/skeletons/glr2.cc (glr_state::as_pointer_): New.
(glr_state::pred): Use it.
* examples/c++/glr/c++-types.test: The test passes.
* tests/glr-regression.at (Duplicate representation of merged trees:
glr2.cc): Passes.
When installed on master as of 2020-12-05 (on top of "glr2.cc: fix
when the stack is not expandable", almost all the GLR regression tests
fail (with a SEGV):
709: Badly Collapsed GLR States: glr2.cc FAILED (glr-regression.at:130)
712: Improper handling of embedded actions and dollar(-N) in GLR parsers: glr2.cc FAILED (glr-regression.at:275)
715: Improper merging of GLR delayed action sets: glr2.cc FAILED (glr-regression.at:404)
718: Duplicate representation of merged trees: glr2.cc FAILED (glr-regression.at:502)
721: User destructor for unresolved GLR semantic value: glr2.cc FAILED (glr-regression.at:566)
724: User destructor after an error during a split parse: glr2.cc FAILED (glr-regression.at:624)
727: Duplicated user destructor for lookahead: glr2.cc FAILED (glr-regression.at:724)
730: Incorrectly initialized location for empty right-hand side in GLR: glr2.cc FAILED (glr-regression.at:823)
733: No users destructors if stack 0 deleted: glr2.cc FAILED (glr-regression.at:911)
736: Corrupted semantic options if user action cuts parse: glr2.cc FAILED (glr-regression.at:974)
739: Undesirable destructors if user action cuts parse: glr2.cc FAILED (glr-regression.at:1042)
742: Leaked semantic values if user action cuts parse: glr2.cc FAILED (glr-regression.at:1173)
748: Incorrect lookahead during nondeterministic GLR: glr2.cc FAILED (glr-regression.at:1546)
751: Leaked semantic values when reporting ambiguity: glr2.cc FAILED (glr-regression.at:1639)
754: Leaked lookahead after nondeterministic parse syntax error: glr2.cc FAILED (glr-regression.at:1710)
757: Uninitialized location when reporting ambiguity: glr2.cc FAILED (glr-regression.at:1794)
766: Predicates: glr2.cc FAILED (glr-regression.at:2045)
These pass:
745: Incorrect lookahead during deterministic GLR: glr2.cc ok
760: Missed %merge type warnings when LHS type is declared later: glr2.cc ok
763: Ambiguity reports: glr2.cc ok
With Valentin Tolmer's "glr2.cc: Fix memory corruption bug" commit,
these test fail "gracefully":
712: Improper handling of embedded actions and dollar(-N) in GLR parsers: glr2.cc FAILED (glr-regression.at:268)
730: Incorrectly initialized location for empty right-hand side in GLR: glr2.cc FAILED (glr-regression.at:816)
748: Incorrect lookahead during nondeterministic GLR: glr2.cc FAILED (glr-regression.at:1539)
And these do not end:
709: Badly Collapsed GLR States: glr2.cc FAILED (glr-regression.at:123)
715: Improper merging of GLR delayed action sets: glr2.cc FAILED (glr-regression.at:397)
718: Duplicate representation of merged trees: glr2.cc FAILED (glr-regression.at:495)
751: Leaked semantic values when reporting ambiguity: glr2.cc FAILED (glr-regression.at:1632)
With "tests: glr2.cc: run the glr-regression tests", none loop, and
709, 715, and 751 pass. Only 718 still fails.
* tests/glr-regression.at: Run all the tests with glr2.cc.
* tests/local.at (AT_GLR2_CC_IF): New.
* tests/glr-regression.at: Adjust the tests to be more independent of
the language, and run them with glr.cc.
Some tests relied on yychar, yylval and yylloc being global variables:
pass arguments instead.
* tests/glr-regression.at: Instead of using AT_BISON_CHECK and
AT_COMPILE, use AT_FULL_COMPILE. This is shorter, and makes it easier
to add support for other programming languages.
* tests/glr-regression.at: Use %expect and %expect-rr in the grammar
files, rather than accepting diagnostics.
This will make it easier to support other programming languages.
The yydefgoto table uses -1 as an invalid for an impossible case (we
never use yydefgoto[0], since it corresponds to the reduction to
$accept, which never happens). Since yydefgoto is a table of state
numbers, this -1 forces a signed type uselessly, which (1) might
trigger compiler warnings when storing a value from yydefgoto into a
state number (nonnegative), and (2) wastes bits which might result in
using a int16 where a uint8 suffices.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00027.html
* src/tables.c (default_goto): Use 0 rather than -1 as invalid value.
* tests/regression.at: Adjust.
This avoids heap allocation and gives minimal costs for the
creation and destruction of the YYParser.Symbol struct if
the location tracking is active.
Suggested by H. S. Teoh.
* data/skeletons/lalr1.d: Here.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at: Test it.
The complete symbol approach was deemed to be the right approach for Dlang.
Now, the user can return from yylex() an instance of YYParser.Symbol structure,
which binds together the TokenKind, the semantic value and the location. Before,
the last two were reported separately to the parser.
Only the user API is changed, Bisons's internal structure is kept the same.
* data/skeletons/d.m4 (struct YYParser.Symbol): New.
* data/skeletons/lalr1.d: Change the return value.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate it.
* tests/calc.at, tests/scanner.at: Test it.
When using lookahead correction, the method YYParser.Context.getExpectedTokens
is not annotated with const, because the method calls yylacCheck, which is not
const. Also, because of yylacStack and yylacEstablished, yylacCheck needs to
be called from the context of the parser class, which is sent as parameter to
the Context's constructor.
* data/skeletons/lalr1.d (yylacCheck, yylacEstablish, yylacDiscard,
yylacStack, yylacEstablished): New.
(Context): Use it.
* doc/bison.texi: Document it.
* tests/calc.at: Check it.
Changed from syntax_error to reportSyntaxError to be similar to the Java parser.
* data/skeletons/lalr1.d: Change the function name.
* doc/bison.texi: Document it.
* tests/local.at: Adjust.
* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
Working on the previous commit I realized that YY_ASSERT was used in
the generated headers, so must follow api.prefix to avoid clashes when
multiple C++ parser with variants are used.
Actually many more macros should obey api.prefix (YY_CPLUSPLUS,
YY_COPY, etc.). There was no complaint so far, so it's not urgent
enough for 3.7.4, but it should be addressed in 3.8.
* data/skeletons/variant.hh (b4_assert): New.
Use it.
* tests/local.at (AT_YYLEX_RETURN): Fix.
* tests/headers.at: Make sure variant-based C++ parsers are checked
too.
This test did find that YY_ASSERT escaped renaming (before the fix in
this commit).
Parser.Context class returns a const YYLocation, so Lexer's method
yyerror() needs to receive the location as a const parameter.
Internal error reporting flow is changed to be similar to that of
the other skeletons. Before, case YYERRLAB was calling yyerror()
with the result of yysyntax_error() as the string parameter. As the
custom error message lets the user decide if they want to use
yyerror() or not, this flow needed to be changed. Now, case YYERRLAB
calls yyreportSyntaxError(), that builds the error message using
yysyntaxErrorArguments(). Then yyreportSyntaxError() passes the
error message to the user defined syntax_error() in case of a custom
message, or to yyerror() otherwise.
In the tests in tests/calc.at, the order of the tokens needs to be
changed in order of precedence, so that the D program outputs the
expected tokens in the same order as the other parsers.
* data/skeletons/lalr1.d: Add the custom error message feature.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at, tests/local.at: Test it.
In D's case, yyerrok() is a private method of the Parser class.
It can be called directly as `yyerrok()` from the grammar rules section.
* data/skeletons/lalr1.d: Add yyerrok().
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate yyerrok().
* tests/calc.at: Update D tests to use yyerrok().
examples/java/calc/Calc.java:1531: warning: [deprecation] Integer(String) in Integer has been deprecated
yylval = new Integer(st.sval);
^
* examples/java/calc/Calc.y, examples/java/simple/Calc.y,
* tests/calc.at, tests/scanner.at: Use Integer.parseInt.
* tests/calc.at: Add tests for LAC in pull and push parsers.
Skip LAC: line from the logs.
* tests/local.at (reportSyntaxError): Output the error message in a
single call, to avoid having the error message on stderr be
interrupted by the debug traces of LAC in getExpectedTokens.
When printing items, it is clearer to put the dot after %emtpy rather
than before:
0 $accept: . unit "end of file"
1 unit: . assignments exp
- 2 assignments: . %empty
+ 2 assignments: %empty .
3 | . assignments assignment
Also, use the Unicode characters if they are supported.
* src/gram.c (item_print): Put the dot after %emtpy.
* tests/conflicts.at, tests/reduce.at, tests/report.at: Adjust.
* tests/local.at (AT_TOKEN_TRANSLATE_IF): New, moved from...
* tests/calc.at: here.
Instead of sorting per feature (main, yylex, calc.y) and then by
language, do the converse, so that C bits are together, etc.
Currently the core of the initial state is limited to the single rule
on $accept.
* src/lr0.c (generate_states): There may now be several rules on
$accept.
* src/graphviz.c (conclude_red): Recognize "final" transitions by the
fact that we reduce to "$accept".
* src/print.c (print_reduction): Likewise.
* src/print-xml.c (print_reduction): Likewise.
Now that the parser can read several start symbols, let's process
them, and create the corresponding rules.
* src/parse-gram.y (grammar_declaration): Accept a list of start symbols.
* src/reader.h, src/reader.c (grammar_start_symbol_set): Rename as...
(grammar_start_symbols_set): this.
* src/reader.h, src/reader.c (start_flag): Replace with...
(start_symbols): this.
* src/reader.c (grammar_start_symbols_set): Build a list of start
symbols.
(switching_token, create_start_rules): New.
(check_and_convert_grammar): Use them to turn the list of start
symbols into a set of rules.
* src/reduce.c (nonterminals_reduce): Don't complain about $accept,
it's an internal detail.
(reduce_grammar): Complain about all the start symbols that don't
derive sentences.
* src/symtab.c (startsymbol, startsymbol_loc): Remove, replaced by
start_symbols.
symbols_pack): Move the check about the start symbols
to...
* src/symlist.c (check_start_symbols): here.
Adjust to multiple start symbols.
* tests/reduce.at (Empty Language): Generalize into...
(Bad start symbols): this.
* data/skeletons/lalr1.d: Change the return value.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Adjust.
* tests/scanner.at: Adjust.
* tests/calc.at (_AT_DATA_CALC_Y(d)): New, extracted from...
(_AT_DATA_CALC_Y(c)): here.
The two grammars have been sufficiently different to be separated.
Still trying to be them together results in a maintenance burden. For
the same reason, instead of specifying the results for D and for the
rest, compute the expected results with D from the regular case.