This is something that has always bothered me: with pure parsers (and
they all should be) the user does not have an (easy) access to yynerrs
at the end of the parse. In the case of error recovery, that's the
only direct means to know if there were errors. The usual approach
being having the user maintain a counter incremented each time yyerror
is called.
So here, also capture yynerrs in the return value of the start-symbol
parsing functions.
* data/skeletons/yacc.c (yy_parse_impl_t): New.
(yy_parse_impl): Use it.
(b4_accept): Fill it.
* examples/c/lexcalc/parse.y, examples/c/lexcalc/scan.l: No longer
pass nerrs as lex- and parse-param, just use the resulting yynerrs.
bistromathic and reccalc both demonstrate %param.
For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse. Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value. This also makes the call to the
parsing function independent of whether a given start-symbol is typed.
For instance, if the grammar file contains:
%type <int> expression
%start input expression
(so "input" is valueless) we get
typedef struct
{
int yystatus;
} yyparse_input_t;
yyparse_input_t yyparse_input (void);
typedef struct
{
int yyvalue;
int yystatus;
} yyparse_expression_t;
yyparse_expression_t yyparse_expression (void);
This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT. They do that after having exported the
start-symbol's value (if it is typed):
switch (yyn)
{
case 1: /* $accept: YY_EXPRESSION expression $end */
{ ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
break;
case 2: /* $accept: YY_INPUT input $end */
{ YYACCEPT; }
break;
I have tried several ways to deal with termination, and this is the
one that appears the best one to me. It is also the most natural.
* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.
* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.
* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
Currently this example crashes on input such as "T (x) + y;".
The same example with glr.c works properly.
* examples/c++/glr/Makefile, examples/c++/glr/README.md,
* examples/c++/glr/c++-types.test, examples/c++/glr/c++-types.yy,
* examples/c++/glr/local.mk, examples/c++/local.mk: New.
Based on examples/c/glr/c++-types.y.
* data/skeletons/lalr1.d: Change the return value.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Adjust.
* tests/scanner.at: Adjust.
* tests/calc.at (_AT_DATA_CALC_Y(d)): New, extracted from...
(_AT_DATA_CALC_Y(c)): here.
The two grammars have been sufficiently different to be separated.
Still trying to be them together results in a maintenance burden. For
the same reason, instead of specifying the results for D and for the
rest, compute the expected results with D from the regular case.
The D skeleton was not properly supporting @1 etc.
Reported by Adela Vais.
https://lists.gnu.org/r/bison-patches/2020-09/msg00049.html
* data/skeletons/d.m4 (b4_rhs_location): Fix it.
* tests/calc.at: Check the support of @n for all the skeletons.
This is consistent with --defines being deprecated in favor of
--header. The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.
* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
231. conflicts.at:1096: testing Syntax error in consistent error state: glr2.cc ...
tests/conflicts.at:1096: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o input input.cc $LIBS
input.cc: In member function 'YYRESULTTAG glr_stack::yyresolveValue(glr_state*)':
input.cc:2674:36: error: 'yysval' may be used uninitialized in this function [-Werror=uninitialized]
Do not initialize the variable: this way ASAN can really make sure we
do set it to a proper value.
If we initialize it, ASAN would report nothing.
* data/skeletons/c.m4 (YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN): Disable
GCC 4.6's -Wuninitialized.
* data/skeletons/glr2.cc: Disable the warning locally.
231. conflicts.at:1096: testing Syntax error in consistent error state: glr2.cc ...
tests/conflicts.at:1096: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o input input.cc $LIBS
input.cc: In function 'int yyparse(yy::parser&)':
input.cc:3147:41: error: 'yyarg' may be used uninitialized in this function [-Werror=maybe-uninitialized]
return yytnamerr_ (yytname_[yysymbol]);
^
input.cc:2058:34: note: 'yyarg' was declared here
yy::parser::symbol_kind_type yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
^
* data/skeletons/glr2.cc (yyreportSyntaxError): Initialize yyarg.
Fix a warning triggered in GCC (at least from 4.6 to 4.9):
input.cc: In constructor 'glr_stack_item::glr_stack_item(bool)':
input.cc:1371:5: error: declaration of 'is_state' shadows a member of 'this' [-Werror=shadow]
: is_state_(is_state)
^
* data/skeletons/glr2.cc (glr_stack_item): Alpha-convert.
On the CI, tests fail with GCC 4.6 to GCC 6 as follows:
tests/synclines.at:440: COLUMNS=1000; export COLUMNS; NO_TERM_HYPERLINKS=1; export NO_TERM_HYPERLINKS; bison --color=no -fno-caret -o \"\\\"\".cc \"\\\"\".y
tests/synclines.at:440: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o \"\\\"\" \"\\\"\".cc $LIBS
stderr:
"\"".cc: In member function 'glr_state& glr_stack_item::getState()':
"\"".cc:1404:47: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
return *reinterpret_cast<glr_state*>(&raw_);
^
"\"".cc: In member function 'const glr_state& glr_stack_item::getState() const':
"\"".cc:1408:53: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
return *reinterpret_cast<const glr_state*>(&raw_);
^
"\"".cc: In member function 'semantic_option& glr_stack_item::getOption()':
"\"".cc:1413:53: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
return *reinterpret_cast<semantic_option*>(&raw_);
^
"\"".cc: In member function 'const semantic_option& glr_stack_item::getOption() const':
"\"".cc:1417:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
return *reinterpret_cast<const semantic_option*>(&raw_);
^
See also be6fa942ac.
* data/skeletons/glr2.cc (glr_stack_item): Use a temporary void*
variable to avoid type-punning issues with reinterpret_cast.
Currently the compiler attributes are defined in
b4_shared_declarations (that can in the header if it exists, otherwise
in the implementation file). This is not needed, only the
implementation file needs them.
Besides, glr2.cc was also defining these macros in the implementation
file, so we had two definitions.
* data/skeletons/glr.cc, data/skeletons/glr2.cc: Define the compiler
attribute macros only in the implementation files.
* tests/regression.at (Lex and parse params): Generate a header, to
make it easy to check that the header is self-sufficient.
input.cc: In constructor 'glr_stack_item::glr_stack_item(bool)':
input.cc:1423:5: error: declaration of 'isState' shadows a member of 'this' [-Werror=shadow]
: isState_(isState) {
^
test.cc:1165:45: error: declaration of 'begin' shadows a member of 'this' [-Werror=shadow]
test.cc:1167:45: error: declaration of 'end' shadows a member of 'this' [-Werror=shadow]
* data/skeletons/glr2.cc (isState): Rename as...
(is_state): this.
Formatting changes.
(reduceToOneStack): Rename variables to avoid name clashes.
The yyerror stand-alone function was used to bounce from glr.c's call
to yyerror to glr.cc's parser.error. Now that glr.c is out of the
way, just directly use parser.error.
* data/skeletons/glr2.cc (yyerror): Remove.
Adjust callers.
(b4_yyerror_args, b4_lyyerror_args, b4_pure_formals): Remove.
Now unused.
* data/skeletons/glr2.cc: Fix some documentation.
Be consistent between class/struct.
(yydoAction, yyresolveAction): Avoid passing yyparser where useless.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
* data/skeletons/c++.m4, data/skeletons/glr.c, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c:
Be more accurate about yychar and yytoken.
Don't name local variables as if they were members.
Currently when a push parser finishes its parsing (i.e., it did not
return YYPUSH_MORE), it also clears its state. It is therefore
impossible to see if it had parse errors.
In the context of autocompletion, because error recovery might have
fired, the parser is actually already in a different state. For
instance on `(1 + + <TAB>` in the bistromathic, because there's a
`exp: "(" error ")"` recovery rule, `1 + +` tokens have already been
popped, replaced by `error`, and autocompletions think we are ready
for the closing ")". So here, we would like to see if there was a
syntax error, yet `yynerrs` was cleared.
In the case of a successful parse, we still have a problem: if error
recovery succeeded, we won't know it, since, again, `yynerrs` is
clearer.
It seems much more natural to leave the parser state available for
analysis when there is a failure.
To reuse the parser, we should either:
1. provide an explicit means to reinitialize a parser state for future
parses.
2. automatically reset the parser state when it is used in a new
parse.
Option 2 requires to check whether we need to reinitialize the parser
each time we call `yypush_parse`, i.e., each time we give a new token.
This seems expensive compared to Option 1, but benchmarks revealed no
difference. Option 1 is incompatible with the documentation
("After `yypush_parse` returns a status other than `YYPUSH_MORE`, the
parser instance `yyps` may be reused for a new parse.").
So Option 2 wins, reusing the private `yynew` member to record that a
parse was finished, and therefore that the state must reset in the
next call to `yypull_parse`.
While at it, this implementation now reuses the previously enlarged
stacks from one parse to another.
* data/skeletons/yacc.c (yypstate_new): Set up the stacks in their
initial configurations (setting their bottom to the stack array), and
use yypstate_clear to reset them (moving their top to their bottom).
(yypstate_delete): Adjust.
(yypush_parse): At the beginning, clear yypstate if needed, and at the
end, record when yypstate needs to be clearer.
* examples/c/bistromathic/parse.y (expected_tokens): Do not propose
autocompletion when there are parse errors.
* examples/c/bistromathic/bistromathic.test: Check that case.
The previous commit ("yacc.c: declare and initialize and the same
time") made b4_initialize_parser_state_variables useless.
* data/skeletons/yacc.c (b4_initialize_parser_state_variables): Inline
into...
(yypstate_clear): here.
In order to factor the code of push and pull parsers, the declaration
of the parser's state variable was common (being local variable in
pull parsers, and struct members in push parsers). This result in
rather poor style in pull parser, with first variable declarations,
and then their initializations.
The initialization is about to differ between push and pull parsers,
so it is no longer worth keeping both cases together.
* data/skeletons/yacc.c (b4_declare_parser_state_variables): Accept an
argument, and when it is set, initialize the variables.
Adjust dependencies.