* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
IELR needs to rule out the successors of the kernel items of the
initial state (`$accept: input • $end`). In the case of multistart,
this condition must be expressed differently: the mere item index does
not suffice.
* src/ielr.c (ielr_item_has_lookahead, ielr_compute_lookaheads): Don't
rely on the item index to check whether is_successor_of_initial_item.
It is certainly more costly than just checking the item index, but (i)
we need to compute the rule anyway, so it's not very much more costly,
and (ii) in ielr_item_has_lookahead, this situation is actually
impossible, so an optimizing compiler reading the assertions should
actually avoid this computation.
Checking that an item index is > 1 means ruling out `$accept: • input
$end` and `$accept: input • $end`. But actually only the latter is
possible there, i.e., we're checking whether this item is about a
successor of a (kernel) item of the initial state ($accept: input •
$end).
* src/ielr.c (is_successor_of_initial_item): Use a variable to name
this condition.
* src/gram.h (rule_is_initial): New.
* src/graphviz.c, src/print-xml.c, src/print.c, src/lalr.c: Use it.
Some of these occurrences were incorrect (checking whether this is
rule 0), and not behaving properly in the case of multistart.
* src/lalr.c: Remove incorrect comment (subsumed anyway by the
(correct) one in the header.
(set_goto_map): More debug traces.
(map_goto): Add an assertion.
* upstream/maint:
doc: fix typo
maint: post-release administrivia
version 3.7.3
build: don't link bison against libreadline
gnulib: update
glr.cc: fix: use symbol_name
build: fix a concurrent build issue in examples
When printing items, it is clearer to put the dot after %emtpy rather
than before:
0 $accept: . unit "end of file"
1 unit: . assignments exp
- 2 assignments: . %empty
+ 2 assignments: %empty .
3 | . assignments assignment
Also, use the Unicode characters if they are supported.
* src/gram.c (item_print): Put the dot after %emtpy.
* tests/conflicts.at, tests/reduce.at, tests/report.at: Adjust.
After all, why not?
* src/reader.c (switching_token): Use symbol_id_get.
(check_start_symbols): Require that the start symbol is a token only
if it's the only one.
* examples/c/lexcalc/parse.y: Let NUM be a start symbol.
For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse. Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value. This also makes the call to the
parsing function independent of whether a given start-symbol is typed.
For instance, if the grammar file contains:
%type <int> expression
%start input expression
(so "input" is valueless) we get
typedef struct
{
int yystatus;
} yyparse_input_t;
yyparse_input_t yyparse_input (void);
typedef struct
{
int yyvalue;
int yystatus;
} yyparse_expression_t;
yyparse_expression_t yyparse_expression (void);
This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT. They do that after having exported the
start-symbol's value (if it is typed):
switch (yyn)
{
case 1: /* $accept: YY_EXPRESSION expression $end */
{ ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
break;
case 2: /* $accept: YY_INPUT input $end */
{ YYACCEPT; }
break;
I have tried several ways to deal with termination, and this is the
one that appears the best one to me. It is also the most natural.
* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.
* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.
* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
So far we were not checking the generated rule 0 at all. Now there
can be several of them. Instead of not checking at all, let's be more
selective on the check to run on them.
* src/reader.c (grammar_rule_check_and_complete): Don't check for
value usage for generated rules, it is ok to have a valued start
symbol, in which case it is ok for the generated rule ("accept: start
$end {}") to not use $1.
(packgram): Call grammar_rule_check_and_complete for all the rules.
Currently the core of the initial state is limited to the single rule
on $accept.
* src/lr0.c (generate_states): There may now be several rules on
$accept.
* src/graphviz.c (conclude_red): Recognize "final" transitions by the
fact that we reduce to "$accept".
* src/print.c (print_reduction): Likewise.
* src/print-xml.c (print_reduction): Likewise.
Now that the parser can read several start symbols, let's process
them, and create the corresponding rules.
* src/parse-gram.y (grammar_declaration): Accept a list of start symbols.
* src/reader.h, src/reader.c (grammar_start_symbol_set): Rename as...
(grammar_start_symbols_set): this.
* src/reader.h, src/reader.c (start_flag): Replace with...
(start_symbols): this.
* src/reader.c (grammar_start_symbols_set): Build a list of start
symbols.
(switching_token, create_start_rules): New.
(check_and_convert_grammar): Use them to turn the list of start
symbols into a set of rules.
* src/reduce.c (nonterminals_reduce): Don't complain about $accept,
it's an internal detail.
(reduce_grammar): Complain about all the start symbols that don't
derive sentences.
* src/symtab.c (startsymbol, startsymbol_loc): Remove, replaced by
start_symbols.
symbols_pack): Move the check about the start symbols
to...
* src/symlist.c (check_start_symbols): here.
Adjust to multiple start symbols.
* tests/reduce.at (Empty Language): Generalize into...
(Bad start symbols): this.
This is consistent with --defines being deprecated in favor of
--header. The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.
* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
On a case such as
%%
exp
: empty "a"
| "a" empty
empty
: %empty
we used to display
warning: shift/reduce conflict on token "a" [-Wcounterexamples]
Example: • "a"
Shift derivation
exp
↳ 2: • "a" empty
↳ 2: ε
Example: • "a"
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: •
where the shift derivation shows an item "2: empty → ε", with an
explicit "ε", but the reduce derivation shows "3: empty → •", without
"ε".
For consistency, let's always show ε/%empty in rules with an empty
rhs:
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: ε •
* src/derivation.c (derivation_width, derivation_print_tree_impl):
Always show ε/%empty in counterexamples.
* tests/diagnostics.at: Check that case.
* tests/conflicts.at, tests/counterexample.at: Adjust.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
An assertion failed when the last character is a '\' and we're in a
character or a string.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00009.html
* src/scan-gram.l: Catch unterminated escapes.
* tests/input.at (Unexpected end of file): New.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00008.html
On an empty such as
%token FOO
BAR
FOO 0
%%
input: %empty
we crash because when we find FOO 0, we decrement ntokens (since FOO
was discovered to be EOF, which is already known to be a token, so we
increment ntokens for it, and need to cancel this). This "works well"
when EOF is properly defined in one go, but here it is first defined
and later only assign token code 0. In the meanwhile BAR was given
the token number that we just decremented.
To fix this, assign symbol numbers after parsing, not during parsing,
so that we also saw all the explicit token codes. To maintain the
current numbers (I'd like to keep no difference in the output, not
just equivalence), we need to make sure the symbols are numbered in
the same order: that of appearance in the source file. So we need the
locations to be correct, which was almost the case, except for nterms
that appeared several times as LHS (i.e., several times as "foo:
..."). Fixing the use of location_of_lhs sufficed (it appears it was
intended for this use, but its implementation was unfinished: it was
always set to "false" only).
* src/symtab.c (symbol_location_as_lhs_set): Update location_of_lhs.
(symbol_code_set): Remove broken hack that decremented ntokens.
(symbol_class_set, dummy_symbol_get): Don't set number, ntokens and
nnterms.
(symbol_check_defined): Do it.
(symbols): Don't count nsyms here.
Actually, don't count nsyms at all: let it be done in...
* src/reader.c (check_and_convert_grammar): here. Define nsyms from
ntokens and nnterms after parsing.
* tests/input.at (EOF redeclared): New.
* examples/c/bistromathic/bistromathic.test: Adjust the traces: in
"%nterm <double> exp %% input: ...", exp used to be numbered before
input.
* cfg.mk (_space_before_paren_exempt): Be less laxist.
* src/output.c, src/reader.c: Fix space before paren issues.
Pacify the warnings where applicable.
Older versions of GCC (4.1.2 here) don't like repeated typedefs.
CC src/bison-parse-simulation.o
src/parse-simulation.c:61: error: redefinition of typedef 'parse_state'
src/parse-simulation.h:74: error: previous declaration of 'parse_state' was here
make: *** [Makefile:7876: src/bison-parse-simulation.o] Error 1
Reported by Nelson H. F. Beebe.
* src/parse-simulation.c (parse_state): Don't typedef,
parse-simulation.h did it already.
This reverts commit d0bec3175f (which
should have read "We have a clash...", not "With have a clash...").
Now that `max()` was renamed `max_int()`, we can use `max` again, as
elsewhere in the code.
* src/counterexample.c (visited_hasher): Alpha reconversion.