When comparing traces from different machines, the mixture of
stdout/stderr in the output are making things uselessly difficult.
* src/lssi.c, src/state-item.c: Output debug traces on stderr.
This macro is not exposed to users, make start it with 'YY_'.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* src/parse-gram.c, tests/actions.at, tests/c++.at, tests/headers.at,
* tests/local.at (YYUSE): Rename as...
(YY_USE): this.
Currently each time we meet %merge we record this location as the
defining location (and symbol). Instead, record the first definition.
In the generated code we go from
yy0->A = merge (*yy0, *yy1);
to
yy0->S = merge (*yy0, *yy1);
where S was indeed the first symbol, and in the diagnostics we go from
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type2>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:30.18-24: note: previous declaration
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
to
glr-regr18.y:30.18-24: error: result type clash on merge function 'merge': <type2> != <type1>
30 | sym2: sym3 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
glr-regr18.y:31.13-19: error: result type clash on merge function 'merge': <type3> != <type1>
31 | sym3: %merge<merge> { $$ = 0; } ;
| ^~~~~~~
glr-regr18.y:29.18-24: note: previous declaration
29 | sym1: sym2 %merge<merge> { $$ = $1; } ;
| ^~~~~~~
where both duplicates are reported against definition 1, rather than
using definition 1 as a reference when diagnosing about definition 2,
and then 2 as a reference for 3.
* src/reader.c (record_merge_function_type): Keep the first definition.
* tests/glr-regression.at: Adjust.
Don't generate C code from bison, leave that to the skeletons.
* src/output.c (merger_output): Emit invocations to b4_call_merger.
* data/skeletons/glr.c, data/skeletons/glr2.cc (b4_call_merger): New.
Symbols are richer than types, and in M4 it is my simpler (and more
common) to deal with symbols rather than types. So let's associate
mergers to a symbol rather than a type name.
* src/reader.h (merger_list): Replace the 'type' member by a symbol
member.
* src/reader.c (record_merge_function_type): Take a symbol as
argument, rather than a type name.
* src/output.c (merger_output): Adjust.
And also, remove the incorrect indentation of these comments:
- /* YYR2[YYN] -- Number of symbols on the right hand side of rule YYN. */
+/* YYR2[RULE-NUM] -- Number of symbols on the right-hand side of rule RULE-NUM. */
static const yytype_int8 yyr2[] =
{
0, 2, 4, 0, 2, 1, 1, 1, 3, 2,
I don't remember why this indentation was added (in
0991e29b75), but it seems wrong,
at least for yacc.c. I suspect this was done with lalr1.cc (where
this is embeded in the class definition, so it should be indented),
but today lalr1.cc uses other routines to output these comments.
* data/skeletons/bison.m4 (b4_integral_parser_tables_map): Improve the
wording of the comments of some tables.
* data/skeletons/c.m4 (b4_integral_parser_table_define): Remove
indentation.
src/reader.c: In function 'grammar_start_symbols_add':
src/reader.c:67:24: error: declaration of 'dup' shadows a global declaration [-Werror=shadow]
* src/reader.c (grammar_start_symbols_add): Rename dup as dupl.
The yydefgoto table uses -1 as an invalid for an impossible case (we
never use yydefgoto[0], since it corresponds to the reduction to
$accept, which never happens). Since yydefgoto is a table of state
numbers, this -1 forces a signed type uselessly, which (1) might
trigger compiler warnings when storing a value from yydefgoto into a
state number (nonnegative), and (2) wastes bits which might result in
using a int16 where a uint8 suffices.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00027.html
* src/tables.c (default_goto): Use 0 rather than -1 as invalid value.
* tests/regression.at: Adjust.
* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
IELR needs to rule out the successors of the kernel items of the
initial state (`$accept: input • $end`). In the case of multistart,
this condition must be expressed differently: the mere item index does
not suffice.
* src/ielr.c (ielr_item_has_lookahead, ielr_compute_lookaheads): Don't
rely on the item index to check whether is_successor_of_initial_item.
It is certainly more costly than just checking the item index, but (i)
we need to compute the rule anyway, so it's not very much more costly,
and (ii) in ielr_item_has_lookahead, this situation is actually
impossible, so an optimizing compiler reading the assertions should
actually avoid this computation.
Checking that an item index is > 1 means ruling out `$accept: • input
$end` and `$accept: input • $end`. But actually only the latter is
possible there, i.e., we're checking whether this item is about a
successor of a (kernel) item of the initial state ($accept: input •
$end).
* src/ielr.c (is_successor_of_initial_item): Use a variable to name
this condition.
* src/gram.h (rule_is_initial): New.
* src/graphviz.c, src/print-xml.c, src/print.c, src/lalr.c: Use it.
Some of these occurrences were incorrect (checking whether this is
rule 0), and not behaving properly in the case of multistart.
* src/lalr.c: Remove incorrect comment (subsumed anyway by the
(correct) one in the header.
(set_goto_map): More debug traces.
(map_goto): Add an assertion.
* upstream/maint:
doc: fix typo
maint: post-release administrivia
version 3.7.3
build: don't link bison against libreadline
gnulib: update
glr.cc: fix: use symbol_name
build: fix a concurrent build issue in examples
When printing items, it is clearer to put the dot after %emtpy rather
than before:
0 $accept: . unit "end of file"
1 unit: . assignments exp
- 2 assignments: . %empty
+ 2 assignments: %empty .
3 | . assignments assignment
Also, use the Unicode characters if they are supported.
* src/gram.c (item_print): Put the dot after %emtpy.
* tests/conflicts.at, tests/reduce.at, tests/report.at: Adjust.
After all, why not?
* src/reader.c (switching_token): Use symbol_id_get.
(check_start_symbols): Require that the start symbol is a token only
if it's the only one.
* examples/c/lexcalc/parse.y: Let NUM be a start symbol.
For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse. Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value. This also makes the call to the
parsing function independent of whether a given start-symbol is typed.
For instance, if the grammar file contains:
%type <int> expression
%start input expression
(so "input" is valueless) we get
typedef struct
{
int yystatus;
} yyparse_input_t;
yyparse_input_t yyparse_input (void);
typedef struct
{
int yyvalue;
int yystatus;
} yyparse_expression_t;
yyparse_expression_t yyparse_expression (void);
This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT. They do that after having exported the
start-symbol's value (if it is typed):
switch (yyn)
{
case 1: /* $accept: YY_EXPRESSION expression $end */
{ ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
break;
case 2: /* $accept: YY_INPUT input $end */
{ YYACCEPT; }
break;
I have tried several ways to deal with termination, and this is the
one that appears the best one to me. It is also the most natural.
* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.
* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.
* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
So far we were not checking the generated rule 0 at all. Now there
can be several of them. Instead of not checking at all, let's be more
selective on the check to run on them.
* src/reader.c (grammar_rule_check_and_complete): Don't check for
value usage for generated rules, it is ok to have a valued start
symbol, in which case it is ok for the generated rule ("accept: start
$end {}") to not use $1.
(packgram): Call grammar_rule_check_and_complete for all the rules.
Currently the core of the initial state is limited to the single rule
on $accept.
* src/lr0.c (generate_states): There may now be several rules on
$accept.
* src/graphviz.c (conclude_red): Recognize "final" transitions by the
fact that we reduce to "$accept".
* src/print.c (print_reduction): Likewise.
* src/print-xml.c (print_reduction): Likewise.
Now that the parser can read several start symbols, let's process
them, and create the corresponding rules.
* src/parse-gram.y (grammar_declaration): Accept a list of start symbols.
* src/reader.h, src/reader.c (grammar_start_symbol_set): Rename as...
(grammar_start_symbols_set): this.
* src/reader.h, src/reader.c (start_flag): Replace with...
(start_symbols): this.
* src/reader.c (grammar_start_symbols_set): Build a list of start
symbols.
(switching_token, create_start_rules): New.
(check_and_convert_grammar): Use them to turn the list of start
symbols into a set of rules.
* src/reduce.c (nonterminals_reduce): Don't complain about $accept,
it's an internal detail.
(reduce_grammar): Complain about all the start symbols that don't
derive sentences.
* src/symtab.c (startsymbol, startsymbol_loc): Remove, replaced by
start_symbols.
symbols_pack): Move the check about the start symbols
to...
* src/symlist.c (check_start_symbols): here.
Adjust to multiple start symbols.
* tests/reduce.at (Empty Language): Generalize into...
(Bad start symbols): this.
This is consistent with --defines being deprecated in favor of
--header. The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.
* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
On a case such as
%%
exp
: empty "a"
| "a" empty
empty
: %empty
we used to display
warning: shift/reduce conflict on token "a" [-Wcounterexamples]
Example: • "a"
Shift derivation
exp
↳ 2: • "a" empty
↳ 2: ε
Example: • "a"
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: •
where the shift derivation shows an item "2: empty → ε", with an
explicit "ε", but the reduce derivation shows "3: empty → •", without
"ε".
For consistency, let's always show ε/%empty in rules with an empty
rhs:
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: ε •
* src/derivation.c (derivation_width, derivation_print_tree_impl):
Always show ε/%empty in counterexamples.
* tests/diagnostics.at: Check that case.
* tests/conflicts.at, tests/counterexample.at: Adjust.