Now that the parser can read several start symbols, let's process
them, and create the corresponding rules.
* src/parse-gram.y (grammar_declaration): Accept a list of start symbols.
* src/reader.h, src/reader.c (grammar_start_symbol_set): Rename as...
(grammar_start_symbols_set): this.
* src/reader.h, src/reader.c (start_flag): Replace with...
(start_symbols): this.
* src/reader.c (grammar_start_symbols_set): Build a list of start
symbols.
(switching_token, create_start_rules): New.
(check_and_convert_grammar): Use them to turn the list of start
symbols into a set of rules.
* src/reduce.c (nonterminals_reduce): Don't complain about $accept,
it's an internal detail.
(reduce_grammar): Complain about all the start symbols that don't
derive sentences.
* src/symtab.c (startsymbol, startsymbol_loc): Remove, replaced by
start_symbols.
symbols_pack): Move the check about the start symbols
to...
* src/symlist.c (check_start_symbols): here.
Adjust to multiple start symbols.
* tests/reduce.at (Empty Language): Generalize into...
(Bad start symbols): this.
* data/skeletons/lalr1.d: Change the return value.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Adjust.
* tests/scanner.at: Adjust.
* tests/calc.at (_AT_DATA_CALC_Y(d)): New, extracted from...
(_AT_DATA_CALC_Y(c)): here.
The two grammars have been sufficiently different to be separated.
Still trying to be them together results in a maintenance burden. For
the same reason, instead of specifying the results for D and for the
rest, compute the expected results with D from the regular case.
The D skeleton was not properly supporting @1 etc.
Reported by Adela Vais.
https://lists.gnu.org/r/bison-patches/2020-09/msg00049.html
* data/skeletons/d.m4 (b4_rhs_location): Fix it.
* tests/calc.at: Check the support of @n for all the skeletons.
This is consistent with --defines being deprecated in favor of
--header. The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.
* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
Currently the compiler attributes are defined in
b4_shared_declarations (that can in the header if it exists, otherwise
in the implementation file). This is not needed, only the
implementation file needs them.
Besides, glr2.cc was also defining these macros in the implementation
file, so we had two definitions.
* data/skeletons/glr.cc, data/skeletons/glr2.cc: Define the compiler
attribute macros only in the implementation files.
* tests/regression.at (Lex and parse params): Generate a header, to
make it easy to check that the header is self-sufficient.
On a case such as
%%
exp
: empty "a"
| "a" empty
empty
: %empty
we used to display
warning: shift/reduce conflict on token "a" [-Wcounterexamples]
Example: • "a"
Shift derivation
exp
↳ 2: • "a" empty
↳ 2: ε
Example: • "a"
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: •
where the shift derivation shows an item "2: empty → ε", with an
explicit "ε", but the reduce derivation shows "3: empty → •", without
"ε".
For consistency, let's always show ε/%empty in rules with an empty
rhs:
Reduce derivation
exp
↳ 1: empty "a"
↳ 3: ε •
* src/derivation.c (derivation_width, derivation_print_tree_impl):
Always show ε/%empty in counterexamples.
* tests/diagnostics.at: Check that case.
* tests/conflicts.at, tests/counterexample.at: Adjust.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
An assertion failed when the last character is a '\' and we're in a
character or a string.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00009.html
* src/scan-gram.l: Catch unterminated escapes.
* tests/input.at (Unexpected end of file): New.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00008.html
On an empty such as
%token FOO
BAR
FOO 0
%%
input: %empty
we crash because when we find FOO 0, we decrement ntokens (since FOO
was discovered to be EOF, which is already known to be a token, so we
increment ntokens for it, and need to cancel this). This "works well"
when EOF is properly defined in one go, but here it is first defined
and later only assign token code 0. In the meanwhile BAR was given
the token number that we just decremented.
To fix this, assign symbol numbers after parsing, not during parsing,
so that we also saw all the explicit token codes. To maintain the
current numbers (I'd like to keep no difference in the output, not
just equivalence), we need to make sure the symbols are numbered in
the same order: that of appearance in the source file. So we need the
locations to be correct, which was almost the case, except for nterms
that appeared several times as LHS (i.e., several times as "foo:
..."). Fixing the use of location_of_lhs sufficed (it appears it was
intended for this use, but its implementation was unfinished: it was
always set to "false" only).
* src/symtab.c (symbol_location_as_lhs_set): Update location_of_lhs.
(symbol_code_set): Remove broken hack that decremented ntokens.
(symbol_class_set, dummy_symbol_get): Don't set number, ntokens and
nnterms.
(symbol_check_defined): Do it.
(symbols): Don't count nsyms here.
Actually, don't count nsyms at all: let it be done in...
* src/reader.c (check_and_convert_grammar): here. Define nsyms from
ntokens and nnterms after parsing.
* tests/input.at (EOF redeclared): New.
* examples/c/bistromathic/bistromathic.test: Adjust the traces: in
"%nterm <double> exp %% input: ...", exp used to be numbered before
input.
From
foo.y:1.7-11: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~~~
foo.y:1.7-11: note: previous declaration
1 | %type <foo> bar bar
| ^~~~~
to
foo.y:1.17-19: error: %type redeclaration for bar
1 | %type <foo> bar bar
| ^~~
foo.y:1.13-15: note: previous declaration
1 | %type <foo> bar bar
| ^~~
* src/symlist.h, src/symlist.c (symbol_list_type_set): There's no need
for the tag's location, use that of the symbol.
* src/parse-gram.y: Adjust.
* tests/input.at: Adjust.
We crash if the input contains a string containing a NUL byte.
Reported by Suhwan Song.
https://lists.gnu.org/r/bug-bison/2020-07/msg00051.html
* src/flex-scanner.h (STRING_FREE): Avoid accidental use of
last_string.
* src/scan-gram.l: Don't call STRING_FREE without calling
STRING_FINISH first.
* tests/input.at (Invalid inputs): Check that case.
Fix 6b78e50cef, "cex: make "rerun with
'-Wcex'" a note instead of a warning"
* tests/conflicts.at (-W versus %expect and %expect-rr): Fix
expectations.
Currently the suggestion to rerun is a -Wother warning:
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]
Instead, let's attach it as a subnote of the diagnostic (in the
current case, -Wconflicts-sr):
warning: 2 shift/reduce conflicts [-Wconflicts-sr]
note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
* src/conflicts.c (conflicts_print): Do that.
Adjust the test suite.
From
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First derivation
a
`-> A b .
Second derivation
a
`-> A b
`-> b .
to
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example: A b .
First reduce derivation
a
`-> A b .
Second reduce derivation
a
`-> A b
`-> b .
* src/counterexample.c (print_counterexample): here.
Compute the width of the labels to properly align the values.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that the derivation is no longer printed on one line, aligning the
example and the derivation is no longer useful. It can actually be
harmful, as it makes the overall structure less clear.
* src/derivation.h, src/derivation.c (derivation_print_leaves): Remove
the `prefix` argument.
* src/counterexample.c (print_counterexample): Put the example next to
its label.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
Now that we use complain, the "sections" are clearer.
* src/counterexample.c (print_counterexample): Use the empty line only
in reports.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at: Adjust.
This is more consistent, and brings benefits: users know that these
diagnostics are attached to -Wcounterexamples, and they can also click
on the hyperlink if permitted by their terminal.
We go from
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Reduce/reduce conflict on token $end:
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
to
warning: 1 reduce/reduce conflict [-Wconflicts-rr]
input.y: warning: reduce/reduce conflict on token $end [-Wcounterexamples]
Example A b .
First derivation a -> [ A b . ]
Second derivation a -> [ A b -> [ b . ] ]
with an hyperlink on -Wcounterexamples.
* src/counterexample.c (counterexample_report_reduce_reduce):
Use complain.
* tests/counterexample.at, tests/diagnostics.at, tests/report.at:
Adjust.
* src/complain.c (begin_hyperlink, end_hyperlink): New.
(warnings_print_categories): Use them.
* tests/local.at (AT_SET_ENV): Disable hyperlinks in the tests, they
contain random id's, and brackets (which is not so nice for M4).
Sometimes, understanding the derivations is difficult, because they
are serialized to fit in one line. For instance, the example taken
from the NEWS file:
%token ID
%%
s: a ID
a: expr
expr: expr ID ',' | "expr"
gave
First example expr • ID ',' ID $end
Shift derivation $accept → [ s → [ a → [ expr → [ expr • ID ',' ] ] ID ] $end ]
Second example expr • ID $end
Reduce derivation $accept → [ s → [ a → [ expr • ] ID ] $end ]
Printing as trees, it gives:
First example expr • ID ',' ID $end
Shift derivation
$accept
↳ s $end
↳ a ID
↳ expr
↳ expr • ID ','
Second example expr • ID $end
Reduce derivation
$accept
↳ s $end
↳ a ID
↳ expr •
* src/glyphs.h, src/glyphs.c (down_arrow, empty, derivation_separator):
New.
* src/derivation.c (derivation_print, derivation_print_impl): Rename
as...
(derivation_print_flat, derivation_print_flat_impl): These.
(fputs_if, derivation_depth, derivation_width, derivation_print_tree)
(derivation_print_tree_impl, derivation_print): New.
* src/counterexample.c (print_counterexample): Adjust.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
When reporting counterexamples for s/r conflicts, put the shift first.
This is more natural, and displays the default resolution first, which
is also what happens for r/r conflicts where the smallest rule number
is displayed first, and "wins".
* src/counterexample.c (counterexample): Add a shift_reduce member.
(new_counterexample): Adjust.
Swap the derivations when this is a s/r conflict.
(print_counterexample): For s/r conflicts, prefer "Shift derivation"
and "Reduce derivation" rather than "First/Second derivation".
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* NEWS, doc/bison.texi: Ditto.
Currently we use both names. Let's stick to the short one.
* src/AnnotationList.c, src/conflicts.c, src/counterexample.c,
* src/getargs.c, src/getargs.h, src/graphviz.c, src/ielr.c,
* src/lalr.c, src/print-graph.c, src/print-xml.c, src/print.c,
* src/state-item.c, src/state.c, src/state.h, src/tables.c:
s/lookahead_token/lookahead/gi.
It does not make a lot of sense to use ::= in our counterexamples,
that's not something that belongs to the Bison "vocabulary". Using
the colon makes sense, but it's too discreet. Let's use the arrow,
which we already use in some reports (HTML and Dot).
* src/gram.h (print_dot_fallback): Generalize into...
(print_fallback): this.
(print_arrow): New.
* src/derivation.c: Use it.
* NEWS, tests/conflicts.at, tests/counterexample.at,
* tests/diagnostics.at, tests/report.at: Adjust.
* doc/bison.texi: Ditto.
Unfortunately the literal `→` is output as `↦`. So we need to use
@arrow.
Currently when we output useless rules, they appear before the
grammar, but using the same invocation. As a result, the anchor is
defined twice, and the wrong one, being first, is honored.
* data/xslt/xml2xhtml.xsl (rule): Take a new 'anchor' parameter to
decide whether being an anchor, or a target.
Let it be true when output the grammar.
* tests/report.at: Adjust.
The text and Dot reports are expected to be identical when generated
directly (--report, --graph) or indirectly (via XML). The xml
testsuite had not be run for ages, let it catch up a bit.
* src/print-xml.c: Pass the type of the symbols.
* data/xslt/xml2text.xsl
Catch up with the new layout.
Display the symbol types.
Use '•', not '.'
* tests/local.at: Smash '•' to '.' when matching against the direct
text report.
* tests/report.at: Adjust XML expectations.
Reported by Martin Blais and Yuriy Solodkyy.
https://lists.gnu.org/r/help-bison/2020-05/msg00011.htmlhttps://lists.gnu.org/r/bug-bison/2020-06/msg00038.html
While at it, modernize filename_type as api.filename.type and document
it properly.
* data/skeletons/c++.m4 (filename_type): Rename as...
(api.filename.type): this.
Default to const std::string.
* data/skeletons/location.cc (position, location): Expose the
filename_type type.
Use api.filename.type.
* doc/bison.texi (%define Summary): Document api.filename.type.
(C++ Location Values): Document position::filename_type.
* src/muscle-tab.c (muscle_percent_variable_update): Ensure backward
compatibility.
* tests/c++.at: Check that using const file names is ok.
tests/input.at: Check backward compat.
AFAICT, "dotted rule" is a more frequent synonym of "item" than
"pointed rule". So let's migrate to using "dot" only.
* doc/bison.texi: Use dot/'•' rather than point/'.'.
* src/print-xml.c (print_core): Use dot rather than point. This is
not backward compatible, but AFAICT, we don't have actual user of the
XML output (but ourselves). So...
* data/xslt/xml2dot.xsl, data/xslt/xml2text.xsl,
* data/xslt/xml2xhtml.xsl, tests/report.at: ... adjust.
It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.
* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled. And check those colors.
I implemented this to print A ::= [ ], but A ::= [ %empty ] might be
clearer.
* src/parse-simulation.c (nullable_closure): Don't generate null
nonterminal derivations as leaves.
* src/derivation.c (derivation_print_impl): Don't print seperator
spaces for null nonterminal.
* tests/counterexample.at: Update test results.
This was a hack to make it easier for people to migrate from yacc.c to
lalr1.cc and from glr.c to glr.cc: when set, YYSTYPE and YYLTYPE were
`#defined`. It was never documented (just mentioned in NEWS for Bison
2.2, 2006-05-19), but was used to simplify the test suite. Stop that:
adjust the test suite to the skeletons, not the converse.
In C++ use yy::parser::semantic_type, yy::parser::location_type, and
yy::parser::token::MY_TOKEN, instead of YYSTYPE, YYLTYPE and MY_TOKEN.
* data/skeletons/glr.cc, data/skeletons/lalr1.cc: Remove its support.
* tests/actions.at, tests/c++.at, tests/calc.at: Adjust.