It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.
* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled. And check those colors.
Yet, don't change the structure identifier to avoid introducing
conflicts in Vincent Imbimbo's PR (which, amusingly enough, is about
conflicts).
* src/symtab.c: here.
* tests/diagnostics.at, tests/input.at: Adjust.
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Since Bison 2.7, output was indented four spaces for explanatory
statements. For example:
input.y:2.7-13: error: %type redeclaration for exp
input.y:1.7-11: previous declaration
Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:
input.y:2.7-13: error: %type redeclaration for exp
2 | %type <float> exp
| ^~~~~~~
input.y:1.7-11: note: previous declaration
1 | %type <int> exp
| ^~~~~
* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
We used to display the unexpected token first:
$ bison foo.y
foo.y:1.8-13: error: syntax error, unexpected %token, expecting character literal or identifier or <tag>
1 | %token %token
| ^~~~~~
GCC uses a different format:
$ gcc-mp-9 foo.c
foo.c:1:5: error: expected identifier or '(' before ')' token
1 | int()()()
| ^
and so does Clang:
$ clang-mp-9.0 foo.c
foo.c:1:5: error: expected identifier or '('
int()()()
^
1 error generated.
They display the unexpected token last (or not at all). Also, they
don't waste width with "syntax error". Let's try that. It gives, for
the same example as above:
$ bison foo.y
foo.y:1.8-13: error: expected character literal or identifier or <tag> before %token
1 | %token %token
| ^~~~~~
* src/complain.h, src/complain.c (syntax_error): New.
* src/parse-gram.y (yyreport_syntax_error): Use it.
* tests/diagnostics.at (Locations from M4, Tabulations and multibyte
characters from M4): These tests are actually checking a message
coming from C, not from M4. Replace with...
(Complaints from M4): This.
Currently there are two globals denoting the input file: grammar_file
is the one from the command line, and current_file which might change
because of #line. Use only the former.
* src/complain.c (error_message): here.
* tests/diagnostics.at: Adjust.
Let's make a difference between places where Perl is required for the
test (AT_PERL_REQUIRE), and the places where it's used to run the
test, but it's not not to run the test (AT_PERL_CHECK).
* tests/local.at (AT_REQUIRE): New.
(AT_PERL_CHECK, AT_PERL_REQUIRE): New.
Use them where appropriate.
* tests/local.mk ($(TESTSUITE)): Beware not to start the line with
'-pi' if Perl is empty, as Make understands this as "it's ok to fail".
Which it is not.
My previous tests (with ./configure PERL=false) have been fooled by
configure, that managed to find perl anyway. This time, I ran this on
a Fedora in Docker, without Perl.
* tests/calc.at, tests/diagnostics.at, tests/headers.at,
* tests/input.at, tests/local.at, tests/named-refs.at,
* tests/output.at, tests/regression.at, tests/skeletons.at,
* tests/synclines.at, tests/torture.at: Don't require Perl.
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird. In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file. For instance, checking
symbols is done on the list of symbols sorted by tag:
foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
Let's sort them by location instead:
foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
From
input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
1 | %printer {} foo baz
| ^~~
to
input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
1 | %printer {} foo baz
| ^~~
| bar
* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
This commit adds the suggestion in green, on the line below the
caret-and-tildes.
foo.y:1.1-14: warning: deprecated directive: '%error-verbose', use '%define parse.error verbose' [-Wdeprecated]
1 | %error-verbose
| ^~~~~~~~~~~~~~
| %define parse.error verbose
The current approach, with location_caret_suggestion, is fragile:
there's a protocol of calls to the complain functions which is strict.
We should rather have a richer structure describing the diagnostics,
including with submessages such as the suggestions, passed in the end
to the routines in charge of formatting and printing them.
* src/location.h, src/location.c (location_caret_suggestion): New.
* src/complain.c (deprecated_directive): Use it.
* tests/diagnostics.at, tests/input.at: Adjust expectations.
We used to treat lone CRs (\r, aka ^M) as regular NLs (\n), probably
to please Classic MacOS. As of today, it makes more sense to treat \r
like a plain white space character.
https://lists.gnu.org/archive/html/bison-patches/2019-09/msg00027.html
* src/scan-gram.l (no_cr_read): Remove. Instead, use...
(eol): this new abbreviation denoting end-of-line.
* src/location.c (caret_getc): New.
(location_caret): Use it.
* tests/diagnostics.at (Carriage return): Adjust expectations.
(CR NL): New.
When the input file contains lone CRs (aka, ^M, \r), the locations see
a new line. Diagnostics look only at \n as end-of-line, so sometimes
there is an offset in diagnostics. Worse yet: sometimes we loop
endlessly waiting for \n to come from a continuous stream of EOF.
Fix that:
- check for EOF
- beware not to call end_use_class if begin_use_class was not
called (which would abort). This could happen if the actual
line is shorter that the expected one.
Prompted by a (private) report from Marc Schönefeld.
* src/location.c (location_caret): here.
* tests/diagnostics.at (Carriage return): New.
https://lists.gnu.org/archive/html/bison-patches/2019-08/msg00007.html
When Bison is started with a flag that suppresses warning messages, the
error_message() function can produce a few gigabytes of indentation
because of a dangling pointer.
* src/complain.c (error_message): Don't reset indent_ptr here, but...
(complain_indent): here.
* tests/diagnostics.at (Indentation with message suppression): Check
this case.
It is more consistent with --color=html, --color=test, etc.
* src/getargs.h, src/getargs.c (style_debug): Rename as...
(color_debug): this.
(getargs_colors): Rename --style=debug as --color=debug.
Adjust dependencies.
* cfg.mk: Disable checks where needed (e.g., we do want to check the
behavior with tabs).
(sc_at_parser_check): Remove. Unfortunately since
a11c144609 we no longer use the './'
prefix to run programs in the current directory. That was so that we
could run Java programs like the other, although they are no run with
the `./` prefix (see 967a59d2c0).
As a consequence this sc check no longer makes sense.
However, since now AT_PARSER_CHECK passes the `./` prefix itself, this
sc-check was superfluous.
* examples/c/reccalc/scan.l: Use memcpy, not strncpy.
* src/ielr.c, src/reader.c: Obfuscate "lr(0)" so that the sc-check for
"space before paren" does not fire.
* tests/diagnostics.at: Avoid space-tab, use tab-tab.
Currently we pass only the columns based on the screen-width, which is
important for the carets. But we don't pass the bytes-based columns,
which is important for the colors. Pass both.
* src/muscle-tab.c (muscle_boundary_grow): Also pass the byte-based column.
* src/location.c (location_caret): Clarify.
(boundary_set_from_string): Adjust to the new format.
* tests/diagnostics.at (Tabulations and multibyte characters from M4): New.
Locations issued from M4 need the byte-based column for the
diagnostics to work properly. Currently they were unassigned, which
typically resulted in partially non-colored diagnostics.
* src/location.c (boundary_set_from_string): Fix the parsed location.
* src/muscle-tab.c (muscle_percent_define_default): Set the byte values.
* tests/diagnostics.at (Locations from M4): New.
The "identifier and colon" of a rule is implemented as a single token,
but whose location is only that of the identifier (so that messages
about the lhs of a rule are accurate). When reducing empty rules, the
default location is the single point location on the end of the
previous symbol. As a consequence, when Bison parses a grammar, the
location of the right-hand side of an empty rule is based on the
lhs, *independently of the position of the colon*. And the colon can
be way farther, separated by comments, white spaces, including empty
lines.
As a result, some messages look really bad. For instance:
$ cat foo.y
%%
foo : /* empty */
bar
: /* empty */
gives
$ bison -Wall foo.y
foo.y:2.4: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:3.4: warning: empty rule without %empty [-Wempty-rule]
3 | bar
| ^
The carets are not at the right column, not even the right line.
This commit passes the colon "again" after the "id colon" token, which
gives more accurate locations for these messages:
$ bison -Wall foo.y
foo.y:2.10: warning: empty rule without %empty [-Wempty-rule]
2 | foo : /* empty */
| ^
foo.y:4.2: warning: empty rule without %empty [-Wempty-rule]
4 | : /* empty */
| ^
* src/scan-gram.l (SC_AFTER_IDENTIFIER): Rollback the colon, so that
we scan it again afterwards.
(INITIAL): Scan colons.
* src/parse-gram.y (COLON): New.
(rules): Parse the colon after the rule's id_colon (and possible
named reference).
* tests/actions.at, tests/conflicts.at, tests/diagnostics.at,
* tests/existing.at: Adjust.
Currently, when we quote the source file, we indent it with one space,
and preserve tabulations, so there is a discrepancy and the visual
rendering is bad. One way out is to indent with a tab instead of a
space, but then this space can be used for more information. This is
what GCC9 does. Let's play copy cats.
See
https://lists.gnu.org/archive/html/bison-patches/2019-04/msg00025.htmlhttps://developers.redhat.com/blog/2019/03/08/usability-improvements-in-gcc-9/https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Guidelines-for-Diagnostics
* src/location.c (location_caret): Prefix quoted lines with the line
number and a pipe, fitting 8 columns.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/diagnostics.at, tests/input.at, tests/java.at,
* tests/named-refs.at, tests/reduce.at, tests/regression.at,
* tests/sets.at: Adjust expectations.
Partly by "./build-aux/update-test tests/testsuite.dir/*/testsuite.log"
repeatedly, and partly by hand.
This is a pity: efforts were invested in computing correctly the
number of screen columns consumed by multibyte characters, but the
routines that do that were fed by single-byte inputs...
As a consequence Bison never displayed correctly locations when there
are multibyte characters.
* src/scan-gram.l (mbchar): New.
Use it instead of . in the catch-all clause.
* tests/diagnostics.at (Tabulations): Enhance into...
(Tabulations and multibyte characters): this.
Single point locations (equal boundaries) are troublesome, and we were
incorrectly ending the style in their case. Which results in an abort
in libtextstyle.
There is also a confusion between columns as displayed on the
screen (which take into account multibyte characters and tabulations),
and the number of bytes. Counting the screen-column
incrementally (character by character) is uneasy (because of multibyte
characters), and I don't want to maintain a buffer of the current line
when displaying the diagnostic. So I believe the simplest solution is
to track the byte number in addition to the screen column.
* src/location.h, src/location.c (boundary): Add the byte-column.
Adjust dependencies.
* src/getargs.c, src/scan-gram.l: Adjust.
* tests/diagnostics.at: Check zero-width locations.
Enable checking of styles even when libtextstyle is not installed.
* src/getargs.h, src/getargs.c (style_debug): New.
(getargs_colors): Set it when --style=debug.
* src/complain.c (begin_use_class, end_use_class): Use it.
* tests/diagnostics.at: New.