Commit Graph

370 Commits

Author SHA1 Message Date
Valentin Tolmer
ef09bf065a glr2.cc: fork glr.cc to a c++ version
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.

* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
2020-08-30 10:45:21 +02:00
Akim Demaille
b801b7b670 fix: unterminated \-escape
An assertion failed when the last character is a '\' and we're in a
character or a string.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00009.html

* src/scan-gram.l: Catch unterminated escapes.
* tests/input.at (Unexpected end of file): New.
2020-08-08 07:53:33 +02:00
Akim Demaille
b7aab2dbad fix: crash when redefining the EOF token
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00008.html

On an empty such as

    %token FOO
           BAR
           FOO 0
    %%
    input: %empty

we crash because when we find FOO 0, we decrement ntokens (since FOO
was discovered to be EOF, which is already known to be a token, so we
increment ntokens for it, and need to cancel this).  This "works well"
when EOF is properly defined in one go, but here it is first defined
and later only assign token code 0.  In the meanwhile BAR was given
the token number that we just decremented.

To fix this, assign symbol numbers after parsing, not during parsing,
so that we also saw all the explicit token codes.  To maintain the
current numbers (I'd like to keep no difference in the output, not
just equivalence), we need to make sure the symbols are numbered in
the same order: that of appearance in the source file.  So we need the
locations to be correct, which was almost the case, except for nterms
that appeared several times as LHS (i.e., several times as "foo:
...").  Fixing the use of location_of_lhs sufficed (it appears it was
intended for this use, but its implementation was unfinished: it was
always set to "false" only).

* src/symtab.c (symbol_location_as_lhs_set): Update location_of_lhs.
(symbol_code_set): Remove broken hack that decremented ntokens.
(symbol_class_set, dummy_symbol_get): Don't set number, ntokens and
nnterms.
(symbol_check_defined): Do it.
(symbols): Don't count nsyms here.
Actually, don't count nsyms at all: let it be done in...
* src/reader.c (check_and_convert_grammar): here.  Define nsyms from
ntokens and nnterms after parsing.
* tests/input.at (EOF redeclared): New.

* examples/c/bistromathic/bistromathic.test: Adjust the traces: in
"%nterm <double> exp %% input: ...", exp used to be numbered before
input.
2020-08-07 07:30:06 +02:00
Akim Demaille
cb65553449 diagnostics: better location for type redeclarations
From

    foo.y:1.7-11: error: %type redeclaration for bar
        1 | %type <foo> bar bar
          |       ^~~~~
    foo.y:1.7-11: note: previous declaration
        1 | %type <foo> bar bar
          |       ^~~~~

to

    foo.y:1.17-19: error: %type redeclaration for bar
        1 | %type <foo> bar bar
          |                 ^~~
    foo.y:1.13-15: note: previous declaration
        1 | %type <foo> bar bar
          |             ^~~

* src/symlist.h, src/symlist.c (symbol_list_type_set): There's no need
for the tag's location, use that of the symbol.
* src/parse-gram.y: Adjust.
* tests/input.at: Adjust.
2020-08-01 08:54:46 +02:00
Akim Demaille
be95a4fe29 scanner: don't crash on strings containing a NUL byte
We crash if the input contains a string containing a NUL byte.
Reported by Suhwan Song.
https://lists.gnu.org/r/bug-bison/2020-07/msg00051.html

* src/flex-scanner.h (STRING_FREE): Avoid accidental use of
last_string.
* src/scan-gram.l: Don't call STRING_FREE without calling
STRING_FINISH first.
* tests/input.at (Invalid inputs): Check that case.
2020-07-28 19:01:48 +02:00
Akim Demaille
6b78e50cef cex: make "rerun with '-Wcex'" a note instead of a warning
Currently the suggestion to rerun is a -Wother warning:

    warning: 2 shift/reduce conflicts [-Wconflicts-sr]
    warning: rerun with option '-Wcounterexamples' to generate conflict counterexamples [-Wother]

Instead, let's attach it as a subnote of the diagnostic (in the
current case, -Wconflicts-sr):

    warning: 2 shift/reduce conflicts [-Wconflicts-sr]
    note: rerun with option '-Wcounterexamples' to generate conflict counterexamples

* src/conflicts.c (conflicts_print): Do that.
Adjust the test suite.
2020-07-21 18:57:56 +02:00
Akim Demaille
eeafc706e8 c++: by default, use const std::string for file names
Reported by Martin Blais and Yuriy Solodkyy.
https://lists.gnu.org/r/help-bison/2020-05/msg00011.html
https://lists.gnu.org/r/bug-bison/2020-06/msg00038.html

While at it, modernize filename_type as api.filename.type and document
it properly.

* data/skeletons/c++.m4 (filename_type): Rename as...
(api.filename.type): this.
Default to const std::string.
* data/skeletons/location.cc (position, location): Expose the
filename_type type.
Use api.filename.type.
* doc/bison.texi (%define Summary): Document api.filename.type.
(C++ Location Values): Document position::filename_type.
* src/muscle-tab.c (muscle_percent_variable_update): Ensure backward
compatibility.
* tests/c++.at: Check that using const file names is ok.
tests/input.at: Check backward compat.
2020-06-27 10:06:00 +02:00
Akim Demaille
c857ed4f72 style: prefer 'FOO ()' to 'FOO' for function-like macros
* src/flex-scanner.h (STRING_GROW, STRING_FINISH, STRING_FREE):
Make them function-like macros.
Adjust dependencies.
2020-06-13 15:20:56 +02:00
Akim Demaille
b0bb4cde2e cex: suggest -Wcounterexamples when there are unexpected conflicts
Suggesting -Wcounterexamples when there are conflicts is probably not
what the user wants.  If she knows her conflicts and has set
%expect/%expect-rr appropriately, we shouldn't warn.

The commit also swaps the counterexamples and the report of conflicts,
into, IMHO, a more natural order: from

    Shift/reduce conflict on token B:
    1:    3 a: A .
    1:    8 y: A . B
    Example              A • B C
    First derivation     s ::=[ a ::=[ A • ] x ::=[ B C ] ]
    Example              A • B C
    Second derivation    s ::=[ y ::=[ A • B ] c ::=[ C ] ]

    input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
    input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]

to

    input.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
    Shift/reduce conflict on token B:
    1:    3 a: A .
    1:    8 y: A . B
    Example              A • B C
    First derivation     s ::=[ a ::=[ A • ] x ::=[ B C ] ]
    Example              A • B C
    Second derivation    s ::=[ y ::=[ A • B ] c ::=[ C ] ]

    input.y:4.4: warning: rule useless in parser due to conflicts [-Wother]

* src/conflicts.c (rule_conflicts_print): Rename as...
(report_rule_expectation_mismatches): this.
Move the handling of report_counterexamples to...
(conflicts_print): Here.
Display this warning when applicable.
2020-06-10 09:51:39 +02:00
Joshua Watt
dd878d1851 bison: add command line option to map file prefixes
Teaches bison about a new command line option, --file-prefix-map OLD=NEW
(based on the -ffile-prefix-map option from GCC) which causes it to
replace and file path of OLD in the text of the output file with NEW,
mainly for header guards and comments. The primary use of this is to
make builds reproducible with different input paths, and in particular
the debugging information produced when the source code is compiled. For
example, a distro may know that the bison source code will be located at
"/usr/src/bison" and thus can generate bison files that are reproducible
with the following command:

    bison --output=/build/bison/parse.c -d --file-prefix-map=/build/bison/=/usr/src/bison/ parse.y

Importantly, this will change the header guards and #line directives
from:

    #ifndef YY_BUILD_BISON_PARSE_H
    #line 100 "/build/bison/parse.h"

to

    #ifndef YY_USR_SRC_BISON_PARSE_H
    #line 100 "/usr/src/bison/parse.h"

which is reproducible.

See https://lists.gnu.org/r/bison-patches/2020-05/msg00016.html
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>

* src/files.h, src/files.c (spec_mapped_header_file)
(mapped_dir_prefix, map_file_name, add_prefix_map): New.
* src/getargs.c (-M, --file-prefix-map): New option.
* src/output.c (prepare): Define b4_mapped_dir_prefix and
b4_spec_header_file.
* src/scan-skel.l (@ofile@): Output the mapped file name.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/location.cc,
* data/skeletons/yacc.c:
Adjust.
* doc/bison.texi: Document.
* tests/input.at, tests/output.at: Check.
2020-05-24 15:17:15 +02:00
Akim Demaille
e7aff57122 style: rename user_token_number as code
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples.  It turns out it's
completely orthogonal.

* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
2020-05-23 08:43:58 +02:00
Akim Demaille
ff4d67ede8 CI: add GCC 10 and Clang 10
* .travis.yml: Here.
* tests/input.at, tests/regression.at: Beware of clang's -Wdocumentation.
2020-05-17 08:28:12 +02:00
Akim Demaille
465babb635 fix: do not emit nested comments
With input such as

    %token<fl> yVL_CLOCK "/*verilator sc_clock*/"

we generate

    yVL_CLOCK = 610,      /* "/*verilator sc_clock*/"  */

which is invalid since the comment will actually be closed on the
first "*/".  Let's turn "*/" into "*\/" to avoid this.  But GCC will
also warn about "/*" inside a comment, so let's "escape" it too.

Reported by Huang Rui.
https://github.com/akimd/bison/issues/38

* data/skeletons/c-like.m4 (_b4_comment): Escape comment delimiters in
comments.
* tests/input.at (Torturing the Scanner): Check thes cases.
* tests/m4.at: New.
2020-05-17 08:28:12 +02:00
Akim Demaille
cd4e799da4 error: rename the error token from YYERRCODE to YYerror
See https://lists.gnu.org/r/bison-patches/2020-04/msg00162.html.

* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.cc,
* data/skeletons/lalr1.java, doc/bison.texi,
* examples/c/bistromathic/parse.y, src/scan-gram.l, src/symtab.c
(YYERRCODE): Rename as...
(YYerror): this.
Adjust dependencies.
2020-04-28 07:54:07 +02:00
Akim Demaille
7346163840 dogfooding: use YYERRCODE in our scanner
* src/scan-gram.l: Use it.
* tests/input.at: Adjust.
2020-04-27 08:21:50 +02:00
Akim Demaille
89c4e1becf scanner: avoid spurious errors about empty character literals
On an invalid character literal such as "'\777'" we used to produce
two errors:

    input.y:2.9-12: error: invalid number after \-escape: 777
    input.y:2.8-13: error: empty character literal

Get rid of the second one.

* src/scan-gram.l (STRING_GROW_ESCAPE): New.
* tests/input.at: Adjust.
2020-04-27 08:06:49 +02:00
Akim Demaille
3262747c5b scanner: bad character literals are errors
* src/scan-gram.l: These are errors, not warnings.
* tests/input.at: Adjust.
2020-04-27 07:17:04 +02:00
Akim Demaille
286d0755f8 all: prefer YYERRCODE to YYERROR
We will not keep YYERRCODE anyway, it causes backward compatibility
issues.  So as a first step, let all the skeletons use that name,
until we have a better one.

* data/skeletons/bison.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c, doc/bison.texi, tests/headers.at,
* tests/input.at:
here.
2020-04-26 15:09:51 +02:00
Akim Demaille
ffa46e6516 skeletons: clarify the tag of special tokens
From

    GRAM_EOF = 0,                  /* $end  */
    GRAM_ERRCODE = 1,              /* error  */
    GRAM_UNDEF = 2,                /* $undefined  */

to

    GRAM_EOF = 0,                  /* "end of file"  */
    GRAM_ERRCODE = 1,              /* error  */
    GRAM_UNDEF = 2,                /* "invalid token"  */

* src/output.c (symbol_tag): New.
Use it to pass the token names and the symbol tags to the skeletons.

* tests/input.at: Adjust.
2020-04-12 13:56:44 +02:00
Akim Demaille
a555b41990 diagnostics: replace "user token number" by "token code"
Yet, don't change the structure identifier to avoid introducing
conflicts in Vincent Imbimbo's PR (which, amusingly enough, is about
conflicts).

* src/symtab.c: here.
* tests/diagnostics.at, tests/input.at: Adjust.
2020-04-12 13:56:44 +02:00
Akim Demaille
e50de09886 tokens: properly define the YYEOF token kind
Currently EOF is handled in an adhoc way, with a #define YYEOF 0 in
the implementation file.  As a result, the user has to define her own
EOF token if she wants to use it, which is a pity.

Give the $end token a visible kind name, YYEOF.  Except that in C,
where enums are not scoped, we would have collisions between all the
definitions of YYEOFs in the header files, so in C, make it
<api.PREFIX>EOF.

* data/skeletons/c.m4 (YYEOF): Override its name to avoid collisions.
Unless the user already gave it a different name.
* data/skeletons/glr.c (YYEOF): Remove.
Use ]b4_symbol(0, [id])[ instead.
Add support for "pre_epilogue", for glr.cc.
* data/skeletons/glr.cc: Remove dead code (never emitted #undefs).
* data/skeletons/yacc.c
* src/parse-gram.c
* src/reader.c
* src/symtab.c
* tests/actions.at
* tests/input.at
2020-04-12 13:56:44 +02:00
Akim Demaille
95421df67b tokens: define the "$undefined" token kind
* data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to
$undefined.
(b4_token_visible_if): $undefined has an id.
* src/output.c (prepare_symbol_definitions): Stop lying: $undefined
_is_ a token.
* tests/input.at: Adjust.
2020-04-12 13:56:43 +02:00
Akim Demaille
a4ed94bc13 tokens: properly define the "error" token kind
There are people out there that do use YYERRCODE (the token kind of
the error token).  See for instance
3812012bb7/unixODBC-2.3.2/Drivers/nn/yylex.c.

Currently, YYERRCODE is defined by yacc.c in an adhoc way as a #define
in the *.c file only.  It belongs with the other token kinds.

YYERRCODE is not a nice name, it does not fit in our naming scheme.
YYERROR would be more logical, but it collides with the YYERROR macro.
Shall we keep the same name in all the skeletons?  Besides, to avoid
collisions in C, we need to apply the api prefix: YYERRCODE is
actually <PREFIX>ERRCODE.  This is not needed in the other languages.

* data/skeletons/bison.m4 (b4_symbol_token_kind): New.
Map the error token to "YYERRCODE".
* data/skeletons/yacc.c (YYERRCODE): Don't define it, it's handled by...
* src/output.c (prepare_symbol_definitions): this.
* tests/input.at (Redefining the error token): Check it.
2020-04-12 13:56:43 +02:00
Akim Demaille
951da960e6 merge branch 'maint'
* upstream/maint:
  maint: post-release administrivia
  version 3.5.3
  news: update for 3.5.3
  yacc.c: make sure we properly propagated the user's number for error
  diagnostics: don't crash because of repeated definitions of error
  style: initialize some struct members
  diagnostics: beware of zero-width characters
  diagnostics: be sure to close the styling when lines are too short
  muscles: fix incorrect decoding of $
  code: be robust to reference with invalid tags
  build: fix typo
  doc: update recommandation for libtextstyle
  style: comment changes
  examples: use consistently the GFDL header for readmes
  style: remove useless declarations
  typo: succesful -> successful
  README: point to tests/bison, and document --trace
  gnulib: update
  maint: post-release administrivia
2020-03-08 10:13:16 +01:00
Akim Demaille
e3812bb8c3 yacc.c: make sure we properly propagated the user's number for error
* data/skeletons/yacc.c (YYERRCODE): Be truthful.
* tests/input.at (Redefining the error token): Check that.
2020-03-08 08:10:11 +01:00
Akim Demaille
cfcd823e16 diagnostics: don't crash because of repeated definitions of error
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:

    The token error shall be reserved for error handling. The name
    error can be used in grammar rules. It indicates places where the
    parser can recover from a syntax error. The default value of error
    shall be 256. Its value can be changed using a %token
    declaration. The lexical analyzer should not return the value of
    error.

I think this feature is useless, the user should not have to deal with
that.  The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number.  In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers.  So
this feature is useless.

Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition.  At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.

Rather, the location of the first user definition of "error" should
become its defining location.

Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html

* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
b82b387da9 muscles: fix incorrect decoding of $
Bug introduced in 458171e6df.
https://lists.gnu.org/archive/html/bison-patches/2013-11/msg00009.html

Reported by Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00010.html

* src/muscle-tab.c (COMMON_DECODE): "$" is coded as "$][", not "$[][".
* tests/input.at ("%define" enum variables): Check that case.
2020-03-07 07:45:10 +01:00
Akim Demaille
641e326303 code: be robust to reference with invalid tags
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.

* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
2020-03-06 17:29:26 +01:00
Victor Morales Cayuela
e09a72eeb0 diagnostics: modernize the display of submessages
Since Bison 2.7, output was indented four spaces for explanatory
statements.  For example:

    input.y:2.7-13: error: %type redeclaration for exp
    input.y:1.7-11:     previous declaration

Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:

    input.y:2.7-13: error: %type redeclaration for exp
        2 | %type <float> exp
          |       ^~~~~~~
    input.y:1.7-11: note: previous declaration
        1 | %type <int> exp
          |       ^~~~~

* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
2020-02-15 08:28:40 +01:00
Akim Demaille
77bdcc6f0c parse.error: document and diagnose the incompatibility with %token-table
* doc/bison.texi (Tokens from Literals): Move to code using
%token-table to...
(Decl Summary: %token-table): here.
* data/skeletons/bison.m4: Implement mutual exclusion.
* tests/input.at: Check it.
* doc/local.mk: Be robust to the removal of doc/.
2020-02-10 20:15:46 +01:00
Akim Demaille
fc2191f137 diagnostics: modernize bison's syntax errors
We used to display the unexpected token first:

    $ bison foo.y
    foo.y:1.8-13: error: syntax error, unexpected %token, expecting character literal or identifier or <tag>
        1 | %token %token
          |        ^~~~~~

GCC uses a different format:

    $ gcc-mp-9 foo.c
    foo.c:1:5: error: expected identifier or '(' before ')' token
        1 | int()()()
          |     ^

and so does Clang:

    $ clang-mp-9.0 foo.c
    foo.c:1:5: error: expected identifier or '('
    int()()()
        ^
    1 error generated.

They display the unexpected token last (or not at all).  Also, they
don't waste width with "syntax error".  Let's try that.  It gives, for
the same example as above:

    $ bison foo.y
    foo.y:1.8-13: error: expected character literal or identifier or <tag> before %token
        1 | %token %token
          |        ^~~~~~

* src/complain.h, src/complain.c (syntax_error): New.
* src/parse-gram.y (yyreport_syntax_error): Use it.
2020-01-23 08:30:28 +01:00
Akim Demaille
2cc361387c diagnostics: translate bison's own tokens
As a test case, support translations in Bison itself.

* src/parse-gram.y: Mark the translatable tokens.
While at it, use clearer names.
* tests/input.at: Adjust expectations.
2020-01-23 08:26:28 +01:00
Adrian Vogelsgesang
4ab2cf7450 larlr1.cc: Reject unsupported values for parse.lac
Just as the yacc.c skeleton, the lalr1.cc skeleton should reject
invalid values for parse.lac.

* data/skeletons/lalr1.cc: check validity of parse.lac
* tests/input.at: new test cases
2020-01-21 06:57:21 +01:00
Adrian Vogelsgesang
172f103c1e larlr1.cc: Reject unsupported values for parse.lac
Just as the yacc.c skeleton, the lalr1.cc skeleton should reject
invalid values for parse.lac.

* data/skeletons/lalr1.cc: check validity of parse.lac
* tests/input.at: new test cases
2020-01-21 06:22:27 +01:00
Akim Demaille
c67daa9a97 package: bump copyrights to 2020
Run 'make update-copyright'.
2020-01-10 19:16:23 +01:00
Akim Demaille
8036635251 package: bump copyrights to 2020
Run 'make update-copyright'.
2020-01-05 10:26:35 +01:00
Akim Demaille
8a910107b3 diagnostics: complain about undeclared string tokens
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors.  For
instance

    %type <exVal> cond "condition"

does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases).  It is rather equivalent to

    %nterm <exVal> cond
    %token <exVal> "condition"

i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.

Introduce -Wdangling-alias to catch this.

* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
2019-11-17 18:27:42 +01:00
Akim Demaille
28d1ca8f48 diagnostics: yacc reserves %type to nonterminals
On

    %token TOKEN1
    %type  <ival> TOKEN1 TOKEN2 't'
    %token TOKEN2
    %%
    expr:

bison -Wyacc gives

    input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |               ^~~~~~
    input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |                             ^~~
    input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |                      ^~~~~~

The messages appear to be out of order, but they are emitted when the
error is found.

* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
2019-11-17 09:45:25 +01:00
Paul Eggert
052215a138 bison: check for int overflow when scanning
* src/scan-gram.l: Include errno.h, for errno.
(scan_integer, handle_syncline): Check for integer overflow.
* tests/input.at (too-large.y): Adjust to match new diagnostics.
2019-10-17 11:51:20 -07:00
Akim Demaille
8631f35bf9 tests: factor the generation of files without the final eol
AFAICT Autotest 2.69 still does not support AT_DATA without the final
eol.

* tests/local.at (AT_DATA_NO_FINAL_EOL): New.
* tests/input.at: Use it.
2019-10-13 09:55:44 +02:00
Akim Demaille
c483b6593f tests: refactor the handling of Perl
Let's make a difference between places where Perl is required for the
test (AT_PERL_REQUIRE), and the places where it's used to run the
test, but it's not not to run the test (AT_PERL_CHECK).

* tests/local.at (AT_REQUIRE): New.
(AT_PERL_CHECK, AT_PERL_REQUIRE): New.
Use them where appropriate.

* tests/local.mk ($(TESTSUITE)): Beware not to start the line with
'-pi' if Perl is empty, as Make understands this as "it's ok to fail".
Which it is not.
2019-10-13 09:22:05 +02:00
Akim Demaille
0c56c195e0 tests: be really robust to Perl missing
My previous tests (with ./configure PERL=false) have been fooled by
configure, that managed to find perl anyway.  This time, I ran this on
a Fedora in Docker, without Perl.

* tests/calc.at, tests/diagnostics.at, tests/headers.at,
* tests/input.at, tests/local.at, tests/named-refs.at,
* tests/output.at, tests/regression.at, tests/skeletons.at,
* tests/synclines.at, tests/torture.at: Don't require Perl.
2019-10-11 06:53:45 +02:00
Akim Demaille
f41e0cf73c tests: do not depend on config.h
Currently we face test suite failures in different environments,
because of a conflict between the definitions of isnan by gnulib, and
by the C++ library:

    262. headers.at:186: testing Sane headers: %locations %debug c++ ...
    ./headers.at:186: COLUMNS=1000; export COLUMNS;  bison --color=no -fno-caret -d -o input.cc input.y
    ./headers.at:186: $CXX $CXXFLAGS $CPPFLAGS  -c -o input.o input.cc
    stderr:
    In file included from /usr/include/c++/4.8.2/cmath:44:0,
                     from /usr/include/c++/4.8.2/random:38,
                     from /usr/include/c++/4.8.2/bits/stl_algo.h:65,
                     from /usr/include/c++/4.8.2/algorithm:62,
                     from location.hh:41,
                     from input.hh:90,
                     from input.cc:50:
    /u/cs/fac/eggert/src/gnu/bison/lib/math.h: In function 'bool isnan(double)':
    /u/cs/fac/eggert/src/gnu/bison/lib/math.h:2849:1: error: new declaration 'bool isnan(double)'
     _GL_MATH_CXX_REAL_FLOATING_DECL_2 (isnan, isnan, bool)
     ^
    In file included from /usr/include/features.h:375:0,
                     from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/os_defines.h:39,
                     from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/c++config.h:2097,
                     from /usr/include/c++/4.8.2/cstdlib:41,
                     from input.hh:48,
                     from input.cc:50:
    /usr/include/bits/mathcalls.h:235:1: error: ambiguates old declaration 'int isnan(double)'
     __MATHDECL_1 (int,isnan,, (_Mdouble_ __value)) __attribute__ ((__const__));
     ^

There might be something to do in gnulib about this, but I believe
that gnulib should not be used in the test suite in the first place.

The test suite should work with other compilers than the one used to
compile the package.  For a start, Bison sources are more
demanding (C99) than the generated parsers.  Last time I tried, tcc
for example, was not able to compile Bison, yet our generated parsers
should compile cleanly with it.

Besides the problem at hand is with the C++ compiler, with is not the
one used to set up gnulib at configuration-time (config.h is mainly
built from probing the C compiler).

We should really not depend on gnulib in tests.

This was introduced in 2001 to check whether including
stdlib.h/string.h is safe thanks to STDC_HEADERS
(2ce1014469).  Today, we assume at least
a C90 compiler, it should be safe enough.

* tests/local.at, tests/testsuite.h: Do not include config.h.
* tests/atlocal.in (conftest.cc): Likewise.
(CPPFLAGS): Do not expose lib/, as because of this we might picked up
gnulib replacement headers for system headers.

* tests/input.at: Use int instead of ptrdiff_t, for easier portability
(some machine on the CI did not find ptrdiff_t).
* tests/c++.at: Add missing include for getchar.
2019-10-10 17:53:48 +02:00
Akim Demaille
fec13ce2db diagnostics: sort symbols per location
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird.  In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file.  For instance, checking
symbols is done on the list of symbols sorted by tag:

    foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~
    foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~

Let's sort them by location instead:

    foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~
    foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~

* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
2019-10-06 09:54:25 +02:00
Akim Demaille
126c4622de style: isolate complain_symbol_undeclared
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.
2019-10-06 09:54:25 +02:00
Akim Demaille
0b585c49ae diagnostics: display suggested update after the caret-info
This commit adds the suggestion in green, on the line below the
caret-and-tildes.

    foo.y:1.1-14: warning: deprecated directive: '%error-verbose', use '%define parse.error verbose' [-Wdeprecated]
        1 | %error-verbose
          | ^~~~~~~~~~~~~~
          | %define parse.error verbose

The current approach, with location_caret_suggestion, is fragile:
there's a protocol of calls to the complain functions which is strict.
We should rather have a richer structure describing the diagnostics,
including with submessages such as the suggestions, passed in the end
to the routines in charge of formatting and printing them.

* src/location.h, src/location.c (location_caret_suggestion): New.
* src/complain.c (deprecated_directive): Use it.
* tests/diagnostics.at, tests/input.at: Adjust expectations.
2019-10-06 08:07:57 +02:00
Akim Demaille
d96fff6115 tests: be robust to -DNDEBUG
input.y: In function 'yylex':
input.y:67:7: error: unused variable 'input_elts' [-Werror=unused-variable]
   int input_elts = sizeof input / sizeof input[0];
       ^~~~~~~~~~
cc1: all warnings being treated as errors

* tests/input.at, tests/local.at: Avoid that.
2019-10-03 09:27:40 +02:00
Paul Eggert
133edcd248 Prefer signed to unsigned integers
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. 	Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
2019-10-02 17:11:33 -07:00
Akim Demaille
8c18e3f18c api.token.raw: cannot be used with character literals
* src/parse-gram.y (CHAR): api.token.raw and character literals are
mutually exclusive.
* tests/input.at (Character literals and api.token.raw): New.
2019-09-14 10:09:08 +02:00
Akim Demaille
d94d83e10b style: tidy yacc.c
* data/skeletons/yacc.c: Include 'c.m4' first.
Then sort the handling of %define variables.
* tests/input.at: Adjust.
2019-09-14 09:55:17 +02:00