Commit Graph

2596 Commits

Author SHA1 Message Date
Akim Demaille
cfcd823e16 diagnostics: don't crash because of repeated definitions of error
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:

    The token error shall be reserved for error handling. The name
    error can be used in grammar rules. It indicates places where the
    parser can recover from a syntax error. The default value of error
    shall be 256. Its value can be changed using a %token
    declaration. The lexical analyzer should not return the value of
    error.

I think this feature is useless, the user should not have to deal with
that.  The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number.  In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers.  So
this feature is useless.

Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition.  At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.

Rather, the location of the first user definition of "error" should
become its defining location.

Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html

* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
2f02d9beae style: initialize some struct members
* src/symtab.c (sym_content_new): Initialize all the location members.
Not needed by the code, but disturbing values when using a debugger.
2020-03-08 08:10:11 +01:00
Akim Demaille
b638603477 diagnostics: beware of zero-width characters
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines.  This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.

* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
e21ff47f5d diagnostics: be sure to close the styling when lines are too short
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
    -    4 | %token foo <error>123
    +    4 | %token foo <error>123</error>
           |            <error>^~~~~~</error>

* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
2020-03-07 10:01:52 +01:00
Akim Demaille
b82b387da9 muscles: fix incorrect decoding of $
Bug introduced in 458171e6df.
https://lists.gnu.org/archive/html/bison-patches/2013-11/msg00009.html

Reported by Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00010.html

* src/muscle-tab.c (COMMON_DECODE): "$" is coded as "$][", not "$[][".
* tests/input.at ("%define" enum variables): Check that case.
2020-03-07 07:45:10 +01:00
Akim Demaille
641e326303 code: be robust to reference with invalid tags
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.

* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
2020-03-06 17:29:26 +01:00
Akim Demaille
666df338a7 style: comment changes
* src/symtab.h, src/lr0.c: here.
2020-03-06 08:32:03 +01:00
Akim Demaille
b493c173c9 style: remove useless declarations
* src/reader.h: Don't duplicate what parse-gram.h already exposes.
* src/lr0.h: Remove useless include.
2020-03-06 08:30:21 +01:00
Adrian Vogelsgesang
aab3feb5a1 typo: succesful -> successful
* data/skeletons/lalr1.cc: here
* etc/bench.pl.in: here
* src/location.c: and here.
2020-03-06 08:29:58 +01:00
Akim Demaille
cc3760ef51 news: 3.5.2
* NEWS: Update.
2020-02-13 18:25:11 +01:00
Akim Demaille
04f3bfc596 regen 2020-01-10 19:21:35 +01:00
Akim Demaille
c67daa9a97 package: bump copyrights to 2020
Run 'make update-copyright'.
2020-01-10 19:16:23 +01:00
Akim Demaille
d55f240991 parser: pretend we are Bison 3.5
* src/parse-gram.y: Accept we're Bison 3.5.
2019-12-08 16:03:36 +01:00
Akim Demaille
6dca1eb950 regen 2019-12-06 08:27:55 +01:00
Akim Demaille
9e9e49224f diagnostics: style changes
* src/complain.h, src/complain.c: Comment changes.
* src/scan-skel.l: Reduce scopes.
* data/skeletons/bison.m4: Factor diagnostic functions.
2019-12-02 19:35:01 +01:00
Akim Demaille
ad32ec64c8 style: pacify syntax-check
* cfg.mk: No need to translate *.md files.
* data/skeletons/glr.c, data/skeletons/yacc.c: Fix space issues.
2019-11-20 07:10:27 +01:00
Akim Demaille
8a910107b3 diagnostics: complain about undeclared string tokens
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors.  For
instance

    %type <exVal> cond "condition"

does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases).  It is rather equivalent to

    %nterm <exVal> cond
    %token <exVal> "condition"

i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.

Introduce -Wdangling-alias to catch this.

* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
2019-11-17 18:27:42 +01:00
Akim Demaille
28d1ca8f48 diagnostics: yacc reserves %type to nonterminals
On

    %token TOKEN1
    %type  <ival> TOKEN1 TOKEN2 't'
    %token TOKEN2
    %%
    expr:

bison -Wyacc gives

    input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |               ^~~~~~
    input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |                             ^~~
    input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
        2 | %type  <ival> TOKEN1 TOKEN2 't'
          |                      ^~~~~~

The messages appear to be out of order, but they are emitted when the
error is found.

* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
2019-11-17 09:45:25 +01:00
Akim Demaille
60ebd8e210 regen 2019-11-16 12:54:44 +01:00
kaneko y
3765e3e790 gram.c: Fix condition of aver
* src/gram.c (grammar_dump): Fix condition of aver.
What we want to check is that rhs is followed by its rule.
2019-11-12 08:39:28 +01:00
Yuichiro Kaneko
17d34c231b gram.c: also print terminals in grammar_dump
* src/gram.c (grammar_dump): Print terminals likewise non terminals.
* tests/sets.at (Reduced Grammar): Update test case to catch up the
change and add a test case where prec and assoc are used.
2019-11-11 10:37:30 +01:00
Akim Demaille
cce6c998b6 diagnostics: add missing translation
* src/muscle-tab.c (muscle_percent_define_check_kind): Here.
2019-11-03 09:24:12 +01:00
Akim Demaille
c53b379784 style: fix cpp indentation
Reported by syntax-check.

* src/system.h: here.
2019-10-29 09:00:46 +01:00
Akim Demaille
8228d96d33 reader: reduce the "scope" of global variables
We have too many global variables, adding structure would help.  For a
start, let's hide some of the variables closer to their usage.

* src/getargs.c, src/files.h (current_file): Move to...
* src/scan-gram.c: here.
* src/scan-gram.h (gram_in, gram__flex_debug): Remove, make them
private to the scanner.
* src/reader.h, src/reader.c (reader): Take a grammar file as argument.
Move the handling of scanner variables to...
* src/scan-gram.l (gram_scanner_open, gram_scanner_close): here.
(gram_scanner_initialize): Remove, replaced by gram_scanner_open.
* src/main.c: Adjust.
2019-10-26 10:39:01 +02:00
Akim Demaille
a5fc4e3b44 regen 2019-10-26 10:39:01 +02:00
Akim Demaille
3be912e4af parser: use grammar_file instead of current_file
* src/parse-gram (%initial-action): here.
(handle_skeleton): Don't depend on the current file name to look for
"local" skeletons (subject to changes coming from "#lines"): depend
only on the initial file name, the one given on the command line.
2019-10-26 10:38:39 +02:00
Akim Demaille
4b4e532748 diagnostics: use grammar_file instead of current_file
Currently there are two globals denoting the input file: grammar_file
is the one from the command line, and current_file which might change
because of #line.  Use only the former.

* src/complain.c (error_message): here.
* tests/diagnostics.at: Adjust.
2019-10-26 09:11:40 +02:00
Akim Demaille
6e7d8ba6a7 reader: let symtab deal with the symbols
* src/reader.c (reader): Move the setting up of the builtin symbols to...
* src/symtab.c (symbols_new): here.
2019-10-25 07:48:07 +02:00
Akim Demaille
c680300a29 style: remove incorrect comment
Reported by Paul Eggert.

* src/system.h: here.
2019-10-25 07:41:38 +02:00
Akim Demaille
fa9871a2fb diagnostics: simplify location handling
Locations start at line 1.  Don't accept line 0.

* src/location.c (location_print): Don't print locations with line 0.
(location_caret): Simplify.
2019-10-24 18:00:43 +02:00
Akim Demaille
76597d01f3 build: reenable -Wtype-limits
See https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00061.html
to https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00073.html.

Paul Eggert's changes in gnulib do fix the issue for modern GCCs (7,
8, 9) on macOS.  Unfortunately these warnings are back on the
CI (GNU/Linux) with GCC 4.6, 4.7, (not 4.8) and 4.9.

Disable the warning locally.

* configure.ac (warn_common, warn_tests): Remove -Wtype-limits.
* src/system.h (IGNORE_TYPE_LIMITS_BEGIN, IGNORE_TYPE_LIMITS_END): New.
* src/InadequacyList.c, src/parse-gram.c, src/parse-gram.y,
* src/symtab.c: Use it.
2019-10-24 08:50:14 +02:00
Akim Demaille
bc5efb558d build: remove dmalloc support
Today sanitizers are a better alternative.

* m4/dmalloc.m4: Remove.
* configure.ac, src/system.h: Adjust.
2019-10-24 07:22:17 +02:00
Yuichiro Kaneko
3945beb1d2 style: update comment in reader.c
rrhs and rlhs were removed by b2ed6e5826.

* src/reader.c (packgram): Update comment.
2019-10-23 08:32:06 +02:00
Akim Demaille
048730c691 style: pacify syntax-check
* doc/.gitignore, src/complain.c, src/getargs.c,
* src/output.c: here.
2019-10-22 10:40:12 +02:00
Akim Demaille
ec64a0bc7e main: also free memory on errors
* src/derives.c (derives_free): Beware of NULL.
* src/main.c (main): Let the 'finish' label include memory release.
2019-10-21 17:18:32 +02:00
Akim Demaille
d76ea5ce06 style: reduce scope in derives
* src/derives.c: here.
And prefer prefix to postfix increment.
2019-10-21 17:18:32 +02:00
Akim Demaille
97d6da0c5b parser: clarify version checking
* src/parse-gram.y: Use the same conventions for gnulib as elsewhere:
<header.h>.
(str_to_version): New.
(handle_require): Use it.
Prefer < to >.
2019-10-20 17:57:28 +02:00
Paul Eggert
693e69f289 regen 2019-10-17 11:51:20 -07:00
Paul Eggert
8a4ec5d4e4 bison: check for int overflow in token numbers
* src/symtab.c: Include intprops.h
(symbol_user_token_number_set): Don’t allow user_token_number ==
INT_MAX because too much other code adds 1 to the user token number.
(symbols_token_translations_init): Complain on integer overflow
instead of indulging in undefined behavior.
2019-10-17 11:51:20 -07:00
Paul Eggert
052215a138 bison: check for int overflow when scanning
* src/scan-gram.l: Include errno.h, for errno.
(scan_integer, handle_syncline): Check for integer overflow.
* tests/input.at (too-large.y): Adjust to match new diagnostics.
2019-10-17 11:51:20 -07:00
Paul Eggert
15c1b913cf bison: check version numbers more carefully
* src/parse-gram.y: Include intprops.h.
(handle_require): Don’t indulge in undefined behavior if the major
or minor number is out of range.  Instead, check that the
resulting value is nonnegative, fits in int, and that the minor
number is less than 100.  Also, check that a number was parsed.
2019-10-17 11:51:20 -07:00
Akim Demaille
d9d37a1196 i18n: don't push too hard for '…'
Suggested by Paul Eggert.

* src/location.c (ellipsis): Clarify comment for translators.
2019-10-12 10:43:53 +02:00
Akim Demaille
c3db1394a1 regen 2019-10-11 08:52:04 +02:00
Akim Demaille
2c66acfec0 diagnostics: prefer "…" to "..." if the locale supports it
* src/location.c (ellipsis, ellipsize): New.
Use them.
2019-10-10 21:57:50 +02:00
Paul Eggert
5463291a91 Use “least” types for integers in Yacc tables
This changes the Yacc skeleton to use “least” integer types to
keep tables smaller on some platforms, which should lessen cache
pressure.  Since Bison uses the Yacc skeleton, it follows suit.
* data/skeletons/yacc.c: Include limits.h and stdint.h if this
seems to be needed.
(yytype_uint8, yytype_int8, yytype_uint16, yytype_int16):
If available, use GCC predefined macros __INT_MAX__ etc. to select
a “least” type, as this avoids namespace hassles.  Otherwise, if
available fall back on selecting a “least” type via the C99 macros
INT_MAX, INT_LEAST8_MAX, etc.  Otherwise, fall further back on one of
the builtin C99 types signed char, short, and int.  Make sure that
any selected type promotes to int.  Ignore any macros YYTYPE_INT16,
YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8 defined by the user.
(ptrdiff_t, PTRDIFF_MAX): Simplify in the light of the above.
(yytype_uint8, yytype_uint16): Do not assume that unsigned char
and unsigned short promote to int, as this isn’t true on some
platforms (e.g., TI TMS320C55x).
* src/parse-gram.y (YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16)
(YYTYPE_UINT8): Remove, as these are no longer effective.
2019-10-07 00:08:19 -07:00
Akim Demaille
58302c6079 regen 2019-10-06 17:48:51 +02:00
Akim Demaille
9e6c5328d3 diagnostics: also show suggested %empty
* src/reader.c (grammar_rule_check_and_complete): Suggest to add %empty.
* tests/actions.at, tests/diagnostics.at: Adjust expectations.
2019-10-06 12:15:12 +02:00
Akim Demaille
fec13ce2db diagnostics: sort symbols per location
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird.  In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file.  For instance, checking
symbols is done on the list of symbols sorted by tag:

    foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~
    foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~

Let's sort them by location instead:

    foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~
    foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~

* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
2019-10-06 09:54:25 +02:00
Akim Demaille
be3cf406af diagnostics: suggest fixes for undeclared symbols
From

    input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
         1 | %printer {} foo baz
           |                 ^~~

to

    input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
        1 | %printer {} foo baz
          |                 ^~~
          |                 bar

* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
2019-10-06 09:54:25 +02:00
Akim Demaille
126c4622de style: isolate complain_symbol_undeclared
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.
2019-10-06 09:54:25 +02:00