Commit Graph

2641 Commits

Author SHA1 Message Date
Akim Demaille
04904e4d28 regen 2020-04-01 08:31:48 +02:00
Akim Demaille
d7f39ac507 regen 2020-04-01 08:31:48 +02:00
Akim Demaille
f3c18c8e80 yacc.c: also define a symbol number for the empty token
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:

    input.c: In function 'yyparse':
    input.c:1107:7: error: conversion to 'unsigned int' from 'int'
                           may change the sign of the result
                           [-Werror=sign-conversion]
     1107 |   yyn += yytoken;
          |       ^~
    input.c:1107:10: error: conversion to 'int' from 'unsigned int'
                            may change the sign of the result
                            [-Werror=sign-conversion]
     1107 |   yyn += yytoken;
          |          ^~~~~~~
    input.c:1108:47: error: comparison of integer expressions of
                            different signedness:
                            'yytype_int8' {aka 'const signed char'} and
                            'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
                            [-Werror=sign-compare]
     1108 |   if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
          |                                               ^~
    input.c:702:25: error: operand of ?: changes signedness from 'int'
                           to 'unsigned int' due to unsignedness of
                           other operand [-Werror=sign-compare]
      702 | #define YYEMPTY         (-2)
          |                         ^~~~
    input.c:1220:33: note: in expansion of macro 'YYEMPTY'
     1220 |   yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
          |                                 ^~~~~~~
    input.c:1220:41: error: unsigned conversion from 'int' to
                            'unsigned int' changes value
                            from '-2' to '4294967294'
                            [-Werror=sign-conversion]
     1220 |   yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
          |                                         ^

Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits.  We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".

* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.

* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.

* tests/regression.at: Adjust.
2020-04-01 08:31:48 +02:00
Akim Demaille
00c80bc96c yacc.c: use yysymbol_type_t instead of int for yytoken
Now that we have a proper type for internal symbol numbers, let's use
it.  More code needs conversion, e.g., printers and destructors, but
they are shared with glr.c, which is not ready yet for this change.

It will also help us deal with warnings such as (GCC9 on GNU/Linux):

    input.c: In function 'int yyparse()':
    input.c:475:37: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
      475 |   (0 <= (YYX) && (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYSYMBOL_YYUNDEF)
          |    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    input.c:1024:17: note: in expansion of macro 'YYTRANSLATE'
     1024 |       yytoken = YYTRANSLATE (yychar);
          |                 ^~~~~~~~~~~

* data/skeletons/yacc.c (yytranslate, yysymbol_name)
(yyparse_context_t, yyexpected_tokens, yypstate_expected_tokens)
(yysyntax_error_arguments):
Use yysymbol_type_t instead of int.
2020-04-01 08:31:48 +02:00
Akim Demaille
f62f1db298 regen 2020-04-01 08:31:48 +02:00
Akim Demaille
50517d578c regen 2020-03-28 15:13:27 +01:00
Akim Demaille
17a9542c4f regen 2020-03-28 15:13:27 +01:00
Akim Demaille
4192de1f41 bison: avoid using yysyntax_error_arguments
* src/parse-gram.y (yyreport_syntax_error): Use yyparse_context_token
and yyexpected_tokens.
2020-03-28 15:13:27 +01:00
Akim Demaille
84b1972c96 yacc.c: use negative numbers for errors in auxiliary functions
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2.  Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.

Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user.  In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.

* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
2020-03-23 07:02:36 +01:00
Akim Demaille
1079595b2a style: reduce length of private constant
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/yacc.c
(YYERROR_VERBOSE_ARGS_MAXIMUM): Rename as...
(YYARGS_MAX): this.
* src/parse-gram.y (YYERROR_VERBOSE_ARGS_MAXIMUM): Rename as...
(ARGS_MAX): this.
2020-03-23 07:02:34 +01:00
Akim Demaille
466fb66578 regen 2020-03-17 19:21:24 +01:00
Akim Demaille
951da960e6 merge branch 'maint'
* upstream/maint:
  maint: post-release administrivia
  version 3.5.3
  news: update for 3.5.3
  yacc.c: make sure we properly propagated the user's number for error
  diagnostics: don't crash because of repeated definitions of error
  style: initialize some struct members
  diagnostics: beware of zero-width characters
  diagnostics: be sure to close the styling when lines are too short
  muscles: fix incorrect decoding of $
  code: be robust to reference with invalid tags
  build: fix typo
  doc: update recommandation for libtextstyle
  style: comment changes
  examples: use consistently the GFDL header for readmes
  style: remove useless declarations
  typo: succesful -> successful
  README: point to tests/bison, and document --trace
  gnulib: update
  maint: post-release administrivia
2020-03-08 10:13:16 +01:00
Akim Demaille
cfcd823e16 diagnostics: don't crash because of repeated definitions of error
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:

    The token error shall be reserved for error handling. The name
    error can be used in grammar rules. It indicates places where the
    parser can recover from a syntax error. The default value of error
    shall be 256. Its value can be changed using a %token
    declaration. The lexical analyzer should not return the value of
    error.

I think this feature is useless, the user should not have to deal with
that.  The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number.  In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers.  So
this feature is useless.

Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition.  At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.

Rather, the location of the first user definition of "error" should
become its defining location.

Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html

* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
2f02d9beae style: initialize some struct members
* src/symtab.c (sym_content_new): Initialize all the location members.
Not needed by the code, but disturbing values when using a debugger.
2020-03-08 08:10:11 +01:00
Akim Demaille
b638603477 diagnostics: beware of zero-width characters
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines.  This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.

* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
e21ff47f5d diagnostics: be sure to close the styling when lines are too short
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
    -    4 | %token foo <error>123
    +    4 | %token foo <error>123</error>
           |            <error>^~~~~~</error>

* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
2020-03-07 10:01:52 +01:00
Akim Demaille
b82b387da9 muscles: fix incorrect decoding of $
Bug introduced in 458171e6df.
https://lists.gnu.org/archive/html/bison-patches/2013-11/msg00009.html

Reported by Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00010.html

* src/muscle-tab.c (COMMON_DECODE): "$" is coded as "$][", not "$[][".
* tests/input.at ("%define" enum variables): Check that case.
2020-03-07 07:45:10 +01:00
Akim Demaille
641e326303 code: be robust to reference with invalid tags
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.

* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
2020-03-06 17:29:26 +01:00
Akim Demaille
666df338a7 style: comment changes
* src/symtab.h, src/lr0.c: here.
2020-03-06 08:32:03 +01:00
Akim Demaille
b493c173c9 style: remove useless declarations
* src/reader.h: Don't duplicate what parse-gram.h already exposes.
* src/lr0.h: Remove useless include.
2020-03-06 08:30:21 +01:00
Adrian Vogelsgesang
aab3feb5a1 typo: succesful -> successful
* data/skeletons/lalr1.cc: here
* etc/bench.pl.in: here
* src/location.c: and here.
2020-03-06 08:29:58 +01:00
Akim Demaille
30d01b21e7 c++: minor fixes
Address compiler warnings such as

    warning: declaration of 'yyla' shadows a member of 'yy::parser::context' [-Wshadow]

* data/skeletons/lalr1.cc (context): Don't use the same names for
variables and members.
Use foo_ for private members, as in parser.
Also, use the + trick in array accesses to please ICC and provide it
with an int.
2020-02-27 21:31:09 +01:00
Adrian Vogelsgesang
368fcf0af5 typo: succesful -> successful
* data/skeletons/lalr1.cc: here
* etc/bench.pl.in: here
* src/location.c: here
* tests/calc.at: and here
2020-02-27 18:10:39 +01:00
Akim Demaille
296660304c style: comment changes
* src/symtab.h, src/lr0.c: here.
2020-02-23 08:25:53 +01:00
Akim Demaille
323731ff74 style: avoid using 'this' as an identifier
LLDB insists on parsing 'this' as a C++ keyword, even when debugging a
C program.

* src/symtab.c: Please the dictator.
2020-02-23 08:25:53 +01:00
Akim Demaille
6f7f949708 style: remove useless declarations
* src/reader.h: Don't duplicate what parse-gram.h already exposes.
* src/lr0.h: Remove useless include.
2020-02-23 08:25:53 +01:00
Akim Demaille
7a28659495 regen 2020-02-15 08:28:57 +01:00
Victor Morales Cayuela
e09a72eeb0 diagnostics: modernize the display of submessages
Since Bison 2.7, output was indented four spaces for explanatory
statements.  For example:

    input.y:2.7-13: error: %type redeclaration for exp
    input.y:1.7-11:     previous declaration

Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:

    input.y:2.7-13: error: %type redeclaration for exp
        2 | %type <float> exp
          |       ^~~~~~~
    input.y:1.7-11: note: previous declaration
        1 | %type <int> exp
          |       ^~~~~

* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
2020-02-15 08:28:40 +01:00
Akim Demaille
cc3760ef51 news: 3.5.2
* NEWS: Update.
2020-02-13 18:25:11 +01:00
Akim Demaille
8637f2c7d6 build: pacify syntax-check
* src/complain.c: Fix indentation.
* cfg.mk: Using strcmp is ok in the tests.
Test cases and examples don't need Bison's PO support.
2020-02-10 20:46:56 +01:00
Akim Demaille
6946149701 regen 2020-02-10 20:42:23 +01:00
Akim Demaille
18a7cfc7cf java: make the syntax error format string translatable
The error format should be translated, but contrary to the case of
C/C++, we cannot just depend on macros to adapt on the
presence/absence of '_'.  Let's consider that the message format is to
be translated iff there are some internationalized tokens.

* src/output.c (prepare_symbol_names): Define b4_has_translations.
* data/skeletons/java.m4 (b4_trans): New.
* data/skeletons/lalr1.java: Use it to emit translatable or not the
format string.
2020-02-08 11:24:53 +01:00
Akim Demaille
52db24b2bc java: add support for parse.error=detailed
In Java there is no need for N_ and yytranslate_.  So instead of
hard-coding the use of N_ in the table of the symbol names, rely on
b4_symbol_translate.

* src/output.c (prepare_symbol_names): Use b4_symbol_translate instead
of N_.
* data/skeletons/c.m4 (b4_symbol_translate): New.
* data/skeletons/lalr1.java (yysymbolName): New.
Use it.
* examples/java/calc/Calc.y: Use parse.error=detailed.
* tests/calc.at: Check parse.error=detailed.
2020-02-08 11:24:53 +01:00
Akim Demaille
e6b0612f91 bison: pretend to 3.6 already
* src/parse-gram.y: here.
2020-01-26 13:29:18 +01:00
Akim Demaille
81849520cd regen 2020-01-23 08:30:51 +01:00
Akim Demaille
fc2191f137 diagnostics: modernize bison's syntax errors
We used to display the unexpected token first:

    $ bison foo.y
    foo.y:1.8-13: error: syntax error, unexpected %token, expecting character literal or identifier or <tag>
        1 | %token %token
          |        ^~~~~~

GCC uses a different format:

    $ gcc-mp-9 foo.c
    foo.c:1:5: error: expected identifier or '(' before ')' token
        1 | int()()()
          |     ^

and so does Clang:

    $ clang-mp-9.0 foo.c
    foo.c:1:5: error: expected identifier or '('
    int()()()
        ^
    1 error generated.

They display the unexpected token last (or not at all).  Also, they
don't waste width with "syntax error".  Let's try that.  It gives, for
the same example as above:

    $ bison foo.y
    foo.y:1.8-13: error: expected character literal or identifier or <tag> before %token
        1 | %token %token
          |        ^~~~~~

* src/complain.h, src/complain.c (syntax_error): New.
* src/parse-gram.y (yyreport_syntax_error): Use it.
2020-01-23 08:30:28 +01:00
Akim Demaille
6fb362c87a regen 2020-01-23 08:26:33 +01:00
Akim Demaille
46ab1d0cbe diagnostics: report syntax errors in color
* src/parse-gram.y (parse.error): Set to 'custom'.
(yyreport_syntax_error): New.
* data/bison-default.css (.expected, .unexpected): New.
* tests/diagnostics.at: Adjust.
2020-01-23 08:26:33 +01:00
Akim Demaille
f54a5b303b regen 2020-01-23 08:26:33 +01:00
Akim Demaille
2cc361387c diagnostics: translate bison's own tokens
As a test case, support translations in Bison itself.

* src/parse-gram.y: Mark the translatable tokens.
While at it, use clearer names.
* tests/input.at: Adjust expectations.
2020-01-23 08:26:28 +01:00
Akim Demaille
e6d1289f4a diagnostics: handle -fno-caret in the called functions
Don't force callers of location_caret to have to deal with flags that
disable it.

* src/location.h, src/location.c (location_caret)
(location_caret_suggestion): Early return if disabled.
* src/complain.c: Simplify.
2020-01-22 22:31:41 +01:00
Akim Demaille
6ada985ff3 parsers: issue tname with i18n markup
Some users would like to avoid having to "parse" the *.y file to find
the strings to translate.  Let's issue the translatable tokens with N_
to allow "parsing" the generated parsers instead.

See
https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00015.html

* src/output.c (prepare_symbol_names): Issue symbol_names with N_()
markup.
2020-01-19 21:23:11 +01:00
Akim Demaille
1db962716a regen 2020-01-19 21:23:11 +01:00
Akim Demaille
9096955fba parsers: support translatable token aliases
In addition to

    %token NUM "number"

accept

    %token NUM _("number")

in which case the token will be translated in error messages.
Do not use _() in the output if there are no translatable tokens.

* src/symtab.h, src/symtab.c (symbol): Add a 'translatable' member.
* src/parse-gram.y (TSTRING): New token.
(string_as_id.opt): Replace with...
(alias): this.
Use it.
* src/scan-gram.l (SC_ESCAPED_TSTRING): New start conditions, to match
TSTRINGs.
* src/output.c (prepare_symbols): Define b4_translatable if there are
translatable strings.

* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c (yytnamerr): Receive b4_translatable, and use it.
2020-01-19 21:23:11 +01:00
Akim Demaille
d9df62bfcd yacc.c: escape trigraphs in detailed parse.error
* src/output.c (escape_trigraphs, xescape_trigraphs): New.
(prepare_symbol_names): Use it.
* tests/regression.at: Check the handling of trigraphs with
parse.error = detailed.
2020-01-19 21:22:41 +01:00
Akim Demaille
adac9a17f0 regen 2020-01-19 14:51:14 +01:00
Akim Demaille
3b4b157369 bison: use detailed error messages
* #: .
2020-01-19 14:51:14 +01:00
Akim Demaille
5c332d70d7 regen 2020-01-19 14:51:14 +01:00
Akim Demaille
f443673450 yacc.c: add support for parse.error detailed
"detailed" error messages are almost like "verbose", except that we
don't double escape them, they don't get inner quotes, we don't use
yytnamerr, and we hide the table.

"custom" is exposed with the "detailed" tokens, not the "verbose"
ones: they are not double-quoted.

Because there's a risk that some people use yytname even without
"verbose", let's keep yytname (instead of yys_name) in "simple"
parse.error.

* src/output.c (prepare_symbol_names): Be ready to output symbol names
unquoted.
(prepare_symbol_names): Output both the old tname table, and the new
symbol_names one.
* data/skeletons/bison.m4: Accept 'detailed'.
* data/skeletons/yacc.c: When parse.error is 'detailed', don't emit
yytname and yytnamerr, just yysymbol_name with the table inside.
* tests/calc.at: Adjust.
2020-01-19 14:51:14 +01:00
Akim Demaille
8e6233353f c: use yysymbol_name in traces
Only parse.error verbose and simple will get the original yytname: the
other options will rely on a different table.  So let's move on top of
the yysymbol_name function.

* data/skeletons/c.m4 (yy_symbol_print): Use yysymbol_name.
* data/skeletons/glr.c (yytokenName): Rename as...
(yysymbol_name): this.
The change of naming scheme is unfortunate, but it's definitely glr.c
which is "wrong".
2020-01-19 14:51:14 +01:00