Currently we display the addresses of the semantic values. Instead,
print the values.
Add support for YY_USE across languages.
* data/skeletons/c.m4, data/skeletons/d.m4 (b4_use): New.
* data/skeletons/bison.m4 (b4_symbol_actions): Use b4_use to be
portable to D.
Add support for %printer, and use it.
* data/skeletons/d.m4: Instead of duplicating what's already in
c-like.m4, include it.
(b4_symbol_action): New.
Differs from the one in bison.m4 in that it uses yyval/yyloc instead
of *yyvaluep and *yylocationp.
* data/skeletons/lalr1.d (yy_symbol_print): Avoid calls to formatting,
just call write directly.
Use the %printer.
* examples/d/calc/calc.y: Specify a printer.
Enable traces when $YYDEBUG is set.
* tests/calc.at: Fix the use of %printer with D.
This macro is not exposed to users, make start it with 'YY_'.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* src/parse-gram.c, tests/actions.at, tests/c++.at, tests/headers.at,
* tests/local.at (YYUSE): Rename as...
(YY_USE): this.
And also, remove the incorrect indentation of these comments:
- /* YYR2[YYN] -- Number of symbols on the right hand side of rule YYN. */
+/* YYR2[RULE-NUM] -- Number of symbols on the right-hand side of rule RULE-NUM. */
static const yytype_int8 yyr2[] =
{
0, 2, 4, 0, 2, 1, 1, 1, 3, 2,
I don't remember why this indentation was added (in
0991e29b75), but it seems wrong,
at least for yacc.c. I suspect this was done with lalr1.cc (where
this is embeded in the class definition, so it should be indented),
but today lalr1.cc uses other routines to output these comments.
* data/skeletons/bison.m4 (b4_integral_parser_tables_map): Improve the
wording of the comments of some tables.
* data/skeletons/c.m4 (b4_integral_parser_table_define): Remove
indentation.
* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
When generating a C parser, YYEMPTY is present in enum yytokentype but
there is no corresponding #define like there is for the other values.
There is a special case for YYEMPTY in b4_token_enums but no
corresponding case in b4_token_defines.
* data/skeletons/c.m4 (b4_token_defines): Do define YYEMPTY.
This should have been part of commit "symbols: stop dealing with YYEMPTY
as b4_symbol(-2, ...)" (cd40ec9526).
Give names to all the special symbols: "eof", "error" and "undef".
* data/skeletons/bison.m4 (b4_symbol): Let `b4_symbol(eof, ...)` mean
`b4_symbol(0, ...)`, `b4_symbol(error, ...)` mean `b4_symbol(1, ...)`,
and , `b4_symbol(undef, ...)` mean `b4_symbol(2, ...)`..
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c:
Prefer symbols to numbers.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
231. conflicts.at:1096: testing Syntax error in consistent error state: glr2.cc ...
tests/conflicts.at:1096: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o input input.cc $LIBS
input.cc: In member function 'YYRESULTTAG glr_stack::yyresolveValue(glr_state*)':
input.cc:2674:36: error: 'yysval' may be used uninitialized in this function [-Werror=uninitialized]
Do not initialize the variable: this way ASAN can really make sure we
do set it to a proper value.
If we initialize it, ASAN would report nothing.
* data/skeletons/c.m4 (YY_IGNORE_MAYBE_UNINITIALIZED_BEGIN): Disable
GCC 4.6's -Wuninitialized.
* data/skeletons/glr2.cc: Disable the warning locally.
* data/skeletons/glr2.cc: Fix some documentation.
Be consistent between class/struct.
(yydoAction, yyresolveAction): Avoid passing yyparser where useless.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
For instance, in the case of Bison's own parser:
- case 40:
+ case 40: /* grammar_declaration: "%code" "identifier" "{...}" */
{
muscle_percent_code_grow ((yyvsp[-1].ID), (yylsp[-1]),
translate_code_braceless ((yyvsp[0].BRACED_CODE), (yylsp[0])),
(yylsp[0]));
code_scanner_last_string_free ();
}
break;
* data/skeletons/c.m4: Modified.
* data/skeletons/d.m4: Modified.
* data/skeletons/java.m4: Modified.
* src/output.c (output_escaped): New.
(quoted_output): Use it, and rename as...
(output_quoted): this.
Adjust dependencies.
(rule_output): New.
(user_actions_output): Use it.
* data/skeletons/c.m4, data/skeletons/d.m4, data/skeletons/java.m4
(b4_case): Add support for $3, an optional comment.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
c.m4 contains a definition of _Noreturn which is modeled after
gnulib's lib/_Noreturn.h. The latter was recently
changed (b61bf2f0e8) to not using
[[noreturn]] at all, because the uses of _Noreturn in gnulib are
sometimes incompatible with the rules of [[noreturn]].
As a result glr.cc started to use _Noreturn in C++, which clang
refuses (all the glr.cc tests currently fail with Clang++).
* data/skeletons/c.m4 (b4_attribute_define): Restore the definition of
_Noreturn as [[noreturn]] in modern C++.
The generated code uses _Noreturn in places where [[noreturn]] is
valid.
Reported by Paul Eggert.
* src/getargs.c: We don't need it anyway, since we use _Noreturn.
* data/skeletons/c.m4: While at it, update the definition of _Noreturn
stolen from gnulib.
For instance test 386, "glr.cc api.value.type={double}":
types.at:366: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o test test.cc $LIBS
stderr:
test.cc: In function 'ptrdiff_t yysplitStack(yyGLRStack*, ptrdiff_t)':
test.cc:490:4: error: 'PTRDIFF_MAX' was not declared in this scope
(PTRDIFF_MAX < SIZE_MAX ? PTRDIFF_MAX : YY_CAST (ptrdiff_t, SIZE_MAX))
^
test.cc:1805:37: note: in expansion of macro 'YYSIZEMAX'
ptrdiff_t half_max_capacity = YYSIZEMAX / 2 / state_size;
^~~~~~~~~
test.cc:490:4: note: suggested alternative: '__PTRDIFF_MAX__'
(PTRDIFF_MAX < SIZE_MAX ? PTRDIFF_MAX : YY_CAST (ptrdiff_t, SIZE_MAX))
^
test.cc:1805:37: note: in expansion of macro 'YYSIZEMAX'
ptrdiff_t half_max_capacity = YYSIZEMAX / 2 / state_size;
^~~~~~~~~
The failing tests are using glr.cc only, which I don't understand, the
problem is rather in glr.c, so I would expect glr.c tests to also fail.
Reported by Bruno Haible.
https://lists.gnu.org/archive/html/bug-bison/2020-05/msg00053.html
* data/skeletons/yacc.c: Move the block that defines
YYPTRDIFF_T/YYPTRDIFF_MAXIMUM, YYSIZE_T/YYSIZE_MAXIMUM, and
YYSIZEOF to...
* data/skeletons/c.m4 (b4_sizes_types_define): Here.
(b4_c99_int_type): Also take care of the #undefinition of short.
* data/skeletons/yacc.c, data/skeletons/glr.c: Use
b4_sizes_types_define.
* data/skeletons/glr.c: Adjust to use YYPTRDIFF_T/YYPTRDIFF_MAXIMUM,
YYSIZE_T/YYSIZE_MAXIMUM.
I have been hesitating a lot before doing it ---after all the user
must not use this kind, so what's the point of showing it in
yytoken_kind_t. And eventually I chose to play it safe with the
typing system and make it possible to use yytoken_kind_t for all the
tokens, even the "empty token".
* data/skeletons/c.m4: Give an id and a tag to YYEMPTY.
(b4_token_enums): Define YYEMPTY.
* data/skeletons/c++.m4 (b4_token_enums): Define YYEMPTY.
* data/skeletons/glr.c, data/skeletons/glr.cc, data/skeletons/yacc.c:
(YYEMPTY): Remove.
Use b4_symbol(-2, id) instead.
Currently EOF is handled in an adhoc way, with a #define YYEOF 0 in
the implementation file. As a result, the user has to define her own
EOF token if she wants to use it, which is a pity.
Give the $end token a visible kind name, YYEOF. Except that in C,
where enums are not scoped, we would have collisions between all the
definitions of YYEOFs in the header files, so in C, make it
<api.PREFIX>EOF.
* data/skeletons/c.m4 (YYEOF): Override its name to avoid collisions.
Unless the user already gave it a different name.
* data/skeletons/glr.c (YYEOF): Remove.
Use ]b4_symbol(0, [id])[ instead.
Add support for "pre_epilogue", for glr.cc.
* data/skeletons/glr.cc: Remove dead code (never emitted #undefs).
* data/skeletons/yacc.c
* src/parse-gram.c
* src/reader.c
* src/symtab.c
* tests/actions.at
* tests/input.at
* data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to
$undefined.
(b4_token_visible_if): $undefined has an id.
* src/output.c (prepare_symbol_definitions): Stop lying: $undefined
_is_ a token.
* tests/input.at: Adjust.
* data/skeletons/bison.m4 (b4_symbol_kind): Dispatch on the UNDEF
token number rather than its name.
* data/skeletons/c++.m4, data/skeletons/c.m4, data/skeletons/java.m4:
Comment changes.
* data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java:
Refer to the "kind" of a symbol, not its "type", where appropriate.
Now that yacc.c and glr.c both know yysymbol_type_t, convert the
common routines.
* data/skeletons/c.m4 (yydestruct, yy_symbol_value_print)
(yy_symbol_print): Use yysymbol_type_t instead of int.
* data/skeletons/glr.c: Use yySymbol where appropriate.
* data/skeletons/yacc.c (YY_ACCESSING_SYMBOL): New wrapper around
yystos.
Use it.
* tests/local.at (yyreport_syntax_error): Use yysymbol_type_t where
appropriate.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
These macros have been extremely useful when we had to support K&R C,
which we dropped long ago. Now, they merely make the code uselessly
hard to read.
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/yacc.c:
Stop using b4_function_define.
In Java there is no need for N_ and yytranslate_. So instead of
hard-coding the use of N_ in the table of the symbol names, rely on
b4_symbol_translate.
* src/output.c (prepare_symbol_names): Use b4_symbol_translate instead
of N_.
* data/skeletons/c.m4 (b4_symbol_translate): New.
* data/skeletons/lalr1.java (yysymbolName): New.
Use it.
* examples/java/calc/Calc.y: Use parse.error=detailed.
* tests/calc.at: Check parse.error=detailed.
Only parse.error verbose and simple will get the original yytname: the
other options will rely on a different table. So let's move on top of
the yysymbol_name function.
* data/skeletons/c.m4 (yy_symbol_print): Use yysymbol_name.
* data/skeletons/glr.c (yytokenName): Rename as...
(yysymbol_name): this.
The change of naming scheme is unfortunate, but it's definitely glr.c
which is "wrong".
We still have a few old C casts in lalr1.cc, let's get rid of them.
Reported by Frank Heckenbach.
Actually, let's monitor all our casts using easy to grep macros.
Let's use these macros to use the C++ standard casts when we are in
C++.
* data/skeletons/c.m4 (b4_cast_define): New.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/stack.hh,
* data/skeletons/yacc.c:
Use it and/or its casts.
* tests/actions.at, tests/cxx-type.at,
* tests/glr-regression.at, tests/headers.at, tests/torture.at,
* tests/types.at:
Use YY_CAST instead of C casts.
* configure.ac (warn_cxx): Add -Wold-style-cast.
* doc/bison.texi: Disable it.
Sun C 5.12 defines __SUNPRO_C to 0x5120 but diagnoses
‘__attribute__ ((__unused__))’. Change the ifdefs to use
the same method as Gnulib in this area.
* data/skeletons/c.m4 (YY_ATTRIBUTE): Remove, since
not all attributes were added in the same compiler version.
(YY_ATTRIBUTE_PURE, YY_ATTRIBUTE_UNUSED):
Use specific GCC version for each attribute.
Pay no attention to __SUNPRO_C.
* tests/headers.at (Several parsers): Tighten tests accordingly.
Oracle Solaris Studio 12.3 (Sun C 5.12 2011/11/16) by default does
not conform to C99; it defines __STDC_VERSION__ to be 199409L, so
the Bison code does not include <stdint.h> (not required by C89
amendment 1) even though this compiler does have <stdint.h>. On
this platform <limits.h> defines INT_LEAST8_MAX (POSIX allows
this) so the skeleton got confused and thought that <stdint.h> had
been included even though it wasn’t.
* data/skeletons/c.m4 (b4_c99_int_type_define) [!__PTRDIFF_MAX__]:
Always include <limits.h>.
(YY_STDINT_H): Define when <stdint.h> was included.
All uses of expressions like ‘defined INT_LEAST8_MAX’ changed to
‘defined YY_STDINT_H’, since Sun C 5.12 <limits.h> defines macros
like INT_LEAST8_MAX but does not declare types like int_least8_t.