And also, remove the incorrect indentation of these comments:
- /* YYR2[YYN] -- Number of symbols on the right hand side of rule YYN. */
+/* YYR2[RULE-NUM] -- Number of symbols on the right-hand side of rule RULE-NUM. */
static const yytype_int8 yyr2[] =
{
0, 2, 4, 0, 2, 1, 1, 1, 3, 2,
I don't remember why this indentation was added (in
0991e29b75), but it seems wrong,
at least for yacc.c. I suspect this was done with lalr1.cc (where
this is embeded in the class definition, so it should be indented),
but today lalr1.cc uses other routines to output these comments.
* data/skeletons/bison.m4 (b4_integral_parser_tables_map): Improve the
wording of the comments of some tables.
* data/skeletons/c.m4 (b4_integral_parser_table_define): Remove
indentation.
* maint:
c++: shorten the assertions that check whether tokens are correct
c++: don't glue functions together
lalr1.cc: YY_ASSERT should use api.prefix
c++: don't use YY_ASSERT at all if parse.assert is disabled
c++: style: follow the Bison m4 quoting pattern
yacc.c: provide the Bison version as an integral macro
regen
style: make conversion of version string to int public
%require: accept version numbers with three parts ("3.7.4")
yacc.c: fix #definition of YYEMPTY
gnulib: update
doc: fix incorrect section title
doc: minor grammar fixes in counterexamples section
This should have been part of commit "symbols: stop dealing with YYEMPTY
as b4_symbol(-2, ...)" (cd40ec9526).
Give names to all the special symbols: "eof", "error" and "undef".
* data/skeletons/bison.m4 (b4_symbol): Let `b4_symbol(eof, ...)` mean
`b4_symbol(0, ...)`, `b4_symbol(error, ...)` mean `b4_symbol(1, ...)`,
and , `b4_symbol(undef, ...)` mean `b4_symbol(2, ...)`..
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c:
Prefer symbols to numbers.
Currently `b4_percent_define_ifdef([foo])` assigns a default value to
`foo` when invoked. As a consequence, skeletons such as lalr1.d
cannot specify their specific default values: `foo` was defined in
bison.m4.
Instead, provide `foo` with a default value when `b4_foo_if` is
invoked.
I could not measure a runtime difference between both cases.
* data/skeletons/bison.m4 (_b4_percent_define_define): New.
Helps getting rid of spurious indentation that resulted in spurious
white space in the output.
(b4_percent_define_if_define): Move the definition to...
(_b4_percent_define_if_define): when the defined macros is called.
For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse. Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value. This also makes the call to the
parsing function independent of whether a given start-symbol is typed.
For instance, if the grammar file contains:
%type <int> expression
%start input expression
(so "input" is valueless) we get
typedef struct
{
int yystatus;
} yyparse_input_t;
yyparse_input_t yyparse_input (void);
typedef struct
{
int yyvalue;
int yystatus;
} yyparse_expression_t;
yyparse_expression_t yyparse_expression (void);
This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT. They do that after having exported the
start-symbol's value (if it is typed):
switch (yyn)
{
case 1: /* $accept: YY_EXPRESSION expression $end */
{ ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
break;
case 2: /* $accept: YY_INPUT input $end */
{ YYACCEPT; }
break;
I have tried several ways to deal with termination, and this is the
one that appears the best one to me. It is also the most natural.
* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.
* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.
* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
The name "defines" is incorrect, the generated file contains far more
than just #defines.
* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.
* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
Instead of generating switch statements with numbers, let's use the
symbol kinds. Not only is this more readable, it also makes reading
diff easier, as a change in symbol numbers won't have such a large
effect on the implementation of symbol actions.
* data/skeletons/bison.m4 (_b4_symbol_case): Use the symbol kind
rather than its number.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
In Bison 3.6.2, the comments with brackets lose their brackets, for
improper m4 quotation.
* data/skeletons/bison.m4 (b4_gsub): New.
* data/skeletons/c-like.m4 (_b4_comment): Use it.
* tests/m4.at: Check b4_gsub.
We will not keep YYERRCODE anyway, it causes backward compatibility
issues. So as a first step, let all the skeletons use that name,
until we have a better one.
* data/skeletons/bison.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c, doc/bison.texi, tests/headers.at,
* tests/input.at:
here.
In C/C++, N_ is a no-op. Define it if the user didn't.
Suggested by Frank Heckenbach.
https://lists.gnu.org/r/bug-bison/2020-04/msg00010.html
* src/output.c (prepare_symbol_names): Rename has_translations as
has_translations_flag.
* data/skeletons/bison.m4 (b4_has_translations_if): New.
* data/skeletons/java.m4 (b4_trans): Use it.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/yacc.c
(N_): Provide a default definition.
Currently EOF is handled in an adhoc way, with a #define YYEOF 0 in
the implementation file. As a result, the user has to define her own
EOF token if she wants to use it, which is a pity.
Give the $end token a visible kind name, YYEOF. Except that in C,
where enums are not scoped, we would have collisions between all the
definitions of YYEOFs in the header files, so in C, make it
<api.PREFIX>EOF.
* data/skeletons/c.m4 (YYEOF): Override its name to avoid collisions.
Unless the user already gave it a different name.
* data/skeletons/glr.c (YYEOF): Remove.
Use ]b4_symbol(0, [id])[ instead.
Add support for "pre_epilogue", for glr.cc.
* data/skeletons/glr.cc: Remove dead code (never emitted #undefs).
* data/skeletons/yacc.c
* src/parse-gram.c
* src/reader.c
* src/symtab.c
* tests/actions.at
* tests/input.at
* data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to
$undefined.
(b4_token_visible_if): $undefined has an id.
* src/output.c (prepare_symbol_definitions): Stop lying: $undefined
_is_ a token.
* tests/input.at: Adjust.
There are people out there that do use YYERRCODE (the token kind of
the error token). See for instance
3812012bb7/unixODBC-2.3.2/Drivers/nn/yylex.c.
Currently, YYERRCODE is defined by yacc.c in an adhoc way as a #define
in the *.c file only. It belongs with the other token kinds.
YYERRCODE is not a nice name, it does not fit in our naming scheme.
YYERROR would be more logical, but it collides with the YYERROR macro.
Shall we keep the same name in all the skeletons? Besides, to avoid
collisions in C, we need to apply the api prefix: YYERRCODE is
actually <PREFIX>ERRCODE. This is not needed in the other languages.
* data/skeletons/bison.m4 (b4_symbol_token_kind): New.
Map the error token to "YYERRCODE".
* data/skeletons/yacc.c (YYERRCODE): Don't define it, it's handled by...
* src/output.c (prepare_symbol_definitions): this.
* tests/input.at (Redefining the error token): Check it.
* data/skeletons/bison.m4 (b4_symbol_kind): Dispatch on the UNDEF
token number rather than its name.
* data/skeletons/c++.m4, data/skeletons/c.m4, data/skeletons/java.m4:
Comment changes.
* data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java:
Refer to the "kind" of a symbol, not its "type", where appropriate.
Currently we define enumerators only for symbols that have an
identifier. That rules out tokens such as '+', and nonterminals such
as foo-bar and foo.bar. As a consequence we are taking chances: the
compiler might compile yysymbol_type_t as too small an integral type
for some symbol codes.
* data/skeletons/bison.m4 (b4_symbol_sid): Forge a unique symbol
identifier for symbols that don't have an ID.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
* doc/bison.texi (Tokens from Literals): Move to code using
%token-table to...
(Decl Summary: %token-table): here.
* data/skeletons/bison.m4: Implement mutual exclusion.
* tests/input.at: Check it.
* doc/local.mk: Be robust to the removal of doc/.
parse.error has more than two possible values.
* data/skeletons/bison.m4 (b4_error_verbose_if, b4_error_verbose_flag):
Remove.
(b4_parse_error_case, b4_parse_error_bmatch): New.
Adjust dependencies.
We used to emit:
/** Token number,to be returned by the scanner. */
static final int NUM = 258;
/** Token number,to be returned by the scanner. */
static final int NEG = 259;
with no space after the comma. Fix that.
* data/skeletons/bison.m4 (b4_token_format): Quote where appropriate.
"detailed" error messages are almost like "verbose", except that we
don't double escape them, they don't get inner quotes, we don't use
yytnamerr, and we hide the table.
"custom" is exposed with the "detailed" tokens, not the "verbose"
ones: they are not double-quoted.
Because there's a risk that some people use yytname even without
"verbose", let's keep yytname (instead of yys_name) in "simple"
parse.error.
* src/output.c (prepare_symbol_names): Be ready to output symbol names
unquoted.
(prepare_symbol_names): Output both the old tname table, and the new
symbol_names one.
* data/skeletons/bison.m4: Accept 'detailed'.
* data/skeletons/yacc.c: When parse.error is 'detailed', don't emit
yytname and yytnamerr, just yysymbol_name with the table inside.
* tests/calc.at: Adjust.
When parse.error is custom, let users define a yyreport_syntax_error
function, and use it.
* data/skeletons/bison.m4 (b4_error_verbose_if): Accept 'custom'.
* data/skeletons/yacc.c: Implement it.
* examples/c/calc/calc.y: Experiment with it.
Bison used to feature %raw, documented as follows:
@item %raw
The output file @file{@var{name}.h} normally defines the tokens with
Yacc-compatible token numbers. If this option is specified, the
internal Bison numbers are used instead. (Yacc-compatible numbers start
at 257 except for single character tokens; Bison assigns token numbers
sequentially for all tokens starting at 3.)
Unfortunately, as far as I can tell, it never worked: token numbers
are indeed changed in the generated tables (from external token number
to internal), yet the code was still applying the mapping from
external token numbers to internal token numbers.
This commit reintroduces the feature as it was expected to be.
* data/skeletons/bison.m4 (b4_token_format): When api.token.raw is
enabled, use the internal token number.
* data/skeletons/yacc.c (yytranslate): Don't emit if api.token.raw is
enabled.
(YYTRANSLATE): Adjust.
Reported by Balázs Scheidler.
* data/skeletons/c.m4 (b4_location_type_define): Use api.location.type
if defined.
* doc/bison.texi: Document it.
* tests/local.at (AT_C_IF, AT_LANG_CASE): New.
Support Span in C.
* tests/calc.at (Span): Convert it to be usable in C and C++.
Check api.location.type with yacc.c and glr.c.