Now that yacc.c and glr.c both know yysymbol_type_t, convert the
common routines.
* data/skeletons/c.m4 (yydestruct, yy_symbol_value_print)
(yy_symbol_print): Use yysymbol_type_t instead of int.
* data/skeletons/glr.c: Use yySymbol where appropriate.
* data/skeletons/yacc.c (YY_ACCESSING_SYMBOL): New wrapper around
yystos.
Use it.
* tests/local.at (yyreport_syntax_error): Use yysymbol_type_t where
appropriate.
Apply the same changes as in yacc.c. Now yySymbol and yysymbol_type_t
are aliases. We will remove the former later, to avoid cluttering
this commit.
* data/skeletons/glr.c: Use b4_declare_symbol_enum.
Use YYSYMBOL_YYEOF etc. where appropriate.
(YYUNDEFTOK, YYTERROR): Remove.
(YYTRANSLATE, yySymbol, yyexpected_tokens, yysyntax_error_arguments):
Adjust.
(yy_accessing_symbol): New.
Use it where appropriate.
This triggers warnings with several compilers. For instance ICC fills
the logs with pages and pages of
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
And so does G++9 when compiling yacc.c's (C) output
input.c:545:8: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
input.c:545:15: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
Clang++ is no exception
input.c:545:8: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
input.c:545:15: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
At some point we could use yysymbol_type_t's enumerators to define
yytranslate. Meanwhile...
* data/skeletons/yacc.c (yytranslate): Use the original integral type
to define it.
(YYTRANSLATE): Cast the result into yysymbol_type_t.
Currently we define enumerators only for symbols that have an
identifier. That rules out tokens such as '+', and nonterminals such
as foo-bar and foo.bar. As a consequence we are taking chances: the
compiler might compile yysymbol_type_t as too small an integral type
for some symbol codes.
* data/skeletons/bison.m4 (b4_symbol_sid): Forge a unique symbol
identifier for symbols that don't have an ID.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
Now that we have a proper type for internal symbol numbers, let's use
it. More code needs conversion, e.g., printers and destructors, but
they are shared with glr.c, which is not ready yet for this change.
It will also help us deal with warnings such as (GCC9 on GNU/Linux):
input.c: In function 'int yyparse()':
input.c:475:37: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
475 | (0 <= (YYX) && (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYSYMBOL_YYUNDEF)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
input.c:1024:17: note: in expansion of macro 'YYTRANSLATE'
1024 | yytoken = YYTRANSLATE (yychar);
| ^~~~~~~~~~~
* data/skeletons/yacc.c (yytranslate, yysymbol_name)
(yyparse_context_t, yyexpected_tokens, yypstate_expected_tokens)
(yysyntax_error_arguments):
Use yysymbol_type_t instead of int.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
* data/skeletons/lalr1.java (yysyntaxErrorArguments): Move it from the
context, to the parser object.
Generate only for detailed and verbose error messages.
* tests/local.at (AT_YYERROR_DEFINE(java)): Use yyexpectedTokens
instead.
We could just "inline yysyntax_error_arguments back" in the routines
it was originally extracted from, but I think the code is nicer to
read this way.
* data/skeletons/glr.c (yysyntax_error_arguments): Generate only for
detailed and verbose error messages.
* data/skeletons/yacc.c: Likewise.
* data/skeletons/lalr1.cc (parser::context::yysyntax_error_arguments):
Move as...
(parser::yysyntax_error_arguments_): this.
And only for detailed and verbose error messages.
Because glr.c shares the same testing routines, we also need to
convert it.
* data/skeletons/glr.c (yyparse_context_token): New.
* tests/local.at (yyreport_syntax_error): here.
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html
* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2. Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.
Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user. In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.
* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
In push parsers, when asking for the list of expected tokens at some
point, it makes no sense to build a yyparse_context_t: the yypstate
alone suffices (the only difference being the lookahead). Instead of
forcing the user to build a useless shell around yypstate, let's offer
yypstate_expected_tokens.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00025.html.
* data/skeletons/yacc.c (yypstate): Declare earlier, so that we can
use it for...
(yypstate_expected_tokens): this new function, when in push parsers.
Adjust dependencies.
* examples/c/bistromathic/parse.y: Simplify: use
yypstate_expected_tokens.
Style fixes.
Reduce scopes (reported by Joel E. Denny).
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
Currently pstate_new does not set up its variables, this task is left
to yypush_parse. This was probably to share more code with usual pull
parsers, where these (local) variables are indeed initialized by
yyparse.
But as a consequence yyexpected_tokens crashes at the very beginning
of the parse, since, for instance, the stacks are not even set up.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00001.html.
The fix could have very simple, but the documentation actually makes
it very clear that we can reuse a pstate for several parses:
After yypush_parse returns a status other than YYPUSH_MORE, the
parser instance yyps may be reused for a new parse.
so we need to restore the parser to its pristine state so that (i) it
is ready to run the next parse, (ii) it properly supports
yyexpected_tokens for the next run.
* data/skeletons/yacc.c (b4_initialize_parser_state_variables): New,
extracted from the top of yyparse/yypush_parse.
(yypstate_clear): New.
(yypstate_new): Use it when push parsers are enabled.
Define after the yyps macros so that we can use the same code as the
regular pull parsers.
(yyparse): Use it when push parsers are _not_ enabled.
* examples/c/bistromathic/bistromathic.test: Check the completion on
the beginning of the line.
These macros have been extremely useful when we had to support K&R C,
which we dropped long ago. Now, they merely make the code uselessly
hard to read.
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/yacc.c:
Stop using b4_function_define.
The current implementation of parser::context keeps a copy of the
lookahead. This is troublesome since we support move-only types.
Besides, while GCC is happy with the current implementation, Clang
complains that the ctor it needs to build the copy of the lookahead is
not yet available.
461. calc.at:1120: testing Calculator C++ %defines %locations parse.error=verbose %name-prefix "calc" %verbose ...
calc.at:1120: COLUMNS=1000; export COLUMNS; bison --color=no -fno-caret -Wno-deprecated -o calc.cc calc.y
calc.at:1120: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o calc calc.cc calc-lex.cc calc-main.cc $LIBS
stderr:
In file included from calc-lex.cc:7:
calc.hh:351:12: error: instantiation of function 'calc::parser::basic_symbol<calc::parser::by_type>::basic_symbol' required here, but no definition is available [-Werror,-Wundefined-func-template]
struct symbol_type : basic_symbol<by_type>
^
calc.hh:273:7: note: forward declaration of template entity is here
basic_symbol (const basic_symbol& that);
^
calc.hh:351:12: note: add an explicit instantiation declaration to suppress this warning if 'calc::parser::basic_symbol<calc::parser::by_type>::basic_symbol' is explicitly instantiated in another translation unit
struct symbol_type : basic_symbol<by_type>
^
1 error generated.
In file included from calc-main.cc:7:
calc.hh:351:12: error: instantiation of function 'calc::parser::basic_symbol<calc::parser::by_type>::basic_symbol' required here, but no definition is available [-Werror,-Wundefined-func-template]
struct symbol_type : basic_symbol<by_type>
^
calc.hh:273:7: note: forward declaration of template entity is here
basic_symbol (const basic_symbol& that);
^
calc.hh:351:12: note: add an explicit instantiation declaration to suppress this warning if 'calc::parser::basic_symbol<calc::parser::by_type>::basic_symbol' is explicitly instantiated in another translation unit
struct symbol_type : basic_symbol<by_type>
^
1 error generated.
stdout:
calc.at:1120: exit code was 1, expected 0
461. calc.at:1120: 461. Calculator C++ %defines %locations parse.error=verbose %name-prefix "calc" %verbose (calc.at:1120): FAILED (calc.at:1120)
* data/skeletons/lalr1.cc (context::yyla_): Make it a const-ref.
Move the implementation out of the declaration.
Address compiler warnings such as
warning: declaration of 'yyla' shadows a member of 'yy::parser::context' [-Wshadow]
* data/skeletons/lalr1.cc (context): Don't use the same names for
variables and members.
Use foo_ for private members, as in parser.
Also, use the + trick in array accesses to please ICC and provide it
with an int.
* data/skeletons/lalr1.cc: added support here
* tests/calc.at: added test cases
* tests/local.at: added yyreport_syntax_error implementation
for C++ test cases
Prefer b4_parse_error_case over the adhoc solution
`m4_case + b4_percent_define_get`. Same for b4_parse_error_bmatch.
* data/skeletons/glr.c: here
* data/skeletons/yacc.c: here
We used to emit:
/** Token number,to be returned by the scanner. */
static final int NUM = 258;
/** Token number,to be returned by the scanner. */
static final int NEG = 259;
with no space after the comma. Fix that.
* data/skeletons/bison.m4 (b4_token_format): Quote where appropriate.
* data/skeletons/lalr1.java (Context): Make data members private.
(Context.getLocation): New.
* examples/java/calc/Calc.y, tests/java.at, tests/local.at: Adjust.
* doc/bison.texi (Tokens from Literals): Move to code using
%token-table to...
(Decl Summary: %token-table): here.
* data/skeletons/bison.m4: Implement mutual exclusion.
* tests/input.at: Check it.
* doc/local.mk: Be robust to the removal of doc/.
This reverts commit ebab1ffca8.
This commit removed "useless" initializers, going from
/* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing
STATE-NUM. */
private static final byte yypact_[] = yypact_init ();
private static final byte[] yypact_init ()
{
return new byte[]
{
25, -7, -8, 37, -8, 40, -8, 20, -8, 61,
-8, -8, 3, 9, 51, -8, -8, -2, -2, -2,
-2, -2, -2, -8, -8, -8, 1, 66, 66, 3,
3, 3
};
}
to
/* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing
STATE-NUM. */
private static final byte[] yypact_ =
{
25, -7, -8, 37, -8, 40, -8, 20, -8, 61,
-8, -8, 3, 9, 51, -8, -8, -2, -2, -2,
-2, -2, -2, -8, -8, -8, 1, 66, 66, 3,
3, 3
};
But it turns out that this was on purpose, to work around the 64KB
limitation in JVM methods. It was introduced on the 2008-11-10 by
Di-an Jan in 09ccae9b18: "Work around
Java's ``code too large'' problem for parser tables". See
https://lists.gnu.org/r/help-bison/2008-11/msg00004.html. A real
test, where we would hit the JVM limitation, would be nice.
To avoid further regressions, add comments.
parse.error has more than two possible values.
* data/skeletons/bison.m4 (b4_error_verbose_if, b4_error_verbose_flag):
Remove.
(b4_parse_error_case, b4_parse_error_bmatch): New.
Adjust dependencies.