* data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java:
Refer to the "kind" of a symbol, not its "type", where appropriate.
The first name is too long. We already have `yypstate`, so
`yypcontext` is ok. We are also migrating to using `*_t` for our
types.
* NEWS, data/skeletons/glr.c, data/skeletons/yacc.c, doc/bison.texi,
* examples/c/bistromathic/parse.y, src/parse-gram.y, tests/local.at:
(yyparse_context_t, yyparse_context_location, yyparse_context_token):
Rename as...
(yypcontext_t, yypcontext_location, yypcontext_token): these.
Now that yacc.c and glr.c both know yysymbol_type_t, convert the
common routines.
* data/skeletons/c.m4 (yydestruct, yy_symbol_value_print)
(yy_symbol_print): Use yysymbol_type_t instead of int.
* data/skeletons/glr.c: Use yySymbol where appropriate.
* data/skeletons/yacc.c (YY_ACCESSING_SYMBOL): New wrapper around
yystos.
Use it.
* tests/local.at (yyreport_syntax_error): Use yysymbol_type_t where
appropriate.
This triggers warnings with several compilers. For instance ICC fills
the logs with pages and pages of
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
And so does G++9 when compiling yacc.c's (C) output
input.c:545:8: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
input.c:545:15: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
Clang++ is no exception
input.c:545:8: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
input.c:545:15: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
At some point we could use yysymbol_type_t's enumerators to define
yytranslate. Meanwhile...
* data/skeletons/yacc.c (yytranslate): Use the original integral type
to define it.
(YYTRANSLATE): Cast the result into yysymbol_type_t.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
Now that we have a proper type for internal symbol numbers, let's use
it. More code needs conversion, e.g., printers and destructors, but
they are shared with glr.c, which is not ready yet for this change.
It will also help us deal with warnings such as (GCC9 on GNU/Linux):
input.c: In function 'int yyparse()':
input.c:475:37: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
475 | (0 <= (YYX) && (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYSYMBOL_YYUNDEF)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
input.c:1024:17: note: in expansion of macro 'YYTRANSLATE'
1024 | yytoken = YYTRANSLATE (yychar);
| ^~~~~~~~~~~
* data/skeletons/yacc.c (yytranslate, yysymbol_name)
(yyparse_context_t, yyexpected_tokens, yypstate_expected_tokens)
(yysyntax_error_arguments):
Use yysymbol_type_t instead of int.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
We could just "inline yysyntax_error_arguments back" in the routines
it was originally extracted from, but I think the code is nicer to
read this way.
* data/skeletons/glr.c (yysyntax_error_arguments): Generate only for
detailed and verbose error messages.
* data/skeletons/yacc.c: Likewise.
* data/skeletons/lalr1.cc (parser::context::yysyntax_error_arguments):
Move as...
(parser::yysyntax_error_arguments_): this.
And only for detailed and verbose error messages.
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html
* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2. Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.
Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user. In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.
* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
In push parsers, when asking for the list of expected tokens at some
point, it makes no sense to build a yyparse_context_t: the yypstate
alone suffices (the only difference being the lookahead). Instead of
forcing the user to build a useless shell around yypstate, let's offer
yypstate_expected_tokens.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00025.html.
* data/skeletons/yacc.c (yypstate): Declare earlier, so that we can
use it for...
(yypstate_expected_tokens): this new function, when in push parsers.
Adjust dependencies.
* examples/c/bistromathic/parse.y: Simplify: use
yypstate_expected_tokens.
Style fixes.
Reduce scopes (reported by Joel E. Denny).
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
Currently pstate_new does not set up its variables, this task is left
to yypush_parse. This was probably to share more code with usual pull
parsers, where these (local) variables are indeed initialized by
yyparse.
But as a consequence yyexpected_tokens crashes at the very beginning
of the parse, since, for instance, the stacks are not even set up.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00001.html.
The fix could have very simple, but the documentation actually makes
it very clear that we can reuse a pstate for several parses:
After yypush_parse returns a status other than YYPUSH_MORE, the
parser instance yyps may be reused for a new parse.
so we need to restore the parser to its pristine state so that (i) it
is ready to run the next parse, (ii) it properly supports
yyexpected_tokens for the next run.
* data/skeletons/yacc.c (b4_initialize_parser_state_variables): New,
extracted from the top of yyparse/yypush_parse.
(yypstate_clear): New.
(yypstate_new): Use it when push parsers are enabled.
Define after the yyps macros so that we can use the same code as the
regular pull parsers.
(yyparse): Use it when push parsers are _not_ enabled.
* examples/c/bistromathic/bistromathic.test: Check the completion on
the beginning of the line.
These macros have been extremely useful when we had to support K&R C,
which we dropped long ago. Now, they merely make the code uselessly
hard to read.
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/yacc.c:
Stop using b4_function_define.
Prefer b4_parse_error_case over the adhoc solution
`m4_case + b4_percent_define_get`. Same for b4_parse_error_bmatch.
* data/skeletons/glr.c: here
* data/skeletons/yacc.c: here
parse.error has more than two possible values.
* data/skeletons/bison.m4 (b4_error_verbose_if, b4_error_verbose_flag):
Remove.
(b4_parse_error_case, b4_parse_error_bmatch): New.
Adjust dependencies.
The C, C++ and D skeletons used to show the stack right after popping
the stack during the reduction. Now that the stack is printed after
reaching a new state, that has become useless:
Entering state 1
Stack now 0 1
Reducing stack by rule 5 (line 83):
$1 = token "number" (1)
-> $$ = nterm exp (1)
Stack now 0
Entering state 8
Stack now 0 8
Remove the "Stack now 0" line.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c:
Here.
Currently, if we have long rules and series of shift, we stack states
without showing stack. Let's be more incremental, and do how the Java
skeleton does.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/yacc.c:
Here.
Adjust test cases.
* tests/torture.at (AT_DATA_STACK_TORTURE): Disable stack traces: this
test produces a very large stack, and showing the stack each time we
shift a token goes quadatric.
The Java skeleton displays
Reading a token:
Next token is token "number" (1)
while the other display
Reading a token: Next token is token "number" (1)
When generating logs in the scanner, the first part is separated from
the second, and the end of the scanner logs have the second part
pasted in. So let's propagate the Java way, but with the colon.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c: Do it.
Adjust test cases and doc.
When building the test cases, emitting code in the epilogue is very
constraining. Let's make it simpler thanks to %code epilogue.
However, I don't want to document this: it is bad style to use it (we
should avoid having too many ways to write the same thing,
TI!MTOWTDI), just put your code in the true epilogue section.
* data/skeletons/glr.c, data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c: Implement support for %code epilogue.
Remove useless comments.
* tests/calc.at, tests/java.at: Simplify.
In addition to
%token NUM "number"
accept
%token NUM _("number")
in which case the token will be translated in error messages.
Do not use _() in the output if there are no translatable tokens.
* src/symtab.h, src/symtab.c (symbol): Add a 'translatable' member.
* src/parse-gram.y (TSTRING): New token.
(string_as_id.opt): Replace with...
(alias): this.
Use it.
* src/scan-gram.l (SC_ESCAPED_TSTRING): New start conditions, to match
TSTRINGs.
* src/output.c (prepare_symbols): Define b4_translatable if there are
translatable strings.
* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c (yytnamerr): Receive b4_translatable, and use it.
"detailed" error messages are almost like "verbose", except that we
don't double escape them, they don't get inner quotes, we don't use
yytnamerr, and we hide the table.
"custom" is exposed with the "detailed" tokens, not the "verbose"
ones: they are not double-quoted.
Because there's a risk that some people use yytname even without
"verbose", let's keep yytname (instead of yys_name) in "simple"
parse.error.
* src/output.c (prepare_symbol_names): Be ready to output symbol names
unquoted.
(prepare_symbol_names): Output both the old tname table, and the new
symbol_names one.
* data/skeletons/bison.m4: Accept 'detailed'.
* data/skeletons/yacc.c: When parse.error is 'detailed', don't emit
yytname and yytnamerr, just yysymbol_name with the table inside.
* tests/calc.at: Adjust.
Currently we get warnings with GCC 4.8 when running the
maintainer-check-g++ tests:
143. skeletons.at:85: testing Installed skeleton file names ...
../../tests/skeletons.at:120: COLUMNS=1000; export COLUMNS; bison --color=no -fno-caret --skeleton=yacc.c -o input-cmd-line.c input-cmd-line.y
../../tests/skeletons.at:121: $CC $CFLAGS $CPPFLAGS $LDFLAGS -o input-cmd-line input-cmd-line.c $LIBS
stderr:
input-cmd-line.c: In function 'int yysyntax_error(long int*, char**, const yyparse_context_t*)':
input-cmd-line.c:977:52: error: conversion to 'int' from 'long int' may alter its value [-Werror=conversion]
YYSIZEOF (yyarg) / YYSIZEOF (*yyarg));
^
cc1plus: all warnings being treated as errors
stdout:
../../tests/skeletons.at:121: exit code was 1, expected 0
and
429. calc.at:823: testing Calculator parse.error=custom %locations api.prefix={calc} ...
../../tests/calc.at:823: COLUMNS=1000; export COLUMNS; bison --color=no -fno-caret -Wno-deprecated -o calc.c calc.y
../../tests/calc.at:823: $CC $CFLAGS $CPPFLAGS $LDFLAGS -o calc calc.c $LIBS
stderr:
calc.y: In function 'int yyreport_syntax_error(const yyparse_context_t*)':
calc.y:157:58: error: conversion to 'int' from 'long unsigned int' may alter its value [-Werror=conversion]
int n = yysyntax_error_arguments (ctx, arg, sizeof arg / sizeof *arg);
^
cc1plus: all warnings being treated as errors
stdout:
../../tests/calc.at:823: exit code was 1, expected 0
We could use a cast to avoid the warning, but it becomes too
cluttered. We can also use YYPTRDIFF_T, but that forces the user to
use YYPTRDIFF_T too, although this is an array of tokens, which is
limited by YYNTOKENS, an int. So let's completely avoid this warning.
* data/skeletons/yacc.c, tests/local.at (yyreport_syntax_error): Avoid
relying on sizeof to compute the array capacity.
Enhance the calculator tests: show that passing arguments to yyerror
works.
* tests/calc.at: Add a new parse-param, nerrs, which counts the number
of syntax errors in a run.
* tests/local.at: Adjust to handle the new 'nerrs' argument, when
present.
The custom error reporting function show sees the user's additional
arguments. Let's experiment with passing them as arguments to
yyreport_syntax_error, but maybe storing them in the context would be
a bettter alternative.
* data/skeletons/yacc.c (yyreport_syntax_error): Handle the
parse-params.
* tests/calc.at, tests/local.at: Adjust.
Provide users with a means to query for the currently allowed tokens.
Could be used for autocompletion for instance.
* data/skeletons/yacc.c (yyexpected_tokens): New, extracted from
yysyntax_error_arguments.
* examples/c/calc/calc.y (PRINT_EXPECTED_TOKENS): New.
Use it.
When parse.error is custom, let users define a yyreport_syntax_error
function, and use it.
* data/skeletons/bison.m4 (b4_error_verbose_if): Accept 'custom'.
* data/skeletons/yacc.c: Implement it.
* examples/c/calc/calc.y: Experiment with it.
That allows users to cover more cases, such as easily filtering some
arguments they don't want to expose. But they now have to call
yysymbol_name explicitly.
* data/skeletons/yacc.c (yysyntax_error_arguments, yysyntax_error):
Deal with symbol numbers instead of symbol names.
Isolate a function that returns the list of expected and unexpected
tokens. It will be exposed to users willing to customize their error
messages.
* data/skeletons/yacc.c (yyparse_context_t): New.
(yyerror_message_arguments): New, extracted from yysyntax_error.