* data/skeletons/lalr1.java (yysyntaxErrorArguments): Move it from the
context, to the parser object.
Generate only for detailed and verbose error messages.
* tests/local.at (AT_YYERROR_DEFINE(java)): Use yyexpectedTokens
instead.
We could just "inline yysyntax_error_arguments back" in the routines
it was originally extracted from, but I think the code is nicer to
read this way.
* data/skeletons/glr.c (yysyntax_error_arguments): Generate only for
detailed and verbose error messages.
* data/skeletons/yacc.c: Likewise.
* data/skeletons/lalr1.cc (parser::context::yysyntax_error_arguments):
Move as...
(parser::yysyntax_error_arguments_): this.
And only for detailed and verbose error messages.
Because glr.c shares the same testing routines, we also need to
convert it.
* data/skeletons/glr.c (yyparse_context_token): New.
* tests/local.at (yyreport_syntax_error): here.
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html
* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2. Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.
Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user. In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.
* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
The cost of the file layer is large and makes benchmarks too coarse,
as seen for in following example, first with a file, then with a
literal string:
0. %skeleton "yacc.c" %define parse.lac full
1. %skeleton "yacc-v1.c" %define nofinal %define parse.lac full
2. %skeleton "yacc-v2.c" %define nofinal %define parse.lac full
3. %skeleton "yacc-v3.c" %define nofinal %define parse.lac full
4. %skeleton "yacc.c"
5. %skeleton "yacc-v1.c" %define nofinal
6. %skeleton "yacc-v2.c" %define nofinal
7. %skeleton "yacc-v3.c" %define nofinal
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 32558 ns 32537 ns 21228
BM_y1 32400 ns 32369 ns 21233
BM_y2 33485 ns 33464 ns 20625
BM_y3 32139 ns 32125 ns 21446
BM_y4 31343 ns 31329 ns 21747
BM_y5 31344 ns 31317 ns 22035
BM_y6 31287 ns 31255 ns 22039
BM_y7 31387 ns 31373 ns 22178
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 10642 ns 10634 ns 63601
BM_y1 10657 ns 10654 ns 63625
BM_y2 10441 ns 10432 ns 65957
BM_y3 10558 ns 10554 ns 64546
BM_y4 9521 ns 9516 ns 72011
BM_y5 9179 ns 9157 ns 75028
BM_y6 9360 ns 9356 ns 73770
BM_y7 9365 ns 9359 ns 72609
Of course, at the same time it is less realistic: most users read
files rather that strings, so it might lead to us to pay attention to
costs most people don't see.
* etc/bench.pl.in (&calc_input): Output into a file given as argument.
Output in C syntax.
(&generate_grammar_calc): Use it.
Simplify the grammar: remove operators we don't care about.
Rewrite the scanner to work on a char* instead of a FILE*.
* etc/bench.pl.in (&bench_with_timethese): Also use y$i, as in
&bench_with_gbenchmark.
(&generate_grammar_calc): Don't add a prefix, let the callers do it.
* etc/bench.pl.in (&compiler): New, extracted from...
(&compile): here.
Don't link when using gbm.
(&calc_input): Don't make massive input for micro
benchmarks.
(&generate_grammar_calc): When using gbm, use api.prefix to avoid name
collisions.
Be ready to issue BENCHMARKS instead of a main.
(&bench): Rename as...
(&bench_with_timethese): this.
(&bench_with_gbenchmark): New.
(&bench): New.
Dispatch on these two.
In push parsers, when asking for the list of expected tokens at some
point, it makes no sense to build a yyparse_context_t: the yypstate
alone suffices (the only difference being the lookahead). Instead of
forcing the user to build a useless shell around yypstate, let's offer
yypstate_expected_tokens.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00025.html.
* data/skeletons/yacc.c (yypstate): Declare earlier, so that we can
use it for...
(yypstate_expected_tokens): this new function, when in push parsers.
Adjust dependencies.
* examples/c/bistromathic/parse.y: Simplify: use
yypstate_expected_tokens.
Style fixes.
Reduce scopes (reported by Joel E. Denny).
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.
* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.