The cost of the file layer is large and makes benchmarks too coarse,
as seen for in following example, first with a file, then with a
literal string:
0. %skeleton "yacc.c" %define parse.lac full
1. %skeleton "yacc-v1.c" %define nofinal %define parse.lac full
2. %skeleton "yacc-v2.c" %define nofinal %define parse.lac full
3. %skeleton "yacc-v3.c" %define nofinal %define parse.lac full
4. %skeleton "yacc.c"
5. %skeleton "yacc-v1.c" %define nofinal
6. %skeleton "yacc-v2.c" %define nofinal
7. %skeleton "yacc-v3.c" %define nofinal
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 32558 ns 32537 ns 21228
BM_y1 32400 ns 32369 ns 21233
BM_y2 33485 ns 33464 ns 20625
BM_y3 32139 ns 32125 ns 21446
BM_y4 31343 ns 31329 ns 21747
BM_y5 31344 ns 31317 ns 22035
BM_y6 31287 ns 31255 ns 22039
BM_y7 31387 ns 31373 ns 22178
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 10642 ns 10634 ns 63601
BM_y1 10657 ns 10654 ns 63625
BM_y2 10441 ns 10432 ns 65957
BM_y3 10558 ns 10554 ns 64546
BM_y4 9521 ns 9516 ns 72011
BM_y5 9179 ns 9157 ns 75028
BM_y6 9360 ns 9356 ns 73770
BM_y7 9365 ns 9359 ns 72609
Of course, at the same time it is less realistic: most users read
files rather that strings, so it might lead to us to pay attention to
costs most people don't see.
* etc/bench.pl.in (&calc_input): Output into a file given as argument.
Output in C syntax.
(&generate_grammar_calc): Use it.
Simplify the grammar: remove operators we don't care about.
Rewrite the scanner to work on a char* instead of a FILE*.
* etc/bench.pl.in (&bench_with_timethese): Also use y$i, as in
&bench_with_gbenchmark.
(&generate_grammar_calc): Don't add a prefix, let the callers do it.
* etc/bench.pl.in (&compiler): New, extracted from...
(&compile): here.
Don't link when using gbm.
(&calc_input): Don't make massive input for micro
benchmarks.
(&generate_grammar_calc): When using gbm, use api.prefix to avoid name
collisions.
Be ready to issue BENCHMARKS instead of a main.
(&bench): Rename as...
(&bench_with_timethese): this.
(&bench_with_gbenchmark): New.
(&bench): New.
Dispatch on these two.
In push parsers, when asking for the list of expected tokens at some
point, it makes no sense to build a yyparse_context_t: the yypstate
alone suffices (the only difference being the lookahead). Instead of
forcing the user to build a useless shell around yypstate, let's offer
yypstate_expected_tokens.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00025.html.
* data/skeletons/yacc.c (yypstate): Declare earlier, so that we can
use it for...
(yypstate_expected_tokens): this new function, when in push parsers.
Adjust dependencies.
* examples/c/bistromathic/parse.y: Simplify: use
yypstate_expected_tokens.
Style fixes.
Reduce scopes (reported by Joel E. Denny).
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.
* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
Currently pstate_new does not set up its variables, this task is left
to yypush_parse. This was probably to share more code with usual pull
parsers, where these (local) variables are indeed initialized by
yyparse.
But as a consequence yyexpected_tokens crashes at the very beginning
of the parse, since, for instance, the stacks are not even set up.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00001.html.
The fix could have very simple, but the documentation actually makes
it very clear that we can reuse a pstate for several parses:
After yypush_parse returns a status other than YYPUSH_MORE, the
parser instance yyps may be reused for a new parse.
so we need to restore the parser to its pristine state so that (i) it
is ready to run the next parse, (ii) it properly supports
yyexpected_tokens for the next run.
* data/skeletons/yacc.c (b4_initialize_parser_state_variables): New,
extracted from the top of yyparse/yypush_parse.
(yypstate_clear): New.
(yypstate_new): Use it when push parsers are enabled.
Define after the yyps macros so that we can use the same code as the
regular pull parsers.
(yyparse): Use it when push parsers are _not_ enabled.
* examples/c/bistromathic/bistromathic.test: Check the completion on
the beginning of the line.
Currently completion on "at" proposes only "atan", but does not
actually complete "at" into "atan".
* examples/c/bistromathic/parse.y (completion): Install the lcp in
matches[0].
* examples/c/bistromathic/bistromathic.test: Check that case.
Currently "(1+<TAB>" does not work as expected, because "+" is not a
word breaking character.
* examples/c/bistromathic/parse.y (init_readline): Specify our word
breaking characters.
* examples/c/bistromathic/bistromathic.test: Avoid trailing spaces.
These macros have been extremely useful when we had to support K&R C,
which we dropped long ago. Now, they merely make the code uselessly
hard to read.
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/yacc.c:
Stop using b4_function_define.
Let's use GNU readline and its TAB autocompletion to demonstrate the
use of yyexpected_tokens.
This shows a number of weaknesses in our current approach:
- some macros (yyssp, etc.) from push parsers "leak" in user code, we
need to undefine them
- the context needed by yyexpected_tokens does not need the token,
yypstate actually suffices
- yypstate is not properly setup when first allocated, which results
in a crash of yyexpected_tokens if fired before a first token was
read. We should move initialization from yypush_parse into
yypstate_new.
* examples/c/bistromathic/parse.y (yylex): Take input as a string, not
a file.
(EXIT): New token.
(input): Adjust to work only on a line.
(line): Remove.
(symbol_count, process_line, expected_tokens, completion)
(init_readline): New.
* examples/c/bistromathic/bistromathic.test: Adjust expectations.
This example will soon use GNU readline, so its scanner should be easy
to use (concurrently) on strings, not streams. This is not a place
where Flex shines, and anyway, these are examples of Bison, not Flex.
There's already lexcalc and reccalc that demonstrate the use of Flex.
* examples/c/bistromathic/scan.l: Remove.
* examples/c/bistromathic/parse.y (yylex): New.
Adjust dependencies.