* tests/local.mk (recheck): Look at the per-test logs, not the overall
log, which, when interrupted, contains only information about... the
tests that passed.
GCC 4.2 dies with
src/InadequacyList.c: In function 'InadequacyList__new_conflict':
src/InadequacyList.c:37: error: #pragma GCC diagnostic not allowed inside functions
src/InadequacyList.c:37: error: #pragma GCC diagnostic not allowed inside functions
src/InadequacyList.c:40: error: #pragma GCC diagnostic not allowed inside functions
Reported by Evan Lavelle.
See https://lists.gnu.org/r/bug-bison/2020-03/msg00021.html
and https://trac.macports.org/ticket/59927.
* src/system.h (GCC_VERSION): New.
Use it to control IGNORE_TYPE_LIMITS_BEGIN and
IGNORE_TYPE_LIMITS_END.
Because of the insane current implementation of glr.cc, things are a
bit nasty. We will rename symbol_number_type as symbol_type_type
later, to keep this commit small.
* data/skeletons/c++.m4 (b4_declare_symbol_enum): New.
Also define YYNTOKENS to avoid type clashes when yyntokens_ was
actually defined in another enum.
Use it.
(symbol_number_type): Be an alias of symbol_type_type.
Use YYSYMBOL_YYEMPTY and the like.
Use symbol_number_type where appropriate.
(empty_symbol): Remove.
(yytranslate_): Use symbol_number_type, not token_number_type.
* data/skeletons/lalr1.cc: Use symbol_number_type where appropriate.
Adjust to the replacement of empty_symbol by YYSYMBOL_YYEMPTY.
(yy_error_token_, yy_undef_token_, yyeof_, yyntokens_): Remove.
Adjust dependencies.
* data/skeletons/glr.cc: Use symbol_number_type where appropriate.
Forward definitions of YYSYMBOL_YYEMPTY, etc. to glr.c.
* tests/headers.at: Accept YYNTOKENS and other YYSYMBOL_*.
* tests/local.at (AT_YYERROR_DEFINE(c++)): Use symbol_number_type.
Now that yacc.c and glr.c both know yysymbol_type_t, convert the
common routines.
* data/skeletons/c.m4 (yydestruct, yy_symbol_value_print)
(yy_symbol_print): Use yysymbol_type_t instead of int.
* data/skeletons/glr.c: Use yySymbol where appropriate.
* data/skeletons/yacc.c (YY_ACCESSING_SYMBOL): New wrapper around
yystos.
Use it.
* tests/local.at (yyreport_syntax_error): Use yysymbol_type_t where
appropriate.
Apply the same changes as in yacc.c. Now yySymbol and yysymbol_type_t
are aliases. We will remove the former later, to avoid cluttering
this commit.
* data/skeletons/glr.c: Use b4_declare_symbol_enum.
Use YYSYMBOL_YYEOF etc. where appropriate.
(YYUNDEFTOK, YYTERROR): Remove.
(YYTRANSLATE, yySymbol, yyexpected_tokens, yysyntax_error_arguments):
Adjust.
(yy_accessing_symbol): New.
Use it where appropriate.
This triggers warnings with several compilers. For instance ICC fills
the logs with pages and pages of
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
And so does G++9 when compiling yacc.c's (C) output
input.c:545:8: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
input.c:545:15: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
Clang++ is no exception
input.c:545:8: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
input.c:545:15: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
At some point we could use yysymbol_type_t's enumerators to define
yytranslate. Meanwhile...
* data/skeletons/yacc.c (yytranslate): Use the original integral type
to define it.
(YYTRANSLATE): Cast the result into yysymbol_type_t.
Currently we define enumerators only for symbols that have an
identifier. That rules out tokens such as '+', and nonterminals such
as foo-bar and foo.bar. As a consequence we are taking chances: the
compiler might compile yysymbol_type_t as too small an integral type
for some symbol codes.
* data/skeletons/bison.m4 (b4_symbol_sid): Forge a unique symbol
identifier for symbols that don't have an ID.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
Now that we have a proper type for internal symbol numbers, let's use
it. More code needs conversion, e.g., printers and destructors, but
they are shared with glr.c, which is not ready yet for this change.
It will also help us deal with warnings such as (GCC9 on GNU/Linux):
input.c: In function 'int yyparse()':
input.c:475:37: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
475 | (0 <= (YYX) && (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYSYMBOL_YYUNDEF)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
input.c:1024:17: note: in expansion of macro 'YYTRANSLATE'
1024 | yytoken = YYTRANSLATE (yychar);
| ^~~~~~~~~~~
* data/skeletons/yacc.c (yytranslate, yysymbol_name)
(yyparse_context_t, yyexpected_tokens, yypstate_expected_tokens)
(yysyntax_error_arguments):
Use yysymbol_type_t instead of int.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
* tests/local.mk (recheck): Look at the per-test logs, not the overall
log, which, when interrupted, contains only information about... the
tests that passed.
* data/skeletons/lalr1.java (yysyntaxErrorArguments): Move it from the
context, to the parser object.
Generate only for detailed and verbose error messages.
* tests/local.at (AT_YYERROR_DEFINE(java)): Use yyexpectedTokens
instead.
We could just "inline yysyntax_error_arguments back" in the routines
it was originally extracted from, but I think the code is nicer to
read this way.
* data/skeletons/glr.c (yysyntax_error_arguments): Generate only for
detailed and verbose error messages.
* data/skeletons/yacc.c: Likewise.
* data/skeletons/lalr1.cc (parser::context::yysyntax_error_arguments):
Move as...
(parser::yysyntax_error_arguments_): this.
And only for detailed and verbose error messages.
Because glr.c shares the same testing routines, we also need to
convert it.
* data/skeletons/glr.c (yyparse_context_token): New.
* tests/local.at (yyreport_syntax_error): here.
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html
* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2. Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.
Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user. In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.
* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
The cost of the file layer is large and makes benchmarks too coarse,
as seen for in following example, first with a file, then with a
literal string:
0. %skeleton "yacc.c" %define parse.lac full
1. %skeleton "yacc-v1.c" %define nofinal %define parse.lac full
2. %skeleton "yacc-v2.c" %define nofinal %define parse.lac full
3. %skeleton "yacc-v3.c" %define nofinal %define parse.lac full
4. %skeleton "yacc.c"
5. %skeleton "yacc-v1.c" %define nofinal
6. %skeleton "yacc-v2.c" %define nofinal
7. %skeleton "yacc-v3.c" %define nofinal
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 32558 ns 32537 ns 21228
BM_y1 32400 ns 32369 ns 21233
BM_y2 33485 ns 33464 ns 20625
BM_y3 32139 ns 32125 ns 21446
BM_y4 31343 ns 31329 ns 21747
BM_y5 31344 ns 31317 ns 22035
BM_y6 31287 ns 31255 ns 22039
BM_y7 31387 ns 31373 ns 22178
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
BM_y0 10642 ns 10634 ns 63601
BM_y1 10657 ns 10654 ns 63625
BM_y2 10441 ns 10432 ns 65957
BM_y3 10558 ns 10554 ns 64546
BM_y4 9521 ns 9516 ns 72011
BM_y5 9179 ns 9157 ns 75028
BM_y6 9360 ns 9356 ns 73770
BM_y7 9365 ns 9359 ns 72609
Of course, at the same time it is less realistic: most users read
files rather that strings, so it might lead to us to pay attention to
costs most people don't see.
* etc/bench.pl.in (&calc_input): Output into a file given as argument.
Output in C syntax.
(&generate_grammar_calc): Use it.
Simplify the grammar: remove operators we don't care about.
Rewrite the scanner to work on a char* instead of a FILE*.
* etc/bench.pl.in (&bench_with_timethese): Also use y$i, as in
&bench_with_gbenchmark.
(&generate_grammar_calc): Don't add a prefix, let the callers do it.
* etc/bench.pl.in (&compiler): New, extracted from...
(&compile): here.
Don't link when using gbm.
(&calc_input): Don't make massive input for micro
benchmarks.
(&generate_grammar_calc): When using gbm, use api.prefix to avoid name
collisions.
Be ready to issue BENCHMARKS instead of a main.
(&bench): Rename as...
(&bench_with_timethese): this.
(&bench_with_gbenchmark): New.
(&bench): New.
Dispatch on these two.