This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
There's a number of advantage in exposing the symbol (internal)
numbers:
- custom error messages can use them to decide how to represent a
given symbol, or a set of symbols.
- we need something similar in uses of yyexpected_tokens. For
instance, currently, bistromathic's completion() reads:
int ntokens = expected_tokens (line, tokens, YYNTOKENS);
[...]
for (int i = 0; i < ntokens; ++i)
if (tokens[i] == YYTRANSLATE (TOK_VAR))
[...]
else if (tokens[i] == YYTRANSLATE (TOK_FUN))
[...]
else
[...]
- now that it's a compile-time expression, we can easily build static
tables, switch, etc.
- some users depended on the ability to get the token number from a
symbol to write test cases for their scanners. But Bison 3.5
removed the table this feature depended upon (a reverse
yytranslate). Now they can check against the actual symbol number,
without having pay (space and time) a conversion.
See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.
- it helps us clearly separate the internal symbol numbers from the
external token numbers, whose difference is sometimes blurred in the
code when values coincide (e.g. "yychar = yytoken = YYEOF").
- it allows us to get rid of ugly macros with inconsistent names such
as YYUNDEFTOK and YYTERROR, and to group related definitions
together.
- similarly it provides a clean access to the $accept symbol (which
proves convenient in a current experimentation of mine with several
%start symbols).
Let's declare this type as a private type (in the *.c file, not
the *.h one). So it does not need to be influenced by the api prefix.
* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
Currently pstate_new does not set up its variables, this task is left
to yypush_parse. This was probably to share more code with usual pull
parsers, where these (local) variables are indeed initialized by
yyparse.
But as a consequence yyexpected_tokens crashes at the very beginning
of the parse, since, for instance, the stacks are not even set up.
See https://lists.gnu.org/r/bison-patches/2020-03/msg00001.html.
The fix could have very simple, but the documentation actually makes
it very clear that we can reuse a pstate for several parses:
After yypush_parse returns a status other than YYPUSH_MORE, the
parser instance yyps may be reused for a new parse.
so we need to restore the parser to its pristine state so that (i) it
is ready to run the next parse, (ii) it properly supports
yyexpected_tokens for the next run.
* data/skeletons/yacc.c (b4_initialize_parser_state_variables): New,
extracted from the top of yyparse/yypush_parse.
(yypstate_clear): New.
(yypstate_new): Use it when push parsers are enabled.
Define after the yyps macros so that we can use the same code as the
regular pull parsers.
(yyparse): Use it when push parsers are _not_ enabled.
* examples/c/bistromathic/bistromathic.test: Check the completion on
the beginning of the line.
These macros have been extremely useful when we had to support K&R C,
which we dropped long ago. Now, they merely make the code uselessly
hard to read.
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/yacc.c:
Stop using b4_function_define.
Since Bison 2.7, output was indented four spaces for explanatory
statements. For example:
input.y:2.7-13: error: %type redeclaration for exp
input.y:1.7-11: previous declaration
Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:
input.y:2.7-13: error: %type redeclaration for exp
2 | %type <float> exp
| ^~~~~~~
input.y:1.7-11: note: previous declaration
1 | %type <int> exp
| ^~~~~
* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
This revealed a number of things I had not realized:
- the Java location tracking was aliasing the same pair of positions
for all the symbols (see previous commit).
- in impure parsers, it's quite easy to use incorrect locations for
diagnostics, since yyerror uses yylloc, which is the location of the
lookahead, not that of the current lhs. So we need something like
{
YYLTYPE old_yylloc = yylloc;
yylloc = @$;
yyerror (]AT_PARAM_IF([result, count, nerrs, ])[buf);
yylloc = old_yylloc;
}
Maybe we should do that little yylloc dance in the skeleton instead
of leaving it to the user? It might be costly... But that's only
for users of the impure parsers, which are asking for trouble
anyway.
- in glr.cc invoking yyerror is somewhat cumbersome: the C++ interface
is not available as we are in yyparse (which in C), and yyerror is
used by glr.cc itself to bind it to the user's parser::error. If we
call yyerror, we need:
yyerror (]AT_LOCATION_IF([[&@$, ]])[yyparser, ]AT_PARAM_IF([result, count, nerrs, ])[msg);
However calling yy::parser::error is easier, once we know that the
current parser object is available as 'yyparser'. Which also saves
us from having to pass the parse-params ourselves:
yyparser.error (]AT_LOCATION_IF([[@$, ]])[msg);
* tests/calc.at: Invoke yyerror by hand, instead of using fprintf etc.
Adjust expectations.
* data/skeletons/lalr1.java (yyexpectedTokens)
(yysyntaxErrorArguments): Make them methods of Context.
(Context.yysymbolName): New.
* tests/local.at: Adjust.
Unfortunately in the Java skeleton the user cannot override the way
locations are displayed, and locations don't know the structure of the
positions. So they cannot implement the tricks used in the C/C++
skeletons to display "1.1" instead of "1.1-1.2".
* tests/local.at (Java): Add support for column tracking in the
locations, as we did in examples/java/calc.
* tests/calc.at: Use AT_CALC_YYLEX.
* data/skeletons/glr.c (YYASSERT): Rename as...
(YY_ASSERT): this, for consistency with yacc.c, and also to emphasize
the fact that this is not for the end user (YY_ prefix).
* tests/glr-regression.at: Define parse.assert.
It is not used. And its implementation was wrong when api.token.raw
was defined, as it was still mapping to the external token numbers,
instead of the internal ones. Besides it was provided only when
api.token.constructor is defined, yet always declared.
* data/skeletons/c++.m4 (by_type::token): Remove, useless.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java (yytoken_number_):
Remove, useless.
Was used in ancient C skeletons to support YYPRINT, long obsoleted by
%printer.
* data/skeletons/location.cc: Remove the u (for unsigned) suffix from
the initial line and column.
* NEWS: AFAICT, only C++ backends have their location types changed.
* data/skeletons/yacc.c (YYPTRDIFF_T, YYPTRDIFF_MAXIMUM):
Default to long, not int.
(yy_lac_stack_realloc, yy_lac, yytnamerr, yyparse):
Avoid casts to YYPTRDIFF_T that were masking the problem.
I no longer agree with that item, there are indeed two things to
report: lack of definition, and being useless. We could have either
one without the other, they are not directly related.
input.c: In function 'int yyparse()':
input.c: error: conversion to 'long int' from 'long unsigned int'
may change the sign of the result [-Werror=sign-conversion]
yyes_capacity = sizeof yyesa / sizeof *yyes;
^
cc1plus: all warnings being treated as errors
* data/skeletons/yacc.c: here.
This is an experiment. Maybe more styles will be used (in which case
a short-hand function will be useful), maybe it will be just reverted.
* data/bison-default.css (.traces0): New.
* src/lalr.c (lalr): Use it.
* cfg.mk: Disable checks where needed (e.g., we do want to check the
behavior with tabs).
(sc_at_parser_check): Remove. Unfortunately since
a11c144609 we no longer use the './'
prefix to run programs in the current directory. That was so that we
could run Java programs like the other, although they are no run with
the `./` prefix (see 967a59d2c0).
As a consequence this sc check no longer makes sense.
However, since now AT_PARSER_CHECK passes the `./` prefix itself, this
sc-check was superfluous.
* examples/c/reccalc/scan.l: Use memcpy, not strncpy.
* src/ielr.c, src/reader.c: Obfuscate "lr(0)" so that the sc-check for
"space before paren" does not fire.
* tests/diagnostics.at: Avoid space-tab, use tab-tab.
We were using the gnulib's gettext module with tricks in
bootstrap.conf to avoid useless files. Instead, use gnulib's
gettext-h module.
* .travis.yml: Force Gettext 0.18.3 on Trusty.
* bootstrap.conf: Use gettext-h instead of gettext.
(excluded_files): Remove.
* configure.ac (AM_GNU_GETTEXT_VERSION): Bump to 0.19.
Currently we have no simple example: rpcalc in reverse Polish, mfcalc
has functions, and lexcalc is using lex.
* examples/c/calc/Makefile, examples/c/calc/calc.y,
* examples/c/calc/calc.test, examples/c/calc/local.mk: New.