yacc.c: also define a symbol number for the empty token

This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:

    input.c: In function 'yyparse':
    input.c:1107:7: error: conversion to 'unsigned int' from 'int'
                           may change the sign of the result
                           [-Werror=sign-conversion]
     1107 |   yyn += yytoken;
          |       ^~
    input.c:1107:10: error: conversion to 'int' from 'unsigned int'
                            may change the sign of the result
                            [-Werror=sign-conversion]
     1107 |   yyn += yytoken;
          |          ^~~~~~~
    input.c:1108:47: error: comparison of integer expressions of
                            different signedness:
                            'yytype_int8' {aka 'const signed char'} and
                            'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
                            [-Werror=sign-compare]
     1108 |   if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
          |                                               ^~
    input.c:702:25: error: operand of ?: changes signedness from 'int'
                           to 'unsigned int' due to unsignedness of
                           other operand [-Werror=sign-compare]
      702 | #define YYEMPTY         (-2)
          |                         ^~~~
    input.c:1220:33: note: in expansion of macro 'YYEMPTY'
     1220 |   yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
          |                                 ^~~~~~~
    input.c:1220:41: error: unsigned conversion from 'int' to
                            'unsigned int' changes value
                            from '-2' to '4294967294'
                            [-Werror=sign-conversion]
     1220 |   yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
          |                                         ^

Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits.  We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".

* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.

* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.

* tests/regression.at: Adjust.
This commit is contained in:
Akim Demaille
2020-03-28 15:37:09 +01:00
parent 00c80bc96c
commit f3c18c8e80
6 changed files with 25 additions and 5 deletions

16
TODO
View File

@@ -53,6 +53,22 @@ would actually also make the following point gracefully handled (status of
YYERRCODE, YYUNDEFTOK, etc.). Possibly we could also define YYEMPTY (twice:
as a token and as a symbol). And YYEOF.
It seems to work well. Yet we have a weird case: the "error" token:
enum yysymbol_type_t
{
YYSYMBOL_YYEMPTY = -2,
YYSYMBOL_YYEOF = 0,
YYSYMBOL_error = 1,
YYSYMBOL_YYUNDEF = 2,
YYSYMBOL_YYACCEPT = 61,
...
YYSYMBOL_error looks weird. We should maybe rename this as
"YYSYMBOL_YYERROR", even though it should not be confonded with the YYERROR
macro.
** Consistency
YYUNDEFTOK is an internal symbol number, as YYTERROR.
But YYERRCODE is an external token number.