* maint:
maint: post-release administrivia
version 3.5.4
examples: reccalc: really compile cleanly in C99
news: announce that Bison 3.6 drops YYERROR_VERBOSE
news: update for 3.5.4
style: fix spellos
typo: succesful -> successful
package: improve the readme
java: check and fix support for api.token.raw
java: style: prefer 'int[] foo' to 'int foo[]'
build: fix syntax-check issues
tests: recheck: work properly when the test suite was interrupted
doc: c++: promote api.token.raw
build: fix compatibility with old compilers
examples: reccalc: compile cleanly in C99
The first name is too long. We already have `yypstate`, so
`yypcontext` is ok. We are also migrating to using `*_t` for our
types.
* NEWS, data/skeletons/glr.c, data/skeletons/yacc.c, doc/bison.texi,
* examples/c/bistromathic/parse.y, src/parse-gram.y, tests/local.at:
(yyparse_context_t, yyparse_context_location, yyparse_context_token):
Rename as...
(yypcontext_t, yypcontext_location, yypcontext_token): these.
The Java enums are very different from the C model. As a consequence,
one cannot "build" an enum directly from an integer, we must retrieve
it. That's the purpose of the SymbolType.get class method.
* data/skeletons/java.m4 (b4_symbol_enum, b4_case_code_symbol)
(b4_declare_symbol_enum): New.
* data/skeletons/lalr1.java: Use SymbolType,
SymbolType.YYSYMBOL_YYEMPTY, etc.
* examples/java/calc/Calc.y, tests/local.at: Adjust.
* tests/local.at (AT_LANG_MATCH, AT_YYERROR_DECLARE(java))
(AT_YYERROR_DECLARE_EXTERN(java), AT_PARSER_CLASS): New.
(AT_MAIN_DEFINE(java)): Use AT_PARSER_CLASS.
* tests/scanner.at: Add a test for Java.
* data/skeletons/lalr1.java (yytranslate_): Cast the result.
* tests/local.mk (recheck): Look at the per-test logs, not the overall
log, which, when interrupted, contains only information about... the
tests that passed.
Because of the insane current implementation of glr.cc, things are a
bit nasty. We will rename symbol_number_type as symbol_type_type
later, to keep this commit small.
* data/skeletons/c++.m4 (b4_declare_symbol_enum): New.
Also define YYNTOKENS to avoid type clashes when yyntokens_ was
actually defined in another enum.
Use it.
(symbol_number_type): Be an alias of symbol_type_type.
Use YYSYMBOL_YYEMPTY and the like.
Use symbol_number_type where appropriate.
(empty_symbol): Remove.
(yytranslate_): Use symbol_number_type, not token_number_type.
* data/skeletons/lalr1.cc: Use symbol_number_type where appropriate.
Adjust to the replacement of empty_symbol by YYSYMBOL_YYEMPTY.
(yy_error_token_, yy_undef_token_, yyeof_, yyntokens_): Remove.
Adjust dependencies.
* data/skeletons/glr.cc: Use symbol_number_type where appropriate.
Forward definitions of YYSYMBOL_YYEMPTY, etc. to glr.c.
* tests/headers.at: Accept YYNTOKENS and other YYSYMBOL_*.
* tests/local.at (AT_YYERROR_DEFINE(c++)): Use symbol_number_type.
Now that yacc.c and glr.c both know yysymbol_type_t, convert the
common routines.
* data/skeletons/c.m4 (yydestruct, yy_symbol_value_print)
(yy_symbol_print): Use yysymbol_type_t instead of int.
* data/skeletons/glr.c: Use yySymbol where appropriate.
* data/skeletons/yacc.c (YY_ACCESSING_SYMBOL): New wrapper around
yystos.
Use it.
* tests/local.at (yyreport_syntax_error): Use yysymbol_type_t where
appropriate.
This triggers warnings with several compilers. For instance ICC fills
the logs with pages and pages of
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
input.c(477): error: a value of type "int" cannot be used to initialize an entity of type "const yysymbol_type_t={yysymbol_type_t}"
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^
And so does G++9 when compiling yacc.c's (C) output
input.c:545:8: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
input.c:545:15: error: invalid conversion from 'int' to 'yysymbol_type_t' [-fpermissive]
545 | 0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
| ^
| |
| int
Clang++ is no exception
input.c:545:8: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
input.c:545:15: error: cannot initialize an array element of type 'const yysymbol_type_t' with an rvalue of type 'int'
0, 5, 9, 2, 2, 2, 2, 2, 2, 2,
^
At some point we could use yysymbol_type_t's enumerators to define
yytranslate. Meanwhile...
* data/skeletons/yacc.c (yytranslate): Use the original integral type
to define it.
(YYTRANSLATE): Cast the result into yysymbol_type_t.
This is not only cleaner, it also protects us from mixing signed
values (YYEMPTY is #defined as -2) with unsigned types (the
yysymbol_type_t enum is typically compiled as a small unsigned).
For instance GCC 9:
input.c: In function 'yyparse':
input.c:1107:7: error: conversion to 'unsigned int' from 'int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~
input.c:1107:10: error: conversion to 'int' from 'unsigned int'
may change the sign of the result
[-Werror=sign-conversion]
1107 | yyn += yytoken;
| ^~~~~~~
input.c:1108:47: error: comparison of integer expressions of
different signedness:
'yytype_int8' {aka 'const signed char'} and
'yysymbol_type_t' {aka 'enum yysymbol_type_t'}
[-Werror=sign-compare]
1108 | if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
| ^~
input.c:702:25: error: operand of ?: changes signedness from 'int'
to 'unsigned int' due to unsignedness of
other operand [-Werror=sign-compare]
702 | #define YYEMPTY (-2)
| ^~~~
input.c:1220:33: note: in expansion of macro 'YYEMPTY'
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^~~~~~~
input.c:1220:41: error: unsigned conversion from 'int' to
'unsigned int' changes value
from '-2' to '4294967294'
[-Werror=sign-conversion]
1220 | yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
| ^
Eventually, it might be interesting to move away from -2 (which is the
only possible negative symbol number) and use the next available
number, to save bits. We could actually even simply use "0" and shift
the rest, which would allow to write "!yytoken" to mean really
"yytoken != YYEMPTY".
* data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY.
* data/skeletons/yacc.c: Use it.
* src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not
YYEMPTY, when dealing with a symbol.
* tests/regression.at: Adjust.
* tests/local.mk (recheck): Look at the per-test logs, not the overall
log, which, when interrupted, contains only information about... the
tests that passed.
* data/skeletons/lalr1.java (yysyntaxErrorArguments): Move it from the
context, to the parser object.
Generate only for detailed and verbose error messages.
* tests/local.at (AT_YYERROR_DEFINE(java)): Use yyexpectedTokens
instead.
Because glr.c shares the same testing routines, we also need to
convert it.
* data/skeletons/glr.c (yyparse_context_token): New.
* tests/local.at (yyreport_syntax_error): here.
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html
* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2. Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.
Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user. In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.
* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines. This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.
* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
- 4 | %token foo <error>123
+ 4 | %token foo <error>123</error>
| <error>^~~~~~</error>
* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.
* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
* data/skeletons/lalr1.cc: added support here
* tests/calc.at: added test cases
* tests/local.at: added yyreport_syntax_error implementation
for C++ test cases
Since Bison 2.7, output was indented four spaces for explanatory
statements. For example:
input.y:2.7-13: error: %type redeclaration for exp
input.y:1.7-11: previous declaration
Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:
input.y:2.7-13: error: %type redeclaration for exp
2 | %type <float> exp
| ^~~~~~~
input.y:1.7-11: note: previous declaration
1 | %type <int> exp
| ^~~~~
* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
This revealed a number of things I had not realized:
- the Java location tracking was aliasing the same pair of positions
for all the symbols (see previous commit).
- in impure parsers, it's quite easy to use incorrect locations for
diagnostics, since yyerror uses yylloc, which is the location of the
lookahead, not that of the current lhs. So we need something like
{
YYLTYPE old_yylloc = yylloc;
yylloc = @$;
yyerror (]AT_PARAM_IF([result, count, nerrs, ])[buf);
yylloc = old_yylloc;
}
Maybe we should do that little yylloc dance in the skeleton instead
of leaving it to the user? It might be costly... But that's only
for users of the impure parsers, which are asking for trouble
anyway.
- in glr.cc invoking yyerror is somewhat cumbersome: the C++ interface
is not available as we are in yyparse (which in C), and yyerror is
used by glr.cc itself to bind it to the user's parser::error. If we
call yyerror, we need:
yyerror (]AT_LOCATION_IF([[&@$, ]])[yyparser, ]AT_PARAM_IF([result, count, nerrs, ])[msg);
However calling yy::parser::error is easier, once we know that the
current parser object is available as 'yyparser'. Which also saves
us from having to pass the parse-params ourselves:
yyparser.error (]AT_LOCATION_IF([[@$, ]])[msg);
* tests/calc.at: Invoke yyerror by hand, instead of using fprintf etc.
Adjust expectations.
* data/skeletons/lalr1.java (Context): Make data members private.
(Context.getLocation): New.
* examples/java/calc/Calc.y, tests/java.at, tests/local.at: Adjust.
* doc/bison.texi (Tokens from Literals): Move to code using
%token-table to...
(Decl Summary: %token-table): here.
* data/skeletons/bison.m4: Implement mutual exclusion.
* tests/input.at: Check it.
* doc/local.mk: Be robust to the removal of doc/.
* data/skeletons/lalr1.java (yyexpectedTokens)
(yysyntaxErrorArguments): Make them methods of Context.
(Context.yysymbolName): New.
* tests/local.at: Adjust.
In Java there is no need for N_ and yytranslate_. So instead of
hard-coding the use of N_ in the table of the symbol names, rely on
b4_symbol_translate.
* src/output.c (prepare_symbol_names): Use b4_symbol_translate instead
of N_.
* data/skeletons/c.m4 (b4_symbol_translate): New.
* data/skeletons/lalr1.java (yysymbolName): New.
Use it.
* examples/java/calc/Calc.y: Use parse.error=detailed.
* tests/calc.at: Check parse.error=detailed.
Unfortunately in the Java skeleton the user cannot override the way
locations are displayed, and locations don't know the structure of the
positions. So they cannot implement the tricks used in the C/C++
skeletons to display "1.1" instead of "1.1-1.2".
* tests/local.at (Java): Add support for column tracking in the
locations, as we did in examples/java/calc.
* tests/calc.at: Use AT_CALC_YYLEX.
Soon calculator tests for Java will move from java.at to calc.at.
Which implies improving the Java testing infrastructure in
local.at (for instance really tracking columns in positions, not just
token number). Detach java.at from local.at.
* tests/java.at (AT_JAVA_POSITION_DEFINE_OLD): New.
Use it.
The C, C++ and D skeletons used to show the stack right after popping
the stack during the reduction. Now that the stack is printed after
reaching a new state, that has become useless:
Entering state 1
Stack now 0 1
Reducing stack by rule 5 (line 83):
$1 = token "number" (1)
-> $$ = nterm exp (1)
Stack now 0
Entering state 8
Stack now 0 8
Remove the "Stack now 0" line.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c:
Here.
Currently, if we have long rules and series of shift, we stack states
without showing stack. Let's be more incremental, and do how the Java
skeleton does.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/yacc.c:
Here.
Adjust test cases.
* tests/torture.at (AT_DATA_STACK_TORTURE): Disable stack traces: this
test produces a very large stack, and showing the stack each time we
shift a token goes quadatric.
The Java skeleton displays
Reading a token:
Next token is token "number" (1)
while the other display
Reading a token: Next token is token "number" (1)
When generating logs in the scanner, the first part is separated from
the second, and the end of the scanner logs have the second part
pasted in. So let's propagate the Java way, but with the colon.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c: Do it.
Adjust test cases and doc.
* tests/local.at (AT_LANG_MATCH): New.
(AT_YYERROR_DECLARE(java), AT_YYERROR_DECLARE_EXTERN(java)): New.
* tests/calc.at: The grammar file for Java is quite different for the
others, and continuing to assemble it from pieces makes the grammar
file hard to understand. Let's also dispatch on the language to
assemble it, and isolate Java from the others.
Most of this comes from java.at.