mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 12:23:04 +00:00
todo: update
This commit is contained in:
69
TODO
69
TODO
@@ -1,28 +1,24 @@
|
||||
* Bison 3.6
|
||||
** Documentation
|
||||
- yyexpected_tokens in all the languages.
|
||||
- remove yysyntax_error_arguments.
|
||||
- YYNOMEM
|
||||
- i18n in Java
|
||||
|
||||
** Java
|
||||
Check api.token.raw
|
||||
|
||||
** Naming conventions
|
||||
yysyntax_error_arguments should be yy_syntax_error_arguments, since it's a
|
||||
private implementation detail.
|
||||
|
||||
There's no good reason to use the "yy" prefix in parser::context, is there?
|
||||
See also the case of Java. We should keep the prefix for private
|
||||
implementation details, but maybe not for public APIs.
|
||||
|
||||
** User token number, internal synbol number, external token number, etc.
|
||||
** User token number, internal symbol number, external token number, etc.
|
||||
There is some confusion over these terms, which is even a problem for
|
||||
translators. We need something clear, especially if we provide access to
|
||||
the symbol numbers (which would be useful for custom error messages).
|
||||
|
||||
We could use "number" and "code".
|
||||
|
||||
Update: the current best options would be "token kind" and "symbol kind",
|
||||
instead of "token type" and "symbol type".
|
||||
|
||||
*** The documentation
|
||||
|
||||
You can explicitly specify the numeric code for a token type...
|
||||
@@ -50,75 +46,18 @@ uses "user token number" in most places.
|
||||
*** M4
|
||||
Make it consistent with the rest (it uses "user_number" and "number").
|
||||
|
||||
** Symbol numbers
|
||||
Giving names to symbol numbers would be useful in custom error messages. It
|
||||
would actually also make the following point gracefully handled (status of
|
||||
YYERRCODE, YYUNDEFTOK, etc.). Possibly we could also define YYEMPTY (twice:
|
||||
as a token and as a symbol). And YYEOF.
|
||||
|
||||
** Consistency
|
||||
YYUNDEFTOK is an internal symbol number, as YYTERROR.
|
||||
But YYERRCODE is an external token number.
|
||||
|
||||
** Java: EOF
|
||||
We should be able to redefine EOF like we do in C.
|
||||
|
||||
** Java: calc.at
|
||||
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
|
||||
|
||||
** Java: _
|
||||
We must not use _ in Java, it is becoming a keyword in Java 9.
|
||||
|
||||
examples/java/calc/Calc.java:998: warning: '_' used as an identifier
|
||||
"$end", "error", "$undefined", _("end of line"), _("number"), "'='",
|
||||
^
|
||||
(use of '_' as an identifier might not be supported in releases after Java SE 8)
|
||||
|
||||
** doc
|
||||
I feel it's ugly to use the GNU style to declare functions in the doc. It
|
||||
generates tons of white space in the page, and may contribute to bad page
|
||||
breaks.
|
||||
|
||||
** improve syntax errors (UTF-8, internationalization)
|
||||
Bison depends on the current locale. For instance:
|
||||
|
||||
%define parse.error verbose
|
||||
%code top {
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
|
||||
int yylex() { return 0; }
|
||||
}
|
||||
%%
|
||||
exp: "↦" | "🎅🐃" | '\n'
|
||||
%%
|
||||
int main() { return yyparse(); }
|
||||
|
||||
gives different results with/without LC_ALL=C.
|
||||
|
||||
$ LC_ALL=C /opt/local/bin/bison /tmp/mangle.y -o ascii.c
|
||||
$ /opt/local/bin/bison /tmp/mangle.y -o utf8.c
|
||||
$ diff -u ascii.c utf8.c -I#line
|
||||
--- ascii.c 2019-01-12 08:15:35.878010093 +0100
|
||||
+++ utf8.c 2019-01-12 08:15:38.856495929 +0100
|
||||
@@ -415,9 +415,8 @@
|
||||
First, the terminals, then, starting at YYNTOKENS, nonterminals. */
|
||||
static const char *const yytname[] =
|
||||
{
|
||||
- "$end", "error", "$undefined", "\"\\342\\206\\246\"",
|
||||
- "\"\\360\\237\\216\\205\\360\\237\\220\\203\"", "'\\n'", "$accept",
|
||||
- "exp", YY_NULLPTR
|
||||
+ "$end", "error", "$undefined", "\"↦\"", "\"🎅🐃\"", "'\\n'",
|
||||
+ "$accept", "exp", YY_NULLPTR
|
||||
};
|
||||
#endif
|
||||
|
||||
$ gcc ascii.c -o ascii && ./ascii
|
||||
syntax error, unexpected $end, expecting "\342\206\246" or "\360\237\216\205\360\237\220\203" or '\n'
|
||||
$ gcc utf8.c -o utf8 && ./utf8
|
||||
syntax error, unexpected $end, expecting ↦ or 🎅🐃 or '\n'
|
||||
|
||||
|
||||
While at it, we should stop using "$end" by default, in favor of "end of
|
||||
file", or "end of input", whatever. See how lalr1.java does that.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user