todo: update

This commit is contained in:
Akim Demaille
2020-04-04 19:27:07 +02:00
parent 4e26809ab9
commit 4e3c06b0f8

69
TODO
View File

@@ -1,28 +1,24 @@
* Bison 3.6
** Documentation
- yyexpected_tokens in all the languages.
- remove yysyntax_error_arguments.
- YYNOMEM
- i18n in Java
** Java
Check api.token.raw
** Naming conventions
yysyntax_error_arguments should be yy_syntax_error_arguments, since it's a
private implementation detail.
There's no good reason to use the "yy" prefix in parser::context, is there?
See also the case of Java. We should keep the prefix for private
implementation details, but maybe not for public APIs.
** User token number, internal synbol number, external token number, etc.
** User token number, internal symbol number, external token number, etc.
There is some confusion over these terms, which is even a problem for
translators. We need something clear, especially if we provide access to
the symbol numbers (which would be useful for custom error messages).
We could use "number" and "code".
Update: the current best options would be "token kind" and "symbol kind",
instead of "token type" and "symbol type".
*** The documentation
You can explicitly specify the numeric code for a token type...
@@ -50,75 +46,18 @@ uses "user token number" in most places.
*** M4
Make it consistent with the rest (it uses "user_number" and "number").
** Symbol numbers
Giving names to symbol numbers would be useful in custom error messages. It
would actually also make the following point gracefully handled (status of
YYERRCODE, YYUNDEFTOK, etc.). Possibly we could also define YYEMPTY (twice:
as a token and as a symbol). And YYEOF.
** Consistency
YYUNDEFTOK is an internal symbol number, as YYTERROR.
But YYERRCODE is an external token number.
** Java: EOF
We should be able to redefine EOF like we do in C.
** Java: calc.at
Stop hard-coding "Calc". Adjust local.at (look for FIXME).
** Java: _
We must not use _ in Java, it is becoming a keyword in Java 9.
examples/java/calc/Calc.java:998: warning: '_' used as an identifier
"$end", "error", "$undefined", _("end of line"), _("number"), "'='",
^
(use of '_' as an identifier might not be supported in releases after Java SE 8)
** doc
I feel it's ugly to use the GNU style to declare functions in the doc. It
generates tons of white space in the page, and may contribute to bad page
breaks.
** improve syntax errors (UTF-8, internationalization)
Bison depends on the current locale. For instance:
%define parse.error verbose
%code top {
#include <stdio.h>
#include <stdlib.h>
void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
int yylex() { return 0; }
}
%%
exp: "↦" | "🎅🐃" | '\n'
%%
int main() { return yyparse(); }
gives different results with/without LC_ALL=C.
$ LC_ALL=C /opt/local/bin/bison /tmp/mangle.y -o ascii.c
$ /opt/local/bin/bison /tmp/mangle.y -o utf8.c
$ diff -u ascii.c utf8.c -I#line
--- ascii.c 2019-01-12 08:15:35.878010093 +0100
+++ utf8.c 2019-01-12 08:15:38.856495929 +0100
@@ -415,9 +415,8 @@
First, the terminals, then, starting at YYNTOKENS, nonterminals. */
static const char *const yytname[] =
{
- "$end", "error", "$undefined", "\"\\342\\206\\246\"",
- "\"\\360\\237\\216\\205\\360\\237\\220\\203\"", "'\\n'", "$accept",
- "exp", YY_NULLPTR
+ "$end", "error", "$undefined", "\"↦\"", "\"🎅🐃\"", "'\\n'",
+ "$accept", "exp", YY_NULLPTR
};
#endif
$ gcc ascii.c -o ascii && ./ascii
syntax error, unexpected $end, expecting "\342\206\246" or "\360\237\216\205\360\237\220\203" or '\n'
$ gcc utf8.c -o utf8 && ./utf8
syntax error, unexpected $end, expecting ↦ or 🎅🐃 or '\n'
While at it, we should stop using "$end" by default, in favor of "end of
file", or "end of input", whatever. See how lalr1.java does that.