mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-19 01:03:04 +00:00
TODO: more updates
This commit is contained in:
123
TODO
123
TODO
@@ -76,8 +76,9 @@ have it?
|
|||||||
|
|
||||||
** clean up (Akim Demaille)
|
** clean up (Akim Demaille)
|
||||||
Do not work on these items now, as I (Akim) have branches with a lot of
|
Do not work on these items now, as I (Akim) have branches with a lot of
|
||||||
changes in this area, and no desire to have to fix conflicts. These
|
changes in this area (hitting several files), and no desire to have to fix
|
||||||
cleaning up will happen after my branches have been merged.
|
conflicts. Addressing these items will happen after my branches have been
|
||||||
|
merged.
|
||||||
|
|
||||||
*** lalr.c
|
*** lalr.c
|
||||||
Introduce a goto struct, and use it in place of from_state/to_state.
|
Introduce a goto struct, and use it in place of from_state/to_state.
|
||||||
@@ -128,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
|
|||||||
38: input.at:1730 errors
|
38: input.at:1730 errors
|
||||||
|
|
||||||
* Short term
|
* Short term
|
||||||
|
** Stop indentation in diagnostics
|
||||||
|
Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
|
||||||
|
|
||||||
|
input.y:2.7-12: %type redeclaration for exp
|
||||||
|
input.y:1.7-12: previous declaration
|
||||||
|
|
||||||
|
In Bison 2.7, we indented them
|
||||||
|
|
||||||
|
input.y:2.7-12: error: %type redeclaration for exp
|
||||||
|
input.y:1.7-12: previous declaration
|
||||||
|
|
||||||
|
Later we quoted the source in the diagnostics, and today we have:
|
||||||
|
|
||||||
|
/tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
|
||||||
|
1 | %token FOO FOO
|
||||||
|
| ^~~
|
||||||
|
/tmp/foo.y:1.8-10: previous declaration
|
||||||
|
1 | %token FOO FOO
|
||||||
|
| ^~~
|
||||||
|
|
||||||
|
The indentation is no longer helping. We should probably get rid of it, or
|
||||||
|
maybe keep it only when -fno-caret. GCC displays this as a "note":
|
||||||
|
|
||||||
|
$ g++-mp-9 -Wall /tmp/foo.c -c
|
||||||
|
/tmp/foo.c:1:10: error: redefinition of 'int foo'
|
||||||
|
1 | int foo, foo;
|
||||||
|
| ^~~
|
||||||
|
/tmp/foo.c:1:5: note: 'int foo' previously declared here
|
||||||
|
1 | int foo, foo;
|
||||||
|
| ^~~
|
||||||
|
|
||||||
|
Likewise for Clang, contrary to what I believed (because "note:" is written
|
||||||
|
in black, so it doesn't show in my terminal :-)
|
||||||
|
|
||||||
|
$ clang++-mp-8.0 -Wall /tmp/foo.c -c
|
||||||
|
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
|
||||||
|
/tmp/foo.c:1:10: error: redefinition of 'foo'
|
||||||
|
int foo, foo;
|
||||||
|
^
|
||||||
|
/tmp/foo.c:1:5: note: previous definition is here
|
||||||
|
int foo, foo;
|
||||||
|
^
|
||||||
|
1 error generated.
|
||||||
|
|
||||||
|
** Better design for diagnostics
|
||||||
|
The current implementation of diagnostics is adhoc, it grew organically. It
|
||||||
|
works as a series of calls to several functions, with dependency of the
|
||||||
|
latter calls on the former. For instance:
|
||||||
|
|
||||||
|
complain (&sym->location,
|
||||||
|
sym->content->status == needed ? complaint : Wother,
|
||||||
|
_("symbol %s is used, but is not defined as a token"
|
||||||
|
" and has no rules; did you mean %s?"),
|
||||||
|
quote_n (0, sym->tag),
|
||||||
|
quote_n (1, best->tag));
|
||||||
|
if (feature_flag & feature_caret)
|
||||||
|
location_caret_suggestion (sym->location, best->tag, stderr);
|
||||||
|
|
||||||
|
We should rewrite this in a more FP way:
|
||||||
|
|
||||||
|
1. build a rich structure that denotes the (complete) diagnostic.
|
||||||
|
"Complete" in the sense that it also contains the suggestions, the list
|
||||||
|
of possible matches, etc.
|
||||||
|
|
||||||
|
2. send this to the pretty-printing routine. The diagnostic structure
|
||||||
|
should be sufficient so that we can generate all the 'format' of
|
||||||
|
diagnostics, including the fixits.
|
||||||
|
|
||||||
|
If properly done, this diagnostic module can be detached from Bison and be
|
||||||
|
put in gnulib. It could be used, for instance, for errors caught by
|
||||||
|
xgettext.
|
||||||
|
|
||||||
|
There's certainly already something alike in GCC. At least that's the
|
||||||
|
impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
|
||||||
|
page:
|
||||||
|
|
||||||
|
https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
|
||||||
|
|
||||||
** consistency
|
** consistency
|
||||||
token vs terminal
|
token vs terminal
|
||||||
|
|
||||||
@@ -137,6 +216,11 @@ itself uses int (for yylen for instance), yet stack is based on size_t.
|
|||||||
|
|
||||||
Maybe locations should also move to ints.
|
Maybe locations should also move to ints.
|
||||||
|
|
||||||
|
Paul Eggert already covered most of this. But before publishing these
|
||||||
|
changes, we need to ask our C++ users if they agree with that change, or if
|
||||||
|
we need some migration path. Could be a %define variable, or simply
|
||||||
|
%require "3.5".
|
||||||
|
|
||||||
** Graphviz display code thoughts
|
** Graphviz display code thoughts
|
||||||
The code for the --graph option is over two files: print_graph, and
|
The code for the --graph option is over two files: print_graph, and
|
||||||
graphviz. This is because Bison used to also produce VCG graphs, but since
|
graphviz. This is because Bison used to also produce VCG graphs, but since
|
||||||
@@ -156,9 +240,6 @@ Little effort seems to have been given to factoring these files and their
|
|||||||
rint{,-xml} counterpart. We would very much like to re-use the pretty format
|
rint{,-xml} counterpart. We would very much like to re-use the pretty format
|
||||||
of states from .output for the graphs, etc.
|
of states from .output for the graphs, etc.
|
||||||
|
|
||||||
Also, the underscore in print_graph.[ch] isn't very fitting considering the
|
|
||||||
dashes in the other filenames.
|
|
||||||
|
|
||||||
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
|
Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
|
||||||
|
|
||||||
** push-parser
|
** push-parser
|
||||||
@@ -296,12 +377,17 @@ we do the same in yacc.c.
|
|||||||
as we don't lose bits to padding. For instance the typical stack for states
|
as we don't lose bits to padding. For instance the typical stack for states
|
||||||
will use 8 bits, while it is likely to consume 32 bits in a struct.
|
will use 8 bits, while it is likely to consume 32 bits in a struct.
|
||||||
|
|
||||||
We need trustworth benching for Bison, for all our backends.
|
We need trustworthy benchmarks for Bison, for all our backends. Akim has a
|
||||||
|
few things scattered around; we need to put them in the repo, and make them
|
||||||
|
more useful.
|
||||||
|
|
||||||
** yysyntax_error
|
** yysyntax_error
|
||||||
The code bw glr.c and yacc.c is really alike, we can certainly factor
|
The code bw glr.c and yacc.c is really alike, we can certainly factor
|
||||||
some parts.
|
some parts.
|
||||||
|
|
||||||
|
This should be worked on when we also address the expected improvements for
|
||||||
|
error generation (e.g., i18n).
|
||||||
|
|
||||||
|
|
||||||
* Report
|
* Report
|
||||||
|
|
||||||
@@ -342,23 +428,25 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
|
|||||||
* Extensions
|
* Extensions
|
||||||
** Multiple start symbols
|
** Multiple start symbols
|
||||||
Would be very useful when parsing closely related languages. The idea is to
|
Would be very useful when parsing closely related languages. The idea is to
|
||||||
declared several start symbols, for instance
|
declare several start symbols, for instance
|
||||||
|
|
||||||
%start: stmt expr
|
%start stmt expr
|
||||||
%%
|
%%
|
||||||
stmt: ...
|
stmt: ...
|
||||||
expr: ...
|
expr: ...
|
||||||
|
|
||||||
and to generate parse, parse_stmt and parse_expr. Technically, the above
|
and to generate parse(), parse_stmt() and parse_expr(). Technically, the
|
||||||
grammar would be transformed into
|
above grammar would be transformed into
|
||||||
|
|
||||||
%start: yy_start
|
%start yy_start
|
||||||
|
%token YY_START_STMT YY_START_EXPR
|
||||||
|
%%
|
||||||
yy_start: YY_START_STMT stmt | YY_START_EXPR expr
|
yy_start: YY_START_STMT stmt | YY_START_EXPR expr
|
||||||
|
|
||||||
so that there are no conflicts in the grammar (as would undoubtedly happen
|
so that there are no new conflicts in the grammar (as would undoubtedly
|
||||||
with yy_start: stmt | expr). Then all that remains to do is to adjust the
|
happen with yy_start: stmt | expr). Then adjust the skeletons so that this
|
||||||
skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be
|
initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
|
||||||
shifted first.
|
corresponding parse function.
|
||||||
|
|
||||||
** Better error messages
|
** Better error messages
|
||||||
The users are not provided with enough tools to forge their error messages.
|
The users are not provided with enough tools to forge their error messages.
|
||||||
@@ -379,8 +467,8 @@ https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html
|
|||||||
However, there are many other things to do before having such a feature,
|
However, there are many other things to do before having such a feature,
|
||||||
because I don't want a % equivalent to #include (which we all learned to
|
because I don't want a % equivalent to #include (which we all learned to
|
||||||
hate). I want something that builds "modules" of grammars, and assembles
|
hate). I want something that builds "modules" of grammars, and assembles
|
||||||
them together, paying attention to keep separate bits separates, in
|
them together, paying attention to keep separate bits separated, in pseudo
|
||||||
pseudo name spaces.
|
name spaces.
|
||||||
|
|
||||||
** Push parsers
|
** Push parsers
|
||||||
There is demand for push parsers in Java and C++. And GLR I guess.
|
There is demand for push parsers in Java and C++. And GLR I guess.
|
||||||
@@ -463,6 +551,7 @@ It is unfortunate that there is a total order for precedence. It
|
|||||||
makes it impossible to have modular precedence information. We should
|
makes it impossible to have modular precedence information. We should
|
||||||
move to partial orders (sounds like series/parallel orders to me).
|
move to partial orders (sounds like series/parallel orders to me).
|
||||||
|
|
||||||
|
This is a prerequisite for modules.
|
||||||
|
|
||||||
* $undefined
|
* $undefined
|
||||||
From Hans:
|
From Hans:
|
||||||
|
|||||||
Reference in New Issue
Block a user