TODO: more updates

2026-06-08 00:32:35 +00:00 · 2019-10-15 08:28:15 +02:00
parent ee35055b49
commit b47340982b
1 changed files with 106 additions and 17 deletions
@@ -76,8 +76,9 @@ have it?
 ** clean up (Akim Demaille)
 Do not work on these items now, as I (Akim) have branches with a lot of
-changes in this area, and no desire to have to fix conflicts.  These
+changes in this area (hitting several files), and no desire to have to fix
-cleaning up will happen after my branches have been merged.
+conflicts.  Addressing these items will happen after my branches have been
 merged.
 *** lalr.c
 Introduce a goto struct, and use it in place of from_state/to_state.
@@ -128,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
  38: input.at:1730      errors
 * Short term
 ** Stop indentation in diagnostics
 Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
    input.y:2.7-12: %type redeclaration for exp
    input.y:1.7-12: previous declaration
 In Bison 2.7, we indented them
    input.y:2.7-12: error: %type redeclaration for exp
    input.y:1.7-12:     previous declaration
 Later we quoted the source in the diagnostics, and today we have:
    /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
        1 | %token FOO FOO
          |            ^~~
    /tmp/foo.y:1.8-10:      previous declaration
        1 | %token FOO FOO
          |        ^~~
 The indentation is no longer helping.  We should probably get rid of it, or
 maybe keep it only when -fno-caret. GCC displays this as a "note":
    $ g++-mp-9 -Wall /tmp/foo.c -c
    /tmp/foo.c:1:10: error: redefinition of 'int foo'
        1 | int foo, foo;
          |          ^~~
    /tmp/foo.c:1:5: note: 'int foo' previously declared here
        1 | int foo, foo;
          |     ^~~
 Likewise for Clang, contrary to what I believed (because "note:" is written
 in black, so it doesn't show in my terminal :-)
    $ clang++-mp-8.0 -Wall /tmp/foo.c -c
    clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
    /tmp/foo.c:1:10: error: redefinition of 'foo'
    int foo, foo;
             ^
    /tmp/foo.c:1:5: note: previous definition is here
    int foo, foo;
        ^
    1 error generated.
 ** Better design for diagnostics
 The current implementation of diagnostics is adhoc, it grew organically.  It
 works as a series of calls to several functions, with dependency of the
 latter calls on the former.  For instance:
      complain (&sym->location,
                sym->content->status == needed ? complaint : Wother,
                _("symbol %s is used, but is not defined as a token"
                  " and has no rules; did you mean %s?"),
                quote_n (0, sym->tag),
                quote_n (1, best->tag));
      if (feature_flag & feature_caret)
        location_caret_suggestion (sym->location, best->tag, stderr);
 We should rewrite this in a more FP way:
 1. build a rich structure that denotes the (complete) diagnostic.
   "Complete" in the sense that it also contains the suggestions, the list
   of possible matches, etc.
 2. send this to the pretty-printing routine.  The diagnostic structure
   should be sufficient so that we can generate all the 'format' of
   diagnostics, including the fixits.
 If properly done, this diagnostic module can be detached from Bison and be
 put in gnulib.  It could be used, for instance, for errors caught by
 xgettext.
 There's certainly already something alike in GCC.  At least that's the
 impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
 page:
 https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
 ** consistency
 token vs terminal
@@ -137,6 +216,11 @@ itself uses int (for yylen for instance), yet stack is based on size_t.
 Maybe locations should also move to ints.
 Paul Eggert already covered most of this.  But before publishing these
 changes, we need to ask our C++ users if they agree with that change, or if
 we need some migration path.  Could be a %define variable, or simply
 %require "3.5".
 ** Graphviz display code thoughts
 The code for the --graph option is over two files: print_graph, and
 graphviz. This is because Bison used to also produce VCG graphs, but since
@@ -156,9 +240,6 @@ Little effort seems to have been given to factoring these files and their
 rint{,-xml} counterpart. We would very much like to re-use the pretty format
 of states from .output for the graphs, etc.
 Also, the underscore in print_graph.[ch] isn't very fitting considering the
 dashes in the other filenames.
 Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
 ** push-parser
@@ -296,12 +377,17 @@ we do the same in yacc.c.
 as we don't lose bits to padding.  For instance the typical stack for states
 will use 8 bits, while it is likely to consume 32 bits in a struct.
-We need trustworth benching for Bison, for all our backends.
+We need trustworthy benchmarks for Bison, for all our backends.  Akim has a
 few things scattered around; we need to put them in the repo, and make them
 more useful.
 ** yysyntax_error
 The code bw glr.c and yacc.c is really alike, we can certainly factor
 some parts.
 This should be worked on when we also address the expected improvements for
 error generation (e.g., i18n).
 * Report
@@ -342,23 +428,25 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
 * Extensions
 ** Multiple start symbols
 Would be very useful when parsing closely related languages.  The idea is to
-declared several start symbols, for instance
+declare several start symbols, for instance
-    %start: stmt expr
+    %start stmt expr
    %%
    stmt: ...
    expr: ...
-and to generate parse, parse_stmt and parse_expr.  Technically, the above
+and to generate parse(), parse_stmt() and parse_expr().  Technically, the
-grammar would be transformed into
+above grammar would be transformed into
-   %start: yy_start
+   %start yy_start
   %token YY_START_STMT YY_START_EXPR
   %%
   yy_start: YY_START_STMT stmt | YY_START_EXPR expr
-so that there are no conflicts in the grammar (as would undoubtedly happen
+so that there are no new conflicts in the grammar (as would undoubtedly
-with yy_start: stmt | expr).  Then all that remains to do is to adjust the
+happen with yy_start: stmt | expr).  Then adjust the skeletons so that this
-skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be
+initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
-shifted first.
+corresponding parse function.
 ** Better error messages
 The users are not provided with enough tools to forge their error messages.
@@ -379,8 +467,8 @@ https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html
 However, there are many other things to do before having such a feature,
 because I don't want a % equivalent to #include (which we all learned to
 hate).  I want something that builds "modules" of grammars, and assembles
-them together, paying attention to keep separate bits separates, in
+them together, paying attention to keep separate bits separated, in pseudo
-pseudo name spaces.
+name spaces.
 ** Push parsers
 There is demand for push parsers in Java and C++.  And GLR I guess.
@@ -463,6 +551,7 @@ It is unfortunate that there is a total order for precedence.  It
 makes it impossible to have modular precedence information.  We should
 move to partial orders (sounds like series/parallel orders to me).
 This is a prerequisite for modules.
 * $undefined
 From Hans: