TODO: more updates

2026-04-23 18:19:38 +00:00 · 2019-10-15 08:28:15 +02:00
parent ee35055b49
commit b47340982b
1 changed files with 106 additions and 17 deletions
@@ -76,8 +76,9 @@ have it?

 ** clean up (Akim Demaille)
 Do not work on these items now, as I (Akim) have branches with a lot of
-changes in this area, and no desire to have to fix conflicts.  These
-cleaning up will happen after my branches have been merged.
+changes in this area (hitting several files), and no desire to have to fix
+conflicts.  Addressing these items will happen after my branches have been
+merged.

 *** lalr.c
 Introduce a goto struct, and use it in place of from_state/to_state.
@@ -128,6 +129,84 @@ $ ./tests/testsuite -l | grep errors | sed q
  38: input.at:1730      errors

 * Short term
+** Stop indentation in diagnostics
+Before Bison 2.7, we printed "flatly" the dependencies in long diagnostics:
+
+    input.y:2.7-12: %type redeclaration for exp
+    input.y:1.7-12: previous declaration
+
+In Bison 2.7, we indented them
+
+    input.y:2.7-12: error: %type redeclaration for exp
+    input.y:1.7-12:     previous declaration
+
+Later we quoted the source in the diagnostics, and today we have:
+
+    /tmp/foo.y:1.12-14: warning: symbol FOO redeclared [-Wother]
+        1 | %token FOO FOO
+          |            ^~~
+    /tmp/foo.y:1.8-10:      previous declaration
+        1 | %token FOO FOO
+          |        ^~~
+
+The indentation is no longer helping.  We should probably get rid of it, or
+maybe keep it only when -fno-caret. GCC displays this as a "note":
+
+    $ g++-mp-9 -Wall /tmp/foo.c -c
+    /tmp/foo.c:1:10: error: redefinition of 'int foo'
+        1 | int foo, foo;
+          |          ^~~
+    /tmp/foo.c:1:5: note: 'int foo' previously declared here
+        1 | int foo, foo;
+          |     ^~~
+
+Likewise for Clang, contrary to what I believed (because "note:" is written
+in black, so it doesn't show in my terminal :-)
+
+    $ clang++-mp-8.0 -Wall /tmp/foo.c -c
+    clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
+    /tmp/foo.c:1:10: error: redefinition of 'foo'
+    int foo, foo;
+             ^
+    /tmp/foo.c:1:5: note: previous definition is here
+    int foo, foo;
+        ^
+    1 error generated.
+
+** Better design for diagnostics
+The current implementation of diagnostics is adhoc, it grew organically.  It
+works as a series of calls to several functions, with dependency of the
+latter calls on the former.  For instance:
+
+      complain (&sym->location,
+                sym->content->status == needed ? complaint : Wother,
+                _("symbol %s is used, but is not defined as a token"
+                  " and has no rules; did you mean %s?"),
+                quote_n (0, sym->tag),
+                quote_n (1, best->tag));
+      if (feature_flag & feature_caret)
+        location_caret_suggestion (sym->location, best->tag, stderr);
+
+We should rewrite this in a more FP way:
+
+1. build a rich structure that denotes the (complete) diagnostic.
+   "Complete" in the sense that it also contains the suggestions, the list
+   of possible matches, etc.
+
+2. send this to the pretty-printing routine.  The diagnostic structure
+   should be sufficient so that we can generate all the 'format' of
+   diagnostics, including the fixits.
+
+If properly done, this diagnostic module can be detached from Bison and be
+put in gnulib.  It could be used, for instance, for errors caught by
+xgettext.
+
+There's certainly already something alike in GCC.  At least that's the
+impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
+page:
+
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
+
 ** consistency
 token vs terminal

@@ -137,6 +216,11 @@ itself uses int (for yylen for instance), yet stack is based on size_t.

 Maybe locations should also move to ints.

+Paul Eggert already covered most of this.  But before publishing these
+changes, we need to ask our C++ users if they agree with that change, or if
+we need some migration path.  Could be a %define variable, or simply
+%require "3.5".
+
 ** Graphviz display code thoughts
 The code for the --graph option is over two files: print_graph, and
 graphviz. This is because Bison used to also produce VCG graphs, but since
@@ -156,9 +240,6 @@ Little effort seems to have been given to factoring these files and their
 rint{,-xml} counterpart. We would very much like to re-use the pretty format
 of states from .output for the graphs, etc.

-Also, the underscore in print_graph.[ch] isn't very fitting considering the
-dashes in the other filenames.
-
 Since graphviz dies on medium-to-big grammars, maybe consider an other tool?

 ** push-parser
@@ -296,12 +377,17 @@ we do the same in yacc.c.
 as we don't lose bits to padding.  For instance the typical stack for states
 will use 8 bits, while it is likely to consume 32 bits in a struct.

-We need trustworth benching for Bison, for all our backends.
+We need trustworthy benchmarks for Bison, for all our backends.  Akim has a
+few things scattered around; we need to put them in the repo, and make them
+more useful.

 ** yysyntax_error
 The code bw glr.c and yacc.c is really alike, we can certainly factor
 some parts.

+This should be worked on when we also address the expected improvements for
+error generation (e.g., i18n).
+

 * Report

@@ -342,23 +428,25 @@ LORIA, INRIA Nancy - Grand Est, Nancy, France
 * Extensions
 ** Multiple start symbols
 Would be very useful when parsing closely related languages.  The idea is to
-declared several start symbols, for instance
+declare several start symbols, for instance

-    %start: stmt expr
+    %start stmt expr
    %%
    stmt: ...
    expr: ...

-and to generate parse, parse_stmt and parse_expr.  Technically, the above
-grammar would be transformed into
+and to generate parse(), parse_stmt() and parse_expr().  Technically, the
+above grammar would be transformed into

-   %start: yy_start
+   %start yy_start
+   %token YY_START_STMT YY_START_EXPR
+   %%
   yy_start: YY_START_STMT stmt | YY_START_EXPR expr

-so that there are no conflicts in the grammar (as would undoubtedly happen
-with yy_start: stmt | expr).  Then all that remains to do is to adjust the
-skeletons so that this initial token (YY_START_STMT, YY_START_EXPR) be
-shifted first.
+so that there are no new conflicts in the grammar (as would undoubtedly
+happen with yy_start: stmt | expr).  Then adjust the skeletons so that this
+initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
+corresponding parse function.

 ** Better error messages
 The users are not provided with enough tools to forge their error messages.
@@ -379,8 +467,8 @@ https://lists.gnu.org/archive/html/bison-patches/2015-09/msg00000.html
 However, there are many other things to do before having such a feature,
 because I don't want a % equivalent to #include (which we all learned to
 hate).  I want something that builds "modules" of grammars, and assembles
-them together, paying attention to keep separate bits separates, in
-pseudo name spaces.
+them together, paying attention to keep separate bits separated, in pseudo
+name spaces.

 ** Push parsers
 There is demand for push parsers in Java and C++.  And GLR I guess.
@@ -463,6 +551,7 @@ It is unfortunate that there is a total order for precedence.  It
 makes it impossible to have modular precedence information.  We should
 move to partial orders (sounds like series/parallel orders to me).

+This is a prerequisite for modules.

 * $undefined
 From Hans: