tables: fix handling for useless tokens

In some rare conditions, the generated parser can be wrong when there
are useless tokens.

Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/72

Balázs managed to prove that the bug was introduced in

    commit af1c6f973a
    Author: Theophile Ranquet <ranquet@lrde.epita.fr>
    Date:   Tue Nov 13 10:38:49 2012 +0000

    tables: use bitsets for a performance boost

    Suggested by Yuri at
    <http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.

    The improvement is marginal for most grammars, but notable for large
    grammars (e.g., PosgreSQL's postgre.y), and very large for the
    sample.y grammar submitted by Yuri in
    http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
    Measured with --trace=time -fsyntax-only.

    parser action tables    postgre.y     sample.y
    Before                 0,129 (44%)  37,095 (99%)
    After                  0,117 (42%)   5,046 (93%)

    * src/tables.c (pos): Replace this set of integer coded as an unsorted
    array of integers with...
    (pos_set): this bitset.

which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.

That patch introduces a bitset to represent a set of integers.  It
managed negative integers by using a (fixed) base (the smallest
integer to represent).  It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.

So we need to be able to accept negative integers that are smaller
than the current base.

"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:

    /* Store PLACE into POS_SET.  PLACE might not belong to the set
       of possible values for instance with useless tokens.  It
       would be more satisfying to eliminate the need for this
       'if'.  */

This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases

* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
This commit is contained in:
Akim Demaille
2021-01-23 18:40:15 +01:00
parent 2c294c1325
commit c22902e360
2 changed files with 79 additions and 11 deletions

7
NEWS
View File

@@ -8,13 +8,18 @@ GNU Bison NEWS
In some cases counterexample generation could crash. This is fixed.
*** Fix Table Generation
In some very rare conditions, when there are many useless tokens, it was
possible to generate incorrect parsers.
*** GLR parsers now support %merge together with api.value.type=union.
*** C++ parsers use noexcept in more places.
*** Generated parsers avoid some warnings about signedness issues.
*** C-language parsers no avoid warnings from pedantic clang.
*** C-language parsers now avoid warnings from pedantic clang.
*** C-language parsers now work around quirks of HP-UX 11.23 (2003).