Commit Graph

7328 Commits

Author SHA1 Message Date
Akim Demaille
330552ea49 yacc.c: push: don't clear the parser state when accepting/rejecting
Currently when a push parser finishes its parsing (i.e., it did not
return YYPUSH_MORE), it also clears its state.  It is therefore
impossible to see if it had parse errors.

In the context of autocompletion, because error recovery might have
fired, the parser is actually already in a different state.  For
instance on `(1 + + <TAB>` in the bistromathic, because there's a
`exp: "(" error ")"` recovery rule, `1 + +` tokens have already been
popped, replaced by `error`, and autocompletions think we are ready
for the closing ")".  So here, we would like to see if there was a
syntax error, yet `yynerrs` was cleared.

In the case of a successful parse, we still have a problem: if error
recovery succeeded, we won't know it, since, again, `yynerrs` is
clearer.

It seems much more natural to leave the parser state available for
analysis when there is a failure.

To reuse the parser, we should either:

1. provide an explicit means to reinitialize a parser state for future
   parses.

2. automatically reset the parser state when it is used in a new
   parse.

Option 2 requires to check whether we need to reinitialize the parser
each time we call `yypush_parse`, i.e., each time we give a new token.
This seems expensive compared to Option 1, but benchmarks revealed no
difference.  Option 1 is incompatible with the documentation
("After `yypush_parse` returns a status other than `YYPUSH_MORE`, the
parser instance `yyps` may be reused for a new parse.").

So Option 2 wins, reusing the private `yynew` member to record that a
parse was finished, and therefore that the state must reset in the
next call to `yypull_parse`.

While at it, this implementation now reuses the previously enlarged
stacks from one parse to another.

* data/skeletons/yacc.c (yypstate_new): Set up the stacks in their
initial configurations (setting their bottom to the stack array), and
use yypstate_clear to reset them (moving their top to their bottom).
(yypstate_delete): Adjust.
(yypush_parse): At the beginning, clear yypstate if needed, and at the
end, record when yypstate needs to be clearer.

* examples/c/bistromathic/parse.y (expected_tokens): Do not propose
autocompletion when there are parse errors.
* examples/c/bistromathic/bistromathic.test: Check that case.
2020-06-29 19:36:41 +02:00
Akim Demaille
7c609859ee bistromathic: don't display undefined locations
Currently, completion when there is a syntax error shows broken
locations.

* examples/c/bistromathic/parse.y (expected_tokens): Initialize the
location.
* examples/c/bistromathic/bistromathic.test: Check that.
2020-06-29 19:10:05 +02:00
Akim Demaille
ed10c308fa yacc.c: simplify initialization of push parsers
The previous commit ("yacc.c: declare and initialize and the same
time") made b4_initialize_parser_state_variables useless.

* data/skeletons/yacc.c (b4_initialize_parser_state_variables): Inline
into...
(yypstate_clear): here.
2020-06-29 19:10:05 +02:00
Akim Demaille
b91566edd1 regen 2020-06-29 19:10:05 +02:00
Akim Demaille
29520abb3b yacc.c: declare and initialize and the same time
In order to factor the code of push and pull parsers, the declaration
of the parser's state variable was common (being local variable in
pull parsers, and struct members in push parsers).  This result in
rather poor style in pull parser, with first variable declarations,
and then their initializations.

The initialization is about to differ between push and pull parsers,
so it is no longer worth keeping both cases together.

* data/skeletons/yacc.c (b4_declare_parser_state_variables): Accept an
argument, and when it is set, initialize the variables.
Adjust dependencies.
2020-06-29 19:10:05 +02:00
Akim Demaille
2491de1eef yacc.c: style changes in push mode
* data/skeletons/yacc.c: here.
2020-06-29 19:10:05 +02:00
Akim Demaille
ec207d1bb2 yacc.c: simplify yypull_parse
Currently yypull_parse takes a yypstate* as argument, and accepts it
to be NULL.  This does not seem to make a lot of sense: rather it is
its callers that should do that.

I believe this is historical: yypull_parse was introduced
first (c3d503425f), with yyparse being a
macro.  So yyparse could hardly deal with memory allocation properly.
In 7172e23e8f that yyparse was turned
into a genuine function.  At that point, it should have allocated its
own yypstate*, which would have left yypull_parse deal with only one
single non-null ypstate* argument.

Fortunately, it is nowhere documented that it is valid to pass NULL to
yypull_parse.  It is now forbidden.

* data/skeletons/yacc.c (yypull_parse): Don't allocate a yypstate.
Needs a location to issue the error message.
(yyparse): Allocate the yypstate.
2020-06-29 19:10:05 +02:00
Akim Demaille
688b3404a2 doc: tidy the text files
* etc/README: Rename/reformat as...
* etc/README.md: this.
And ship it.
2020-06-29 19:10:05 +02:00
Akim Demaille
cd6ef1e7d7 bench: simplify the rand target
* etc/bench.pl.in: There is no need to recompile the bench cases
themselves.
2020-06-29 19:10:05 +02:00
Akim Demaille
2b518d621f bench: make it easy to edit the generated files
* etc/bench.pl.in (&compile): Generate rules that compile the
generated files independently of the source files.
2020-06-29 19:08:15 +02:00
Akim Demaille
1ae4f1d329 tests: don't use $VERBOSE
It is used by the test suite itself, which results in this test
failing.

* tests/c++.at: Use $DEBUG, not $VERBOSE.
2020-06-29 06:45:44 +02:00
Akim Demaille
160df55c56 doc: overhaul of the readmes
* README-hacking.md (Working from the Repository): Make it first to
make it easier to find the instructions to build from the repo.
(Implementation Notes): New.
* README: Provide more links.
2020-06-28 14:57:41 +02:00
Akim Demaille
e0b0a67b86 java: rename package as api.package
* data/skeletons/lalr1.java: here.
* doc/bison.texi: Update.
* src/muscle-tab.c: Ensure backward compat.
* tests/java.at: Check it.
2020-06-28 09:49:00 +02:00
Akim Demaille
0e5cbd38b2 style: shift/reduce, not shift-reduce
* src/reader.c: here.
2020-06-28 08:33:24 +02:00
Akim Demaille
feb0bb0a59 style: rename endtoken as eoftoken
* src/symtab.h, src/symtab.c (endtoken): Rename as...
(eoftoken): this.
Adjust dependencies.
2020-06-27 17:31:59 +02:00
Akim Demaille
d796e11f8f news: fixes
Reported by Jacob L. Mandelson.

* NEWS: here.
2020-06-27 17:04:50 +02:00
Akim Demaille
0895858d8e style: use 'nonterminal' consistently
* doc/bison.texi: Formatting changes.
* src/gram.h, src/gram.c (nvars): Rename as...
(nnterms): this.
Adjust dependencies.
(section): New.  Use it.
Replace "non terminal" and "non-terminal" by "nonterminal".
2020-06-27 11:39:32 +02:00
Akim Demaille
4efb2f7bd2 doc: parse.assert in C++ requires RTTI
* doc/bison.texi (%define Summary): Say it.
2020-06-27 10:31:59 +02:00
Akim Demaille
eeafc706e8 c++: by default, use const std::string for file names
Reported by Martin Blais and Yuriy Solodkyy.
https://lists.gnu.org/r/help-bison/2020-05/msg00011.html
https://lists.gnu.org/r/bug-bison/2020-06/msg00038.html

While at it, modernize filename_type as api.filename.type and document
it properly.

* data/skeletons/c++.m4 (filename_type): Rename as...
(api.filename.type): this.
Default to const std::string.
* data/skeletons/location.cc (position, location): Expose the
filename_type type.
Use api.filename.type.
* doc/bison.texi (%define Summary): Document api.filename.type.
(C++ Location Values): Document position::filename_type.
* src/muscle-tab.c (muscle_percent_variable_update): Ensure backward
compatibility.
* tests/c++.at: Check that using const file names is ok.
tests/input.at: Check backward compat.
2020-06-27 10:06:00 +02:00
Akim Demaille
cf6d8d0631 ielr: fix crash on memory management
Reported by Dwight Guth.
https://lists.gnu.org/r/bug-bison/2020-06/msg00037.html

* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
Beware that SBITSET__FOR_EACH nests _two_ for-loops, so "break" does
not actually break out of it.
That was the only occurrence in the code.
* src/Sbitset.h (SBITSET__FOR_EACH): Warn passersby.
2020-06-27 08:16:07 +02:00
Akim Demaille
8f44164443 style: factor the access to a rule from its items
* src/counterexample.c (item_rule): Move to...
* src/counterexample.h: here.
* src/AnnotationList.c, src/counterexample.c, src/ielr.c: Use it.
2020-06-25 19:36:07 +02:00
Akim Demaille
1001f48416 style: clean up nullable
* src/nullable.c: Reduce scopes.
Prefer `r` to `rules_ruleno`, which is truly an ugly name.
2020-06-25 19:36:07 +02:00
Akim Demaille
3be228f64c style: clean up ielr
* src/AnnotationList.c, src/ielr.c: Fix include order.
Prefer `res` to `result`.
Reduce scopes.
Be free of the oldish 76 cols limitation when it clutters too much the
code.
Denest when possible (we're starving for horizontal width).
2020-06-25 19:30:06 +02:00
Akim Demaille
670c7e7a75 don't use strlen to compute visual width
* src/output.c (prepare_symbol_names): Use mbswidth.
2020-06-23 08:27:26 +02:00
Akim Demaille
c4b1a2b68f doc: use dot/'•' rather than point/'.'
AFAICT, "dotted rule" is a more frequent synonym of "item" than
"pointed rule".  So let's migrate to using "dot" only.

* doc/bison.texi: Use dot/'•' rather than point/'.'.

* src/print-xml.c (print_core): Use dot rather than point.  This is
not backward compatible, but AFAICT, we don't have actual user of the
XML output (but ourselves).  So...
* data/xslt/xml2dot.xsl, data/xslt/xml2text.xsl,
* data/xslt/xml2xhtml.xsl, tests/report.at: ... adjust.
2020-06-23 07:37:29 +02:00
Akim Demaille
b65bd16e45 cex: display all the S/R conflicts, not just one per (state, rule)
Before this commit, on

    %%
    exp
    : "if" exp "then" exp
    | "if" exp "then" exp "else" exp
    | exp "+" exp
    | "num"

we used to not display the third counterexample below:

    Shift/reduce conflict on token "+":
      Example              exp "+" exp . "+" exp
      First derivation     exp ::=[ exp ::=[ exp "+" exp . ] "+" exp ]
      Second derivation    exp ::=[ exp "+" exp ::=[ exp . "+" exp ] ]

    Shift/reduce conflict on token "else":
      Example              "if" exp "then" "if" exp "then" exp . "else" exp
      First derivation     exp ::=[ "if" exp "then" exp ::=[ "if" exp "then" exp . ] "else" exp ]
      Second derivation    exp ::=[ "if" exp "then" exp ::=[ "if" exp "then" exp . "else" exp ] ]

    Shift/reduce conflict on token "+":
      Example              "if" exp "then" exp . "+" exp
      First derivation     exp ::=[ exp ::=[ "if" exp "then" exp . ] "+" exp ]
      Second derivation    exp ::=[ "if" exp "then" exp ::=[ exp . "+" exp ] ]

    Shift/reduce conflict on token "+":
      Example              "if" exp "then" exp "else" exp . "+" exp
      First derivation     exp ::=[ exp ::=[ "if" exp "then" exp "else" exp . ] "+" exp ]
      Second derivation    exp ::=[ "if" exp "then" exp "else" exp ::=[ exp . "+" exp ] ]

* src/counterexample.c (counterexample_report_state): Don't stop of
the first conflicts.
* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.
2020-06-23 06:56:04 +02:00
Akim Demaille
0f120354b6 cex: don't display twice unifying examples if there is no color
It makes no sense, and is actually confusing, to display twice the
same example with no visible difference.

* src/complain.h, src/complain.c (is_styled): New.
* src/counterexample.c (print_counterexample): Display the unified
example a second time only if it makes a difference.
* tests/conflicts.at, tests/counterexample.at, tests/report.at: Adjust.
* tests/diagnostics.at: Make sure we do display the unifying examples
twice when colors are enabled.  And check those colors.
2020-06-22 19:33:30 +02:00
Vincent Imbimbo
69e3b405d9 cex: fix reporting of null nonterminals
I implemented this to print A ::= [ ], but A ::= [ %empty ] might be
clearer.

* src/parse-simulation.c (nullable_closure): Don't generate null
nonterminal derivations as leaves.
* src/derivation.c (derivation_print_impl): Don't print seperator
spaces for null nonterminal.
* tests/counterexample.at: Update test results.
2020-06-22 07:11:31 +02:00
Akim Demaille
3dd8f2305a cex: use the bullet in HTML
* data/xslt/xml2xhtml.xsl: here.
2020-06-22 07:02:29 +02:00
Akim Demaille
9e75066819 cex: style changes
* src/counterexample.c: Simplify a bit.
* src/parse-simulation.c, src/parse-simulation.h: Enforce coding style.
2020-06-19 08:02:18 +02:00
Akim Demaille
efb65daa36 c++: get rid of global_tokens_and_yystype
This was a hack to make it easier for people to migrate from yacc.c to
lalr1.cc and from glr.c to glr.cc: when set, YYSTYPE and YYLTYPE were
`#defined`.  It was never documented (just mentioned in NEWS for Bison
2.2, 2006-05-19), but was used to simplify the test suite.  Stop that:
adjust the test suite to the skeletons, not the converse.

In C++ use yy::parser::semantic_type, yy::parser::location_type, and
yy::parser::token::MY_TOKEN, instead of YYSTYPE, YYLTYPE and MY_TOKEN.

* data/skeletons/glr.cc, data/skeletons/lalr1.cc: Remove its support.
* tests/actions.at, tests/c++.at, tests/calc.at: Adjust.
2020-06-16 08:14:42 +02:00
Akim Demaille
e077bf1ebc cex: don't assume the terminal supports "•"
Use of print_unicode_char suggested by Bruno Haible.
https://lists.gnu.org/r/bug-gettext/2020-06/msg00012.html

* src/gram.h (print_dot_fallback, print_dot): New.
* src/gram.c, src/derivation.c: Use it.
* tests/counterexample.at, tests/report.at: Adjust the test suite.
* .travis.yml, README-hacking.md: Adjust.
2020-06-16 07:58:40 +02:00
Akim Demaille
c35e829a76 cex: also include in the report on --report=counterexamples
And let --report=all include the counterexamples.

* src/getargs.h, src/getargs.c (report_cex): New.
* src/main.c: Compute counterexamples when -rcex is specified.
* src/print.c: Include the counterexamples when -rcex is specified.

* tests/conflicts.at, tests/existing.at, tests/local.at: Adjust.
2020-06-16 07:30:46 +02:00
Akim Demaille
d4f854e5b2 cex: also include the counterexamples in the report
The report is the best place to show the details about
counterexamples, since we have the state right under the nose.

For instance:

State 7

    1 exp: exp . "⊕" exp
    2    | exp . "+" exp
    2    | exp "+" exp .  [$end, "+", "⊕"]
    3    | exp . "+" exp
    3    | exp "+" exp .  [$end, "+", "⊕"]

    "⊕"  shift, and go to state 6

    $end      reduce using rule 2 (exp)
    $end      [reduce using rule 3 (exp)]
    "+"       reduce using rule 2 (exp)
    "+"       [reduce using rule 3 (exp)]
    "⊕"       [reduce using rule 2 (exp)]
    "⊕"       [reduce using rule 3 (exp)]
    $default  reduce using rule 2 (exp)

    Conflict between rule 2 and token "+" resolved as reduce (%left "+").

    Shift/reduce conflict on token "⊕":
        2 exp: exp "+" exp .
        1 exp: exp . "⊕" exp
      Example                  exp "+" exp • "⊕" exp
      First derivation         exp ::=[ exp ::=[ exp "+" exp • ] "⊕" exp ]
      Example                  exp "+" exp • "⊕" exp
      Second derivation        exp ::=[ exp "+" exp ::=[ exp • "⊕" exp ] ]

    Reduce/reduce conflict on tokens $end, "+", "⊕":
        2 exp: exp "+" exp .
        3 exp: exp "+" exp .
      Example                  exp "+" exp •
      First derivation         exp ::=[ exp "+" exp • ]
      Example                  exp "+" exp •
      Second derivation        exp ::=[ exp "+" exp • ]

    Shift/reduce conflict on token "⊕":
        3 exp: exp "+" exp .
        1 exp: exp . "⊕" exp
      Example                  exp "+" exp • "⊕" exp
      First derivation         exp ::=[ exp ::=[ exp "+" exp • ] "⊕" exp ]
      Example                  exp "+" exp • "⊕" exp
      Second derivation        exp ::=[ exp "+" exp ::=[ exp • "⊕" exp ] ]

* src/conflicts.h, src/conflicts.c (has_conflicts): New.
* src/counterexample.h, src/counterexample.c (print_counterexample):
Add a `prefix` argument.
(counterexample_report_shift_reduce)
(counterexample_report_reduce_reduce): Show the items when there's a
prefix.
* src/state-item.h, src/state-item.c (print_state_item):
Add a `prefix` argument.
* src/derivation.h, src/derivation.c (derivation_print)
(derivation_print_leaves): Add a prefix argument.
* src/print.c (print_state): When -Wcex is enabled, show the
conflicts.
* tests/report.at: Adjust.
2020-06-16 07:30:26 +02:00
Akim Demaille
35c0fe6789 cex: indent the diagnostics to highlight the structure
Instead of

    Shift/reduce conflict on token D:
    Example              A a • D
    First derivation     s ::=[ A a a ::=[ b ::=[ c ::=[ • ] ] ] d ::=[ D ] ]
    Example              A a • D
    Second derivation    s ::=[ A a d ::=[ • D ] ]

display

    Shift/reduce conflict on token D:
      Example              A a • D
      First derivation     s ::=[ A a a ::=[ b ::=[ c ::=[ • ] ] ] d ::=[ D ] ]
      Example              A a • D
      Second derivation    s ::=[ A a d ::=[ • D ] ]

* src/counterexample.c (print_counterexample): Indent.
* tests/counterexample.at: Adjust.
2020-06-16 07:29:46 +02:00
Akim Demaille
22f62414f9 cex: don't report the items
Showing the items (with the state numbers) is really something we
should restrict to the report.

* src/counterexample.c (counterexample_report_shift_reduce)
(counterexample_report_reduce_reduce): Don't show the pointed rules,
we will do that in the report.
* tests/counterexample.at: Adjust.
2020-06-16 07:29:46 +02:00
Akim Demaille
9206b15c4e cex: make sure traces go to stderr
* src/parse-simulation.h, src/parse-simulation.c (print_parse_state):
here.
2020-06-16 07:29:46 +02:00
Akim Demaille
5edac5e15a cex: add an argument to the reporting functions to specify the stream
* src/conflicts.c (find_state_item_number, report_state_counterexamples):
Move to...
* src/counterexample.h, src/counterexample.c (find_state_item_number)
(counterexample_report_state): this.
Add support for `out` as an argument.
(counterexample_report_reduce_reduce, counterexample_report_shift_reduce):
Accept an `out` argument, and be static.
2020-06-16 07:29:46 +02:00
Akim Demaille
1c3189734c style: more uses of const
* src/print.c, src/state.h, src/state.c: here.
2020-06-16 07:29:46 +02:00
Akim Demaille
c662b23735 Merge 'maint'
* upstream/maint:
  maint: post-release administrivia
  version 3.6.4
  glr.cc: don't leak glr.c/glr.cc scaffolding to the user

Some fixes were needed to adjust to recent changes in glr.cc and
glr.c.

* data/skeletons/glr.cc: Stop messing with the user's epilogue to
insert glr.cc code.  We need that code to be inserted _before_ the
user's epilogue, not after.  So define b4_glr_cc_pre_epilogue.
* data/skeletons/glr.c: Use it.
2020-06-16 07:16:00 +02:00
Akim Demaille
627fecb19e maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2020-06-15 20:39:01 +02:00
Akim Demaille
2a069f22c6 version 3.6.4
* NEWS: Record release date.
v3.6.4
2020-06-15 20:18:50 +02:00
Akim Demaille
3f4ffea6f2 glr.cc: don't leak glr.c/glr.cc scaffolding to the user
Until we have a decent reimplementation of glr.cc, we have to use
tricks to shoehorn C++ symbols to the C engine of glr.c.  Some of them
are done via #define.  Unfortunately in Bison 3.6 some of these we
done in the header file, which broke valid user code.

Reported by Egor Pugin.
https://lists.gnu.org/r/bug-bison/2020-06/msg00003.html

* data/skeletons/glr.cc: Stop playing tricks with b4_pre_epilogue.
(b4_glr_cc_setup, b4_glr_cc_cleanup): New.
Much cleaner way to instal glr.cc's scaffolding around glr.c.
* data/skeletons/glr.c: Adjust to use them.
2020-06-15 20:18:47 +02:00
Akim Demaille
251e1b137f reports: the column width differs from the byte count
From

    "number"          shift, and go to state 1
    "Ñùṃéℝô"  shift, and go to state 2

to

    "number"  shift, and go to state 1
    "Ñùṃéℝô"  shift, and go to state 2

* src/print.c: Use mbswidth, not strlen, to compute visual columns.
* tests/report.at: Adjust.
2020-06-13 17:21:51 +02:00
Akim Demaille
efbcadeca7 reports: don't escape the labels
Currently we use "quotearg" to escape the strings output in Dot.  As a
result, if the user's locale is C for instance, all the non-ASCII are
escaped.  Unfortunately graphviz does not interpret this style of
escaping.

For instance:

    5 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]

was displayed as a sequence of numbers.  We now output:

    5 -> 2 [style=solid label="\"Ñùṃéℝô\""]

independently of the user's locale.

* src/system.h (obstack_backslash): New.
* src/graphviz.h, src/graphviz.c (escape): Remove, use
obstack_backslash instead.
* src/print-graph.c: Likewise.
* tests/report.at: Adjust.
2020-06-13 16:58:13 +02:00
Akim Demaille
e4d33cf579 regen 2020-06-13 16:58:03 +02:00
Akim Demaille
5855da4722 parser: keep string aliases as the user wrote it
Currently our scanner decodes all the escapes in the strings, and we
later reescape the strings when we emit them.

This is troublesome, as we do not respect the user input.  For
instance, when the user writes in UTF-8, we destroy her string when we
write it back.  And this shows everywhere: in the reports we show the
escaped string instead of the actual alias:

    0 $accept: . exp $end
    1 exp: . exp "\342\212\225" exp
    2    | . exp "+" exp
    3    | . exp "+" exp
    4    | . "number"
    5    | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"

    "number"                                                    shift, and go to state 1
    "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"  shift, and go to state 2

This commit preserves the user's exact spelling of the string aliases,
instead of interpreting the escapes and then reescaping.  The report
now shows:

    0 $accept: . exp $end
    1 exp: . exp "⊕" exp
    2    | . exp "+" exp
    3    | . exp "+" exp
    4    | . "number"
    5    | . "Ñùṃéℝô"

    "number"          shift, and go to state 1
    "Ñùṃéℝô"  shift, and go to state 2

Likewise, the XML (and therefore HTML) outputs are fixed.

* src/scan-gram.l (STRING, TSTRING): Do not interpret the escapes in
the resulting string.
* src/parse-gram.y (unquote, parser_init, parser_free, unquote_free)
(handle_defines, handle_language, obstack_for_unquote): New.
Use them to unquote where needed.
* tests/regression.at, tests/report.at: Update.
2020-06-13 16:56:40 +02:00
Akim Demaille
5d5e1df1dc tests: check reports with conflicts and UTF-8
This is to record the current state of the report, which escapes the
UTF-8 characters (as parse.error="verbose" does), but shouldn't (as
parse.error="detailed" does).

* tests/report.at: here.
2020-06-13 15:58:32 +02:00
Akim Demaille
cef13e11f5 style: factor common bits about string scanning
* src/scan-gram.l: here.
2020-06-13 15:20:56 +02:00
Akim Demaille
b7fbfd050e style: introduce & use STRING_1GROW
* src/flex-scanner.h (STRING_1GROW): New.
* src/scan-gram.l, src/scan-skel.l: Use it.
2020-06-13 15:20:56 +02:00