Commit Graph

2925 Commits

Author SHA1 Message Date
Akim Demaille
5b19f91ccf multistart: check duplicates
* src/symlist.h, src/symlist.c (symbol_list_find_symbol)
(symbol_list_last): New.
(symbol_list_append): Use symbol_list_last.
* src/reader.c (grammar_start_symbols_add): Check and discard duplicates.
* tests/input.at (Duplicate %start symbol): New.
* tests/reduce.at (Bad start symbols): Add the multistart keyword.
2020-11-30 16:48:03 +01:00
Akim Demaille
7fe9205b9f style: change the format of a debugging function
* src/symlist.c (symbol_list_syms_print): Use braces to make traces
easier to read.
2020-11-22 16:01:05 +01:00
Akim Demaille
d798851e48 style: rename grammar_start_symbols_set as grammar_start_symbols_add
* src/reader.h, src/reader.c (grammar_start_symbols_set): Rename as...
(grammar_start_symbols_add): this.
Adjust dependencies.
2020-11-22 11:18:20 +01:00
Akim Demaille
23472033ee Merge branch 'maint'
* maint:
  c++: shorten the assertions that check whether tokens are correct
  c++: don't glue functions together
  lalr1.cc: YY_ASSERT should use api.prefix
  c++: don't use YY_ASSERT at all if parse.assert is disabled
  c++: style: follow the Bison m4 quoting pattern
  yacc.c: provide the Bison version as an integral macro
  regen
  style: make conversion of version string to int public
  %require: accept version numbers with three parts ("3.7.4")
  yacc.c: fix #definition of YYEMPTY
  gnulib: update
  doc: fix incorrect section title
  doc: minor grammar fixes in counterexamples section
2020-11-13 07:01:19 +01:00
Akim Demaille
21c147b6e5 yacc.c: provide the Bison version as an integral macro
Suggested by Balazs Scheidler.
https://github.com/akimd/bison/issues/55

* src/muscle-tab.c (muscle_init): Move/rename `b4_version` to/as...
* src/output.c (prepare): `b4_version_string`.
Also define `b4_version`.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/d.m4,
* data/skeletons/java.m4: Adjust.
* doc/bison.texi: Document it.
2020-11-11 09:08:57 +01:00
Akim Demaille
d3c575a6c6 regen 2020-11-11 08:47:23 +01:00
Akim Demaille
d8b49e2b73 style: make conversion of version string to int public
* src/parse-gram.y (str_to_version): Rename as/move to...
* src/strversion.h, src/strversion.c (strversion_to_int): these new
files.
2020-11-11 08:47:23 +01:00
Akim Demaille
14c65a35f0 %require: accept version numbers with three parts ("3.7.4")
* src/parse-gram.y (str_to_version): Support three parts.
* data/skeletons/location.cc, data/skeletons/stack.hh:
Adjust.
2020-11-11 08:47:23 +01:00
Akim Demaille
25ded505a3 ielr: fix incorrect function call
* src/ielr.c: s/rule_is_accepting/rule_is_initial/.
2020-11-10 07:15:43 +01:00
Akim Demaille
6c0ba6089a ielr: more comments and logs
* src/ielr.c: More comments.
(state_list_print): New.
2020-11-10 07:08:07 +01:00
Akim Demaille
70ee8c77a8 multistart: fix IELR computations
IELR needs to rule out the successors of the kernel items of the
initial state (`$accept: input • $end`).  In the case of multistart,
this condition must be expressed differently: the mere item index does
not suffice.

* src/ielr.c (ielr_item_has_lookahead, ielr_compute_lookaheads): Don't
rely on the item index to check whether is_successor_of_initial_item.
It is certainly more costly than just checking the item index, but (i)
we need to compute the rule anyway, so it's not very much more costly,
and (ii) in ielr_item_has_lookahead, this situation is actually
impossible, so an optimizing compiler reading the assertions should
actually avoid this computation.
2020-11-10 07:08:07 +01:00
Akim Demaille
e9a43ed4ae ielr: make some conditions about items easier to understand
Checking that an item index is > 1 means ruling out `$accept: • input
$end` and `$accept: input • $end`.  But actually only the latter is
possible there, i.e., we're checking whether this item is about a
successor of a (kernel) item of the initial state ($accept: input •
$end).

* src/ielr.c (is_successor_of_initial_item): Use a variable to name
this condition.
2020-11-10 07:08:07 +01:00
Akim Demaille
a38d0b9145 multistart: introduce and use rule_is_initial
* src/gram.h (rule_is_initial): New.
* src/graphviz.c, src/print-xml.c, src/print.c, src/lalr.c: Use it.
Some of these occurrences were incorrect (checking whether this is
rule 0), and not behaving properly in the case of multistart.
2020-11-10 07:08:03 +01:00
Akim Demaille
4b0cd01fb7 style: comment and formatting changes, and fixes
* examples/c/lexcalc/parse.y: Fix option handling.
* src/gram.h: Clarify comments.
* src/ielr.c: Fix indentation.
* src/print.c, src/state.h: More comments.
2020-11-08 13:42:15 +01:00
Akim Demaille
0328cbad64 lalr: add assertions
* src/lalr.c: Remove incorrect comment (subsumed anyway by the
(correct) one in the header.
(set_goto_map): More debug traces.
(map_goto): Add an assertion.
2020-11-08 08:25:20 +01:00
Akim Demaille
98691fcd2d Merge branch 'maint'
* upstream/maint:
  doc: fix typo
  maint: post-release administrivia
  version 3.7.3
  build: don't link bison against libreadline
  gnulib: update
  glr.cc: fix: use symbol_name
  build: fix a concurrent build issue in examples
2020-10-14 21:12:45 +02:00
Akim Demaille
bc5e4541da build: don't link bison against libreadline
Reported by Paul Smith <psmith@gnu.org>.
https://lists.gnu.org/r/bug-bison/2020-10/msg00001.html

* src/local.mk (src_bison_LDADD): here.
2020-10-13 06:57:33 +02:00
Akim Demaille
36143b5ecc report: put the dot after %empty in items
When printing items, it is clearer to put the dot after %emtpy rather
than before:

     0 $accept: . unit "end of file"
     1 unit: . assignments exp
-    2 assignments: . %empty
+    2 assignments: %empty .
     3            | . assignments assignment

Also, use the Unicode characters if they are supported.

* src/gram.c (item_print): Put the dot after %emtpy.
* tests/conflicts.at, tests/reduce.at, tests/report.at: Adjust.
2020-10-07 06:28:52 +02:00
Akim Demaille
f4d33ff4b4 yacc.c: also count calls to YYERROR in yynerrs
* data/skeletons/yacc.c: here.
2020-09-27 11:58:27 +02:00
Akim Demaille
683040b324 multistart: allow tokens as start symbols
After all, why not?

* src/reader.c (switching_token): Use symbol_id_get.
(check_start_symbols): Require that the start symbol is a token only
if it's the only one.
* examples/c/lexcalc/parse.y: Let NUM be a start symbol.
2020-09-27 09:44:23 +02:00
Akim Demaille
d9cf99b6a5 multistart: use b4_accept instead of action post-processing
For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse.  Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value.  This also makes the call to the
parsing function independent of whether a given start-symbol is typed.

For instance, if the grammar file contains:

    %type <int> expression
    %start input expression

(so "input" is valueless) we get

    typedef struct
    {
      int yystatus;
    } yyparse_input_t;

    yyparse_input_t yyparse_input (void);

    typedef struct
    {
      int yyvalue;
      int yystatus;
    } yyparse_expression_t;

    yyparse_expression_t yyparse_expression (void);

This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT.  They do that after having exported the
start-symbol's value (if it is typed):

  switch (yyn)
    {
  case 1: /* $accept: YY_EXPRESSION expression $end  */
  { ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
    break;

  case 2: /* $accept: YY_INPUT input $end  */
  { YYACCEPT; }
    break;

I have tried several ways to deal with termination, and this is the
one that appears the best one to me.  It is also the most natural.

* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.

* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.

* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
2020-09-27 09:44:18 +02:00
Akim Demaille
a6805bb8d9 multistart: adjust reader checks for generated rules
So far we were not checking the generated rule 0 at all.  Now there
can be several of them.  Instead of not checking at all, let's be more
selective on the check to run on them.

* src/reader.c (grammar_rule_check_and_complete): Don't check for
value usage for generated rules, it is ok to have a valued start
symbol, in which case it is ok for the generated rule ("accept: start
$end {}") to not use $1.
(packgram): Call grammar_rule_check_and_complete for all the rules.
2020-09-27 09:23:51 +02:00
Akim Demaille
05d6b54703 multistart: pass the list of start symbols to the backend
* src/output.c (start_symbols_output): New.
(muscles_output): Use it.
2020-09-27 09:23:51 +02:00
Akim Demaille
85ccc1bab3 multistart: adjust computation of initial core and adjust reports
Currently the core of the initial state is limited to the single rule
on $accept.

* src/lr0.c (generate_states): There may now be several rules on
$accept.

* src/graphviz.c (conclude_red): Recognize "final" transitions by the
fact that we reduce to "$accept".
* src/print.c (print_reduction): Likewise.
* src/print-xml.c (print_reduction): Likewise.
2020-09-27 09:23:51 +02:00
Akim Demaille
4646be7db4 regen 2020-09-27 09:23:51 +02:00
Akim Demaille
8eaddf326b multistart: turn start symbols into rules on $accept
Now that the parser can read several start symbols, let's process
them, and create the corresponding rules.

* src/parse-gram.y (grammar_declaration): Accept a list of start symbols.
* src/reader.h, src/reader.c (grammar_start_symbol_set): Rename as...
(grammar_start_symbols_set): this.

* src/reader.h, src/reader.c (start_flag): Replace with...
(start_symbols): this.
* src/reader.c (grammar_start_symbols_set): Build a list of start
symbols.
(switching_token, create_start_rules): New.
(check_and_convert_grammar): Use them to turn the list of start
symbols into a set of rules.
* src/reduce.c (nonterminals_reduce): Don't complain about $accept,
it's an internal detail.
(reduce_grammar): Complain about all the start symbols that don't
derive sentences.

* src/symtab.c (startsymbol, startsymbol_loc): Remove, replaced by
start_symbols.
symbols_pack): Move the check about the start symbols
to...
* src/symlist.c (check_start_symbols): here.
Adjust to multiple start symbols.
* tests/reduce.at (Empty Language): Generalize into...
(Bad start symbols): this.
2020-09-27 09:23:51 +02:00
Akim Demaille
db68f61595 regen 2020-09-27 09:23:51 +02:00
Akim Demaille
7eca26e87b parser: expose a list of symbols
* src/parse-gram.y (%type): Also use current_class.
(symbol_decl.1): Rename as...
(symbols.1): this.
2020-09-27 09:23:51 +02:00
Akim Demaille
e50ec28153 reader: get ready to create several initial rules
* src/reader.c (create_start_rule): New.
Use it.
2020-09-27 09:23:50 +02:00
Akim Demaille
f7f2c99c28 gram: more debugging information
* src/gram.c (ritem_print): Show indices in ritem.
2020-09-27 09:23:50 +02:00
Akim Demaille
72946549ed style: formatting changes
* src/scan-code.l: here.
2020-09-20 08:23:28 +02:00
Akim Demaille
bad4fc09a7 style: introduce parse_positional_ref
* src/scan-code.l: here.
2020-09-20 08:23:28 +02:00
Akim Demaille
aac79ca103 style: clarify the way state kernels (aka cores) are built
Use state_list_append in a more natural way.

* src/lr0.c (generate_states): Here.
2020-09-20 08:23:28 +02:00
Akim Demaille
843f99886c style: reorder and comment
* src/reader.h: here.
2020-09-20 08:23:28 +02:00
Akim Demaille
0711dca9d9 add support for --html
* bootstrap.conf: We need the "execute" module.
* src/files.h, src/files.c (spec_html_file, html_flag): New.
* src/getargs.h, src/getargs.c (--html): New.
* src/print-xml.h, src/print-xml.c (print_html): New.
* src/main.c: Use them.
* tests/output.at, tests/report.at: Check --html.
2020-09-19 17:49:03 +02:00
Akim Demaille
f5d4b64909 regen 2020-09-19 17:49:03 +02:00
Akim Demaille
b327f38832 deprecate %defines in favor of %header
This is consistent with --defines being deprecated in favor of
--header.  The directive %defines is also too similar to %define.
And %header matches nicely with api.header.name.

* src/scan-gram.l (%defines): Deprecate to %header.
(%header): Scan it.
* src/parse-gram.y (PERCENT_DEFINES): Replace with...
(PERCENT_HEADER): this.
* data/skeletons/lalr1.java
* doc/bison.texi
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/java.at, tests/local.at, tests/output.at,
* tests/synclines.at, tests/types.at:
Convert most tests to check %header instead of %defines.
2020-09-19 17:49:03 +02:00
Akim Demaille
75c3746ce2 options: rename --defines as --header
The name "defines" is incorrect, the generated file contains far more
than just #defines.

* src/getargs.h, src/getargs.c (-H, --header): New option.
With optional argument, just like --defines, --xml, etc.
(defines_flag): Rename as...
(header_flag): this.
Adjust dependencies.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c:
Adjust.
* examples, doc/bison.texi: Adjust.
* tests/headers.at, tests/local.at, tests/output.at: Convert most
tests from using --defines to using --header.
2020-09-19 08:31:49 +02:00
Akim Demaille
325ec7d324 cex: always show ε/%empty in counterexamples
On a case such as
    %%
    exp
    : empty "a"
    | "a" empty

    empty
    : %empty

we used to display

    warning: shift/reduce conflict on token "a" [-Wcounterexamples]
    Example: • "a"
    Shift derivation
      exp
      ↳ 2: • "a" empty
                 ↳ 2: ε
    Example: • "a"
    Reduce derivation
      exp
      ↳ 1: empty  "a"
           ↳ 3: •

where the shift derivation shows an item "2: empty → ε", with an
explicit "ε", but the reduce derivation shows "3: empty → •", without
"ε".

For consistency, let's always show ε/%empty in rules with an empty
rhs:

    Reduce derivation
      exp
      ↳ 1: empty    "a"
           ↳ 3: ε •

* src/derivation.c (derivation_width, derivation_print_tree_impl):
Always show ε/%empty in counterexamples.
* tests/diagnostics.at: Check that case.
* tests/conflicts.at, tests/counterexample.at: Adjust.
2020-09-02 07:31:55 +02:00
Akim Demaille
3c36d871fa cex: display the rule numbers
From

    Example: "if" expr "then" "if" expr "then" stmt • "else" stmt
    Shift derivation
      if_stmt
      ↳ "if" expr "then" stmt
                         ↳ if_stmt
                           ↳ "if" expr "then" stmt • "else" stmt
    Reduce derivation
      if_stmt
      ↳ "if" expr "then" stmt                        "else" stmt
                         ↳ if_stmt
                           ↳ "if" expr "then" stmt •

to

    Example: "if" expr "then" "if" expr "then" stmt • "else" stmt
    Shift derivation
      if_stmt
      ↳ 3: "if" expr "then" stmt
                            ↳ 2: if_stmt
                                 ↳ 4: "if" expr "then" stmt • "else" stmt
    Example: "if" expr "then" "if" expr "then" stmt • "else" stmt
    Reduce derivation
      if_stmt
      ↳ 4: "if" expr "then" stmt                              "else" stmt
                            ↳ 2: if_stmt
                                 ↳ 3: "if" expr "then" stmt •

* src/state-item.h, src/state-item.c (state_item_rule): New.
* src/derivation.h, src/derivation.c (struct derivation): Add a rule
member.
Adjust dependencies.
* src/counterexample.c, src/parse-simulation.c: Pass the rule to
derivation_new.
* src/derivation.c (fprintf_if): New.
(derivation_width, derivation_print_tree_impl): Take the rule number
into account.

* tests/conflicts.at, tests/counterexample.at, tests/diagnostics.at,
* tests/report.at: Adjust.

* doc/bison.texi: Adjust.
2020-08-30 19:20:49 +02:00
Valentin Tolmer
ef09bf065a glr2.cc: fork glr.cc to a c++ version
This is a fork of glr.cc to be c++-first instead of a wrapper around
glr.c.

* data/skeletons/glr2.cc: New.
* data/skeletons/bison.m4, data/skeletons/c++.m4: Adjust.
* data/skeletons/c.m4 (b4_user_args_no_comma): New.
* src/reader.c (grammar_rule_check_and_complete): glr2.cc is C++.
* tests/actions.at, tests/c++.at, tests/calc.at, tests/conflicts.at,
* tests/input.at, tests/local.at, tests/regression.at, tests/scanner.at,
* tests/synclines.at, tests/types.at: Also check glr2.cc.
2020-08-30 10:45:21 +02:00
Akim Demaille
b801b7b670 fix: unterminated \-escape
An assertion failed when the last character is a '\' and we're in a
character or a string.
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00009.html

* src/scan-gram.l: Catch unterminated escapes.
* tests/input.at (Unexpected end of file): New.
2020-08-08 07:53:33 +02:00
Akim Demaille
b7aab2dbad fix: crash when redefining the EOF token
Reported by Agency for Defense Development.
https://lists.gnu.org/r/bug-bison/2020-08/msg00008.html

On an empty such as

    %token FOO
           BAR
           FOO 0
    %%
    input: %empty

we crash because when we find FOO 0, we decrement ntokens (since FOO
was discovered to be EOF, which is already known to be a token, so we
increment ntokens for it, and need to cancel this).  This "works well"
when EOF is properly defined in one go, but here it is first defined
and later only assign token code 0.  In the meanwhile BAR was given
the token number that we just decremented.

To fix this, assign symbol numbers after parsing, not during parsing,
so that we also saw all the explicit token codes.  To maintain the
current numbers (I'd like to keep no difference in the output, not
just equivalence), we need to make sure the symbols are numbered in
the same order: that of appearance in the source file.  So we need the
locations to be correct, which was almost the case, except for nterms
that appeared several times as LHS (i.e., several times as "foo:
...").  Fixing the use of location_of_lhs sufficed (it appears it was
intended for this use, but its implementation was unfinished: it was
always set to "false" only).

* src/symtab.c (symbol_location_as_lhs_set): Update location_of_lhs.
(symbol_code_set): Remove broken hack that decremented ntokens.
(symbol_class_set, dummy_symbol_get): Don't set number, ntokens and
nnterms.
(symbol_check_defined): Do it.
(symbols): Don't count nsyms here.
Actually, don't count nsyms at all: let it be done in...
* src/reader.c (check_and_convert_grammar): here.  Define nsyms from
ntokens and nnterms after parsing.
* tests/input.at (EOF redeclared): New.

* examples/c/bistromathic/bistromathic.test: Adjust the traces: in
"%nterm <double> exp %% input: ...", exp used to be numbered before
input.
2020-08-07 07:30:06 +02:00
Akim Demaille
89e42ffb4b style: fix missing space before paren
* cfg.mk (_space_before_paren_exempt): Be less laxist.
* src/output.c, src/reader.c: Fix space before paren issues.
Pacify the warnings where applicable.
2020-08-07 07:30:06 +02:00
Akim Demaille
6aae4a7378 style: fix comments and more debug trace
* src/location.c, src/symtab.h, src/symtab.c: here.
2020-08-07 07:30:06 +02:00
Akim Demaille
7d4a4300c2 style: more uses of const
* src/symtab.c: here.
2020-08-07 07:30:06 +02:00
Akim Demaille
0a5bfb4fda portability: multiple typedefs
Older versions of GCC (4.1.2 here) don't like repeated typedefs.

      CC       src/bison-parse-simulation.o
    src/parse-simulation.c:61: error: redefinition of typedef 'parse_state'
    src/parse-simulation.h:74: error: previous declaration of 'parse_state' was here
    make: *** [Makefile:7876: src/bison-parse-simulation.o] Error 1

Reported by Nelson H. F. Beebe.

* src/parse-simulation.c (parse_state): Don't typedef,
parse-simulation.h did it already.
2020-08-03 07:30:35 +02:00
Akim Demaille
12d0b15679 style: revert "avoid warnings with GCC 4.6"
This reverts commit d0bec3175f (which
should have read "We have a clash...", not "With have a clash...").
Now that `max()` was renamed `max_int()`, we can use `max` again, as
elsewhere in the code.

* src/counterexample.c (visited_hasher): Alpha reconversion.
2020-08-02 10:20:23 +02:00
Akim Demaille
2f8a874215 portability: we use termios.h and sys/ioctl.h
Reported by Maarten De Braekeleer.
https://lists.gnu.org/r/bison-patches/2020-07/msg00079.html

* bootstrap.conf (gnulib_modules): Add termios and sys_ioctl.
2020-08-02 08:36:49 +02:00
Maarten De Braekeleer
ad6f600bb1 portability: rename accept to acceptsymbol because of MSVC
MSVC already defines this symbol.

* src/symtab.h, src/symtab.c (accept): Rename as...
(acceptsymbol): this.
Adjust dependencies.
2020-08-02 08:32:57 +02:00