Commit Graph

2436 Commits

Author SHA1 Message Date
Akim Demaille
58ae95670b style: rename spec_defines_file as spec_header_file
The variable spec_defines_file denotes the name of the generated
header.  Its name is derived from --defines/%defines, whose name in
turn is derived from the fact that the header, in Yacc, contained the

Not only does the header now contain a lot more than just the token
definitions, but we no longer even generate macros, but an enum...

Let's modernize our vocabulary.

* src/files.h, src/files.c (spec_defines_file): Rename as...
(spec_header_file): this.
2019-03-17 16:36:05 +01:00
Akim Demaille
4e19ab9fcd yacc.c: provide a means to include the header in the implementation
Currently when --defines is used, we generate a header, and paste an
exact copy of it into the generated parser implementation file.  Let's
provide a means to #include it instead.

We don't do it by default because of the Autotools' ylwrap.  This
program wraps invocations of yacc (that uses a fixed output name:
y.tab.c, y.tab.h, y.output) to support a more modern naming
scheme (dir/foo.y -> dir/foo.tab.c, dir/foo.tab.h, etc.).  It does
that by renaming the generated files, and then by running sed to
propagate these renamings inside the files themselves.

Unfortunately Automake's Makefiles uses Bison as if it were Yacc (with
--yacc or with -o y.tab.c) and invoke bison via ylwrap.  As a
consequence, as far as Bison is concerned, the output files are
y.tab.c and y.tab.h, so it emits '#include "y.tab.h"'.  So far, so
good.  But now ylwrap processes this '#include "y.tab.h"' into
'#include "dir/foo.tab.h"', which is not guaranteed to always work.

So, let's do the Right Thing when the output file is not y.tab.c, in
which case the user should %define api.header.include.  Binding this
behavior to --yacc is tempting, but we recently told people to stop
using --yacc (as it also enables the Yacc warnings), but rather to use
-o y.tab.c.

Yacc.c is the only skeleton concerned: all the others do include their
header.

* data/skeletons/yacc.c (b4_header_include_if): New.
(api.header.include): Provide a default value when the output is not
y.tab.c.
* src/parse-gram.y (api.header.include): Define.
2019-03-17 16:36:05 +01:00
Akim Demaille
35add841ee address warnings from GCC's UB sanitizer
Running with CC='gcc-mp-8 -fsanitize=undefined' revealed Undefined
Behaviors.
https://lists.gnu.org/archive/html/bison-patches/2019-03/msg00008.html

* src/state.c (errs_new): Don't call memcpy with NULL as source.
* src/location.c (add_column_width): Don't assume that the column
argument is nonnegative: the scanner sometimes "backtracks" (e.g., see
ROLLBACK_CURRENT_TOKEN and DEPRECATED) in which case we can have
negative column numbers (temporarily).
Found in test 3 (Invalid inputs).
2019-03-17 13:21:25 +01:00
Akim Demaille
f6e38d7ac9 diagnostics: use libtextstyle for colored output
Bruno Haible released libtextstyle, a library for colored output based
on CSS.  Let's use it to generate colored diagnostics, provided
libtextstyle is available.

See
https://lists.gnu.org/archive/html/bug-gnulib/2019-01/msg00176.html
https://lists.gnu.org/archive/html/bison-patches/2019-02/msg00073.html
https://lists.gnu.org/archive/html/bison-patches/2019-02/msg00084.html
https://lists.gnu.org/archive/html/bison-patches/2019-03/msg00007.html

* bootstrap.conf (gnulib_modules): Use libtextstyle when possible.
* data/diagnostics.css: New.
* src/complain.c (begin_use_class, end_use_class, flush)
(severity_style, complain_init_color): New.
Use them.
* src/getargs.c (getargs_colors): New.
(getargs): Use it.
Skip --color and --style.
* src/location.h, src/location.c (location_print): Use a style.

* tests/bison.in: Force --color=yes when stderr is a tty.
* tests/local.at: Disable colors during the test suite.
* tests/input.at: Adjust expectations to the extra options passed on
the command line.
2019-03-16 16:46:17 +01:00
Akim Demaille
855fbf1c11 style: clean up complain.c
* src/complain.c (severity_prefix): New.
(error_message): Take the severity as argument, instead of the prefix.
2019-03-16 16:46:17 +01:00
Akim Demaille
d57751d2fb lalr: clarify the count of lookaheads
* src/lalr.c (state_lookahead_tokens_count): Remove wierd `+=` that is
actually an `=`.
2019-02-28 06:47:19 +01:00
Akim Demaille
e062b9f70d lalr: clarify the API
* src/state.h, src/state.c (state_reduction_find): Clarify.
Die on errors.
* src/lalr.c (goto_list_new): New.
Use it.
2019-02-28 06:47:19 +01:00
Akim Demaille
c837141832 lalr: improve traces
* src/lalr.c (follows_print): Just print the symbol tag.
Take and print a title.
Indent the output.
Use it to print the various steps of the computation.
(lookahead_tokens_print): Fix a lie: the number displayed is not the
number of tokens.
Don't display states that don't even have reductions.
2019-02-28 06:47:19 +01:00
Akim Demaille
a415a78d71 lalr: print the 'reads' relation
* src/relation.h, src/relation.c (relation_print): Accept and use a
title.
Don't print empty rows.
Indent the output.
Adjust dependencies.
* src/lalr.c (initialize_goto_follows): Print 'reads' in traces.
2019-02-27 19:06:32 +01:00
Akim Demaille
5255b919ae style: comment changes
* src/lr0.c: here.
2019-02-27 19:06:32 +01:00
Akim Demaille
d04962f788 style: eliminate useless indirection
* src/relation.h, src/relation.c (relation_digraph): Don't take the
biteetv as a pointer, it is already a pointer (as it's an array).
2019-02-25 06:19:55 +01:00
Akim Demaille
ec8142391a style: rename function for clarity
Commit db34f79889 renames the variable F
as goto_follows, but forgot to rename this function.

* src/lalr.c (initialize_F): Rename as...
(initialize_goto_follows): this.
2019-02-25 06:19:55 +01:00
Akim Demaille
59bec5fade lalr: more debug traces
I need to be able to read includes and goto_follows.

* src/relation.h, src/relation.c (relation_print): Provide a means to
pretty-print the nodes of the relation.
* src/lalr.c (goto_print, follows_print): New.
(set_goto_map): Use goto_print.
(build_relations): Show INCLUDES.
(compute_FOLLOWS): Rename as...
(compute_follows): this.
Show FOLLOWS.
2019-02-25 06:19:54 +01:00
Akim Demaille
5230e610fc style: minor changes
* examples/c/calc/calc.y, src/lalr.c: Reduce scope.
* src/gram.c: Prefer < to >.
2019-02-24 19:08:01 +01:00
Akim Demaille
b81419a9fd style: clarify the computation of the lookback edges
* src/lalr.c (build_relations): Reduce the scopes.
Instead of keeping rp alive in two different loops, clarify the second
one by having an index on the path we traverse (i.e., use that index
to compute the source state _and_ the symbol that labels the
transition).
This allows to turn an obscure 'while'-loop in a clearer (IMHO)
'for'-loop.  We also consume more variables (by introducing p instead
of making more side effects on length), but we're in 2019, I don't
think this matters.  What does matter is that (IMHO again), this is
now clearer.
Also, use clearer names.
2019-02-24 19:07:32 +01:00
Akim Demaille
2b9ee006d8 style: scope reduction in tables.c
* src/tables.c: here.
* src/lalr.c: Prefer < to >.
2019-02-24 12:00:44 +01:00
Akim Demaille
bd55d43333 graph: prefer *.gv to *.dot
Reported by Hans Åberg.
https://lists.gnu.org/archive/html/help-bison/2019-02/msg00064.html

* src/files.c (spec_graph_file): Use `*.gv` when 3.4 or better,
otherwise `*.dot`.
* src/parse-gram.y (handle_require): Pretend we are already 3.4.
* doc/bison.texi: Adjust.
* tests/local.at, tests/output.at: Exercise this.
2019-02-21 06:46:07 +01:00
Akim Demaille
d7ec136ffb style: move pkgdatadir to files.*
Let's move it to a more logical place.

* src/output.h, src/output.c (pkgdatadir): Move to...
* src/files.h, src/files.c: here.
2019-02-16 07:26:16 +01:00
Akim Demaille
dbdf2878ab style: rename cleanup_caret as caret_free
* src/location.c, src/location.h, src/main.c: here.
2019-02-14 18:53:01 +01:00
Akim Demaille
8654fca058 style: avoid default in switch on enums
* src/assoc.c (assoc_to_string): here.
2019-02-14 06:27:03 +01:00
Akim Demaille
fb83319d9c style: comment and names changes in map_goto
* src/lalr.h, src/lalr.c: Use clearer names.
2019-02-12 06:19:10 +01:00
Akim Demaille
ad7d8af6d1 style: factor printing of rules
* src/gram.h, src/gram.c (rule_print): New.
Use it.
2019-02-09 08:59:55 +01:00
Akim Demaille
f293345aa8 style: use lower case for variable names
* src/relation.c (INDEX, VERTICES): Rename as...
(indexes, vertices): these.
2019-02-09 08:58:12 +01:00
Akim Demaille
e18ad5a96b style: scope reduction in relation.c 2019-02-09 08:58:12 +01:00
Akim Demaille
dd232b95b7 report: stop counting uselessly
* src/print.c (print_nonterminal_symbols): Replace left_count and
right_count with on_left and on_right.
2019-02-09 08:23:50 +01:00
Akim Demaille
51861998c7 report: clean up its format
The format is inconsistent.  For instance most sections are
indented (including "Terminals unused in grammar" for instance), but
the sections "Terminals, with rules where they appear" and
"Nonterminals, with rules where they appear" are not.  Let's indent
them.  Also, these two sections try to wrap the output to avoid lines
too long.  Yet we don't do that in the rest of the file, for instance
when listing the lookaheads of an item.

For instance in the case of Bison's parse-gram.output we go from:

    Terminals, with rules where they appear

    "end of file" (0) 0
    error (256) 28 88
    "string" <char*> (258) 9 13 16 17 20 23 24 109 116
    [...]

    Nonterminals, with rules where they appear

    $accept (58)
        on left: 0
    input (59)
        on left: 1, on right: 0
    prologue_declarations (60)
        on left: 2 3, on right: 1 3
    prologue_declaration (61)
        on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24
        25 26 27 28 29, on right: 3
    [...]

to

    Terminals, with rules where they appear

    "end of file" (0) 0
    error (256) 28 88
    "string" <char*> (258) 9 13 16 17 20 23 24 109 116
    [...]

    Nonterminals, with rules where they appear

        $accept (58)
            on left: 0
        input (59)
            on left: 1
            on right: 0
        prologue_declarations (60)
            on left: 2 3
            on right: 1 3
        prologue_declaration (61)
            on left: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29
            on right: 3
    [...]

* src/print.c (END_TEST): Remove.
(print_terminal_symbols): Don't try to wrap the output.
(print_nonterminal_symbols): Likewise.
Make two different lines for occurrences on the left, and occurrence
on the rhs of the rules.
Indent by 4 and 8, not 3.
* src/reduce.c (reduce_output): Indent by 4, not 3.

* tests/conflicts.at, tests/existing.at, tests/reduce.at,
* tests/regression.at, tests/report.at:
Adjust.
2019-02-09 08:23:50 +01:00
Akim Demaille
e346210c03 add LR(0) output
This should not be used to generate parsers.  My point is actually to
facilitate debugging (when tweaking the generation of the LR(0)
automaton for instance, not carying -yet- about lookaheads).

* src/reader.c (prepare_percent_define_front_end_variables): Add lr(0).
* src/conflicts.c (set_conflicts): Be robust to reds not having
lookaheads at all.
* src/ielr.c (LrType, lr_type_get): Adjust.
(ielr): Implement support for LR(0).
* src/lalr.c (lalr_free): Don't free LA when it's not computed.
2019-02-05 19:02:09 +01:00
Akim Demaille
0d44f83fcc style: scope reduction in derives.c
* src/derives.c: here.
2019-02-05 08:45:52 +01:00
Akim Demaille
40b5f89ee0 style: comment changes and refactoring in state.c
* src/state.h, src/state.c: Comment changes.
(transitions_to): Take a state* as argument.
* src/lalr.h, src/lalr.c: Comment changes.
(initialize_F): Use clear variable names.
2019-02-05 08:45:52 +01:00
Akim Demaille
cf96d1b0af Merge branch maint
* maint:
  maint: post-release administrivia
  version 3.3.2
  style: minor fixes
  NEWS: named constructors are preferable to symbol_type ctors
  gram: fix handling of nterms in actions when some are unused
  style: rename local variable
  CI: update the ICC serial number for travis-ci.org
2019-02-03 15:23:54 +01:00
Akim Demaille
334cb8f222 style: minor fixes
* NEWS, src/reduce.c, src/reduce.h: Use 'nonterminal'.
Fix comments.
2019-02-03 14:42:22 +01:00
Akim Demaille
cacdfc2f6e gram: fix handling of nterms in actions when some are unused
Since Bison 3.3, semantic values in rule actions (i.e., '$...') are
passed to the m4 backend as the symbol number.  Unfortunately, when
there are unused symbols, the symbols are renumbered _after_ the
numbers were used in the rule actions.  As a result, the evaluation of
the skeleton failed because it used non existing symbol numbers.
Which is the happy scenario: we could use numbers of other existing
symbols...

Reported by Balázs Scheidler.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html

Translating the rule actions after the symbol renumbering moves too
many parts in bison.  Relying on the symbol identifiers is more
troublesome than it might first seem: some don't have an
identifier (tokens with only a literal string), some might have a
complex one (tokens with a literal string with characters special for
M4).  Well, these are tokens, but nterms also have issues: "dummy"
nterms (for midrule actions) are named $@32 etc. which is risky for
M4.

Instead, let's simply give M4 the mapping between the old numbers and
the new ones.  To avoid confusion between old and new numbers, always
emit pre-renumbering numbers as "orig NUM".

* data/README: Give details about "orig NUM".
* data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the
"orig NUM".
* src/output.c (prepare_symbol_definitions): Pass nterm_map to m4.
* src/reduce.h, src/reduce.c (nterm_map): Extract it from
nonterminals_reduce, to make it public.
(reduce_free): Free it.
* src/scan-code.l (handle_action_dollar): When referring to a nterm,
use "orig NUM".
* tests/reduce.at (Useless Parts): New, based Balázs Scheidler's
report.
2019-02-03 10:05:53 +01:00
Akim Demaille
48429252c1 style: reduce scopes
* src/symlist.c (symbol_list_free): New.
2019-02-03 07:28:57 +01:00
Akim Demaille
d459a5b8e6 style: prefer snprintf to sprintf
* src/symtab.c (dummy_symbol_get): There's no need for the buffer to
be so big and static.
Use snprintf for safety.
2019-02-03 07:28:57 +01:00
Akim Demaille
9566232422 style: comment and name changes
* src/output.c (prepare_symbol_names): here.
* src/reader.c: Remove obsolete comment.
* src/scan-code.l: Use || for Boolean or.
2019-02-02 17:32:10 +01:00
Akim Demaille
dc654a925c style: comment changes
* src/reader.c, src/scan-code.l: here.
2019-02-02 17:32:04 +01:00
Akim Demaille
31788ed4c7 style: rename local variable
* src/reduce.c (nonterminals_reduce): Rename nontermmap as nterm_map.
We will expose it.
2019-02-02 16:37:25 +01:00
Akim Demaille
781d2b02de gram: detect and report (in debug traces) useless chain rules
A rule is a useless chain iff it's a chain (aka unit, or injection)
rule (i.e., the RHS has length 1), and it's useless (it has no used
defined semantic action).

* src/gram.h, src/gram.c (rule_useless_chain_p): New.
(grammar_dump): Report useless chain rules.
* tests/sets.at: Check the traces.
2019-01-30 07:08:09 +01:00
Akim Demaille
8b5fc2143f lr(0): more debug traces
* src/lr0.c (core_print, kernel_print): New.
Use them.
2019-01-30 07:08:09 +01:00
Akim Demaille
5670677cb6 lr(0): remove useless conditional
* src/lr0.c (new_itemsets): There's no harm in setting a Boolean
several times.
2019-01-30 07:08:08 +01:00
Akim Demaille
32b9dcecc7 style: sort includes and avoid assignments
* src/symtab.c: Sort includes.
* src/gram.c (grammar_rules_print_xml): Avoid assignments to define
'usefulness'.
2019-01-30 07:08:00 +01:00
Akim Demaille
ac12b725ea style: use item_rule
* src/print-graph.c, src/print-xml.c: here.
2019-01-30 07:06:48 +01:00
Akim Demaille
e1783bc686 gram: factor the printing of items and the computation of their rule
There are several places where we need to recover the rule from an
item, let's factor that into item_rule.  We also want to print items
in a nice way: we do it when generating the *output file, but it is
also useful in debug messages.

* src/gram.h, src/gram.c (item_rule, item_print): New.
* src/print.c (print_core): Use them.
* src/state.h, src/state.c: Propagate constness.
2019-01-30 07:06:48 +01:00
Akim Demaille
c4f143eb96 style: scope reduction in print-xml
* src/print-xml.c: here.
2019-01-30 07:06:48 +01:00
Akim Demaille
3075d96d44 style: comment changes
* src/lr0.c, src/state.c, src/state.h: here.
2019-01-28 07:00:23 +01:00
Akim Demaille
7fec997ecf closure: initialize it once for all
The memory allocated by 'closure' (and some data such as 'fderives')
is used to computed a state's full itemset from its core.  This is
needed during the construction of the LR(0) automaton, and the memory
is reclaimed immediately afterwards.

Unfortunately the reports (graph, text, xml) also need this
information when describing the states with their full itemsets.  As a
consequence the memory was allocated again, fderives computed again
too, and more --trace reports are generated which only duplicate what
was already reported.

Stop that.  It does mean that we release the memory later (hence the
peak memory usage is higher now), but I don't think that's a problem
today.

* src/lr0.c (generate_states): Don't call closure_free.
* src/state.c (states_free): Do it here.
(for symmetry with closure_new which is called in generate_states).
* src/print-graph.c, src/print-xml.c, src/print.c: You can now expect
the closure module to be functional.
2019-01-28 06:57:31 +01:00
Akim Demaille
7355a35e4b style: rename closure_* functions as closure_*
This is more consistent with the other files.

* closure.h, closure.c (new_closure, free_closure): Rename as...
(closure_new, closure_free): this.
Adjust dependencies.
2019-01-28 06:47:33 +01:00
Akim Demaille
e585377e68 lr0: use a bitset for the set of "shiftable symbols"
This will make it easier to add new elements (that might already be
part of shift_symbol) without having to worry about the size of
shift_symbol (which is currently a fixed size vector).

I could not measure any significant differences in performances in the
generation of LR(0) automaton (benched on gramamrs of Ruby, C, and C++).

* src/lr0.c (shift_symbol): Make it a bitset.
2019-01-28 06:47:33 +01:00
Akim Demaille
9cd7bd4d5f add -fsyntax-only
When debugging Bison itself, this is very handy, especially when
tweaking the frontend badly enough to break the backends. It can also
be used to check a grammar.

* src/getargs.h, src/getargs.c (feature_syntax_only): New.
(feature_args, feature_types): Adjust.
* src/main.c (main): Use it.
2019-01-28 06:47:07 +01:00
Akim Demaille
a108d84f88 style: beware of collisions on status
* src/symtab.h (status): Rename as...
(declaration_status): this, to avoid colliding with status, the
argument of 'usage'.
'status' seems a tad too general to be used only here.
2019-01-27 20:07:08 +01:00