Reported by Askar Safin.
https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00000.html
* data/skeletons/glr.c (yygetToken): Return YYEMPTY when an exception
is thrown.
* data/skeletons/lalr1.cc: Log when an exception is caught.
* tests/c++.at (Syntax error as exception): Be sure to recover from
error before triggering another error.
* tests/local.at (AT_LOCATION_TYPE_IF): Turn into...
(AT_LOCATION_TYPE_SPAN_IF): this.
Adjust dependencies.
* tests/headers.at (Several parsers): Add another C++ parser,
which uses the first C++ parser's locations.
Suggested by Wolfgang Thaller.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00081.html
* data/c++.m4 (basic_symbol, by_type): Instead of provide either move
or copy constructor, always provide the copy one.
* tests/c++.at (C++ Variant-based Symbols Unit Tests): Check it.
Currently the following piece of code crashes (with parse.assert),
because we don't record that s was moved-from, and we invoke its dtor.
{
auto s = parser::make_INT (42);
auto s2 = std::move (s);
}
Reported by Wolfgang Thaller.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00077.html
* data/c++.m4 (by_type): Provide a move-ctor.
(basic_symbol): Be sure not to read a moved-from value.
* tests/c++.at (C++ Variant-based Symbols Unit Tests): Check this case.
* tests/c++.at: Use AT_YYLEX_PROTOTYPE etc.
Which requires that we use the same argument names (lvalp, etc.).
* tests/local.at (AT_NAME_PREFIX): Fix regex.
Instead of introducing make_symbol (whose name, btw, somewhat
infringes on the user's "name space", if she defines a token named
"symbol"), let's make the construction of symbol_type safer, using
assertions.
For instance with:
%token ':' <std::string> ID <int> INT;
generate:
symbol_type (int token, const std::string&);
symbol_type (int token, const int&);
symbol_type (int token);
It does mean that now named token constructors (make_ID, make_INT,
etc.) go through a useless assert, but I think we can ignore this: I
assume any decent compiler will inline the symbol_type ctor inside the
make_TOKEN functions, which will show that the assert is trivially
verified, hence I expect no code will be emitted for it. And anyway,
that's an assert, NDEBUG controls it.
* data/c++.m4 (symbol_type): Turn into a subclass of
basic_symbol<by_type>.
Declare symbol constructors when variants are enabled.
* data/variant.hh (_b4_type_constructor_declare)
(_b4_type_constructor_define): Replace with...
(_b4_symbol_constructor_declare, _b4_symbol_constructor_def): these.
Generate symbol_type constructors.
* doc/bison.texi (Complete Symbols): Document.
* tests/types.at: Check.
On
%token <int> FOO BAR
we currently generate make_FOO(int) and make_BAR(int). However, in
order to factor their scanners, some users would also like to have
make_symbol(tok, int), where tok is FOO or BAR. To ensure type
safety, add assertions that do check that value type and token type
match. Bind this assertion to the parse.assert %define variable.
Suggested by Frank Heckenbach.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00034.html
Should also match expectations from Аскар Сафин.
http://lists.gnu.org/archive/html/bug-bison/2018-12/msg00023.html
* data/variant.hh: Use b4_token_visible_if where applicable.
(_b4_type_constructor_declare, _b4_type_constructor_define): New.
Use them.
Bitten by macros, again.
See 680b715518.
* data/variant.hh (_b4_symbol_constructor_declare)
(_b4_symbol_constructor_define): Do not use user types, which can
include commas as in `std::pair<int, int>`, to macros.
* tests/local.at: Adjust the lex related macros to support the
case of token constructors.
* tests/types.at: Also check token constructors on types with commas.
Prompted by Rici Lake.
http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html
We have four classes of directives that declare symbols: %nterm,
%type, %token, and the family of %left etc. Currently not all of them
support the possibility to have several type tags (`<type>`), and not
all of them support the fact of not having any type tag at all
(%type). Let's unify this.
- %type
POSIX Yacc specifies that %type is for nonterminals only. However,
some Bison users want to use it for both tokens and nterms
(actually, Bison's own grammar does this in several places, e.g.,
CHAR). So it should accept char/string literals.
As a consequence cannot be used to declare tokens with their alias:
`%type foo "foo"` would be ambiguous (are we defining foo = "foo",
or are these two different symbols?)
POSIX specifies that it is OK to use %type without a type tag. I'm
not sure what it means, but we support it.
- %token
Accept token declarations with number and string literal:
(ID|CHAR) NUM? STRING?.
- %left, etc.
They cannot be the same as %token, because we accept to declare the
symbol with %token, and to then qualify its precedence with %left.
Then `%left foo "foo"` would also be ambiguous: foo="foo", or two
symbols.
They cannot be simply a list of identifiers, but POSIX Yacc says we
can declare token numbers here. I personally think this is a bad
idea, precedence management is tricky in itself and should not be
cluttered with token declaration issues.
We used to accept declaring a token number on a string literal here
(e.g., `%left "token" 1`). This is abnormal. Either the feature is
useful, and then it should be supported in %token, or it's useless
and we should not support it in corner cases.
- %nterm
Obviously cannot accept tokens, nor char/string literals. Does not
exist in POSIX Yacc, but since %type also works for terminals, it is
a nice option to have.
* src/parse-gram.y: Avoid relying on side effects. For instance, get
rid of current_type, rather, build the list of symbols and iterate
over it to assign the type.
It's not always possible/convenient. For instance, we still use
current_class.
Prefer "decl" to "def", since in the rest of the implementation we
actually "declare" symbols, we don't "define" them.
(token_decls, token_decls_for_prec, symbol_decls, nterm_decls): New.
Use them for %token, %left, %type and %nterm.
* src/symlist.h, src/symlist.c (symbol_list_type_set): New.
* tests/regression.at b/tests/regression.at
(Token number in precedence declaration): We no longer accept
to give a number to string literals.
After having spent quite some time on cleaning the handling of symbol
declarations in the grammar files, I believe we should keep it.
It looks like it's a duplicate of %type, but it is not. While POSIX
Yacc requires %type to apply only to nonterminal symbols, it appears
that both byacc and bison accept it for tokens too. And some
experienced users do actually expect this feature to group
symbols (terminal or not) by type ("On the other hand, it is generally
more useful IMHO to group terminals and non-terminals with the same
type tag together",
http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html).
Even Bison's own parser does this today (see CHAR).
Basically reverts 7928c3e6fb.
* src/scan-gram.l (%nterm): Dedeprecate, but issue a Wyacc warning.
* tests/input.at: Adjust expectations.
(Yacc warnings on symbols): New.
* src/symtab.c (symbol_class_set): Fix error introduced in
20b0746793.
I personally prefer 'non terminal', or 'non-terminal', but
'nonterminal' is the common spelling.
* data/glr.c, src/parse-gram.y, src/symtab.c, src/symtab.h,
* tests/input.at, doc/refcard.tex: here.
Currently our error messages include both "symbol redeclared" and
"symbol redefined", and they mean something different. This is
obscure, let's make this clearer.
I think the idea between 'definition' vs. 'declaration' is that in the
case of the nonterminals, the actual definition is its set of rules,
so %nterm would be about declaration. The case of %token is less
clear.
* src/symtab.c (complain_class_redefined): New.
(symbol_class_set): Use it.
Simplify the logic of this function to clearly skip its body when the
preconditions are not met.
* tests/input.at (Symbol class redefinition): New.
* examples/java/Calc.y: New, based on test 495: "Calculator
parse.error=verbose %locations".
* examples/java/Calc.test, examples/java/local.mk: New.
* configure.ac (ENABLE_JAVA): New.
* examples/test (prog): Be ready to run Java programs.
Reported by Rici Lake.
http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html
* src/complain.h: Formatting change.
* src/parse-gram.y (id): Reject character literals used in a context
for non-terminals.
* tests/input.at (Invalid %nterm uses): Check that.
Reported by Uxio Prego.
https://lists.gnu.org/archive/html/help-bison/2018-11/msg00031.html
We also need to move the unreachable 'goto' to a reachable place,
otherwise clang complains about the code being unreachable anyway.
See also https://bugs.llvm.org/show_bug.cgi?id=39736.
Interestingly, we don't have to apply that trick to
`#define YYCDEBUG if (false) std::cerr`, clang does not warn when the
code comes from macro expansion.
* configure.ac: Use -Wunreachable-code when supported.
* data/lalr1.cc, data/yacc.c: Pacify clang's warning about `if (0)`
by using a macro.
Another possibility was to move this statement to a reachable place.
* tests/actions.at, tests/c++.at: Avoid generating unreachable code.
Currently on a grammar such as
exp : a '1' | a '2' | a '3' | b '1' | b '2' | b '3'
a:
b:
we count only one rr-conflict on the `b:` rule, i.e., we expect:
b: %expect-rr 1
although there are 3 conflicts in total. That's because in the
conflicted state we count only a single conflict, not three (one for
each of the lookaheads: '1', '2', '3').
State 0
0 $accept: . exp $end
1 exp: . a '1'
2 | . a '2'
3 | . a '3'
4 | . b '1'
5 | . b '2'
6 | . b '3'
7 a: . %empty ['1', '2', '3']
8 b: . %empty ['1', '2', '3']
'1' reduce using rule 7 (a)
'1' [reduce using rule 8 (b)]
'2' reduce using rule 7 (a)
'2' [reduce using rule 8 (b)]
'3' reduce using rule 7 (a)
'3' [reduce using rule 8 (b)]
$default reduce using rule 7 (a)
exp go to state 1
a go to state 2
b go to state 3
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_rr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_rr_conflicts): Adjust.
* tests/conflicts.at (%expect-rr in grammar rules)
(%expect-rr too much in grammar rules)
(%expect-rr not enough in grammar rules): New.
On a grammar such as
exp: "num" | "num" | "num"
we currently report only one RR conflict, instead of two.
This bug is present since the origins of Bison
commit 08089d5d35
Author: David MacKenzie <djm@djmnet.org>
Date: Tue Apr 20 05:42:52 1993 +0000
Initial revision
and was preserved in
commit 676385e29c
Author: Paul Hilfinger <Hilfinger@CS.Berkeley.EDU>
Date: Fri Jun 28 02:26:44 2002 +0000
Initial check-in introducing experimental GLR parsing. See entry in
ChangeLog dated 2002-06-27 from Paul Hilfinger for details.
See
https://lists.gnu.org/archive/html/bison-patches/2018-11/msg00011.html
* src/conflicts.h, src/conflicts.c (count_state_rr_conflicts)
(count_rr_conflicts): Use only the correct count of conflicts.
* tests/glr-regression.at: Fix expectations.
Currently on a grammar such as
exp: "number" | exp "+" exp | exp "*" exp
we count only one sr-conflict for both binary rules, i.e., we expect:
exp: "number" | exp "+" exp %expect 1 | exp "*" exp %expect 1
although there are 4 conflicts in total. That's because in the states
in conflict, for instance that for the "+" rule:
State 6
2 exp: exp . "+" exp
2 | exp "+" exp . [$end, "+", "*"]
3 | exp . "*" exp
"+" shift, and go to state 4
"*" shift, and go to state 5
"+" [reduce using rule 2 (exp)]
"*" [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
we count only a single conflict, although there are two (one on "+"
and another with "*").
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html.
* src/conflicts.c (rule_has_state_sr_conflicts): Rename as...
(count_rule_state_sr_conflicts): this.
DWIM.
(count_rule_sr_conflicts): Adjust.
* tests/conflicts.at (%expect in grammar rules): New.
This change allows one to document (and check) which rules participate
in shift/reduce and reduce/reduce conflicts. This is particularly
important GLR parsers, where conflicts are a normal occurrence. For
example,
%glr-parser
%expect 1
%%
...
argument_list:
arguments %expect 1
| arguments ','
| %empty
;
arguments:
expression
| argument_list ',' expression
;
...
Looking at the output from -v, one can see that the shift-reduce
conflict here is due to the fact that the parser does not know whether
to reduce arguments to argument_list until it sees the token AFTER the
following ','. By marking the rule with %expect 1 (because there is a
conflict in one state), we document the source of the 1 overall shift-
reduce conflict.
In GLR parsers, we can use %expect-rr in a rule for reduce/reduce
conflicts. In this case, we mark each of the conflicting rules. For
example,
%glr-parser
%expect-rr 1
%%
stmt:
target_list '=' expr ';'
| expr_list ';'
;
target_list:
target
| target ',' target_list
;
target:
ID %expect-rr 1
;
expr_list:
expr
| expr ',' expr_list
;
expr:
ID %expect-rr 1
| ...
;
In a statement such as
x, y = 3, 4;
the parser must reduce x to a target or an expr, but does not know
which until it sees the '='. So we notate the two possible reductions
to indicate that each conflicts in one rule.
See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html.
* doc/bison.texi (Suppressing Conflict Warnings): Document %expect,
%expect-rr in grammar rules.
* src/conflicts.c (count_state_rr_conflicts): Adjust comment.
(rule_has_state_sr_conflicts): New static function.
(count_rule_sr_conflicts): New static function.
(rule_nast_state_rr_conflicts): New static function.
(count_rule_rr_conflicts): New static function.
(rule_conflicts_print): New static function.
(conflicts_print): Also use rule_conflicts_print to report on individual
rules.
* src/gram.h (struct rule): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* src/reader.c (grammar_midrule_action): Transfer expected_sr_conflicts,
expected_rr_conflicts to new rule, and turn off in current_rule.
(grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
(packgram): Transfer expected_sr_conflicts, expected_rr_conflicts
to new rule.
* src/reader.h (grammar_current_rule_expect_sr): New function.
(grammar_current_rule_expect_rr): New function.
* src/symlist.c (symbol_list_sym_new): Initialize expected_sr_conflicts,
expected_rr_conflicts.
* src/symlist.h (struct symbol_list): Add new fields expected_sr_conflicts,
expected_rr_conflicts.
* tests/conflicts.at: Add tests "%expect in grammar rule not enough",
"%expect in grammar rule right.", "%expect in grammar rule too much."
We may generate code such as
basic_symbol (typename Base::kind_type t, YY_RVREF (std::pair<int,int>) v);
which, of course, breaks, because YY_RVREF sees two arguments. Let's
not play tricks with _VA_ARGS__, I'm unsure about it portability.
Anyway, I plan to change more things in this area.
Reported by Sébastien Villemot.
http://lists.gnu.org/archive/html/bug-bison/2018-11/msg00014.html
* data/variant.hh (b4_basic_symbol_constructor_declare)
(b4_basic_symbol_constructor_define): Don't use macro on user types.
* tests/types.at: Check that we support pairs.
* tests/local.at (AT_LANG_FOR_EACH_STD): New.
(AT_REQUIRE_CXX_VERSION): Rename as...
(AT_REQUIRE_CXX_STD): this.
Accept an argument for what to do when the requirement is not met.
* tests/types.at (api.value.type): Check all the C++ stds.
It is unfortunate that %error_verbose was properly diagnosed as
obsoleted by "%define parse.error verbose", but %error-verbose was
not.
* src/parse-gram.y (%error-verbose): Remove support.
* src/scan-gram.l: Do it here instead, with a warning.
* tests/input.at (Deprecated directives): Check it.
These tests are skipped with GCC:
"\"".c:1:5: error: function declaration isn't a prototype [-Werror=strict-prototypes]
int main() { return 0; }
^~~~
* tests/synclines.at: Stop writing C++ in C.
* tests/local.at: Formatting changes.