bison

mirror of https://git.savannah.gnu.org/git/bison.git synced 2026-07-25 14:30:32 +00:00

Author	SHA1	Message	Date
Akim Demaille	dbe499e936	regen	2018-12-16 12:27:28 +01:00
Akim Demaille	1d5956f87f	symbols: clean up their parsing Prompted by Rici Lake. http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html We have four classes of directives that declare symbols: %nterm, %type, %token, and the family of %left etc. Currently not all of them support the possibility to have several type tags (`<type>`), and not all of them support the fact of not having any type tag at all (%type). Let's unify this. - %type POSIX Yacc specifies that %type is for nonterminals only. However, some Bison users want to use it for both tokens and nterms (actually, Bison's own grammar does this in several places, e.g., CHAR). So it should accept char/string literals. As a consequence cannot be used to declare tokens with their alias: `%type foo "foo"` would be ambiguous (are we defining foo = "foo", or are these two different symbols?) POSIX specifies that it is OK to use %type without a type tag. I'm not sure what it means, but we support it. - %token Accept token declarations with number and string literal: (ID\|CHAR) NUM? STRING?. - %left, etc. They cannot be the same as %token, because we accept to declare the symbol with %token, and to then qualify its precedence with %left. Then `%left foo "foo"` would also be ambiguous: foo="foo", or two symbols. They cannot be simply a list of identifiers, but POSIX Yacc says we can declare token numbers here. I personally think this is a bad idea, precedence management is tricky in itself and should not be cluttered with token declaration issues. We used to accept declaring a token number on a string literal here (e.g., `%left "token" 1`). This is abnormal. Either the feature is useful, and then it should be supported in %token, or it's useless and we should not support it in corner cases. - %nterm Obviously cannot accept tokens, nor char/string literals. Does not exist in POSIX Yacc, but since %type also works for terminals, it is a nice option to have. * src/parse-gram.y: Avoid relying on side effects. For instance, get rid of current_type, rather, build the list of symbols and iterate over it to assign the type. It's not always possible/convenient. For instance, we still use current_class. Prefer "decl" to "def", since in the rest of the implementation we actually "declare" symbols, we don't "define" them. (token_decls, token_decls_for_prec, symbol_decls, nterm_decls): New. Use them for %token, %left, %type and %nterm. * src/symlist.h, src/symlist.c (symbol_list_type_set): New. * tests/regression.at b/tests/regression.at (Token number in precedence declaration): We no longer accept to give a number to string literals.	2018-12-16 12:27:28 +01:00
Akim Demaille	fdceb6330f	symbols: set tag_seen when assigning a type to symbols * src/reader.h, src/reader.c (tag_seen): Move to... * src/symtab.h, src/symtab.c: here. (symbol_type_set): Set it to true. * src/parse-gram.y: Don't.	2018-12-15 17:41:25 +01:00
Akim Demaille	465a47d46b	parser: warn about string literals in Yacc mode * src/scan-gram.l (scan_integer): Warn. * tests/input.at (Yacc warnings on symbols): Check.	2018-12-14 05:10:31 +01:00
Akim Demaille	953a95695a	parser: warn about hexadecimal token numbers in Yacc mode * src/scan-gram.l (scan_integer): Warn. * tests/input.at (Yacc warnings on symbols): Check.	2018-12-14 05:10:31 +01:00
Akim Demaille	aadf6c0bf3	parser: reprecate %nterm back After having spent quite some time on cleaning the handling of symbol declarations in the grammar files, I believe we should keep it. It looks like it's a duplicate of %type, but it is not. While POSIX Yacc requires %type to apply only to nonterminal symbols, it appears that both byacc and bison accept it for tokens too. And some experienced users do actually expect this feature to group symbols (terminal or not) by type ("On the other hand, it is generally more useful IMHO to group terminals and non-terminals with the same type tag together", http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html). Even Bison's own parser does this today (see CHAR). Basically reverts `7928c3e6fb`. * src/scan-gram.l (%nterm): Dedeprecate, but issue a Wyacc warning. * tests/input.at: Adjust expectations. (Yacc warnings on symbols): New. * src/symtab.c (symbol_class_set): Fix error introduced in `20b0746793`.	2018-12-14 05:10:18 +01:00
Akim Demaille	d68f05d75c	style: s/non-terminal/nonterminal/ I personally prefer 'non terminal', or 'non-terminal', but 'nonterminal' is the common spelling. * data/glr.c, src/parse-gram.y, src/symtab.c, src/symtab.h, * tests/input.at, doc/refcard.tex: here.	2018-12-11 06:55:41 +01:00
Akim Demaille	b05aa7be2e	style: rename error functions for clarity * src/symtab.c (symbol_redeclaration, semantic_type_redeclaration) (user_token_number_redeclaration): Rename as... (complain_symbol_redeclared, complain_semantic_type_redeclared) (complain_user_token_number_redeclared): this.	2018-12-11 06:55:35 +01:00
Akim Demaille	20b0746793	parser: improve the error message for symbol class redefinition Currently our error messages include both "symbol redeclared" and "symbol redefined", and they mean something different. This is obscure, let's make this clearer. I think the idea between 'definition' vs. 'declaration' is that in the case of the nonterminals, the actual definition is its set of rules, so %nterm would be about declaration. The case of %token is less clear. * src/symtab.c (complain_class_redefined): New. (symbol_class_set): Use it. Simplify the logic of this function to clearly skip its body when the preconditions are not met. * tests/input.at (Symbol class redefinition): New.	2018-12-11 06:53:25 +01:00
Akim Demaille	4cbdcaa572	regen	2018-12-09 13:55:05 +01:00
Akim Demaille	1e6a68858a	regen	2018-12-09 12:50:53 +01:00
Akim Demaille	17730b0287	parser: minor refactoring * src/parse-gram.y (symbol.prec): Reuse int.opt.	2018-12-09 12:50:53 +01:00
Akim Demaille	157f12c483	parser: move checks inside the called functions Revamping the handling of the symbols is the grammar is much more delicate than I anticipated. Let's first move things around for clarity. * src/symtab.c (symbol_make_alias): Don't accept to alias non-terminals. (symbol_user_token_number_set): Don't accept user token numbers for non-terminals. Don't do anything in case of redefinition, instead of trying to update. The flow is eaier to follow this way.	2018-12-09 12:50:53 +01:00
Akim Demaille	401afe5cc2	parser: fix incorrect condition to raise a syntax error * src/parse-gram.y (symbol_def): Fix test.	2018-12-06 17:50:54 +01:00
Akim Demaille	156140dfc3	style: scope reduction in ielr.c * src/ielr.c: here.	2018-12-05 07:12:12 +01:00
Akim Demaille	4176584062	style: scope reduction in lalr.c * src/lalr.c: here.	2018-12-05 06:49:06 +01:00
Akim Demaille	22b2c286ff	d: add experimental support for the D language * configure.ac (ENABLE_D): New. * src/getargs.c (valid_languages): Add d.	2018-12-04 20:29:33 +01:00
Akim Demaille	f539a56620	regen	2018-12-03 18:42:00 +01:00
Akim Demaille	c44a782a4e	backend: revamp the handling of symbol types Currently it is the front end that passes the symbol types to the backend. For instance: %token <ival> NUM %type <ival> exp1 exp2 exp1: NUM { $$ = $1; } exp2: NUM { $<ival>$ = $<ival>1; } In both cases, $$ and $1 are passed to the backend as having type 'ival' resulting in code like `val.ival`. This is troublesome in the case of api.value.type=union, since in that the case the code this: %define api.value.type union %token <int> NUM %type <int> exp1 exp2 exp1: NUM { $$ = $1; } exp2: NUM { $<int>$ = $<int>1; } because in this case, since the backend does not know the symbol being processed, it is forced to generate casts in both cases: (int)(&val)`. This is unfortunate in the first case (exp1) where there is no reason at all to use a cast instead of `val.NUM` and `val.exp1`. So instead delegate the computation of the actual value type to the backend: pass $<ival>$ as `symbol-number, ival` and $$ as `symbol-number, MULL`, instead of passing `ival` before. * src/scan-code.l (handle_action_dollar): Find the symbol the action is about, not just its tyye. Pass both symbol-number, and explicit type tag ($<tag>n when there is one) to b4_lhs_value and b4_rhs_value. * data/bison.m4 (b4_symbol_action): adjust to the new signature to b4_dollar_pushdef. * data/c-like.m4 (_b4_dollar_dollar, b4_dollar_pushdef): Accept the symbol-number as new argument. * data/c.m4 (b4_symbol_value): Accept the symbol-number as new argument, and use it. (b4_symbol_value_union): Accept the symbol-number as new argument, and use it to prefer ready a union member rather than casting the union. * data/yacc.c (b4_lhs_value, b4_rhs_value): Accept the new symbol-number argument. Adjust uses of b4_dollar_pushdef. * data/glr.c (b4_lhs_value, b4_rhs_value): Adjust. * data/lalr1.cc (b4_symbol_value_template, b4_lhs_value): Adjust to the new symbol-number argument. * data/variant.hh (b4_symbol_value, b4_symbol_value_template): Accept the new symbol-number argument. * data/java.m4 (b4_symbol_value, b4_rhs_data): New. (b4_rhs_value): Use them. * data/lalr1.java: Adjust to b4_dollar_pushdef, and use b4_rhs_data.	2018-12-03 18:40:26 +01:00
Akim Demaille	e40db8976c	style: comment and formatting changes * data/bison.m4, data/c++.m4, data/glr.c, data/java.m4, data/lalr1.cc, * data/yacc.c, src/scan-code.l: Fix comments. Prefer POS to denote the position of a symbol in a rule, since NUM is also used to denote symbol numbers.	2018-12-03 08:42:26 +01:00
Akim Demaille	3422ee7435	style: unsigned int -> unsigned See https://lists.gnu.org/archive/html/bison-patches/2018-08/msg00027.html * src/output.c (muscle_insert_unsigned_int_table): Rename as... (muscle_insert_unsigned_table): this.	2018-12-01 11:13:08 +01:00
Akim Demaille	e1094c4c09	output: restore yyrhs and yyprhs This was demanded several times. See for instance: - David M. Warme https://lists.gnu.org/archive/html/help-bison/2011-04/msg00003.html - box12009 http://lists.gnu.org/archive/html/bug-bison/2016-10/msg00001.html Basically, this reverts: - commit `3d3bc1fe30` Get rid of (yy)rhs and (yy)prhs - commit `d333175f63` Avoid compiler warning. Note that since these tables are not needed in the generated parsers, no skeleton requests them. This change only brings back their definition to M4, making it possible to user-defined skeletons to use these tables. * src/output.c (muscle_insert_item_number_table): Define. (prepare_rules): Generate the rhs and prhs tables.	2018-12-01 11:12:59 +01:00
Akim Demaille	060da943bd	regen	2018-11-30 06:10:21 +01:00
Akim Demaille	b7577ea6f6	parser: shorten side-effects on current_type * src/parse-gram.y (tag.opt): Don't change current_type. Rather, return its value. Adjust dependencies.	2018-11-30 06:07:56 +01:00
Akim Demaille	6220e96e76	style: reduce scopes * src/symlist.c: here.	2018-11-30 06:04:03 +01:00
Akim Demaille	b1d6c42ae5	regen	2018-11-29 06:16:20 +01:00
Akim Demaille	8e092082cb	parser: factor the symbol definition * src/parse-gram.y (int.opt, string_as_id.opt): New. (symbol_def): Use it.	2018-11-29 06:16:20 +01:00
Akim Demaille	2c5e933672	parser: improve location of string alias errors * src/parse-gram.y (symbol_def): Pass the right location for symbol_make_alias. * tests/regression.at (Duplicate string): Move to... * tests/input.at: here. (Token collisions): New.	2018-11-29 06:16:20 +01:00
Akim DemailleandAkim Demaille	d92ed9d9f7	diagnostics: complain about Bison directives when -Wyacc * src/complain.h, src/complain.c (bison_directive): New. * src/scan-gram.l (BISON_DIRECTIVE): New. Use it for Bison extensions.	2018-11-29 06:16:20 +01:00
Akim Demaille	0e9eade009	regen	2018-11-27 08:32:49 +01:00
Akim Demaille	9686b585e7	%nterm: do not accept character literals Reported by Rici Lake. http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html * src/complain.h: Formatting change. * src/parse-gram.y (id): Reject character literals used in a context for non-terminals. * tests/input.at (Invalid %nterm uses): Check that.	2018-11-27 08:25:38 +01:00
Akim Demaille	4bddd33439	%nterm: do not accept numbers nor string alias Reported by Rici Lake. http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html * src/parse-gram.y (symbol_def): Refuse string aliases and numbers for non-terminals. (prologue_declaration): Recover from errors ended with ';'. * tests/input.at (Invalid %nterm uses): New.	2018-11-27 08:25:38 +01:00
Akim Demaille	bcecfbafab	gnulib: update to use its bitsets Bison's bitset were moved to gnulib. * lib/abitset.c, lib/abitset.h, lib/bbitset.h, lib/bitset.c, * lib/bitset.h, lib/ebitset.c, lib/ebitset.h, lib/lbitset.c, * lib/bitset_stats.c, lib/bitset_stats.h, lib/bitsetv-print.c, * lib/bitsetv-print.h, lib/bitsetv.c, lib/bitsetv.h, * lib/lbitset.h, lib/vbitset.c, lib/vbitset.h: Remove. * gnulib: Update. * bootstrap.conf, lib/local.mk: Adjust.	2018-11-26 06:33:45 +01:00
Akim Demaille	9ffed56cd9	regen	2018-11-25 11:27:08 +01:00
Akim Demaille	7ded5bb764	%expect-rr: tune the number of conflicts per rule Currently on a grammar such as exp : a '1' \| a '2' \| a '3' \| b '1' \| b '2' \| b '3' a: b: we count only one rr-conflict on the `b:` rule, i.e., we expect: b: %expect-rr 1 although there are 3 conflicts in total. That's because in the conflicted state we count only a single conflict, not three (one for each of the lookaheads: '1', '2', '3'). State 0 0 $accept: . exp $end 1 exp: . a '1' 2 \| . a '2' 3 \| . a '3' 4 \| . b '1' 5 \| . b '2' 6 \| . b '3' 7 a: . %empty ['1', '2', '3'] 8 b: . %empty ['1', '2', '3'] '1' reduce using rule 7 (a) '1' [reduce using rule 8 (b)] '2' reduce using rule 7 (a) '2' [reduce using rule 8 (b)] '3' reduce using rule 7 (a) '3' [reduce using rule 8 (b)] $default reduce using rule 7 (a) exp go to state 1 a go to state 2 b go to state 3 See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html. * src/conflicts.c (rule_has_state_rr_conflicts): Rename as... (count_rule_state_sr_conflicts): this. DWIM. (count_rule_rr_conflicts): Adjust. * tests/conflicts.at (%expect-rr in grammar rules) (%expect-rr too much in grammar rules) (%expect-rr not enough in grammar rules): New.	2018-11-22 08:34:10 +01:00
Akim Demaille	ad0b4661d1	%expect-rr: fix the computation of the overall number of conflicts On a grammar such as exp: "num" \| "num" \| "num" we currently report only one RR conflict, instead of two. This bug is present since the origins of Bison commit `08089d5d35` Author: David MacKenzie <[email protected]> Date: Tue Apr 20 05:42:52 1993 +0000 Initial revision and was preserved in commit `676385e29c` Author: Paul Hilfinger <[email protected]> Date: Fri Jun 28 02:26:44 2002 +0000 Initial check-in introducing experimental GLR parsing. See entry in ChangeLog dated 2002-06-27 from Paul Hilfinger for details. See https://lists.gnu.org/archive/html/bison-patches/2018-11/msg00011.html * src/conflicts.h, src/conflicts.c (count_state_rr_conflicts) (count_rr_conflicts): Use only the correct count of conflicts. * tests/glr-regression.at: Fix expectations.	2018-11-22 08:34:07 +01:00
Akim Demaille	e51fd547ca	%expect: tune the number of conflicts per rule Currently on a grammar such as exp: "number" \| exp "+" exp \| exp "" exp we count only one sr-conflict for both binary rules, i.e., we expect: exp: "number" \| exp "+" exp %expect 1 \| exp "" exp %expect 1 although there are 4 conflicts in total. That's because in the states in conflict, for instance that for the "+" rule: State 6 2 exp: exp . "+" exp 2 \| exp "+" exp . [$end, "+", ""] 3 \| exp . "" exp "+" shift, and go to state 4 "" shift, and go to state 5 "+" [reduce using rule 2 (exp)] "" [reduce using rule 2 (exp)] $default reduce using rule 2 (exp) we count only a single conflict, although there are two (one on "+" and another with ""). See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00106.html. src/conflicts.c (rule_has_state_sr_conflicts): Rename as... (count_rule_state_sr_conflicts): this. DWIM. (count_rule_sr_conflicts): Adjust. * tests/conflicts.at (%expect in grammar rules): New.	2018-11-21 22:10:35 +01:00
Akim Demaille	4ebebcc438	regen	2018-11-21 22:10:35 +01:00
Akim DemailleandAkim Demaille	2b2556b41c	style: reduce scopes * src/conflicts.c, src/reader.c: Minor style changes.	2018-11-21 22:08:47 +01:00
Paul HilfingerandAkim Demaille	b34b12c4f9	allow %expect and %expect-rr modifiers on individual rules This change allows one to document (and check) which rules participate in shift/reduce and reduce/reduce conflicts. This is particularly important GLR parsers, where conflicts are a normal occurrence. For example, %glr-parser %expect 1 %% ... argument_list: arguments %expect 1 \| arguments ',' \| %empty ; arguments: expression \| argument_list ',' expression ; ... Looking at the output from -v, one can see that the shift-reduce conflict here is due to the fact that the parser does not know whether to reduce arguments to argument_list until it sees the token AFTER the following ','. By marking the rule with %expect 1 (because there is a conflict in one state), we document the source of the 1 overall shift- reduce conflict. In GLR parsers, we can use %expect-rr in a rule for reduce/reduce conflicts. In this case, we mark each of the conflicting rules. For example, %glr-parser %expect-rr 1 %% stmt: target_list '=' expr ';' \| expr_list ';' ; target_list: target \| target ',' target_list ; target: ID %expect-rr 1 ; expr_list: expr \| expr ',' expr_list ; expr: ID %expect-rr 1 \| ... ; In a statement such as x, y = 3, 4; the parser must reduce x to a target or an expr, but does not know which until it sees the '='. So we notate the two possible reductions to indicate that each conflicts in one rule. See https://lists.gnu.org/archive/html/bison-patches/2013-02/msg00105.html. * doc/bison.texi (Suppressing Conflict Warnings): Document %expect, %expect-rr in grammar rules. * src/conflicts.c (count_state_rr_conflicts): Adjust comment. (rule_has_state_sr_conflicts): New static function. (count_rule_sr_conflicts): New static function. (rule_nast_state_rr_conflicts): New static function. (count_rule_rr_conflicts): New static function. (rule_conflicts_print): New static function. (conflicts_print): Also use rule_conflicts_print to report on individual rules. * src/gram.h (struct rule): Add new fields expected_sr_conflicts, expected_rr_conflicts. * src/reader.c (grammar_midrule_action): Transfer expected_sr_conflicts, expected_rr_conflicts to new rule, and turn off in current_rule. (grammar_current_rule_expect_sr): New function. (grammar_current_rule_expect_rr): New function. (packgram): Transfer expected_sr_conflicts, expected_rr_conflicts to new rule. * src/reader.h (grammar_current_rule_expect_sr): New function. (grammar_current_rule_expect_rr): New function. * src/symlist.c (symbol_list_sym_new): Initialize expected_sr_conflicts, expected_rr_conflicts. * src/symlist.h (struct symbol_list): Add new fields expected_sr_conflicts, expected_rr_conflicts. * tests/conflicts.at: Add tests "%expect in grammar rule not enough", "%expect in grammar rule right.", "%expect in grammar rule too much."	2018-11-21 22:08:47 +01:00
Akim Demaille	ebb92c0545	regen	2018-11-20 20:04:06 +01:00
Akim Demaille	e0de1020ea	style: avoid lengthy actions We also lack a consistent naming for directive implementations. `directive_skeleton` is too long, `percent_skeleton` is not very nice looking, `process_skeleton` looks ambiguous, `do_skeleton` is somewhat ambiguous too, but seems a better track. * src/parse-gram.y (version_check): Rename as... (do_require): this. (do_skeleton): New. Use it.	2018-11-20 20:03:01 +01:00
Akim Demaille	a52723e3e8	style: formatting changes * src/scan-gram.l: here.	2018-11-13 07:46:08 +01:00
Akim Demaille	4810ed8107	regen	2018-11-12 07:41:46 +01:00
Akim Demaille	35b8e0e947	parser: deprecate %error-verbose It is unfortunate that %error_verbose was properly diagnosed as obsoleted by "%define parse.error verbose", but %error-verbose was not. * src/parse-gram.y (%error-verbose): Remove support. * src/scan-gram.l: Do it here instead, with a warning. * tests/input.at (Deprecated directives): Check it.	2018-11-12 07:41:46 +01:00
Akim Demaille	7928c3e6fb	parser: deprecate %nterm It has several weaknesses. Reported by Rici Lake. http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html * src/scan-gram.l: here.	2018-11-12 07:28:20 +01:00
Akim Demaille	3d601616da	regen	2018-11-10 17:03:36 +01:00
Akim Demaille	bda2bed459	reader: no longer accept %define variable names in quotes It was never documented. * src/parse-gram.y (variable): Here.	2018-11-10 17:02:50 +01:00
Akim Demaille	3ae81aa338	dogfooding: use api.value.type union * src/parse-gram.y (api.value.type): Set to union. Replace occurrences of %union with explicit %types. * src/scan-gram.l: Adjust yylval's field names. (RETURN_VALUE): No longer needs the Field argument. Use it more.	2018-11-10 17:02:50 +01:00
Akim Demaille	eee37354b5	scanner: simplify use of gettext * src/scan-gram.l (unexpected_end): Leave the actual call to gettext to the caller.	2018-11-10 17:02:50 +01:00

1 2 3 4 5 ...