bison

mirror of https://git.savannah.gnu.org/git/bison.git synced 2026-03-09 12:23:04 +00:00

Author	SHA1	Message	Date
Akim Demaille	95421df67b	tokens: define the "$undefined" token kind * data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to $undefined. (b4_token_visible_if): $undefined has an id. * src/output.c (prepare_symbol_definitions): Stop lying: $undefined _is_ a token. * tests/input.at: Adjust.	2020-04-12 13:56:43 +02:00
Akim Demaille	a4ed94bc13	tokens: properly define the "error" token kind There are people out there that do use YYERRCODE (the token kind of the error token). See for instance `3812012bb7/unixODBC-2.3.2/Drivers/nn/yylex.c`. Currently, YYERRCODE is defined by yacc.c in an adhoc way as a #define in the .c file only. It belongs with the other token kinds. YYERRCODE is not a nice name, it does not fit in our naming scheme. YYERROR would be more logical, but it collides with the YYERROR macro. Shall we keep the same name in all the skeletons? Besides, to avoid collisions in C, we need to apply the api prefix: YYERRCODE is actually <PREFIX>ERRCODE. This is not needed in the other languages. data/skeletons/bison.m4 (b4_symbol_token_kind): New. Map the error token to "YYERRCODE". * data/skeletons/yacc.c (YYERRCODE): Don't define it, it's handled by... * src/output.c (prepare_symbol_definitions): this. * tests/input.at (Redefining the error token): Check it.	2020-04-12 13:56:43 +02:00
Akim Demaille	07726f1178	tokens: style: minor fixes * data/skeletons/bison.m4 (b4_symbol_kind): Dispatch on the UNDEF token number rather than its name. * data/skeletons/c++.m4, data/skeletons/c.m4, data/skeletons/java.m4: Comment changes.	2020-04-12 13:56:43 +02:00
Akim Demaille	007e1b5f0a	symbols: minor fixes * data/skeletons/bison.m4 (b4_symbol_kind): Series of _ are useless, one is enough. * data/skeletons/c.m4 (b4_token_enum): Fix overquoting.	2020-04-10 18:33:02 +02:00
Akim Demaille	bbb9750b3e	skeletons: introduce api.symbol.prefix * data/skeletons/bison.m4 (b4_symbol_prefix): New. (b4_symbol_kind): Use it. * data/skeletons/c++.m4, data/skeletons/c.m4, data/skeletons/d.m4 * data/skeletons/java.m4 (api.symbol.prefix): Provide a default value. * data/skeletons/glr.c, data/skeletons/glr.cc, data/skeletons/lalr1.cc, * data/skeletons/lalr1.d, data/skeletons/lalr1.java, data/skeletons/yacc.c: Adjust: use b4_symbol_prefix instead of YYSYMBOL_.	2020-04-07 08:40:16 +02:00
Akim Demaille	e657f04b62	c: make the symbol kind definition nicer to read From enum yysymbol_kind_t { YYSYMBOL_YYEMPTY = -2, YYSYMBOL_YYEOF = 0, YYSYMBOL_YYERROR = 1, YYSYMBOL_YYUNDEF = 2, to enum yysymbol_kind_t { YYSYMBOL_YYEMPTY = -2, YYSYMBOL_YYEOF = 0, /* "end of file" / YYSYMBOL_YYERROR = 1, / error / YYSYMBOL_YYUNDEF = 2, / $undefined / data/skeletons/bison.m4 (b4_last_symbol): New. (b4_symbol_enum, b4_symbol_enums): Reformat the output. * data/skeletons/c.m4	2020-04-06 18:43:34 +02:00
Akim Demaille	10e61eec6d	c: make the token kind definition nicer to read From enum gram_tokentype { GRAM_EOF = 0, STRING = 3, TSTRING = 4, PERCENT_TOKEN = 5, To enum gram_tokentype { GRAM_EOF = 0, /* "end of file" / STRING = 3, / "string" / TSTRING = 4, / "translatable string" / PERCENT_TOKEN = 5, / "%token" / data/skeletons/bison.m4 (b4_last_enum_token): New. * data/skeletons/c.m4 (b4_token_enum, b4_token_enums): Show the corresponding symbol.	2020-04-06 18:43:34 +02:00
Akim Demaille	f0bb82ae9e	skeletons: use consistently "kind" instead of "type" in the code * data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4, * data/skeletons/glr.cc, data/skeletons/lalr1.cc, * data/skeletons/lalr1.d, data/skeletons/lalr1.java: Refer to the "kind" of a symbol, not its "type", where appropriate.	2020-04-05 19:14:39 +02:00
Akim Demaille	2b7bde9d13	m4: rename b4_symbol_sid as b4_symbol_kind * data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4, * data/skeletons/d.m4, data/skeletons/java.m4 (b4_symbol_sid): Rename as... (b4_symbol_kind): this. Adjust dependencies. * data/README.md: Document the kind.	2020-04-05 14:56:19 +02:00
Akim Demaille	fd37eb057e	yysymbol_type_t: always assign an enumerator Currently we define enumerators only for symbols that have an identifier. That rules out tokens such as '+', and nonterminals such as foo-bar and foo.bar. As a consequence we are taking chances: the compiler might compile yysymbol_type_t as too small an integral type for some symbol codes. * data/skeletons/bison.m4 (b4_symbol_sid): Forge a unique symbol identifier for symbols that don't have an ID.	2020-04-01 08:31:48 +02:00
Akim Demaille	75a605454d	yacc.c: prefer YYSYMBOL_YYERROR to YYSYMBOL_error * data/skeletons/bison.m4 (b4_symbol_sid): Map "error" to YYSYMBOL_YYERROR. * data/skeletons/yacc.c: Adjust.	2020-04-01 08:31:48 +02:00
Akim Demaille	f3c18c8e80	yacc.c: also define a symbol number for the empty token This is not only cleaner, it also protects us from mixing signed values (YYEMPTY is #defined as -2) with unsigned types (the yysymbol_type_t enum is typically compiled as a small unsigned). For instance GCC 9: input.c: In function 'yyparse': input.c:1107:7: error: conversion to 'unsigned int' from 'int' may change the sign of the result [-Werror=sign-conversion] 1107 \| yyn += yytoken; \| ^~ input.c:1107:10: error: conversion to 'int' from 'unsigned int' may change the sign of the result [-Werror=sign-conversion] 1107 \| yyn += yytoken; \| ^~~~~~~ input.c:1108:47: error: comparison of integer expressions of different signedness: 'yytype_int8' {aka 'const signed char'} and 'yysymbol_type_t' {aka 'enum yysymbol_type_t'} [-Werror=sign-compare] 1108 \| if (yyn < 0 \|\| YYLAST < yyn \|\| yycheck[yyn] != yytoken) \| ^~ input.c:702:25: error: operand of ?: changes signedness from 'int' to 'unsigned int' due to unsignedness of other operand [-Werror=sign-compare] 702 \| #define YYEMPTY (-2) \| ^~~~ input.c:1220:33: note: in expansion of macro 'YYEMPTY' 1220 \| yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar); \| ^~~~~~~ input.c:1220:41: error: unsigned conversion from 'int' to 'unsigned int' changes value from '-2' to '4294967294' [-Werror=sign-conversion] 1220 \| yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar); \| ^ Eventually, it might be interesting to move away from -2 (which is the only possible negative symbol number) and use the next available number, to save bits. We could actually even simply use "0" and shift the rest, which would allow to write "!yytoken" to mean really "yytoken != YYEMPTY". * data/skeletons/c.m4 (b4_declare_symbol_enum): Define YYSYMBOL_YYEMPTY. * data/skeletons/yacc.c: Use it. * src/parse-gram.y (yyreport_syntax_error): Use YYSYMBOL_YYEMPTY, not YYEMPTY, when dealing with a symbol. * tests/regression.at: Adjust.	2020-04-01 08:31:48 +02:00
Akim Demaille	3ba001baac	yacc.c: introduce an enum that defines the symbol's number There's a number of advantage in exposing the symbol (internal) numbers: - custom error messages can use them to decide how to represent a given symbol, or a set of symbols. - we need something similar in uses of yyexpected_tokens. For instance, currently, bistromathic's completion() reads: int ntokens = expected_tokens (line, tokens, YYNTOKENS); [...] for (int i = 0; i < ntokens; ++i) if (tokens[i] == YYTRANSLATE (TOK_VAR)) [...] else if (tokens[i] == YYTRANSLATE (TOK_FUN)) [...] else [...] - now that it's a compile-time expression, we can easily build static tables, switch, etc. - some users depended on the ability to get the token number from a symbol to write test cases for their scanners. But Bison 3.5 removed the table this feature depended upon (a reverse yytranslate). Now they can check against the actual symbol number, without having pay (space and time) a conversion. See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html. - it helps us clearly separate the internal symbol numbers from the external token numbers, whose difference is sometimes blurred in the code when values coincide (e.g. "yychar = yytoken = YYEOF"). - it allows us to get rid of ugly macros with inconsistent names such as YYUNDEFTOK and YYTERROR, and to group related definitions together. - similarly it provides a clean access to the $accept symbol (which proves convenient in a current experimentation of mine with several %start symbols). Let's declare this type as a private type (in the .c file, not the .h one). So it does not need to be influenced by the api prefix. * data/skeletons/bison.m4 (b4_symbol_sid): New. (b4_symbol): Use it. * data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New. * data/skeletons/yacc.c: Use b4_declare_symbol_enum. (YYUNDEFTOK, YYTERROR): Remove. Use the corresponding symbol enum instead.	2020-04-01 08:31:33 +02:00
Akim Demaille	4140320a0a	style: comment changes about token numbers * data/skeletons/bison.m4, data/skeletons/c.m4: here.	2020-03-30 08:41:12 +02:00
Akim Demaille	77bdcc6f0c	parse.error: document and diagnose the incompatibility with %token-table * doc/bison.texi (Tokens from Literals): Move to code using %token-table to... (Decl Summary: %token-table): here. * data/skeletons/bison.m4: Implement mutual exclusion. * tests/input.at: Check it. * doc/local.mk: Be robust to the removal of doc/.	2020-02-10 20:15:46 +01:00
Akim Demaille	bc74b4b15a	skeletons: avoid b4_error_verbose_if, which is confusing parse.error has more than two possible values. * data/skeletons/bison.m4 (b4_error_verbose_if, b4_error_verbose_flag): Remove. (b4_parse_error_case, b4_parse_error_bmatch): New. Adjust dependencies.	2020-02-10 07:24:38 +01:00
Akim Demaille	8dd8137c38	skeletons: decorelate %token-table from verbose error messages Reported by Adrian Vogelsgesang. * data/skeletons/bison.m4: Here. * data/skeletons/lalr1.cc: Adjust.	2020-02-10 07:24:38 +01:00
Akim Demaille	650b253843	m4: fix b4_token_format We used to emit: /** Token number,to be returned by the scanner. / static final int NUM = 258; /* Token number,to be returned by the scanner. / static final int NEG = 259; with no space after the comma. Fix that. data/skeletons/bison.m4 (b4_token_format): Quote where appropriate.	2020-02-08 11:24:53 +01:00
Akim Demaille	f443673450	yacc.c: add support for parse.error detailed "detailed" error messages are almost like "verbose", except that we don't double escape them, they don't get inner quotes, we don't use yytnamerr, and we hide the table. "custom" is exposed with the "detailed" tokens, not the "verbose" ones: they are not double-quoted. Because there's a risk that some people use yytname even without "verbose", let's keep yytname (instead of yys_name) in "simple" parse.error. * src/output.c (prepare_symbol_names): Be ready to output symbol names unquoted. (prepare_symbol_names): Output both the old tname table, and the new symbol_names one. * data/skeletons/bison.m4: Accept 'detailed'. * data/skeletons/yacc.c: When parse.error is 'detailed', don't emit yytname and yytnamerr, just yysymbol_name with the table inside. * tests/calc.at: Adjust.	2020-01-19 14:51:14 +01:00
Akim Demaille	cda1934606	yacc.c: add custom error message generation When parse.error is custom, let users define a yyreport_syntax_error function, and use it. * data/skeletons/bison.m4 (b4_error_verbose_if): Accept 'custom'. * data/skeletons/yacc.c: Implement it. * examples/c/calc/calc.y: Experiment with it.	2020-01-17 06:49:59 +01:00
Akim Demaille	8036635251	package: bump copyrights to 2020 Run 'make update-copyright'.	2020-01-05 10:26:35 +01:00
Akim Demaille	fc2040a750	c++: fix comments for %code blocks In a project of mine, vcsn, this commit fixes the following comments. --- /tmp/parse.hh 2019-12-08 15:51:24.792934703 +0100 +++ lib/vcsn/rat/parse.hh 2019-12-08 16:00:59.137107503 +0100 @@ -43,7 +43,7 @@ #ifndef YY_YY_USERS_AKIM_SRC_LRDE_2_LIB_VCSN_RAT_PARSE_HH_INCLUDED # define YY_YY_USERS_AKIM_SRC_LRDE_2_LIB_VCSN_RAT_PARSE_HH_INCLUDED -// // "%code requires" blocks. +// "%code requires" blocks. #line 20 "/Users/akim/src/lrde/2/lib/vcsn/rat/parse.yy" #include <iostream> @@ -1851,7 +1851,7 @@ -// // "%code provides" blocks. +// "%code provides" blocks. #line 60 "/Users/akim/src/lrde/2/lib/vcsn/rat/parse.yy" #define YY_DECL_(Class) \ * data/skeletons/bison.m4 (b4_percent_code_get): Pass an expanded string to b4_comment.	2019-12-08 16:03:36 +01:00
Akim Demaille	9e9e49224f	diagnostics: style changes * src/complain.h, src/complain.c: Comment changes. * src/scan-skel.l: Reduce scopes. * data/skeletons/bison.m4: Factor diagnostic functions.	2019-12-02 19:35:01 +01:00
Akim Demaille	9861bcc540	api.token.raw: implement Bison used to feature %raw, documented as follows: @item %raw The output file @file{@var{name}.h} normally defines the tokens with Yacc-compatible token numbers. If this option is specified, the internal Bison numbers are used instead. (Yacc-compatible numbers start at 257 except for single character tokens; Bison assigns token numbers sequentially for all tokens starting at 3.) Unfortunately, as far as I can tell, it never worked: token numbers are indeed changed in the generated tables (from external token number to internal), yet the code was still applying the mapping from external token numbers to internal token numbers. This commit reintroduces the feature as it was expected to be. * data/skeletons/bison.m4 (b4_token_format): When api.token.raw is enabled, use the internal token number. * data/skeletons/yacc.c (yytranslate): Don't emit if api.token.raw is enabled. (YYTRANSLATE): Adjust.	2019-09-14 09:55:17 +02:00
Akim Demaille	1161649446	preserve the indentation in the ouput Preserve the actions' initial indentation. For instance, on \| %define api.value.type {int} \| %% \| exp: exp '/' exp { if ($3) \| $$ = $1 + $3; \| else \| $$ = 0; } we used to generate \| { if (yyvsp[0]) \| yyval = yyvsp[-2] + yyvsp[0]; \| else \| yyval = 0; } now we produce \| { if (yyvsp[0]) \| yyval = yyvsp[-2] + yyvsp[0]; \| else \| yyval = 0; } See https://lists.gnu.org/archive/html/bison-patches/2019-06/msg00012.html. * data/skeletons/bison.m4 (b4_symbol_action): Output the code in column 0, leave indentation matters to the C code. * src/output.c (user_actions_output): Preserve the incoming indentation in the output. (prepare_symbol_definitions): Likewise for %printer/%destructor. * tests/synclines.at (Output columns): New.	2019-07-02 07:38:52 +02:00
Akim Demaille	9260e5ca4f	api.location.type: support it in C Reported by Balázs Scheidler. * data/skeletons/c.m4 (b4_location_type_define): Use api.location.type if defined. * doc/bison.texi: Document it. * tests/local.at (AT_C_IF, AT_LANG_CASE): New. Support Span in C. * tests/calc.at (Span): Convert it to be usable in C and C++. Check api.location.type with yacc.c and glr.c.	2019-04-25 20:20:59 +02:00
Akim Demaille	0f193d2d21	no-lines: avoid leaving an empty line instead of the syncline Currently, with --no-lines, instead of "#line file line\n", we emit "\n". Let's emit nothing. * data/skeletons/bison.m4 (b4_syncline): Emit at end-of-line when enabled. * data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.cc, * data/skeletons/lalr1.cc, src/output.c: Use dnl after b4_syncline to avoid spurious empty lines. * tests/synclines.at (Sync Lines): Make sure that --no-lines is like grep -v #line. * tests/calc.at: Make sure that a rich grammar file behaves properly with %no-lines.	2019-04-03 19:20:39 +02:00
Akim Demaille	9832fdd6ef	java: use full locations for diagnostics about destructors Currently we use the syncline to report errors about a symbol's destructor/printer. This is not accurate (only file and line), and this is incorrect: the file name is double quotes (a recent change, needed to make sure we escape properly double quotes in it). And worst of all: with --no-line, b4_syncline expands to nothing. Rather, push the locations into the backend, and use them. * src/muscle-tab.h, src/muscle-tab.c (muscle_location_grow): Make it public. * src/output.c (prepare_symbol_definitions): Use it to pubish the location of the printer and destructor. * data/skeletons/lalr1.java: Use complain_at instead of complain. * tests/java.at (Java invalid directives): Adjust expectations. * data/skeletons/bison.m4 (b4_symbol_action_location): Remove. We should not use b4_syncline this way.	2019-04-03 19:20:39 +02:00
Akim Demaille	91bbf4219d	simplify the generated #line Currently we generate things like: #line 683 "src/parse-gram.y" /* yacc.c:316 / The first part is of course very important: compilers point the users to their grammar file rather than into the generated parser. The second part points to the place in the skeletons that generated this piece of code. This dependency on the Bison skeletons generates lots of useless 'git diff'. This location is useless for the regular user (who does not care about the skeletons) and is actually not useful for Bison developpers too (I never used this to locate the code in skeletons that generated output). So disable it completely. If someone thinks this was actually useful, a %define variable should be provided to control the level of verbosity of '#line', in replacement of --no-lines. So now, generate: #line 683 "src/parse-gram.y" data/skeletons/bison.m4 (b4_sync_end): Emit nothing.	2019-03-16 10:12:09 +01:00
Akim Demaille	cacdfc2f6e	gram: fix handling of nterms in actions when some are unused Since Bison 3.3, semantic values in rule actions (i.e., '$...') are passed to the m4 backend as the symbol number. Unfortunately, when there are unused symbols, the symbols are renumbered _after_ the numbers were used in the rule actions. As a result, the evaluation of the skeleton failed because it used non existing symbol numbers. Which is the happy scenario: we could use numbers of other existing symbols... Reported by Balázs Scheidler. http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html Translating the rule actions after the symbol renumbering moves too many parts in bison. Relying on the symbol identifiers is more troublesome than it might first seem: some don't have an identifier (tokens with only a literal string), some might have a complex one (tokens with a literal string with characters special for M4). Well, these are tokens, but nterms also have issues: "dummy" nterms (for midrule actions) are named $@32 etc. which is risky for M4. Instead, let's simply give M4 the mapping between the old numbers and the new ones. To avoid confusion between old and new numbers, always emit pre-renumbering numbers as "orig NUM". * data/README: Give details about "orig NUM". * data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the "orig NUM". * src/output.c (prepare_symbol_definitions): Pass nterm_map to m4. * src/reduce.h, src/reduce.c (nterm_map): Extract it from nonterminals_reduce, to make it public. (reduce_free): Free it. * src/scan-code.l (handle_action_dollar): When referring to a nterm, use "orig NUM". * tests/reduce.at (Useless Parts): New, based Balázs Scheidler's report.	2019-02-03 10:05:53 +01:00
Akim Demaille	665c5d688c	style: formatting changes * data/skeletons/lalr1.cc: Add dnl. * data/skeletons/bison.m4: Comment the use of dnl.	2019-01-26 10:46:33 +01:00
Akim Demaille	2471733f1a	package: bump copyrights to 2019	2019-01-05 14:58:05 +01:00
Akim Demaille	d07564af63	style: remove stray empty lines * data/skeletons/glr.c, data/skeletons/glr.cc: here. * data/skeletons/bison.m4 (b4_glr_cc_if): Move it here.	2019-01-02 08:01:48 +01:00
Akim Demaille	112ccb5ed7	package: move skeletons into data/skeletons * data/bison.m4, data/c++-skel.m4, data/c++.m4, data/c-like.m4, * data/c-skel.m4, data/c.m4, data/d-skel.m4, data/d.m4, data/glr.c, * data/glr.cc, data/java-skel.m4, data/java.m4, data/lalr1.cc, * data/lalr1.d, data/lalr1.java, data/location.cc, data/stack.hh, * data/variant.hh, data/yacc.c: Move to... * data/skeletons: here. Use b4_skeletonsdir instead of b4_pkgdatadir. * data/local.mk, src/output.c: Adjust.	2018-12-25 07:47:51 +01:00

34 Commits