For instance, in the case of Bison's own parser:
- case 40:
+ case 40: /* grammar_declaration: "%code" "identifier" "{...}" */
{
muscle_percent_code_grow ((yyvsp[-1].ID), (yylsp[-1]),
translate_code_braceless ((yyvsp[0].BRACED_CODE), (yylsp[0])),
(yylsp[0]));
code_scanner_last_string_free ();
}
break;
* data/skeletons/c.m4: Modified.
* data/skeletons/d.m4: Modified.
* data/skeletons/java.m4: Modified.
* src/output.c (output_escaped): New.
(quoted_output): Use it, and rename as...
(output_quoted): this.
Adjust dependencies.
(rule_output): New.
(user_actions_output): Use it.
* data/skeletons/c.m4, data/skeletons/d.m4, data/skeletons/java.m4
(b4_case): Add support for $3, an optional comment.
Teaches bison about a new command line option, --file-prefix-map OLD=NEW
(based on the -ffile-prefix-map option from GCC) which causes it to
replace and file path of OLD in the text of the output file with NEW,
mainly for header guards and comments. The primary use of this is to
make builds reproducible with different input paths, and in particular
the debugging information produced when the source code is compiled. For
example, a distro may know that the bison source code will be located at
"/usr/src/bison" and thus can generate bison files that are reproducible
with the following command:
bison --output=/build/bison/parse.c -d --file-prefix-map=/build/bison/=/usr/src/bison/ parse.y
Importantly, this will change the header guards and #line directives
from:
#ifndef YY_BUILD_BISON_PARSE_H
#line 100 "/build/bison/parse.h"
to
#ifndef YY_USR_SRC_BISON_PARSE_H
#line 100 "/usr/src/bison/parse.h"
which is reproducible.
See https://lists.gnu.org/r/bison-patches/2020-05/msg00016.html
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
* src/files.h, src/files.c (spec_mapped_header_file)
(mapped_dir_prefix, map_file_name, add_prefix_map): New.
* src/getargs.c (-M, --file-prefix-map): New option.
* src/output.c (prepare): Define b4_mapped_dir_prefix and
b4_spec_header_file.
* src/scan-skel.l (@ofile@): Output the mapped file name.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/location.cc,
* data/skeletons/yacc.c:
Adjust.
* doc/bison.texi: Document.
* tests/input.at, tests/output.at: Check.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
Let --trace=m4-early dump all the logs from the start (as --trace=m4
used to do), and have --trace=m4 now start traces only when actually
working of the user's grammar.
Can make a big difference in the case of small inputs. E.g.
$ bison -S tests/testsuite.dir/001/input.m4 tests/testsuite.dir/001/input.y --trace=m4 |& wc
3952 19446 251068
$ bison -S tests/testsuite.dir/001/input.m4 tests/testsuite.dir/001/input.y --trace=m4-early |& wc
19491 131904 1830495
* data/skeletons/traceon.m4: New.
* src/getargs.h, src/getargs.c: Introduce --trace=m4-early.
* src/output.c (output_skeleton): Adjust for --trace=m4 and --trace=m4-early.
Commit a4ede8f85b ("package: make bison
a relocatable package") made Bison relocatable, but in fact it still
contains one absolute reference: the M4 variable, which points to the
M4 program. Let's fix that by using relocate(), see if an M4 binary is
available at the relocated location, and otherwise fallback to the
original M4 location.
See https://lists.gnu.org/r/bison-patches/2020-05/msg00078.html,
and https://lists.gnu.org/r/bison-patches/2020-05/msg00087.html.
* src/files.h, src/files.c (m4path): New.
* src/output.c: Use it.
In C/C++, N_ is a no-op. Define it if the user didn't.
Suggested by Frank Heckenbach.
https://lists.gnu.org/r/bug-bison/2020-04/msg00010.html
* src/output.c (prepare_symbol_names): Rename has_translations as
has_translations_flag.
* data/skeletons/bison.m4 (b4_has_translations_if): New.
* data/skeletons/java.m4 (b4_trans): Use it.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/yacc.c
(N_): Provide a default definition.
* src/output.c (prepare_symbol_names): Don't play tricks with the
symbols, it's quite too late.
(has_translations): Move to...
* src/symtab.c: here.
(symbols_pack): Use it to enable translation for special symbols.
The name "$end" is nice in the report, in particular it avoids that
pointed-rules (aka items) be too long. It also helps keeping them
"standard".
But it is bad in error messages, we should report "end of file" (or
maybe "end of input", this is debatable). So, unless the user already
defined the alias for the error token herself, make it "end of file".
It should even be translated if the user already translated some
tokens, so that there is now no strong reason to redefine the $end
token.
* src/output.c (prepare_symbol_names): Issue "end of file" instead of
"$end".
* data/skeletons/lalr1.java (yytnamerr_): Remove the renaming hack.
* build-aux/update-test: Accept files with names containing a "+",
such as c++.at.
* tests/actions.at, tests/c++.at, tests/conflicts.at,
* tests/glr-regression.at, tests/regression.at, tests/skeletons.at:
Adjust.
* data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to
$undefined.
(b4_token_visible_if): $undefined has an id.
* src/output.c (prepare_symbol_definitions): Stop lying: $undefined
_is_ a token.
* tests/input.at: Adjust.
There are people out there that do use YYERRCODE (the token kind of
the error token). See for instance
3812012bb7/unixODBC-2.3.2/Drivers/nn/yylex.c.
Currently, YYERRCODE is defined by yacc.c in an adhoc way as a #define
in the *.c file only. It belongs with the other token kinds.
YYERRCODE is not a nice name, it does not fit in our naming scheme.
YYERROR would be more logical, but it collides with the YYERROR macro.
Shall we keep the same name in all the skeletons? Besides, to avoid
collisions in C, we need to apply the api prefix: YYERRCODE is
actually <PREFIX>ERRCODE. This is not needed in the other languages.
* data/skeletons/bison.m4 (b4_symbol_token_kind): New.
Map the error token to "YYERRCODE".
* data/skeletons/yacc.c (YYERRCODE): Don't define it, it's handled by...
* src/output.c (prepare_symbol_definitions): this.
* tests/input.at (Redefining the error token): Check it.
The error format should be translated, but contrary to the case of
C/C++, we cannot just depend on macros to adapt on the
presence/absence of '_'. Let's consider that the message format is to
be translated iff there are some internationalized tokens.
* src/output.c (prepare_symbol_names): Define b4_has_translations.
* data/skeletons/java.m4 (b4_trans): New.
* data/skeletons/lalr1.java: Use it to emit translatable or not the
format string.
In Java there is no need for N_ and yytranslate_. So instead of
hard-coding the use of N_ in the table of the symbol names, rely on
b4_symbol_translate.
* src/output.c (prepare_symbol_names): Use b4_symbol_translate instead
of N_.
* data/skeletons/c.m4 (b4_symbol_translate): New.
* data/skeletons/lalr1.java (yysymbolName): New.
Use it.
* examples/java/calc/Calc.y: Use parse.error=detailed.
* tests/calc.at: Check parse.error=detailed.
Some users would like to avoid having to "parse" the *.y file to find
the strings to translate. Let's issue the translatable tokens with N_
to allow "parsing" the generated parsers instead.
See
https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00015.html
* src/output.c (prepare_symbol_names): Issue symbol_names with N_()
markup.
In addition to
%token NUM "number"
accept
%token NUM _("number")
in which case the token will be translated in error messages.
Do not use _() in the output if there are no translatable tokens.
* src/symtab.h, src/symtab.c (symbol): Add a 'translatable' member.
* src/parse-gram.y (TSTRING): New token.
(string_as_id.opt): Replace with...
(alias): this.
Use it.
* src/scan-gram.l (SC_ESCAPED_TSTRING): New start conditions, to match
TSTRINGs.
* src/output.c (prepare_symbols): Define b4_translatable if there are
translatable strings.
* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c (yytnamerr): Receive b4_translatable, and use it.
* src/output.c (escape_trigraphs, xescape_trigraphs): New.
(prepare_symbol_names): Use it.
* tests/regression.at: Check the handling of trigraphs with
parse.error = detailed.
"detailed" error messages are almost like "verbose", except that we
don't double escape them, they don't get inner quotes, we don't use
yytnamerr, and we hide the table.
"custom" is exposed with the "detailed" tokens, not the "verbose"
ones: they are not double-quoted.
Because there's a risk that some people use yytname even without
"verbose", let's keep yytname (instead of yys_name) in "simple"
parse.error.
* src/output.c (prepare_symbol_names): Be ready to output symbol names
unquoted.
(prepare_symbol_names): Output both the old tname table, and the new
symbol_names one.
* data/skeletons/bison.m4: Accept 'detailed'.
* data/skeletons/yacc.c: When parse.error is 'detailed', don't emit
yytname and yytnamerr, just yysymbol_name with the table inside.
* tests/calc.at: Adjust.
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
Some members are called foo_location, others are foo_loc. Stick to
the latter.
* src/gram.h, src/location.h, src/location.c, src/output.c,
* src/parse-gram.y, src/reader.h, src/reader.c, src/reduce.c,
* src/scan-gram.l, src/symlist.h, src/symlist.c, src/symtab.h,
* src/symtab.c:
Use _loc consistently, not _location.
Currently, with --no-lines, instead of "#line file line\n", we emit
"\n". Let's emit nothing.
* data/skeletons/bison.m4 (b4_syncline): Emit at end-of-line when enabled.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, src/output.c: Use dnl after b4_syncline to
avoid spurious empty lines.
* tests/synclines.at (Sync Lines): Make sure that --no-lines is like
grep -v #line.
* tests/calc.at: Make sure that a rich grammar file behaves properly
with %no-lines.
Currently we use the syncline to report errors about a symbol's
destructor/printer. This is not accurate (only file and line), and
this is incorrect: the file name is double quotes (a recent change,
needed to make sure we escape properly double quotes in it). And
worst of all: with --no-line, b4_syncline expands to nothing.
Rather, push the locations into the backend, and use them.
* src/muscle-tab.h, src/muscle-tab.c (muscle_location_grow): Make it
public.
* src/output.c (prepare_symbol_definitions): Use it to pubish the
location of the printer and destructor.
* data/skeletons/lalr1.java: Use complain_at instead of complain.
* tests/java.at (Java invalid directives): Adjust expectations.
* data/skeletons/bison.m4 (b4_symbol_action_location): Remove.
We should not use b4_syncline this way.
The variable spec_defines_file denotes the name of the generated
header. Its name is derived from --defines/%defines, whose name in
turn is derived from the fact that the header, in Yacc, contained the
Not only does the header now contain a lot more than just the token
definitions, but we no longer even generate macros, but an enum...
Let's modernize our vocabulary.
* src/files.h, src/files.c (spec_defines_file): Rename as...
(spec_header_file): this.
* maint:
maint: post-release administrivia
version 3.3.2
style: minor fixes
NEWS: named constructors are preferable to symbol_type ctors
gram: fix handling of nterms in actions when some are unused
style: rename local variable
CI: update the ICC serial number for travis-ci.org
Since Bison 3.3, semantic values in rule actions (i.e., '$...') are
passed to the m4 backend as the symbol number. Unfortunately, when
there are unused symbols, the symbols are renumbered _after_ the
numbers were used in the rule actions. As a result, the evaluation of
the skeleton failed because it used non existing symbol numbers.
Which is the happy scenario: we could use numbers of other existing
symbols...
Reported by Balázs Scheidler.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00044.html
Translating the rule actions after the symbol renumbering moves too
many parts in bison. Relying on the symbol identifiers is more
troublesome than it might first seem: some don't have an
identifier (tokens with only a literal string), some might have a
complex one (tokens with a literal string with characters special for
M4). Well, these are tokens, but nterms also have issues: "dummy"
nterms (for midrule actions) are named $@32 etc. which is risky for
M4.
Instead, let's simply give M4 the mapping between the old numbers and
the new ones. To avoid confusion between old and new numbers, always
emit pre-renumbering numbers as "orig NUM".
* data/README: Give details about "orig NUM".
* data/skeletons/bison.m4 (__b4_symbol, _b4_symbol): Resolve the
"orig NUM".
* src/output.c (prepare_symbol_definitions): Pass nterm_map to m4.
* src/reduce.h, src/reduce.c (nterm_map): Extract it from
nonterminals_reduce, to make it public.
(reduce_free): Free it.
* src/scan-code.l (handle_action_dollar): When referring to a nterm,
use "orig NUM".
* tests/reduce.at (Useless Parts): New, based Balázs Scheidler's
report.
We should use -ffixit and --update to clean files with duplicate
directives. And we should complain only once about duplicate obsolete
directives: keep only the "duplicate" warning. Let's start with %yacc.
For instance on:
%fixed-output_files
%fixed-output-files
%yacc
%%
exp:
This run of bison:
$ bison /tmp/foo.y -u
foo.y:1.1-19: warning: deprecated directive, use '%fixed-output-files' [-Wdeprecated]
%fixed-output_files
^~~~~~~~~~~~~~~~~~~
foo.y:2.1-19: warning: duplicate directive [-Wother]
%fixed-output-files
^~~~~~~~~~~~~~~~~~~
foo.y:1.1-19: previous declaration
%fixed-output_files
^~~~~~~~~~~~~~~~~~~
foo.y:3.1-5: warning: duplicate directive [-Wother]
%yacc
^~~~~
foo.y:1.1-19: previous declaration
%fixed-output_files
^~~~~~~~~~~~~~~~~~~
bison: file 'foo.y' was updated (backup: 'foo.y~')
gives:
%fixed-output-files
%%
exp:
* src/location.h, src/location.c (location_empty): New.
* src/complain.h, src/complain.c (duplicate_directive): New.
* src/getargs.h, src/getargs.c (yacc_flag): Instead of a Boolean, be
the location of the definition.
Update dependencies.
* src/scan-gram.l (%yacc, %fixed-output-files): Move the handling of
its warnings to...
* src/parse-gram.y (do_yacc): This new function.
* tests/input.at (Deprecated Directives): Adjust expectations.
Suggested by David Barto
https://lists.gnu.org/archive/html/help-bison/2015-02/msg00004.html
and Victor Zverovich.
https://lists.gnu.org/archive/html/bison-patches/2018-10/msg00121.html
This is very easy to do, thanks to work by Bruno Haible in gnulib.
See "Supporting Relocation" in gnulib's documentation.
* bootstrap.conf: We need relocatable-prog and relocatable-script (for yacc).
* src/yacc.in: New.
* configure.ac, src/local.mk: Instantiate it.
* src/main.c, src/output.c (main, pkgdatadir): Use relocatable2.
* doc/bison.texi (FAQ): Document it.
This was demanded several times. See for instance:
- David M. Warme
https://lists.gnu.org/archive/html/help-bison/2011-04/msg00003.html
- box12009
http://lists.gnu.org/archive/html/bug-bison/2016-10/msg00001.html
Basically, this reverts:
- commit 3d3bc1fe30
Get rid of (yy)rhs and (yy)prhs
- commit d333175f63
Avoid compiler warning.
Note that since these tables are not needed in the generated parsers,
no skeleton requests them. This change only brings back their
definition to M4, making it possible to user-defined skeletons to use
these tables.
* src/output.c (muscle_insert_item_number_table): Define.
(prepare_rules): Generate the rhs and prhs tables.
The files stack.hh and position.hh are deprecated. Rather than
devoting specify %define variables to discard them (api.position.file
and api.stack.file), and rather than having to use special rules when
api.location.file is used, let's simply decide that from %require
"3.2" onwards, these files will not be generated.
The only noticeable thing here is that, in order to be able to check
the behavior of %require "3.2", to have this version (which is still
3.1-*) to accept %require "3.2".
* src/gram.h, src/gram.c (required_version): New.
* src/parse-gram.y (version_check): Set it.
* src/output.c (prepare): Pass it m4.
* data/bison.m4 (b4_required_version_if): Receive it and use it.
* data/location.cc, data/stack.hh: Replace the api.*.file with only
required version comparison.
* tests/input.at: No longer check api.stack.file and api.position.file.
* NEWS, doc/bison.texi: Don't mention them.
Document the %require 3.2 behavior.
* tests/output.at: Use %require 3.2 instead.