* data/skeletons/bison.m4 (b4_symbol_token_kind): Give a definition to
$undefined.
(b4_token_visible_if): $undefined has an id.
* src/output.c (prepare_symbol_definitions): Stop lying: $undefined
_is_ a token.
* tests/input.at: Adjust.
* upstream/maint:
maint: post-release administrivia
version 3.5.3
news: update for 3.5.3
yacc.c: make sure we properly propagated the user's number for error
diagnostics: don't crash because of repeated definitions of error
style: initialize some struct members
diagnostics: beware of zero-width characters
diagnostics: be sure to close the styling when lines are too short
muscles: fix incorrect decoding of $
code: be robust to reference with invalid tags
build: fix typo
doc: update recommandation for libtextstyle
style: comment changes
examples: use consistently the GFDL header for readmes
style: remove useless declarations
typo: succesful -> successful
README: point to tests/bison, and document --trace
gnulib: update
maint: post-release administrivia
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:
The token error shall be reserved for error handling. The name
error can be used in grammar rules. It indicates places where the
parser can recover from a syntax error. The default value of error
shall be 256. Its value can be changed using a %token
declaration. The lexical analyzer should not return the value of
error.
I think this feature is useless, the user should not have to deal with
that. The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number. In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers. So
this feature is useless.
Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition. At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.
Rather, the location of the first user definition of "error" should
become its defining location.
Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html
* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
Since Bison 2.7, output was indented four spaces for explanatory
statements. For example:
input.y:2.7-13: error: %type redeclaration for exp
input.y:1.7-11: previous declaration
Since the introduction of caret-diagnostics, it became less clear.
Remove the indentation and display submessages as in GCC:
input.y:2.7-13: error: %type redeclaration for exp
2 | %type <float> exp
| ^~~~~~~
input.y:1.7-11: note: previous declaration
1 | %type <int> exp
| ^~~~~
* src/complain.h (SUB_INDENT): Remove.
(warnings): Add "note" to the enum.
* src/complain.h, src/complain.c (complain_indent): Replace by...
(subcomplain): this.
Adjust all dependencies.
* tests/actions.at, tests/diagnostics.at, tests/glr-regression.at,
* tests/input.at, tests/named-refs.at, tests/regression.at:
Adjust expectations.
In addition to
%token NUM "number"
accept
%token NUM _("number")
in which case the token will be translated in error messages.
Do not use _() in the output if there are no translatable tokens.
* src/symtab.h, src/symtab.c (symbol): Add a 'translatable' member.
* src/parse-gram.y (TSTRING): New token.
(string_as_id.opt): Replace with...
(alias): this.
Use it.
* src/scan-gram.l (SC_ESCAPED_TSTRING): New start conditions, to match
TSTRINGs.
* src/output.c (prepare_symbols): Define b4_translatable if there are
translatable strings.
* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c (yytnamerr): Receive b4_translatable, and use it.
String literals, which allow for better error messages, are (too)
liberally accepted by Bison, which might result in silent errors. For
instance
%type <exVal> cond "condition"
does not define “condition” as a string alias to 'cond' (nonterminal
symbols do not have string aliases). It is rather equivalent to
%nterm <exVal> cond
%token <exVal> "condition"
i.e., it gives the type 'exVal' to the "condition" token, which was
clearly not the intention.
Introduce -Wdangling-alias to catch this.
* src/complain.h, src/complain.c: Add support for -Wdangling-alias.
(argmatch_warning_args): Sort.
* src/symtab.c (symbol_check_defined): Complain about dangling
aliases.
* doc/bison.texi: Document it.
* tests/input.at (Dangling aliases): New test.
On
%token TOKEN1
%type <ival> TOKEN1 TOKEN2 't'
%token TOKEN2
%%
expr:
bison -Wyacc gives
input.y:2.15-20: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
input.y:2.29-31: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~
input.y:2.22-27: warning: POSIX yacc reserves %type to nonterminals [-Wyacc]
2 | %type <ival> TOKEN1 TOKEN2 't'
| ^~~~~~
The messages appear to be out of order, but they are emitted when the
error is found.
* src/symtab.h (symbol_class): Add pct_type_sym, used to denote
symbols appearing in %type.
* src/symtab.c (complain_pct_type_on_token): New.
(symbol_class_set): Check that %type is not applied to tokens.
(symbol_check_defined): pct_type_sym also means undefined.
* src/parse-gram.y (symbol_decl.1): Set the class to pct_type_sym.
* src/reader.c (grammar_current_rule_begin): pct_type_sym also means
undefined.
* tests/input.at (Yacc's %type): New.
See https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00061.html
to https://lists.gnu.org/archive/html/bug-bison/2019-10/msg00073.html.
Paul Eggert's changes in gnulib do fix the issue for modern GCCs (7,
8, 9) on macOS. Unfortunately these warnings are back on the
CI (GNU/Linux) with GCC 4.6, 4.7, (not 4.8) and 4.9.
Disable the warning locally.
* configure.ac (warn_common, warn_tests): Remove -Wtype-limits.
* src/system.h (IGNORE_TYPE_LIMITS_BEGIN, IGNORE_TYPE_LIMITS_END): New.
* src/InadequacyList.c, src/parse-gram.c, src/parse-gram.y,
* src/symtab.c: Use it.
* src/symtab.c: Include intprops.h
(symbol_user_token_number_set): Don’t allow user_token_number ==
INT_MAX because too much other code adds 1 to the user token number.
(symbols_token_translations_init): Complain on integer overflow
instead of indulging in undefined behavior.
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird. In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file. For instance, checking
symbols is done on the list of symbols sorted by tag:
foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
Let's sort them by location instead:
foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
1 | %destructor {} QUX BAR
| ^~~
* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
From
input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
1 | %printer {} foo baz
| ^~~
to
input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
1 | %printer {} foo baz
| ^~~
| bar
* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.
Both are stored in a hash, and back in the days, we used to iterate
over these tables using hash_do_for_each. However, the order of
traversal was not deterministic, which was a nuisance for
deterministic output (and therefore also a problem for tests). So at
some point (83b60c97ee) we generated a
sorted list of these symbols, and symbols_do actually iterated on that
list. But we kept the constraints of using hash_do_for_each, which
requires a lot of ceremonial code, and makes it hard/unnatural to
preserve data between iterations (see the next commit).
Alas, this is C, not C++.
Let's remove this abstraction, and directly iterate on the sorted
tables.
* src/symtab.c (symbols_do): Remove.
Adjust callers to use a simple for-loop instead.
(table_sort): New.
(symbols_check_defined): Use it.
(symbol_check_defined_processor, symbol_pack_processor)
(semantic_type_check_defined_processor, symbol_translation_processor):
Remove.
Simplify the corresponding functions (that no longer need to return a
bool).
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
With
%token EOF 0 EOF 0
we get
input.y:3.14-16: warning: symbol EOF redeclared [-Wother]
3 | %token EOF 0 EOF 0
| ^~~
input.y:3.8-10: previous declaration
3 | %token EOF 0 EOF 0
| ^~~
Assertion failed: (nsyms == ntokens + nvars), function check_and_convert_grammar,
file /Users/akim/src/gnu/bison/src/reader.c, line 839.
Reported by Marc Schönefeld.
* src/symtab.c (symbol_user_token_number_set): Register only the
first definition of the end of input token.
* tests/input.at (Symbol redeclared): Check that case.
hash_initialize returns NULL when out of memory. Check for it, and
die cleanly instead of crashing.
Reported by 江 祖铭 (Zu-Ming Jiang).
https://lists.gnu.org/archive/html/bug-bison/2019-08/msg00015.html
* src/muscle-tab.c, src/state.c, src/symtab.c, src/uniqstr.c:
Check the value returned by hash_initialize.
Some members are called foo_location, others are foo_loc. Stick to
the latter.
* src/gram.h, src/location.h, src/location.c, src/output.c,
* src/parse-gram.y, src/reader.h, src/reader.c, src/reduce.c,
* src/scan-gram.l, src/symlist.h, src/symlist.c, src/symtab.h,
* src/symtab.c:
Use _loc consistently, not _location.
Reported by wcventure.
http://lists.gnu.org/archive/html/bug-bison/2019-03/msg00008.html
* src/symtab.c (complain_class_redeclared): Don't print empty
locations.
There can only be empty locations for predefined symbols. And the
only symbol that is lexically available is the error token. So this
appears to be the only possible way to have an error involving an
empty location.
* tests/input.at (Symbol class redefinition): Check it.
After having spent quite some time on cleaning the handling of symbol
declarations in the grammar files, I believe we should keep it.
It looks like it's a duplicate of %type, but it is not. While POSIX
Yacc requires %type to apply only to nonterminal symbols, it appears
that both byacc and bison accept it for tokens too. And some
experienced users do actually expect this feature to group
symbols (terminal or not) by type ("On the other hand, it is generally
more useful IMHO to group terminals and non-terminals with the same
type tag together",
http://lists.gnu.org/archive/html/bug-bison/2018-10/msg00000.html).
Even Bison's own parser does this today (see CHAR).
Basically reverts 7928c3e6fb.
* src/scan-gram.l (%nterm): Dedeprecate, but issue a Wyacc warning.
* tests/input.at: Adjust expectations.
(Yacc warnings on symbols): New.
* src/symtab.c (symbol_class_set): Fix error introduced in
20b0746793.
I personally prefer 'non terminal', or 'non-terminal', but
'nonterminal' is the common spelling.
* data/glr.c, src/parse-gram.y, src/symtab.c, src/symtab.h,
* tests/input.at, doc/refcard.tex: here.
Currently our error messages include both "symbol redeclared" and
"symbol redefined", and they mean something different. This is
obscure, let's make this clearer.
I think the idea between 'definition' vs. 'declaration' is that in the
case of the nonterminals, the actual definition is its set of rules,
so %nterm would be about declaration. The case of %token is less
clear.
* src/symtab.c (complain_class_redefined): New.
(symbol_class_set): Use it.
Simplify the logic of this function to clearly skip its body when the
preconditions are not met.
* tests/input.at (Symbol class redefinition): New.
Revamping the handling of the symbols is the grammar is much more
delicate than I anticipated. Let's first move things around for
clarity.
* src/symtab.c (symbol_make_alias): Don't accept to alias
non-terminals.
(symbol_user_token_number_set): Don't accept user token numbers
for non-terminals.
Don't do anything in case of redefinition, instead of trying to
update. The flow is eaier to follow this way.
* origin/maint:
maint: post-release administrivia
version 3.0.4
gnulib: update
build: re-enable compiler warnings, and fix them
tests: c++: fix a C++03 conformance issue
tests: fix a title
c++: reserve 200 slots in the parser's stack
tests: be more robust to unrecognized synclines, and try to recognize xlc
tests: fix C++ conformance
build: fix some warnings
build: avoid infinite recursions on include_next
There are warnings (-Wextra) in generated C++ code:
ltlparse.cc: In member function 'ltlyy::parser::symbol_number_type
ltlyy::parser::by_state::type_get() const':
ltlparse.cc:452:33: warning: enumeral and non-enumeral type in
conditional expression
return state == empty_state ? empty_symbol : yystos_[state];
Reported by Alexandre Duret-Lutz.
It turns out that -Wall and -Wextra were disabled because of a stupid
typo.
* configure.ac: Fix the stupid typo.
* data/lalr1.cc, src/AnnotationList.c, src/InadequacyList.c,
* src/ielr.c, src/print.c, src/scan-code.l, src/symlist.c,
* src/symlist.h, src/symtab.c, src/tables.c, tests/actions.at,
* tests/calc.at, tests/cxx-type.at, tests/glr-regression.at,
* tests/named-refs.at, tests/torture.at:
Fix warnings, mostly issues about variables used only with assertions,
which are disabled with -DNDEBUG.
Currently on the following grammar:
%type <foo> foo
%%
start: foo | bar | "baz"
foo: foo
bar: bar
bison reports:
warning: 2 nonterminals useless in grammar [-Wother]
warning: 4 rules useless in grammar [-Wother]
1.13-15: warning: nonterminal useless in grammar: foo [-Wother]
%type <foo> foo
^^^
3.14-16: warning: nonterminal useless in grammar: bar [-Wother]
start: foo | bar | "baz"
^^^
[...]
i.e., the location of the first occurrence of a symbol is taken as its
definition point. In the case of nonterminals, the first occurrence
as a left-hand side of a rule makes more sense:
warning: 2 nonterminals useless in grammar [-Wother]
warning: 4 rules useless in grammar [-Wother]
4.1-3: warning: nonterminal useless in grammar: foo [-Wother]
foo: foo
^^^
5.1-3: warning: nonterminal useless in grammar: bar [-Wother]
bar: bar
^^^
[...]
* src/symtab.h, src/symtab.c (symbol::location_of_lhs): New.
(symbol_location_as_lhs_set): New.
* src/parse-gram.y (current_lhs): Use it.
* tests/reduce.at: Update locations.
* origin/maint:
build: don't try to generate docs when cross-compiling
package: fix a reporter's name
%union: fix the support for named %union
package: bump to 2015
flex: don't trust YY_USER_INIT
yacc.c: fix broken union when api.value.type=union and %defines are used
doc: fix missing xref
gnulib: update
location: remove some ugly debugging code traces
build: use abort to pacify compiler errors
package: bump to 2014
doc: specify documentation encoding
This completes and fixes a728075710.
Reported by Valentin Tolmer.
Before it Bison used to put the properties of the symbols
(associativity, printer, etc.) in the 'symbol' structure. An
identifier-named token (FOO) and its string-named alias ("foo")
duplicated these properties, and symbol_check_alias_consistency()
checked that both had compatible properties and fused them, at the end
of the parsing of the grammar.
The commit a728075710 introduces a
sym_content structure that keeps all these properties, and ensures
that both aliases point to the same sym_content (instead of
duplicating). However, it removed symbol_check_alias_consistency,
which resulted in the non-fusion of *existing* properties:
%token FOO "foo"
%left FOO %left "foo"
was properly diagnosed as a redeclaration, but
%left FOO %left "foo"
%token FOO "foo"
was not, as the properties of FOO and "foo" were not checked before
fusion. It certainly also means that
%left "foo"
%token FOO "foo"
did not transfer properly the associativity to FOO.
The fix is simple: reintroduce symbol_check_alias_consistency (under a
better name, symbol_merge_properties) and call it where appropriate.
Also, that commit made USER_NUMBER_HAS_STRING_ALIAS useless, but left
it.
* src/symtab.h (USER_NUMBER_HAS_STRING_ALIAS): Remove, unused.
Adjust dependencies.
* src/symtab.c (symbol_merge_properties): New, based on the former
symbol_check_alias_consistency.
* tests/input.at: Re-enable tests that we now pass.
* origin/maint:
package: install the examples
package: install README and the like in docdir
diagnostics: fix the order of multiple declarations reports
symbol: provide an easy means to compare them in source order
Conflicts:
src/symtab.c
tests/input.at
* tests/input.at: Comment out a test that master currently does not
pass (because of a728075710).