In some rare conditions, the generated parser can be wrong when there
are useless tokens.
Reported by Balázs Scheidler.
https://github.com/akimd/bison/issues/72
Balázs managed to prove that the bug was introduced in
commit af1c6f973a
Author: Theophile Ranquet <ranquet@lrde.epita.fr>
Date: Tue Nov 13 10:38:49 2012 +0000
tables: use bitsets for a performance boost
Suggested by Yuri at
<http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.
The improvement is marginal for most grammars, but notable for large
grammars (e.g., PosgreSQL's postgre.y), and very large for the
sample.y grammar submitted by Yuri in
http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
Measured with --trace=time -fsyntax-only.
parser action tables postgre.y sample.y
Before 0,129 (44%) 37,095 (99%)
After 0,117 (42%) 5,046 (93%)
* src/tables.c (pos): Replace this set of integer coded as an unsorted
array of integers with...
(pos_set): this bitset.
which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.
That patch introduces a bitset to represent a set of integers. It
managed negative integers by using a (fixed) base (the smallest
integer to represent). It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.
So we need to be able to accept negative integers that are smaller
than the current base.
"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:
/* Store PLACE into POS_SET. PLACE might not belong to the set
of possible values for instance with useless tokens. It
would be more satisfying to eliminate the need for this
'if'. */
This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases
* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
The yydefgoto table uses -1 as an invalid for an impossible case (we
never use yydefgoto[0], since it corresponds to the reduction to
$accept, which never happens). Since yydefgoto is a table of state
numbers, this -1 forces a signed type uselessly, which (1) might
trigger compiler warnings when storing a value from yydefgoto into a
state number (nonnegative), and (2) wastes bits which might result in
using a int16 where a uint8 suffices.
Reported by Jot Dot <jotdot@shaw.ca>.
https://lists.gnu.org/r/bug-bison/2020-11/msg00027.html
* src/tables.c (default_goto): Use 0 rather than -1 as invalid value.
* tests/regression.at: Adjust.
Currently we use both names. Let's stick to the short one.
* src/AnnotationList.c, src/conflicts.c, src/counterexample.c,
* src/getargs.c, src/getargs.h, src/graphviz.c, src/ielr.c,
* src/lalr.c, src/print-graph.c, src/print-xml.c, src/print.c,
* src/state-item.c, src/state.c, src/state.h, src/tables.c:
s/lookahead_token/lookahead/gi.
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
Suggested by Yuri at
<http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.
The improvement is marginal for most grammars, but notable for large
grammars (e.g., PosgreSQL's postgre.y), and very large for the
sample.y grammar submitted by Yuri in
http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
Measured with --trace=time -fsyntax-only.
parser action tables postgre.y sample.y
Before 0,129 (44%) 37,095 (99%)
After 0,117 (42%) 5,046 (93%)
* src/tables.c (pos): Replace this set of integer coded as an unsorted
array or integers with...
(pos_set): this bitset.
* origin/maint:
maint: post-release administrivia
version 3.0.4
gnulib: update
build: re-enable compiler warnings, and fix them
tests: c++: fix a C++03 conformance issue
tests: fix a title
c++: reserve 200 slots in the parser's stack
tests: be more robust to unrecognized synclines, and try to recognize xlc
tests: fix C++ conformance
build: fix some warnings
build: avoid infinite recursions on include_next
There are warnings (-Wextra) in generated C++ code:
ltlparse.cc: In member function 'ltlyy::parser::symbol_number_type
ltlyy::parser::by_state::type_get() const':
ltlparse.cc:452:33: warning: enumeral and non-enumeral type in
conditional expression
return state == empty_state ? empty_symbol : yystos_[state];
Reported by Alexandre Duret-Lutz.
It turns out that -Wall and -Wextra were disabled because of a stupid
typo.
* configure.ac: Fix the stupid typo.
* data/lalr1.cc, src/AnnotationList.c, src/InadequacyList.c,
* src/ielr.c, src/print.c, src/scan-code.l, src/symlist.c,
* src/symlist.h, src/symtab.c, src/tables.c, tests/actions.at,
* tests/calc.at, tests/cxx-type.at, tests/glr-regression.at,
* tests/named-refs.at, tests/torture.at:
Fix warnings, mostly issues about variables used only with assertions,
which are disabled with -DNDEBUG.
* origin/maint:
build: don't try to generate docs when cross-compiling
package: fix a reporter's name
%union: fix the support for named %union
package: bump to 2015
flex: don't trust YY_USER_INIT
yacc.c: fix broken union when api.value.type=union and %defines are used
doc: fix missing xref
gnulib: update
location: remove some ugly debugging code traces
build: use abort to pacify compiler errors
package: bump to 2014
doc: specify documentation encoding
Rather than having duplicate info in the symbol and the alias that has
to be resolved later on, both the symbol and the alias have a common
pointer to a separate structure containing this info.
* src/symtab.h (sym_content): New structure.
* src/symtab.c (sym_content_new, sym_content_free, symbol_free): New
* src/AnnotationList.c, src/conflicts.c, src/gram.c, src/gram.h,
* src/graphviz.c, src/ielr.c, src/output.c, src/parse-gram.y, src/print.c
* src/print-xml.c, src/print_graph.c, src/reader.c, src/reduce.c,
* src/state.h, src/symlist.c, src/symtab.c, src/symtab.h, src/tables.c:
Adjust.
* tests/input.at: Fix expectations (order changes).
* src/tables.c (save_column, pack_vector): Reduce the scope to
emphasize the structure of the code.
Rename the returned value "res" to make understanding easier.
* doc/bison.texinfo: Space change.
* src/system.h (STREQ, STRNEQ): New.
* src/files.c, src/ielr.c, src/lalr.c, src/muscle-tab.c,
* src/output.c, src/print.c, src/print_graph.c,
* src/reader.c, src/scan-skel.l, src/tables.c,
* src/uniqstr.c:
Use them.
* src/scan-gram.l: Do not use streq.h, use system.h's STREQ.
* cfg.mk: The documentation is an exception.
This change was made by applying emacs' untabify function to
nearly all files in Bison's repository. Required tabs in make
files, ChangeLog, regexps, and test code were manually skipped.
Other notable exceptions and changes are listed below.
* bootstrap: Skip because we sync this with gnulib.
* data/m4sugar/foreach.m4
* data/m4sugar/m4sugar.m4: Skip because we sync these with
Autoconf.
* djgpp: Skip because I don't know how to test djgpp properly, and
this code appears to be unmaintained anyway.
* README-hacking (Hacking): Specify that tabs should be avoided
where not required.
States that shift the error token do not have default reductions,
and GLR disables some default reductions, so "all" was a misnomer.
* doc/bison.texinfo (%define Summary): Update.
(Default Reductions): Update.
* src/print.c (print_reductions): Update.
* src/reader.c (prepare_percent_define_front_end_variables):
Update.
* src/tables.c (action_row): Update.
* tests/input.at (%define enum variables): Update.
* tests/reduce.at (%define lr.default-reductions): Update.
(cherry picked from commit d815ec4a62)
* data/bison.m4 (b4_integral_parser_tables_map): Don't mention
zero values in the YYTABLE comments.
* data/glr.c (yytable_value_is_error): Don't check for zero
value.
* data/lalr1.cc (yy_table_value_is_error_): Likewise.
* data/yacc.c (yytable_value_is_error): Likewise.
* data/lalr1.java (yy_table_value_is_error_): Likewise.
(yysyntax_error): Fix typo in code: use yytable_ not yycheck_.
* src/tables.h: In header comments, explain why it's useless to
check for a zero value in yytable.
Its value describes the states that are permitted to contain
default rules: "all", "consistent", or "accepting".
* src/reader.c (reader): Default lr.default_rules to "all".
Check for a valid lr.default_rules value.
* src/lalr.c (state_lookahead_tokens_count): If lr.default_rules
is "accepting", then only mark the accepting state as
consistent.
(initialize_LA): Tell state_lookahead_tokens_count whether
lr.default_rules is "accepting".
* src/tables.c (action_row): If lr.default_rules is not "all",
then disable default rules in inconsistent states.
* src/print.c (print_reductions): Use this opportunity to
perform some assertions about whether lr.default_rules was
obeyed correctly.
* tests/local.at (AT_TEST_TABLES_AND_PARSE): New macro that
helps with checking the parser tables for a grammar.
* tests/input.at (%define lr.default_rules invalid values): New
test group.
* tests/reduce.at (AT_TEST_LR_DEFAULT_RULES): New macro using
AT_TEST_TABLES_AND_PARSE.
(`no %define lr.default_rules'): New test group generated by
AT_TEST_LR_DEFAULT_RULES.
(`%define lr.default_rules "all"'): Likewise.
(`%define lr.default_rules "consistent"'): Likewise.
(`%define lr.default_rules "accepting"'): Likewise.
<http://lists.gnu.org/archive/html/bug-bison/2006-09/msg00018.html>.
* src/parse-gram.y (symbol_declaration): Don't put statements
before declarations; it's not portable to C89.
* src/scan-code.l (handle_action_at): Likewise.
* src/scan-code.l: Always initialize braces_level; the old code
left it uninitialized and therefore had undefined behavior.
Don't attempt to redefine 'assert', since it runs afoul of
systems where standard headers (mistakenly) include <assert.h>.
Instead, define and use our own alternative, called 'aver'.
* src/reader.c: Don't include assert.h, since we no longer
use assert.
* src/scan-code.l: Likewise.
* src/system.h (assert): Remove, replacing with....
(aver): New function, taking a bool arg. All uses changed.
* src/tables.c (pack_vector): Ensure that aver arg is bool,
not merely an integer.
`look_ahead'. Discussed starting at
<http://lists.gnu.org/archive/html/bison-patches/2006-01/msg00049.html>
and then at
<http://lists.gnu.org/archive/html/bison-patches/2006-06/msg00017.html>.
* NEWS: For the next release, note the change to `--report'.
* TODO, doc/bison.1: Update English.
* doc/bison.texinfo: Update English.
(Understanding Your Parser, Bison Options): Document as
`--report=lookahead' rather than `--report=look-ahead'.
* src/conflicts.c: Update English in comments.
(lookahead_set): Rename from look_ahead_set.
(flush_reduce): Rename argument look_ahead_tokens to lookahead_tokens.
(resolve_sr_conflict): Rename local look_ahead_tokens to
lookahead_tokens, and update other uses.
(flush_shift, set_conflicts, conflicts_solve, count_sr_conflicts,
count_rr_conflicts, conflicts_free): Update uses.
* src/getargs.c (report_args): Move "lookahead" before alternate
spellings.
(report_types): Update uses.
(usage): For `--report' usage description, state `lookahead' spelling
rather than `look-ahead'.
* src/getargs.h (report.report_lookahead_tokens): Rename from
report_look_ahead_tokens.
* src/lalr.c: Update English in comments.
(compute_lookahead_tokens): Rename from compute_look_ahead_tokens.
(state_lookahead_tokens_count): Rename from
state_look_ahead_tokens_count.
Rename local n_look_ahead_tokens to n_lookahead_tokens.
(lookahead_tokens_print): Rename from look_ahead_tokens_print.
Rename local n_look_ahead_tokens to n_lookahead_tokens.
Update other uses.
Update English in output.
(add_lookback_edge, initialize_LA, lalr, lalr_free): Update uses.
* src/print.c: Update English in comments.
(lookahead_set): Rename from look_ahead_set.
(print_reduction): Rename argument lookahead_token from
look_ahead_token.
(print_core, state_default_rule, print_reductions, print_results):
Update uses.
* src/print_graph.c: Update English in comments.
(print_core): Update uses.
* src/state.c: Update English in comments.
(reductions_new): Update uses.
(state_rule_lookahead_tokens_print): Rename from
state_rule_look_ahead_tokens_print, and update other uses.
* src/state.h: Update English in comments.
(reductions.lookahead_tokens): Rename from look_ahead_tokens.
(state_rule_lookahead_tokens_print): Rename from
state_rule_look_ahead_tokens_print.
* src/tables.c: Update English in comments.
(conflict_row, action_row): Update uses.
* tests/glr-regression.at
(Incorrect lookahead during deterministic GLR,
Incorrect lookahead during nondeterministic GLR): Rename
print_look_ahead to print_lookahead.
* tests/torture.at: Update English in comments.
(AT_DATA_LOOKAHEAD_TOKENS_GRAMMAR): Rename from
AT_DATA_LOOK_AHEAD_TOKENS_GRAMMAR.
(Many lookahead tokens): Update uses.
* data/glr.c: Update English in comments.
* lalr1.cc: Likewise.
* yacc.c: Likewise.
* src/conflicts.h: Likewise.
* src/lalr.h: Likewise.
* src/main.c: Likewise.
* src/output.c: Likewise.
* src/parse-gram.c: Likewise.
* src/tables.h: Likewise.
* tests/calc.at: Likewise.