Currently, if we have long rules and series of shift, we stack states
without showing stack. Let's be more incremental, and do how the Java
skeleton does.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/yacc.c:
Here.
Adjust test cases.
* tests/torture.at (AT_DATA_STACK_TORTURE): Disable stack traces: this
test produces a very large stack, and showing the stack each time we
shift a token goes quadatric.
The Java skeleton displays
Reading a token:
Next token is token "number" (1)
while the other display
Reading a token: Next token is token "number" (1)
When generating logs in the scanner, the first part is separated from
the second, and the end of the scanner logs have the second part
pasted in. So let's propagate the Java way, but with the colon.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c: Do it.
Adjust test cases and doc.
Just as the yacc.c skeleton, the lalr1.cc skeleton should reject
invalid values for parse.lac.
* data/skeletons/lalr1.cc: check validity of parse.lac
* tests/input.at: new test cases
Let's have C be the reference, and match it elsewhere. Maybe C is too
verbose and some adjustments are needed, but then that would be done
in another batch of patches.
* data/skeletons/lalr1.cc: Print the stack once we popped after
YYERROR, and before emptying the stack at the end of parsing.
Currently the C and C++ parse traces differ in the order in which the
stack is displayed: bottom up in C, top down in C++. Let's stick to
the C order.
* data/skeletons/stack.hh (stack::iterator, stack::const_iterator)
(begin, end): Be forward, not backward.
Now that we use small integral types, possibly unsigned (e.g.,
unsigned char), to store state numbers, using -1 to denote an empty
state (i.e., a state that stores no semantical value) is very
dangerous: it will be confused with state 255, which might be
non-empty.
Rather than allocating a larger range of state numbers to keep the
empty-state apart, let's use the number of a state known to store no
value. The initial state, numbered 0, seems to fit perfectly the job.
Reported by Frank Heckenbach.
https://lists.gnu.org/archive/html/bug-bison/2019-11/msg00016.html
* data/skeletons/lalr1.cc (empty_state): Be 0.
Reported by Frank Heckenbach.
https://lists.gnu.org/archive/html/bug-bison/2019-11/msg00016.html
The cast is needed when yytranslate_'s argument type is token_type,
i.e., when api.token.constructor is defined.
373. types.at:138: testing lalr1.cc api.value.type=variant api.token.constructor ...
======== Testing with C++ standard flags: ''
../../tests/types.at:138: bison --color=no -fno-caret -o test.cc test.y
../../tests/types.at:138: $CXX $CXXFLAGS $CPPFLAGS $LDFLAGS -o test test.cc $LIBS
stderr:
test.cc:966:16: error: result of comparison of constant 257 with
expression of type 'yy::parser::token_type'
(aka 'yy::parser::token::yytokentype') is always true
[-Werror,-Wtautological-constant-out-of-range-compare]
else if (t <= user_token_number_max_)
~ ^ ~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
It is because it is expected that when api.token.constructor is
defined, only symbol constructors will be used, that yytranslate_ then
takes a token_type. But it is wrong: we still allow literal
characters in this case, as demonstrated by test 373 for instance.
%define api.value.type variant
%define api.token.constructor
%token <std::pair<int, int>> '1' '2';
[...]
static yy::parser::symbol_type yylex ()
{
static char const input[] = "12";
int res = input[toknum++];
typedef yy::parser::symbol_type symbol;
if (res)
return symbol (res, std::make_pair (res - '0', res - '0' + 1));
else
return symbol (res);
}
So let yytranslate_ always take an int, which makes the cast truly
useless.
* data/skeletons/c++.m4, data/skeletons/lalr1.cc (yytranslate_): here.
The C++ implementation of LAC did not skip the $undefined token,
probably because it was not exposed. Expose it, and use clearer
names.
* data/skeletons/c++.m4: Don't define undef_token_ in yytranslate_,
but...
* data/skeletons/lalr1.cc (yy_undef_token_): here.
Use a more precise type to define yy_undef_token_ and yy_error_token_.
Unfortunately we move from a compile-time value defined via an enum to
a static const member. Eventually we should make it constexpr.
Make LAC implementation more alike yacc.c's one.
It is not used at all. We will remove it also from yacc.c, but
later (see TODO).
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java (yyerrcode_):
Remove.
We still have a few old C casts in lalr1.cc, let's get rid of them.
Reported by Frank Heckenbach.
Actually, let's monitor all our casts using easy to grep macros.
Let's use these macros to use the C++ standard casts when we are in
C++.
* data/skeletons/c.m4 (b4_cast_define): New.
* data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, data/skeletons/stack.hh,
* data/skeletons/yacc.c:
Use it and/or its casts.
* tests/actions.at, tests/cxx-type.at,
* tests/glr-regression.at, tests/headers.at, tests/torture.at,
* tests/types.at:
Use YY_CAST instead of C casts.
* configure.ac (warn_cxx): Add -Wold-style-cast.
* doc/bison.texi: Disable it.
This skeleton uses a single stack of state structures, so it is less
likely to benefit from a stack size reduction than yacc.c (which uses
several stacks: state number, value and location). But it will reduce
the size of the LAC stack.
This skeleton was already using int for state numbers, so, contrary to
yacc.c, this brings nothing for large automata.
Overall, it is still nicer to make the skeletons alike.
* data/skeletons/lalr1.cc (state_type): Here.
On the CI with GCC 6:
examples/c++/calc++/parser.cc:845:5: error: 'ptrdiff_t' was not declared in this scope
ptrdiff_t yycount = 0;
^~~~~~~~~
examples/c++/calc++/parser.cc:845:5: note: suggested alternatives:
/usr/include/x86_64-linux-gnu/c++/6/bits/c++config.h:202:28: note: 'std::ptrdiff_t'
typedef __PTRDIFF_TYPE__ ptrdiff_t;
^~~~~~~~~
* data/skeletons/lalr1.cc: Qualify ptrdiff_t and size_t with std::.
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
* NEWS: Mention this.
* data/skeletons/c.m4 (b4_int_type):
Prefer char if it will do, and prefer signed types to unsigned if
either will do.
* data/skeletons/glr.c (yy_reduce_print): No need to
convert rule line to unsigned long.
(yyrecoverSyntaxError): Put action into an int to
avoid GCC warning of using a char subscript.
* data/skeletons/lalr1.cc (yy_lac_check_, yysyntax_error_):
Prefer ptrdiff_t to size_t.
* data/skeletons/yacc.c (b4_int_type):
Prefer signed types to unsigned if either will do.
* data/skeletons/yacc.c (b4_declare_parser_state_variables):
(YYSTACK_RELOCATE, YYCOPY, yy_lac_stack_realloc, yy_lac)
(yytnamerr, yysyntax_error, yyparse): Prefer ptrdiff_t to size_t.
(YYPTRDIFF_T, YYPTRDIFF_MAXIMUM): New macros.
(YYSIZE_T): Fix "! defined YYSIZE_T" typo.
(YYSIZE_MAXIMUM): Take the minimum of PTRDIFF_MAX and SIZE_MAX.
(YYSIZEOF): New macro.
(YYSTACK_GAP_MAXIMUM, YYSTACK_BYTES, YYSTACK_RELOCATE)
(yy_lac_stack_realloc, yyparse): Use it.
(YYCOPY, yy_lac_stack_realloc): Cast to YYSIZE_T to pacify GCC.
(yy_reduce_print): Use int instead of unsigned long when int
will do.
(yy_lac_stack_realloc): Prefer long to unsigned long when
either will do.
* tests/regression.at: Adjust to these changes.
* upstream/maint:
c++: add copy ctors for compatibility with the IAR compiler
CI: show git status
CI: disable ICC
tests: pass -jN from Make to the test suite
quotearg: avoid leaks
maint: post-release administrivia
Reported by Andreas Damm.
https://savannah.gnu.org/support/?110032
* data/skeletons/lalr1.cc (stack_symbol_type::operator=): New
overload, const, to please the IAR C++ compiler (version ca 2013).
Implement lookahead correction (LAC) for the C++ skeleton. LAC is a
mechanism to make sure that we report the correct list of expected
tokens if a syntax error occurs. So far, LAC was only supported for
the C skeleton "yacc.c".
* data/skeletons/lalr1.cc: Add LAC support.
* doc/bison.texi: Update.
Currently, with --no-lines, instead of "#line file line\n", we emit
"\n". Let's emit nothing.
* data/skeletons/bison.m4 (b4_syncline): Emit at end-of-line when enabled.
* data/skeletons/bison.m4, data/skeletons/c.m4, data/skeletons/glr.cc,
* data/skeletons/lalr1.cc, src/output.c: Use dnl after b4_syncline to
avoid spurious empty lines.
* tests/synclines.at (Sync Lines): Make sure that --no-lines is like
grep -v #line.
* tests/calc.at: Make sure that a rich grammar file behaves properly
with %no-lines.
The variable spec_defines_file denotes the name of the generated
header. Its name is derived from --defines/%defines, whose name in
turn is derived from the fact that the header, in Yacc, contained the
Not only does the header now contain a lot more than just the token
definitions, but we no longer even generate macros, but an enum...
Let's modernize our vocabulary.
* src/files.h, src/files.c (spec_defines_file): Rename as...
(spec_header_file): this.
Reported by Derek Clegg
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00021.html
aux/parser-internal.h:429:12: error: 'syntax_error' has no out-of-line virtual
method definitions; its vtable will be emitted in every translation unit
[-Werror,-Wweak-vtables]
struct syntax_error : std::runtime_error
To avoid this warning, we need syntax_error to have a virtual function
defined in a compilation unit. Let it be the destructor. To comply
with C++98, this dtor should be 'throw()'. Merely making YY_NOEXCEPT
be 'throw()' in C++98 triggers
errors (http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00022.html),
so let's introduce YY_NOTHROW and flag only ~syntax_error with it.
Also, since we now have an explicit dtor, we need to provide an copy
ctor.
* configure.ac (warn_cxx): Add -Wweak-vtables.
* data/skeletons/c++.m4 (YY_NOTHROW): New.
(syntax_error): Declare the dtor, and define the copy ctor.
* data/skeletons/glr.cc, data/skeletons/lalr1.cc (~syntax_error):
Define.
Reported by Derek Clegg.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00004.html
* configure.ac (warn_common): Add -Wimplicit-fallthrough.
This does trigger failures in the test suite.
* data/skeletons/glr.c, data/skeletons/lalr1.cc,
* data/skeletons/yacc.c, tests/c++.at:
Make fall-throws explicit.
This line:
slice<stack_symbol_type, stack_type> slice (yystack_, yylen);
triggers warnings:
parse.h:1790:11: note: shadowed declaration is here
Reported by Frank Heckenbach.
http://lists.gnu.org/archive/html/bug-bison/2019-01/msg00002.html
* configure.ac (warn_c): Move -Wshadow to...
(warn_common): here.
* data/skeletons/stack.hh (slice): Define as an inner class of stack.
* data/skeletons/lalr1.cc: Adjust.
Rename the variable as 'range' instead of 'slice'.
The previous name was historical and inconsistent.
* src/muscle-tab.c (define_directive): Use the proper value passing
syntax, based on the muscle kind.
(muscle_percent_variable_update): Use the right value passing syntax.
Migrate from parser_class_name to api.parser.class.
* data/skeletons: Migrate from parser_class_name to api.parser.class.
* doc/bison.texi (%define Summary): Document both parser_class_name
and api.parser.class.
Promote the latter over the former.
Reported by Askar Safin.
https://lists.gnu.org/archive/html/bison-patches/2019-01/msg00000.html
* data/skeletons/glr.c (yygetToken): Return YYEMPTY when an exception
is thrown.
* data/skeletons/lalr1.cc: Log when an exception is caught.
* tests/c++.at (Syntax error as exception): Be sure to recover from
error before triggering another error.
We used to create a short definition of yy::parser with all the
implementations of its member functions outside. But yy::parser is no
longer short and simple to read. Maintaining each function twice is
painful: a lot of redundancy but different indentation levels, output
which depends on whether we are in a header or not (see
d132c2d545), etc.
Let's simplify this and put the implementations into the class
definition itself.
Discussed in this monologue:
https://lists.gnu.org/archive/html/bison-patches/2018-12/msg00058.html.
* data/skeletons/c++.m4, data/skeletons/lalr1.cc,
* data/skeletons/variant.hh (b4_basic_symbol_constructor_define)
(_b4_token_constructor_declare, b4_token_constructor_declare)
Merge into...
(b4_basic_symbol_constructor_define, _b4_token_constructor_define)
(b4_token_constructor_define): these.
We used to define such auxiliary structures outside the class, mainly
as a matter of style to keep the definition of yy::parser short and
simple. However, now there's a lot more code generated inside the
class definition (e.g., all the token constructors), so the
readability no longer applies.
However, if we move stack (and slice) inside yy::parser, then it
should no longer be needed to change the namespace to have multiple
parsers: changing the class name should suffice.
One common argument against inner classes is that they code bloat. It
hardly applies here, since typically different parsers will have
different semantic value types, hence different actual stack types.
* data/skeletons/lalr1.cc: Invoke b4_stack_define inside yy::parser.