Both are stored in a hash, and back in the days, we used to iterate
over these tables using hash_do_for_each. However, the order of
traversal was not deterministic, which was a nuisance for
deterministic output (and therefore also a problem for tests). So at
some point (83b60c97ee) we generated a
sorted list of these symbols, and symbols_do actually iterated on that
list. But we kept the constraints of using hash_do_for_each, which
requires a lot of ceremonial code, and makes it hard/unnatural to
preserve data between iterations (see the next commit).
Alas, this is C, not C++.
Let's remove this abstraction, and directly iterate on the sorted
tables.
* src/symtab.c (symbols_do): Remove.
Adjust callers to use a simple for-loop instead.
(table_sort): New.
(symbols_check_defined): Use it.
(symbol_check_defined_processor, symbol_pack_processor)
(semantic_type_check_defined_processor, symbol_translation_processor):
Remove.
Simplify the corresponding functions (that no longer need to return a
bool).
This commit adds the suggestion in green, on the line below the
caret-and-tildes.
foo.y:1.1-14: warning: deprecated directive: '%error-verbose', use '%define parse.error verbose' [-Wdeprecated]
1 | %error-verbose
| ^~~~~~~~~~~~~~
| %define parse.error verbose
The current approach, with location_caret_suggestion, is fragile:
there's a protocol of calls to the complain functions which is strict.
We should rather have a richer structure describing the diagnostics,
including with submessages such as the suggestions, passed in the end
to the routines in charge of formatting and printing them.
* src/location.h, src/location.c (location_caret_suggestion): New.
* src/complain.c (deprecated_directive): Use it.
* tests/diagnostics.at, tests/input.at: Adjust expectations.
* src/location.c (caret_set_file): New.
Store the current line's length in caret_info.line_len.
Pay attention to fseek's return value.
Extracted from...
(location_caret): here.
* src/location.c (caret_set_file):
* src/scan-code.l (contains_dot_or_dash):
Do not quietly convert pointer to bool, as Oracle Developer Studio
12.6 complains and it is arguably confusing style anyway.
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
* bootstrap.conf: We need winsz-ioctl and winsz-termios.
* src/location.c (columns): Use winsize to get the number of
columns.
Code taken from the GNU Coreutils.
* src/location.h, src/location.c (caret_init): New.
* src/complain.c (complain_init): Call it.
* tests/bison.in: Export COLUMNS so that users of tests/bison can
enjoy proper line truncation.
* src/location.c (min_int, columns): New.
(location_caret): Compute the line width. Based on it, compute how
many columns must be skipped before the quoted location and truncated
after, to fit the sceen width.
* tests/local.at (AT_QUELL_VALGRIND): Transform into...
(AT_SET_ENV_IF, AT_SET_ENV): these.
Define COLUMNS to protect the test suite from the user's environment.
So far diagnostics were cheating: in addition to the 'column' field of
locations (based on actual screen width per multibyte characters and
on tabulation expansion), the scanner sets the 'byte' field.
Diagnostics used this byte count to decide where to insert (color)
style.
We want to be able to truncate the quoted lines when there are too
wide to fit the screen. This requires that the diagnostics learn how
to count columns, the byte-in-boundary trick no longer works.
Bytes are still used for fix-its.
* bootstrap.conf: We need mbfile for mbf_getc.
* src/location.c (caret_info): We need an mbfile.
(caret_set_file): Initialize it.
(caret_getc): Convert to mbfile.
(location_caret): Instead of relying on the byte position to decide
where to insert the color style, count the current column using
boundary_compute.
* src/location.c (caret_info): Replace file and line with pos, a
boundary. This will allow us to use features of the boundary type,
such as boundary_compute.
The handling of the contributions of the tabulations in the columns is
burried inside location_compute. We will soon be willing to use the
boundary part of the computation (to compute the current column number
each time we read a multibyte char).
* src/location.c (boundary_compute): New, extracted from...
(location_compute): here.
We used to treat lone CRs (\r, aka ^M) as regular NLs (\n), probably
to please Classic MacOS. As of today, it makes more sense to treat \r
like a plain white space character.
https://lists.gnu.org/archive/html/bison-patches/2019-09/msg00027.html
* src/scan-gram.l (no_cr_read): Remove. Instead, use...
(eol): this new abbreviation denoting end-of-line.
* src/location.c (caret_getc): New.
(location_caret): Use it.
* tests/diagnostics.at (Carriage return): Adjust expectations.
(CR NL): New.
When the input file contains lone CRs (aka, ^M, \r), the locations see
a new line. Diagnostics look only at \n as end-of-line, so sometimes
there is an offset in diagnostics. Worse yet: sometimes we loop
endlessly waiting for \n to come from a continuous stream of EOF.
Fix that:
- check for EOF
- beware not to call end_use_class if begin_use_class was not
called (which would abort). This could happen if the actual
line is shorter that the expected one.
Prompted by a (private) report from Marc Schönefeld.
* src/location.c (location_caret): here.
* tests/diagnostics.at (Carriage return): New.
With
%token EOF 0 EOF 0
we get
input.y:3.14-16: warning: symbol EOF redeclared [-Wother]
3 | %token EOF 0 EOF 0
| ^~~
input.y:3.8-10: previous declaration
3 | %token EOF 0 EOF 0
| ^~~
Assertion failed: (nsyms == ntokens + nvars), function check_and_convert_grammar,
file /Users/akim/src/gnu/bison/src/reader.c, line 839.
Reported by Marc Schönefeld.
* src/symtab.c (symbol_user_token_number_set): Register only the
first definition of the end of input token.
* tests/input.at (Symbol redeclared): Check that case.
hash_initialize returns NULL when out of memory. Check for it, and
die cleanly instead of crashing.
Reported by 江 祖铭 (Zu-Ming Jiang).
https://lists.gnu.org/archive/html/bug-bison/2019-08/msg00015.html
* src/muscle-tab.c, src/state.c, src/symtab.c, src/uniqstr.c:
Check the value returned by hash_initialize.
https://lists.gnu.org/archive/html/bison-patches/2019-08/msg00007.html
When Bison is started with a flag that suppresses warning messages, the
error_message() function can produce a few gigabytes of indentation
because of a dangling pointer.
* src/complain.c (error_message): Don't reset indent_ptr here, but...
(complain_indent): here.
* tests/diagnostics.at (Indentation with message suppression): Check
this case.
See the previous commit. This option should be removed, -o suffices.
* src/getargs.c (FIXED_OUTPUT_FILES): New.
Add support for it.
(getargs): Define loc, and use it.
This is safer when we need to pass a pointer to a location.
The name fixed-output-files is pretty clear: generate y.tab.c, as Yacc
does. So let's detach this from %yacc which does more: it requires
POSIX Yacc behavior.
This directive is obsolete since December 29th 2001
8c9a50bee1. It does not show in the
doc. I don't want to spend more time on improving its diagnostics, it
could be removed just as well as far as I'm concerned.
* src/scan-gram.l, src/parse-gram.y (%fixed-output-files): Detach from
%yacc.
Years ago we moved from 'look-ahead' to 'lookahead', and that alias
was kept for backward compatibility. But now that we use argmatch to
generate the documentation, that value clutters the doc.
* src/getargs.c (argmatch_report_args): Remove the
--report=look-aheads alias.
The doc says that -Dfoo=bar is the same as %define foo "bar". It is
not: the quotes are not added (and it makes a difference).
* doc/bison.texi (Tuning the Parser): Fix the definition of -D/-F
* src/getargs.c (usage): Likewise.
Let's clarify --help: use clearer "section" names, as in the doc.
Move --yacc to where it belongs.
* src/getargs.c (usage): Rename "Parser" as "Tuning the Parser", as in
the doc.
Rename "Output" as "Output Files"
Move --yacc to "Tuning the Parser".
* doc/bison.texi: Likewise.
It can now generate the usage message.
* src/complain.h (feature_fixit_parsable): Rename as...
(feature_fixit): this, for column economy.
Adjust dependencies.
(warning_usage): New.
Use it.
* src/complain.h, src/complain.c, src/getargs.h, src/getargs.c:
Use ARGMATCH_DEFINE_GROUP instead of the older interface.
The code is inconsistent: sometimes we pass by value, sometimes by
reference. Let's stick to the last, more conventional for large
values in C.
* src/scan-code.l: Pass locations by reference.
Sadly enough, AFAIK, there were never answers to the "More user
feedback will help to stabilize it" sentences. Remove them.
* src/getargs.c: IELR, canonical LR and XML output are here to stay,
and they are no more experimental than some other features.
* doc/bison.texi: Likewise.
Also remove "experimental" warning for Java, LAC, LR tuning options,
and named references.