Commit Graph

2551 Commits

Author SHA1 Message Date
Akim Demaille
58302c6079 regen 2019-10-06 17:48:51 +02:00
Akim Demaille
9e6c5328d3 diagnostics: also show suggested %empty
* src/reader.c (grammar_rule_check_and_complete): Suggest to add %empty.
* tests/actions.at, tests/diagnostics.at: Adjust expectations.
2019-10-06 12:15:12 +02:00
Akim Demaille
fec13ce2db diagnostics: sort symbols per location
Because the checking of the grammar is made by phases after the whole
grammar was read, we sometimes have diagnostics that look weird.  In
some case, within one type of checking, the entities are not checked
in the order in which they appear in the file.  For instance, checking
symbols is done on the list of symbols sorted by tag:

    foo.y:1.20-22: warning: symbol BAR is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~
    foo.y:1.16-18: warning: symbol QUX is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~

Let's sort them by location instead:

    foo.y:1.16-18: warning: symbol 'QUX' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                ^~~
    foo.y:1.20-22: warning: symbol 'BAR' is used, but is not defined as a token and has no rules [-Wother]
        1 | %destructor {} QUX BAR
          |                    ^~~

* src/location.h (location_cmp): Be robust to empty file names.
* src/symtab.c (symbol_cmp): Sort by location.
* tests/input.at: Adjust expectations.
2019-10-06 09:54:25 +02:00
Akim Demaille
be3cf406af diagnostics: suggest fixes for undeclared symbols
From

    input.y:1.17-19: warning: symbol baz is used, but is not defined as a token and has no rules [-Wother]
         1 | %printer {} foo baz
           |                 ^~~

to

    input.y:1.17-19: warning: symbol 'baz' is used, but is not defined as a token and has no rules; did you mean 'bar'? [-Wother]
        1 | %printer {} foo baz
          |                 ^~~
          |                 bar

* bootstrap.conf: We need fstrcmp.
* src/symtab.c (symbol_from_uniqstr_fuzzy): New.
(complain_symbol_undeclared): Use it.
* tests/diagnostics.at (Suggestions): New.
* data/bison-default.css (insertion): Rename as...
(fixit-insert): this, as this is what GCC uses.
2019-10-06 09:54:25 +02:00
Akim Demaille
126c4622de style: isolate complain_symbol_undeclared
* src/symtab.c (complain_symbol_undeclared): New.
Use it.
Use quote on the guilty symbol (like GCC does, and we also do
elsewhere).
* tests/input.at: Adjust.
2019-10-06 09:54:25 +02:00
Akim Demaille
dd64eaf9db style: simplify the handling of symbol and semantic_type tables
Both are stored in a hash, and back in the days, we used to iterate
over these tables using hash_do_for_each.  However, the order of
traversal was not deterministic, which was a nuisance for
deterministic output (and therefore also a problem for tests).  So at
some point (83b60c97ee) we generated a
sorted list of these symbols, and symbols_do actually iterated on that
list.  But we kept the constraints of using hash_do_for_each, which
requires a lot of ceremonial code, and makes it hard/unnatural to
preserve data between iterations (see the next commit).

Alas, this is C, not C++.

Let's remove this abstraction, and directly iterate on the sorted
tables.

* src/symtab.c (symbols_do): Remove.
Adjust callers to use a simple for-loop instead.
(table_sort): New.
(symbols_check_defined): Use it.
(symbol_check_defined_processor, symbol_pack_processor)
(semantic_type_check_defined_processor, symbol_translation_processor):
Remove.
Simplify the corresponding functions (that no longer need to return a
bool).
2019-10-06 09:54:20 +02:00
Akim Demaille
0b585c49ae diagnostics: display suggested update after the caret-info
This commit adds the suggestion in green, on the line below the
caret-and-tildes.

    foo.y:1.1-14: warning: deprecated directive: '%error-verbose', use '%define parse.error verbose' [-Wdeprecated]
        1 | %error-verbose
          | ^~~~~~~~~~~~~~
          | %define parse.error verbose

The current approach, with location_caret_suggestion, is fragile:
there's a protocol of calls to the complain functions which is strict.
We should rather have a richer structure describing the diagnostics,
including with submessages such as the suggestions, passed in the end
to the routines in charge of formatting and printing them.

* src/location.h, src/location.c (location_caret_suggestion): New.
* src/complain.c (deprecated_directive): Use it.
* tests/diagnostics.at, tests/input.at: Adjust expectations.
2019-10-06 08:07:57 +02:00
Akim Demaille
37c4d0b175 diagnostics: isolate caret_set_column
* src/location.c (caret_info): Add width and skip members.
(caret_set_column): New.
Use it.
2019-10-06 08:07:57 +02:00
Akim Demaille
56bcccbc51 diagnostics: isolate caret_set_file
* src/location.c (caret_set_file): New.
Store the current line's length in caret_info.line_len.
Pay attention to fseek's return value.
Extracted from...
(location_caret): here.
2019-10-06 08:07:57 +02:00
Paul Eggert
8f5aaa0e04 Avoid quiet conversion of pointer to bool
* src/location.c (caret_set_file):
* src/scan-code.l (contains_dot_or_dash):
Do not quietly convert pointer to bool, as Oracle Developer Studio
12.6 complains and it is arguably confusing style anyway.
2019-10-05 01:19:39 -07:00
Paul Eggert
b75b055288 Port ARGMATCH_DEFINE_GROUP calls to C99
* src/complain.c, src/getargs.c: Omit ‘;’ after call
to ARGMATCH_DEFINE_GROUP, as C99 does not allow ‘;’ there.
2019-10-05 01:19:39 -07:00
Paul Eggert
67dcef357c regen 2019-10-02 17:11:33 -07:00
Paul Eggert
133edcd248 Prefer signed to unsigned integers
This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l (<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>)
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. 	Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
2019-10-02 17:11:33 -07:00
Akim Demaille
67bff62e31 diagnostics: get the screen width from the terminal
* bootstrap.conf: We need winsz-ioctl and winsz-termios.
* src/location.c (columns): Use winsize to get the number of
columns.
Code taken from the GNU Coreutils.
* src/location.h, src/location.c (caret_init): New.
* src/complain.c (complain_init): Call it.
* tests/bison.in: Export COLUMNS so that users of tests/bison can
enjoy proper line truncation.
2019-09-22 09:12:08 +02:00
Akim Demaille
5f45cb05f1 diagnostics: don't print ellipsis on the caret line
From

    9 | ...TUVWXYZ  ABCDEFGHIJKLMNOPQRSTUVWXYZ  ABCDEFGHIJKL
      | ...         ^~~~~~~~~~~~~~~~~~~~~~~~~~

to

    9 | ...TUVWXYZ  ABCDEFGHIJKLMNOPQRSTUVWXYZ  ABCDEFGHI...
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~

* src/location.c (location_caret): here.
* tests/diagnostics.at: Adjust expectations.
2019-09-22 09:12:08 +02:00
Akim Demaille
b61b0eb9ac diagnostics: also show truncation at the end of line with "..."
From

    9 | ...TUVWXYZ  ABCDEFGHIJKLMNOPQRSTUVWXYZ  ABCDEFGHIJKL
      | ...         ^~~~~~~~~~~~~~~~~~~~~~~~~~

to

    9 | ...TUVWXYZ  ABCDEFGHIJKLMNOPQRSTUVWXYZ  ABCDEFGHI...
      | ...         ^~~~~~~~~~~~~~~~~~~~~~~~~~

* src/location.c (location_caret): here.
* tests/diagnostics.at: Adjust expectations.
2019-09-22 09:12:08 +02:00
Akim Demaille
f716484627 diagnostics: truncate quoted sources to fit the screen
* src/location.c (min_int, columns): New.
(location_caret): Compute the line width.  Based on it, compute how
many columns must be skipped before the quoted location and truncated
after, to fit the sceen width.
* tests/local.at (AT_QUELL_VALGRIND): Transform into...
(AT_SET_ENV_IF, AT_SET_ENV): these.
Define COLUMNS to protect the test suite from the user's environment.
2019-09-22 09:12:08 +02:00
Akim Demaille
945b917da2 diagnostics: learn how to count column number with multibyte chars
So far diagnostics were cheating: in addition to the 'column' field of
locations (based on actual screen width per multibyte characters and
on tabulation expansion), the scanner sets the 'byte' field.
Diagnostics used this byte count to decide where to insert (color)
style.

We want to be able to truncate the quoted lines when there are too
wide to fit the screen.  This requires that the diagnostics learn how
to count columns, the byte-in-boundary trick no longer works.

Bytes are still used for fix-its.

* bootstrap.conf: We need mbfile for mbf_getc.
* src/location.c (caret_info): We need an mbfile.
(caret_set_file): Initialize it.
(caret_getc): Convert to mbfile.
(location_caret): Instead of relying on the byte position to decide
where to insert the color style, count the current column using
boundary_compute.
2019-09-22 09:12:08 +02:00
Akim Demaille
1ef407d923 diagnostics: style: rename member for clariy
* src/location.c (caret_info): Now that we no longer have a 'file'
member (see previous commit), rename 'source' as 'file'.
2019-09-22 09:12:08 +02:00
Akim Demaille
576b863e91 diagnostics: style: use a boundary to track the caret_info
* src/location.c (caret_info): Replace file and line with pos, a
boundary.  This will allow us to use features of the boundary type,
such as boundary_compute.
2019-09-22 09:12:08 +02:00
Akim Demaille
2274c34e91 diagnostics: extract boundary_compute from location_compute
The handling of the contributions of the tabulations in the columns is
burried inside location_compute.  We will soon be willing to use the
boundary part of the computation (to compute the current column number
each time we read a multibyte char).

* src/location.c (boundary_compute): New, extracted from...
(location_compute): here.
2019-09-22 09:12:08 +02:00
Akim Demaille
fccab9bc40 diagnostics: style: add caret_set_file
To make the following commits easier to read.

* src/location.c (caret_set_file): New.
2019-09-22 09:12:08 +02:00
Akim Demaille
488607534a diagnostics: style: minor changes
* src/location.c (location_caret): Factor two branches of an if.
2019-09-22 09:12:08 +02:00
Akim Demaille
4901ee115b quotearg: avoid leaks
Reported by Tomasz Kłoczko.
https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00008.html

* src/main.c (main): Free quotearg's memory later.
2019-09-21 15:01:45 +02:00
Akim Demaille
569125a6bf regen 2019-09-14 10:09:08 +02:00
Akim Demaille
8ac28ba1f0 parser: use api.token.raw
* src/parse-gram.y: Here.
2019-09-14 10:09:08 +02:00
Akim Demaille
8c18e3f18c api.token.raw: cannot be used with character literals
* src/parse-gram.y (CHAR): api.token.raw and character literals are
mutually exclusive.
* tests/input.at (Character literals and api.token.raw): New.
2019-09-14 10:09:08 +02:00
Akim Demaille
32dff87c1d diagnostics: fix use of complain_indent
* src/symtab.c (symbol_class_set): Here.
* tests/diagnostics.at, tests/input.at, tests/regression.at: Adjust
expectations.
2019-09-14 09:47:49 +02:00
Akim Demaille
19da501e06 input: stop treating lone CRs as end-of-lines
We used to treat lone CRs (\r, aka ^M) as regular NLs (\n), probably
to please Classic MacOS.  As of today, it makes more sense to treat \r
like a plain white space character.

https://lists.gnu.org/archive/html/bison-patches/2019-09/msg00027.html

* src/scan-gram.l (no_cr_read): Remove.  Instead, use...
(eol): this new abbreviation denoting end-of-line.
* src/location.c (caret_getc): New.
(location_caret): Use it.
* tests/diagnostics.at (Carriage return): Adjust expectations.
(CR NL): New.
2019-09-14 09:23:47 +02:00
Akim Demaille
4eed3a0f0c diagnostics: beware of unexpected EOF when quoting the source file
When the input file contains lone CRs (aka, ^M, \r), the locations see
a new line.  Diagnostics look only at \n as end-of-line, so sometimes
there is an offset in diagnostics.  Worse yet: sometimes we loop
endlessly waiting for \n to come from a continuous stream of EOF.

Fix that:
- check for EOF
- beware not to call end_use_class if begin_use_class was not
  called (which would abort).  This could happen if the actual
  line is shorter that the expected one.

Prompted by a (private) report from Marc Schönefeld.

* src/location.c (location_caret): here.
* tests/diagnostics.at (Carriage return): New.
2019-09-12 07:02:46 +02:00
Akim Demaille
84a6621c78 gnulib: update
Contains the creation of the xhash module.
https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00046.html

* src/muscle-tab.c, src/state.c, src/symtab.c, src/uniqstr.c:
Use hash_xinitialize.
2019-09-11 09:07:27 +02:00
Akim Demaille
a9499e6ea2 regen 2019-09-08 08:58:55 +02:00
Akim Demaille
7d701f4378 fix: don't die when EOF token is defined twice
With

    %token EOF 0 EOF 0

we get

    input.y:3.14-16: warning: symbol EOF redeclared [-Wother]
        3 | %token EOF 0 EOF 0
          |              ^~~
    input.y:3.8-10: previous declaration
        3 | %token EOF 0 EOF 0
          |        ^~~
    Assertion failed: (nsyms == ntokens + nvars), function check_and_convert_grammar,
        file /Users/akim/src/gnu/bison/src/reader.c, line 839.

Reported by Marc Schönefeld.

* src/symtab.c (symbol_user_token_number_set): Register only the
first definition of the end of input token.
* tests/input.at (Symbol redeclared): Check that case.
2019-09-07 17:09:43 +02:00
Akim Demaille
378963b139 tests: check token redeclaration
* src/symtab.c (symbol_class_set): Report previous definitions when
redeclared.
* tests/input.at (Symbol redeclared): New.
2019-09-07 17:09:43 +02:00
Akim Demaille
989a7aa865 check for memory exhaustion
hash_initialize returns NULL when out of memory.  Check for it, and
die cleanly instead of crashing.

Reported by 江 祖铭 (Zu-Ming Jiang).
https://lists.gnu.org/archive/html/bug-bison/2019-08/msg00015.html

* src/muscle-tab.c, src/state.c, src/symtab.c, src/uniqstr.c:
Check the value returned by hash_initialize.
2019-09-01 17:53:22 +02:00
László Várady
6d81b91ae0 diagnostics: avoid global variables
* src/complain.c (indent_ptr): Remove.
(error_message, complains): Take indent as an argument.
Adjust callers.
2019-08-18 09:40:58 -05:00
László Várady
9145bd0b61 diagnostics: fix invalid error message indentation
https://lists.gnu.org/archive/html/bison-patches/2019-08/msg00007.html

When Bison is started with a flag that suppresses warning messages, the
error_message() function can produce a few gigabytes of indentation
because of a dangling pointer.

* src/complain.c (error_message): Don't reset indent_ptr here, but...
(complain_indent): here.
* tests/diagnostics.at (Indentation with message suppression): Check
this case.
2019-08-18 09:40:44 -05:00
Akim Demaille
cbdc22af10 diagnostics: use the modern argmatch interface
* src/complain.h (warnings): Remove Werror.
Adjust dependencies.
Sort.
Remove useless comments (see the doc in argmatch group).
* src/complain.c (warnings_args, warnings_types): Remove.
(warning_argmatch): Use argmatch_warning_value.
(warnings_print_categories): Use argmatch_warning_argument.
2019-07-26 07:57:15 +02:00
Akim Demaille
e29ac453d0 --fixed-output-files: detach from --yacc
See the previous commit.  This option should be removed, -o suffices.

* src/getargs.c (FIXED_OUTPUT_FILES): New.
Add support for it.
(getargs): Define loc, and use it.
This is safer when we need to pass a pointer to a location.
2019-07-07 15:59:54 +02:00
Akim Demaille
44a56b20ac %fixed-output-files: detach from %yacc
The name fixed-output-files is pretty clear: generate y.tab.c, as Yacc
does.  So let's detach this from %yacc which does more: it requires
POSIX Yacc behavior.

This directive is obsolete since December 29th 2001
8c9a50bee1.  It does not show in the
doc.  I don't want to spend more time on improving its diagnostics, it
could be removed just as well as far as I'm concerned.

* src/scan-gram.l, src/parse-gram.y (%fixed-output-files): Detach from
%yacc.
2019-07-07 15:54:20 +02:00
Akim Demaille
f99956b550 style: clarify control flow
* src/getargs.c (language_argmatch): Initialize msg.
Check it instead of relying on a return.
2019-07-07 15:01:45 +02:00
Akim Demaille
1f02348d6c remove MS-DOS support
DJGPP support was dropped in Bison 3.3
(c239e53bab).

AS_FILE_NAME was introduced in
ae40480115.

* src/getargs.c (AS_FILE_NAME): Remove.
2019-07-07 14:38:49 +02:00
Akim Demaille
421ff03018 style: declare options in the same order as in --help
* src/getargs.c (long_options): here.
2019-07-07 14:27:39 +02:00
Akim Demaille
5d3468e0d1 regen 2019-07-07 14:03:37 +02:00
Akim Demaille
9bdefd7984 style: comment change
* src/getargs.c: here.
2019-07-07 12:13:30 +02:00
Akim Demaille
d233a2e314 doc: remove the --report=look-aheads alias
Years ago we moved from 'look-ahead' to 'lookahead', and that alias
was kept for backward compatibility.  But now that we use argmatch to
generate the documentation, that value clutters the doc.

* src/getargs.c (argmatch_report_args): Remove the
--report=look-aheads alias.
2019-07-07 08:11:35 +02:00
Akim Demaille
d90023af5f doc: fix inaccuracies wrt --define and --force-define
The doc says that -Dfoo=bar is the same as %define foo "bar".  It is
not: the quotes are not added (and it makes a difference).

* doc/bison.texi (Tuning the Parser): Fix the definition of -D/-F
* src/getargs.c (usage): Likewise.
2019-07-07 08:11:35 +02:00
Akim Demaille
964c6508b1 doc: put diagnostics related options together
* doc/bison.texi (Diagnostics): New section.
Move --warning, --color and --style there.
* src/getargs.c (usage): Likewise.
2019-07-07 08:11:35 +02:00
Akim Demaille
4e3c6f59cc doc: move -y's documentation into "Tuning the Parser"
Let's clarify --help: use clearer "section" names, as in the doc.
Move --yacc to where it belongs.

* src/getargs.c (usage): Rename "Parser" as "Tuning the Parser", as in
the doc.
Rename "Output" as "Output Files"
Move --yacc to "Tuning the Parser".
* doc/bison.texi: Likewise.
2019-07-07 08:01:37 +02:00
Akim Demaille
801582b410 doc: document colorized diagnostics
* src/getargs.c (argmatch_color_group): New.
(usage): Document --color and --style.
* doc/bison.texi (Bison Options): Split into three subsections.
Document --color and --style.
2019-07-07 08:01:37 +02:00