Commit Graph

7004 Commits

Author SHA1 Message Date
Akim Demaille
00c80bc96c yacc.c: use yysymbol_type_t instead of int for yytoken
Now that we have a proper type for internal symbol numbers, let's use
it.  More code needs conversion, e.g., printers and destructors, but
they are shared with glr.c, which is not ready yet for this change.

It will also help us deal with warnings such as (GCC9 on GNU/Linux):

    input.c: In function 'int yyparse()':
    input.c:475:37: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
      475 |   (0 <= (YYX) && (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYSYMBOL_YYUNDEF)
          |    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    input.c:1024:17: note: in expansion of macro 'YYTRANSLATE'
     1024 |       yytoken = YYTRANSLATE (yychar);
          |                 ^~~~~~~~~~~

* data/skeletons/yacc.c (yytranslate, yysymbol_name)
(yyparse_context_t, yyexpected_tokens, yypstate_expected_tokens)
(yysyntax_error_arguments):
Use yysymbol_type_t instead of int.
2020-04-01 08:31:48 +02:00
Akim Demaille
f62f1db298 regen 2020-04-01 08:31:48 +02:00
Akim Demaille
3ba001baac yacc.c: introduce an enum that defines the symbol's number
There's a number of advantage in exposing the symbol (internal)
numbers:

- custom error messages can use them to decide how to represent a
  given symbol, or a set of symbols.

- we need something similar in uses of yyexpected_tokens.  For
  instance, currently, bistromathic's completion() reads:

    int ntokens = expected_tokens (line, tokens, YYNTOKENS);
    [...]
    for (int i = 0; i < ntokens; ++i)
      if (tokens[i] == YYTRANSLATE (TOK_VAR))
      [...]
      else if (tokens[i] == YYTRANSLATE (TOK_FUN))
      [...]
      else
      [...]

- now that it's a compile-time expression, we can easily build static
  tables, switch, etc.

- some users depended on the ability to get the token number from a
  symbol to write test cases for their scanners.  But Bison 3.5
  removed the table this feature depended upon (a reverse
  yytranslate).  Now they can check against the actual symbol number,
  without having pay (space and time) a conversion.
  See https://lists.gnu.org/r/bug-bison/2020-01/msg00001.html, and
  https://lists.gnu.org/archive/html/bug-bison/2020-03/msg00015.html.

- it helps us clearly separate the internal symbol numbers from the
  external token numbers, whose difference is sometimes blurred in the
  code when values coincide (e.g. "yychar = yytoken = YYEOF").

- it allows us to get rid of ugly macros with inconsistent names such
  as YYUNDEFTOK and YYTERROR, and to group related definitions
  together.

- similarly it provides a clean access to the $accept symbol (which
  proves convenient in a current experimentation of mine with several
  %start symbols).

Let's declare this type as a private type (in the *.c file, not
the *.h one).  So it does not need to be influenced by the api prefix.

* data/skeletons/bison.m4 (b4_symbol_sid): New.
(b4_symbol): Use it.
* data/skeletons/c.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/yacc.c: Use b4_declare_symbol_enum.
(YYUNDEFTOK, YYTERROR): Remove.
Use the corresponding symbol enum instead.
2020-04-01 08:31:33 +02:00
Akim Demaille
4140320a0a style: comment changes about token numbers
* data/skeletons/bison.m4, data/skeletons/c.m4: here.
2020-03-30 08:41:12 +02:00
Akim Demaille
af19fd7e0f tests: recheck: work properly when the test suite was interrupted
* tests/local.mk (recheck): Look at the per-test logs, not the overall
log, which, when interrupted, contains only information about... the
tests that passed.
2020-03-30 08:41:12 +02:00
Akim Demaille
2c74872991 java: move away from _ for internationalization
The "_" is becoming a keyword in Java, which causes tons of warnings
currently in our test suite.  GNU Gettext is now using "i18n" instead
of "_"
(https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=commitdiff;h=e89fea36545f27487d9652a13e6a0adbea1117d0).

* data/skeletons/java.m4: Use "i18n", not "_".
* examples/java/calc/Calc.y, tests/calc.at: Adjust.
2020-03-30 08:03:10 +02:00
Akim Demaille
50517d578c regen 2020-03-28 15:13:27 +01:00
Akim Demaille
59d820d1ef c: use YYNOMEM instead of -2
See 84b1972c96.

* data/skeletons/glr.c, data/skeletons/yacc.c (YYNOMEM): New.
Use it.
2020-03-28 15:13:27 +01:00
Akim Demaille
90f0500ef8 todo: update
* TODO (Token Number): We have to clean this.
(Naming conventions, Symbol numbers): New.
(Bad styling): Addressed in e21ff47f5d.
2020-03-28 15:13:27 +01:00
Akim Demaille
17a9542c4f regen 2020-03-28 15:13:27 +01:00
Akim Demaille
b7045aa706 java: make yysyntaxErrorArguments a private detail
* data/skeletons/lalr1.java (yysyntaxErrorArguments): Move it from the
context, to the parser object.
Generate only for detailed and verbose error messages.
* tests/local.at (AT_YYERROR_DEFINE(java)): Use yyexpectedTokens
instead.
2020-03-28 15:13:27 +01:00
Akim Demaille
ee56b6e0f2 skeletons: make yysyntax_error_arguments a private detail
We could just "inline yysyntax_error_arguments back" in the routines
it was originally extracted from, but I think the code is nicer to
read this way.

* data/skeletons/glr.c (yysyntax_error_arguments): Generate only for
detailed and verbose error messages.
* data/skeletons/yacc.c: Likewise.
* data/skeletons/lalr1.cc (parser::context::yysyntax_error_arguments):
Move as...
(parser::yysyntax_error_arguments_): this.
And only for detailed and verbose error messages.
2020-03-28 15:13:27 +01:00
Akim Demaille
1edc98f793 lalr1.cc: avoid using yysyntax_error_arguments
* data/skeletons/lalr1.cc (context::token): New.
* tests/local.at (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
2020-03-28 15:13:27 +01:00
Akim Demaille
4192de1f41 bison: avoid using yysyntax_error_arguments
* src/parse-gram.y (yyreport_syntax_error): Use yyparse_context_token
and yyexpected_tokens.
2020-03-28 15:13:27 +01:00
Akim Demaille
00b0d02955 tests: yacc.c: avoid yysyntax_error_arguments
Because glr.c shares the same testing routines, we also need to
convert it.

* data/skeletons/glr.c (yyparse_context_token): New.
* tests/local.at (yyreport_syntax_error): here.
2020-03-28 15:13:27 +01:00
Akim Demaille
1045c8d0ef examples: don't use yysyntax_error_arguments
Suggested by Adrian Vogelsgesang.
https://lists.gnu.org/archive/html/bison-patches/2020-02/msg00069.html

* data/skeletons/lalr1.java (Context.EMPTY, Context.getToken): New.
(Context.yyntokens): Rename as...
(Context.NTOKENS): this.
Because (i) all the Java coding styles recommend upper case for
constants, and (ii) the Java Skeleton exposes Lexer.EOF, not
Lexer.YYEOF.
* data/skeletons/yacc.c (yyparse_context_token): New.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Don't use
yysyntax_error_arguments.
* examples/java/calc/Calc.y (yyreportSyntaxError): Likewise.
2020-03-28 15:13:27 +01:00
Akim Demaille
ef8965b5f5 skeletons: fix incorrect type for translatable tokens
* data/skeletons/glr.c, data/skeletons/lalr1.c, data/skeletons/yacc.c:
Fix confusion between the "translatable" and the "translate" tables.
2020-03-28 15:13:27 +01:00
Akim Demaille
84b1972c96 yacc.c: use negative numbers for errors in auxiliary functions
yyparse returns 0, 1, 2 since ages (accept, reject, memory exhausted).
Some of our auxiliary functions such as yy_lac and
yyreport_syntax_error also need to return error codes and also use 0,
1, 2.  Because it uses yy_lac, yyexpected_tokens also needs to return
"problem", "memory exhausted", but in case of success, it needs to
return the number of tokens, so it cannot use 1 and 2 as error code.
Currently it uses -1 and -2, which is later converted into 1 and 2 as
yacc.c expects it.

Let's simplify this and use consistently -1 and -2 for auxiliary
functions that are not exposed (or not yet exposed) to the user.  In
particular this will save the user from having to convert
yyexpected_tokens's -2 into yyreport_syntax_error's 2: both return -1
or -2.

* data/skeletons/yacc.c (yy_lac, yyreport_syntax_error)
(yy_lac_stack_realloc): Return -1, -2 for errors instead of 1, 2.
Adjust callers.
* examples/c/bistromathic/parse.y (yyreport_syntax_error): Do take
error codes into account.
Issue a syntax error message even if we ran out of memory.
* src/parse-gram.y, tests/local.at (yyreport_syntax_error): Adjust.
2020-03-23 07:02:36 +01:00
Akim Demaille
1079595b2a style: reduce length of private constant
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/yacc.c
(YYERROR_VERBOSE_ARGS_MAXIMUM): Rename as...
(YYARGS_MAX): this.
* src/parse-gram.y (YYERROR_VERBOSE_ARGS_MAXIMUM): Rename as...
(ARGS_MAX): this.
2020-03-23 07:02:34 +01:00
Akim Demaille
e364bcdbc5 doc: c++: promote api.token.raw
* doc/bison.texi (Calc++ Parser): Here.
2020-03-23 07:02:32 +01:00
Akim Demaille
5a8db8a739 bench: calc: no need for super long inputs
* etc/bench.pl.in ($iterations): Restore initial value, -1, meaning
"at least one second".
($calc_input): There is no need to generate 400 lines.
2020-03-22 15:59:22 +01:00
Akim Demaille
5acc29041e bench: calc: work on a string instead of a file
The cost of the file layer is large and makes benchmarks too coarse,
as seen for in following example, first with a file, then with a
literal string:

    0. %skeleton "yacc.c" %define parse.lac full
    1. %skeleton "yacc-v1.c" %define nofinal %define parse.lac full
    2. %skeleton "yacc-v2.c" %define nofinal %define parse.lac full
    3. %skeleton "yacc-v3.c" %define nofinal %define parse.lac full
    4. %skeleton "yacc.c"
    5. %skeleton "yacc-v1.c" %define nofinal
    6. %skeleton "yacc-v2.c" %define nofinal
    7. %skeleton "yacc-v3.c" %define nofinal
    --------------------------------------------------
    Benchmark           Time           CPU Iterations
    --------------------------------------------------
    BM_y0           32558 ns      32537 ns      21228
    BM_y1           32400 ns      32369 ns      21233
    BM_y2           33485 ns      33464 ns      20625
    BM_y3           32139 ns      32125 ns      21446
    BM_y4           31343 ns      31329 ns      21747
    BM_y5           31344 ns      31317 ns      22035
    BM_y6           31287 ns      31255 ns      22039
    BM_y7           31387 ns      31373 ns      22178
    --------------------------------------------------
    Benchmark           Time           CPU Iterations
    --------------------------------------------------
    BM_y0           10642 ns      10634 ns      63601
    BM_y1           10657 ns      10654 ns      63625
    BM_y2           10441 ns      10432 ns      65957
    BM_y3           10558 ns      10554 ns      64546
    BM_y4            9521 ns       9516 ns      72011
    BM_y5            9179 ns       9157 ns      75028
    BM_y6            9360 ns       9356 ns      73770
    BM_y7            9365 ns       9359 ns      72609

Of course, at the same time it is less realistic: most users read
files rather that strings, so it might lead to us to pay attention to
costs most people don't see.

* etc/bench.pl.in (&calc_input): Output into a file given as argument.
Output in C syntax.
(&generate_grammar_calc): Use it.
Simplify the grammar: remove operators we don't care about.
Rewrite the scanner to work on a char* instead of a FILE*.
2020-03-22 15:59:22 +01:00
Akim Demaille
5b0b0a1e08 bench: add a "latest" symlink
* etc/bench.pl.in: here.
2020-03-22 15:59:14 +01:00
Akim Demaille
1c694e08cc bench: use the same prefix in both bench methods
* etc/bench.pl.in (&bench_with_timethese): Also use y$i, as in
&bench_with_gbenchmark.
(&generate_grammar_calc): Don't add a prefix, let the callers do it.
2020-03-22 15:59:13 +01:00
Akim Demaille
4cfb067d93 bench: use a C++-11 compiler
See https://github.com/google/benchmark#a-faster-keeprunning-loop.

* etc/bench.pl.in ($cxx): Be C++11.
(&bench_with_gbenchmark): Adjust.
2020-03-22 15:59:13 +01:00
Akim Demaille
cf60d0a617 bench: create a README file with benches
* etc/bench.pl.in (&bench_with_gbenchmark): Here.
2020-03-22 15:59:13 +01:00
Akim Demaille
c0e8489605 bench: calc: add support for google benchmark
* etc/bench.pl.in (&compiler): New, extracted from...
(&compile): here.
Don't link when using gbm.
(&calc_input): Don't make massive input for micro
benchmarks.
(&generate_grammar_calc): When using gbm, use api.prefix to avoid name
collisions.
Be ready to issue BENCHMARKS instead of a main.
(&bench): Rename as...
(&bench_with_timethese): this.
(&bench_with_gbenchmark): New.
(&bench): New.
Dispatch on these two.
2020-03-21 18:19:14 +01:00
Akim Demaille
788b1a6858 bench: better error messages on invalid input
* etc/bench.pl.in: here.
2020-03-21 18:17:09 +01:00
Akim Demaille
56414791e9 bench: simplify the calc grammar
* etc/bench.pl.in (generate_grammar_calc): We don't need global_result
etc.
2020-03-21 18:17:02 +01:00
Akim Demaille
675dcf1962 bench: die clearly on incorrect --grammar arguments
* etc/bench.pl.in (getopt): here.
2020-03-21 14:52:41 +01:00
Akim Demaille
466fb66578 regen 2020-03-17 19:21:24 +01:00
Akim Demaille
cbb967dbad yacc.c: style: prefer switch to if
* data/skeletons/yacc.c: Prefer switch to decode yy_lac's return value.
2020-03-17 19:21:07 +01:00
Akim Demaille
44ac18d136 yacc.c: yypstate_expected_tokens
In push parsers, when asking for the list of expected tokens at some
point, it makes no sense to build a yyparse_context_t: the yypstate
alone suffices (the only difference being the lookahead).  Instead of
forcing the user to build a useless shell around yypstate, let's offer
yypstate_expected_tokens.

See https://lists.gnu.org/r/bison-patches/2020-03/msg00025.html.

* data/skeletons/yacc.c (yypstate): Declare earlier, so that we can
use it for...
(yypstate_expected_tokens): this new function, when in push parsers.
Adjust dependencies.
* examples/c/bistromathic/parse.y: Simplify: use
yypstate_expected_tokens.
Style fixes.
Reduce scopes (reported by Joel E. Denny).
2020-03-17 19:20:13 +01:00
Akim Demaille
0c3dd3a669 examples: bistromathic: simplify
* examples/c/bistromathic/parse.y (expected_tokens): Remove useless "break".
2020-03-09 07:24:33 +01:00
Akim Demaille
951da960e6 merge branch 'maint'
* upstream/maint:
  maint: post-release administrivia
  version 3.5.3
  news: update for 3.5.3
  yacc.c: make sure we properly propagated the user's number for error
  diagnostics: don't crash because of repeated definitions of error
  style: initialize some struct members
  diagnostics: beware of zero-width characters
  diagnostics: be sure to close the styling when lines are too short
  muscles: fix incorrect decoding of $
  code: be robust to reference with invalid tags
  build: fix typo
  doc: update recommandation for libtextstyle
  style: comment changes
  examples: use consistently the GFDL header for readmes
  style: remove useless declarations
  typo: succesful -> successful
  README: point to tests/bison, and document --trace
  gnulib: update
  maint: post-release administrivia
2020-03-08 10:13:16 +01:00
Akim Demaille
15ea35019f maint: post-release administrivia
* NEWS: Add header line for next release.
* .prev-version: Record previous version.
* cfg.mk (old_NEWS_hash): Auto-update.
2020-03-08 08:50:10 +01:00
Akim Demaille
f49684a577 version 3.5.3
* NEWS: Record release date.
v3.5.3
2020-03-08 08:30:41 +01:00
Akim Demaille
044ad1288c news: update for 3.5.3 2020-03-08 08:17:13 +01:00
Akim Demaille
e3812bb8c3 yacc.c: make sure we properly propagated the user's number for error
* data/skeletons/yacc.c (YYERRCODE): Be truthful.
* tests/input.at (Redefining the error token): Check that.
2020-03-08 08:10:11 +01:00
Akim Demaille
cfcd823e16 diagnostics: don't crash because of repeated definitions of error
According to https://www.unix.com/man-page/POSIX/1posix/yacc/, the
user is allowed to specify her user number for the error token:

    The token error shall be reserved for error handling. The name
    error can be used in grammar rules. It indicates places where the
    parser can recover from a syntax error. The default value of error
    shall be 256. Its value can be changed using a %token
    declaration. The lexical analyzer should not return the value of
    error.

I think this feature is useless, the user should not have to deal with
that.  The intend is probably to give the user a means to use 256 if
she wants to, but provided "error" cleared the path first by being
assigned another number.  In the case of Bison, 256 is assigned to
"error" at the end if the user did not use it for a token of hers.  So
this feature is useless.

Yet it is valid, and if the user assigns twice a token number to
"error", then the second time we want to complain about it and want to
show the original definition.  At this point, we try to display the
built-in definition of "error", whose location is NULL, and we crash.

Rather, the location of the first user definition of "error" should
become its defining location.

Reported byg Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00007.html

* src/symtab.c (symbol_class_set): If this is a declaration and the
symbol was not declared yet, keep this as defining location.
* tests/input.at (Redefining the error token): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
2f02d9beae style: initialize some struct members
* src/symtab.c (sym_content_new): Initialize all the location members.
Not needed by the code, but disturbing values when using a debugger.
2020-03-08 08:10:11 +01:00
Akim Demaille
b638603477 diagnostics: beware of zero-width characters
Currenly we rely on (visual) width of the characters to decide where
to open and close the styling of the quoted lines.  This breaks when
we deal with zero-width characters: we cannot just rely on (visual)
columns, we need to know whether we are before, inside, or after the
highlighted portion.

* src/location.c (location_caret): col_end: no longer add 1, "regular"
characters have a width of 1, only 0-width characters have 0-width.
opened: replace with 'state', a three-valued enum.
Don't reopen the style if we already did.
* tests/diagnostics.at (Zero-width characters): New.
2020-03-08 08:10:11 +01:00
Akim Demaille
e21ff47f5d diagnostics: be sure to close the styling when lines are too short
bar.y:4.12-17: <error>error:</error> redefining user token number of foo
    -    4 | %token foo <error>123
    +    4 | %token foo <error>123</error>
           |            <error>^~~~~~</error>

* src/location.c (location_caret): Be sure to close.
* tests/diagnostics.at (Line is too short, and then you die): New.
2020-03-07 10:01:52 +01:00
Akim Demaille
b82b387da9 muscles: fix incorrect decoding of $
Bug introduced in 458171e6df.
https://lists.gnu.org/archive/html/bison-patches/2013-11/msg00009.html

Reported by Ahcheong Lee.
https://lists.gnu.org/r/bug-bison/2020-03/msg00010.html

* src/muscle-tab.c (COMMON_DECODE): "$" is coded as "$][", not "$[][".
* tests/input.at ("%define" enum variables): Check that case.
2020-03-07 07:45:10 +01:00
Akim Demaille
641e326303 code: be robust to reference with invalid tags
Because we want to support $<a->b>$, we must accept -> in type tags,
and reject $<->$, as it is unfinished.
Reported by Ahcheong Lee.

* src/scan-code.l (yylex): Make sure "tag" does not end with -, since
-> does not close the tag.
* tests/input.at (Stray $ or @): Check this.
2020-03-06 17:29:26 +01:00
Akimn Demaille
192e9fdf77 build: fix typo
* build-aux/cross-options.pl: here.
2020-03-06 08:32:26 +01:00
Akim Demaille
a4a3f08c11 doc: update recommandation for libtextstyle
* README: here.
2020-03-06 08:32:18 +01:00
Akim Demaille
666df338a7 style: comment changes
* src/symtab.h, src/lr0.c: here.
2020-03-06 08:32:03 +01:00
Akim Demaille
b437b16603 examples: use consistently the GFDL header for readmes
* examples/c++/README.md, examples/c++/calc++/README.md,
* examples/c/calc/README.md, examples/c/lexcalc/README.md,
* examples/c/reccalc/README.md:
Prefer the GFDL banner to the GPL one.
2020-03-06 08:31:34 +01:00
Akim Demaille
b493c173c9 style: remove useless declarations
* src/reader.h: Don't duplicate what parse-gram.h already exposes.
* src/lr0.h: Remove useless include.
2020-03-06 08:30:21 +01:00