The yychar variable was keeping the external form of the token (the
TokenKind). As the D parser translates the token to its internal
form (the SymbolKind) inside the struct Symbol, there is no need for
yychar anymore.
* data/examples/lalr1.d (yychar): Remove.
Use only yytoken.
This avoids heap allocation and gives minimal costs for the
creation and destruction of the YYParser.Symbol struct if
the location tracking is active.
Suggested by H. S. Teoh.
* data/skeletons/lalr1.d: Here.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at: Test it.
The complete symbol approach was deemed to be the right approach for Dlang.
Now, the user can return from yylex() an instance of YYParser.Symbol structure,
which binds together the TokenKind, the semantic value and the location. Before,
the last two were reported separately to the parser.
Only the user API is changed, Bisons's internal structure is kept the same.
* data/skeletons/d.m4 (struct YYParser.Symbol): New.
* data/skeletons/lalr1.d: Change the return value.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate it.
* tests/calc.at, tests/scanner.at: Test it.
When using lookahead correction, the method YYParser.Context.getExpectedTokens
is not annotated with const, because the method calls yylacCheck, which is not
const. Also, because of yylacStack and yylacEstablished, yylacCheck needs to
be called from the context of the parser class, which is sent as parameter to
the Context's constructor.
* data/skeletons/lalr1.d (yylacCheck, yylacEstablish, yylacDiscard,
yylacStack, yylacEstablished): New.
(Context): Use it.
* doc/bison.texi: Document it.
* tests/calc.at: Check it.
Changed from syntax_error to reportSyntaxError to be similar to the Java parser.
* data/skeletons/lalr1.d: Change the function name.
* doc/bison.texi: Document it.
* tests/local.at: Adjust.
Parser.Context class returns a const YYLocation, so Lexer's method
yyerror() needs to receive the location as a const parameter.
Internal error reporting flow is changed to be similar to that of
the other skeletons. Before, case YYERRLAB was calling yyerror()
with the result of yysyntax_error() as the string parameter. As the
custom error message lets the user decide if they want to use
yyerror() or not, this flow needed to be changed. Now, case YYERRLAB
calls yyreportSyntaxError(), that builds the error message using
yysyntaxErrorArguments(). Then yyreportSyntaxError() passes the
error message to the user defined syntax_error() in case of a custom
message, or to yyerror() otherwise.
In the tests in tests/calc.at, the order of the tokens needs to be
changed in order of precedence, so that the D program outputs the
expected tokens in the same order as the other parsers.
* data/skeletons/lalr1.d: Add the custom error message feature.
* doc/bison.texi: Document it.
* examples/d/calc/calc.y: Adjust.
* tests/calc.at, tests/local.at: Test it.
This should have been part of commit "symbols: stop dealing with YYEMPTY
as b4_symbol(-2, ...)" (cd40ec9526).
Give names to all the special symbols: "eof", "error" and "undef".
* data/skeletons/bison.m4 (b4_symbol): Let `b4_symbol(eof, ...)` mean
`b4_symbol(0, ...)`, `b4_symbol(error, ...)` mean `b4_symbol(1, ...)`,
and , `b4_symbol(undef, ...)` mean `b4_symbol(2, ...)`..
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/glr.cc,
* data/skeletons/glr2.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c:
Prefer symbols to numbers.
All methods are now declared as const. Method getExpectedTokens() has now the
functionality of reporting only the number of expected tokens.
* data/skeletons/lalr1.d: Fix Context class.
In D's case, yyerrok() is a private method of the Parser class.
It can be called directly as `yyerrok()` from the grammar rules section.
* data/skeletons/lalr1.d: Add yyerrok().
* examples/d/calc/calc.y, examples/d/simple/calc.y: Demonstrate yyerrok().
* tests/calc.at: Update D tests to use yyerrok().
This will provide the user an interface for creating custom error messages.
* data/skeletons/lalr1.d: Add the Context class.
* doc/bison.texi: Document it.
* data/skeletons/lalr1.d: Change the return value.
* examples/d/calc/calc.y, examples/d/simple/calc.y: Adjust.
* tests/scanner.at: Adjust.
* tests/calc.at (_AT_DATA_CALC_Y(d)): New, extracted from...
(_AT_DATA_CALC_Y(c)): here.
The two grammars have been sufficiently different to be separated.
Still trying to be them together results in a maintenance burden. For
the same reason, instead of specifying the results for D and for the
rest, compute the expected results with D from the regular case.
* data/skeletons/c++.m4, data/skeletons/glr.c, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c:
Be more accurate about yychar and yytoken.
Don't name local variables as if they were members.
This should have been done in 3.6, but I wanted to avoid introducing
conflicts into Vincent's work on counterexamples. It turns out it's
completely orthogonal.
* data/README.md, data/skeletons/bison.m4, data/skeletons/c++.m4,
* data/skeletons/c.m4, data/skeletons/glr.c, data/skeletons/java.m4,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/variant.hh, data/skeletons/yacc.c, src/conflicts.c,
* src/derives.c, src/gram.c, src/gram.h, src/output.c,
* src/parse-gram.c, src/parse-gram.y, src/print-xml.c, src/print.c,
* src/reader.c, src/symtab.c, src/symtab.h, tests/input.at,
* tests/types.at:
s/user_token_number/code/g.
Plus minor changes.
I'm quite pleased to see that the tricky case of glr.c was already
prepared by the changes to support syntax_error exceptions. Better
yet, it is actually syntax_error that becomes a special case of the
general pattern: make yytoken be YYERRCODE.
* data/skeletons/glr.c (YYFAULTYTOK): Remove the now useless (Basil)
Faulty token.
Instead, use the error token.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java: When computing
the action, first check the case of the error token.
* tests/calc.at: Check cases for the error token symbols before and
after it.
We will not keep YYERRCODE anyway, it causes backward compatibility
issues. So as a first step, let all the skeletons use that name,
until we have a better one.
* data/skeletons/bison.m4, data/skeletons/glr.c,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c, doc/bison.texi, tests/headers.at,
* tests/input.at:
here.
Not all the symbols have a fixed symbol code. UNDEF's one is fixed:
-2.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/yacc.c: here.
* data/skeletons/bison.m4, data/skeletons/c++.m4, data/skeletons/c.m4,
* data/skeletons/glr.cc, data/skeletons/lalr1.cc,
* data/skeletons/lalr1.d, data/skeletons/lalr1.java:
Refer to the "kind" of a symbol, not its "type", where appropriate.
* data/skeletons/d.m4 (b4_symbol_enum, b4_declare_symbol_enum): New.
* data/skeletons/lalr1.d: Use them.
Use SymbolType, SymbolType.YYSYMBOL_YYEMPTY etc. where appropriate.
(undef_token_, token_number_type, yy_error_token_): Remove.
parse.error has more than two possible values.
* data/skeletons/bison.m4 (b4_error_verbose_if, b4_error_verbose_flag):
Remove.
(b4_parse_error_case, b4_parse_error_bmatch): New.
Adjust dependencies.
The C, C++ and D skeletons used to show the stack right after popping
the stack during the reduction. Now that the stack is printed after
reaching a new state, that has become useless:
Entering state 1
Stack now 0 1
Reducing stack by rule 5 (line 83):
$1 = token "number" (1)
-> $$ = nterm exp (1)
Stack now 0
Entering state 8
Stack now 0 8
Remove the "Stack now 0" line.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c:
Here.
Currently, if we have long rules and series of shift, we stack states
without showing stack. Let's be more incremental, and do how the Java
skeleton does.
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/yacc.c:
Here.
Adjust test cases.
* tests/torture.at (AT_DATA_STACK_TORTURE): Disable stack traces: this
test produces a very large stack, and showing the stack each time we
shift a token goes quadatric.
The Java skeleton displays
Reading a token:
Next token is token "number" (1)
while the other display
Reading a token: Next token is token "number" (1)
When generating logs in the scanner, the first part is separated from
the second, and the end of the scanner logs have the second part
pasted in. So let's propagate the Java way, but with the colon.
* data/skeletons/glr.c, data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java, data/skeletons/yacc.c: Do it.
Adjust test cases and doc.
When building the test cases, emitting code in the epilogue is very
constraining. Let's make it simpler thanks to %code epilogue.
However, I don't want to document this: it is bad style to use it (we
should avoid having too many ways to write the same thing,
TI!MTOWTDI), just put your code in the true epilogue section.
* data/skeletons/glr.c, data/skeletons/lalr1.d, data/skeletons/lalr1.java,
* data/skeletons/yacc.c: Implement support for %code epilogue.
Remove useless comments.
* tests/calc.at, tests/java.at: Simplify.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java: Don't expose
yyuser_token_number_max_ and yyundef_token_. Do as in C++: scope them
into yytranslate_, and only when api.token.raw is not defined.
(yyterror_): Rename as...
(yy_error_token_): this.
* data/skeletons/lalr1.d (token_number_type): New.
Use it.
Can't be done in the Java backend, as Java does not have type aliases.
* data/skeletons/lalr1.d, data/skeletons/lalr1.java (yytoken_number_):
Remove, useless.
Was used in ancient C skeletons to support YYPRINT, long obsoleted by
%printer.
It is not used at all. We will remove it also from yacc.c, but
later (see TODO).
* data/skeletons/lalr1.cc, data/skeletons/lalr1.d,
* data/skeletons/lalr1.java (yyerrcode_):
Remove.
This changes the traces from
Reading a token:
Now at end of input.
to
Reading a token:
Next token is token $end (7FFEE56E6474)
which is ok. Actually it is even better, as it gives the location
when locations are enabled, and is clearer when rules explicitly use
the EOF token.
* data/skeletons/lalr1.d (yytranslate_): Handle eof here, as is done
in lalr1.cc.
* configure.ac (DCFLAGS): Pass -g.
* data/skeletons/d.m4 (b4_locations_if): Remove, let bison.m4's one do
its job.
* data/skeletons/lalr1.d (position): Leave filename empty by default.
(position::toString): Don't print empty file names.
(location::this): New ctor.
(location::toString): Match the implementations of C/C++.
(yy_semantic_null): Leave undefined, the previous implementation does
not compile.
* tests/calc.at: Improve the implementation for D.
Enable more checks, in particular using locations.
* tests/local.at (AT_YYERROR_DEFINE(d)): Fix its implementation.