yysyntax_error: fix for consistent error with lookahead.

* NEWS (2.5): Document.
* data/yacc.c (yysyntax_error): In a verbose syntax error
message while in a consistent state with a default action (which
must be an error action given that yysyntax_error is being
invoked), continue to drop the expected token list, but don't
drop the unexpected token unless there actually is no lookahead.
Moreover, handle that internally instead of returning 1 to tell
the caller to do it.  With that meaning of 1 gone, renumber
return codes more usefully.
(yyparse, yypush_parse): Update yysyntax_error usage.  Most
importantly, set yytoken to YYEMPTY when there's no lookahead.
* data/glr.c (yyreportSyntaxError): As in yacc.c, don't drop the
unexpected token unless there actually is no lookahead.
* data/lalr1.cc (yy::parser::parse): If there's no lookahead,
set yytoken to yyempty_ before invoking yysyntax_error_.
(yy::parser::yysyntax_error_): Again, don't drop the unexpected
token unless there actually is no lookahead.
* data/lalr1.java (YYParser::parse): If there's no lookahead,
set yytoken to yyempty_ before invoking yysyntax_error.
(YYParser::yysyntax_error): Again, don't drop the unexpected
token unless there actually is no lookahead.
* tests/conflicts.at (%error-verbose and consistent
errors): Extend test group to further reveal how the previous
use of the simple "syntax error" message was too general.  Test
yacc.c, glr.c, lalr1.cc, and lalr1.java.  No longer an expected
failure.
* tests/java.at (AT_JAVA_COMPILE, AT_JAVA_PARSER_CHECK): Move
to...
* tests/local.at: ... here.
(_AT_BISON_OPTION_PUSHDEFS): Push AT_SKEL_JAVA_IF definition.
(AT_BISON_OPTION_POPDEFS): Pop it.
(AT_FULL_COMPILE): Extend to handle Java.
(cherry picked from commit d2060f0634)

Conflicts:

	data/lalr1.cc
	data/lalr1.java
	src/parse-gram.c
	src/parse-gram.h
	tests/java.at
This commit is contained in:
Joel E. Denny
2010-11-07 16:01:56 -05:00
parent 678094a2f5
commit 095a1d11ca
11 changed files with 603 additions and 265 deletions

View File

@@ -2094,11 +2094,7 @@ yyreportSyntaxError (yyGLRStack* yystackp]b4_user_formals[)
#if ! YYERROR_VERBOSE
yyerror (]b4_lyyerror_args[YY_("syntax error"));
#else
int yyn;
yyn = yypact[yystackp->yytops.yystates[0]->yylrState];
if (YYPACT_NINF < yyn && yyn <= YYLAST)
{
yySymbol yytoken = YYTRANSLATE (yychar);
yySymbol yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
size_t yysize0 = yytnamerr (NULL, yytokenName (yytoken));
size_t yysize = yysize0;
size_t yysize1;
@@ -2109,23 +2105,47 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
const char *yyformat = 0;
/* Arguments of yyformat. */
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;
/* There are many possibilities here to consider:
- If this state is a consistent state with a default action, then
the only way this function was invoked is if the default action
is an error action. In that case, don't check for expected
tokens because there are none.
- The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this state is a
consistent state with a default action. There might have been a
previous inconsistent state, consistent state with a non-default
action, or user semantic action that manipulated yychar.
- Of course, the expected token list depends on states to have
correct lookahead information, and it depends on the parser not
to perform extra reductions after fetching a lookahead from the
scanner and before detecting a syntax error. Thus, state merging
(from LALR or IELR) and default reductions corrupt the expected
token list. However, the list is correct for canonical LR with
one exception: it will still contain any token that will not be
accepted due to an error action in a later state.
*/
if (yytoken != YYEMPTY)
{
int yyn = yypact[yystackp->yytops.yystates[0]->yylrState];
yyarg[yycount++] = yytokenName (yytoken);
if (!yypact_value_is_default (yyn))
{
/* Start YYX at -YYN if negative to avoid negative indexes in
YYCHECK. In other words, skip the first -YYN actions for this
state because they are default actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;
/* Stay within bounds of both yycheck and yytname. */
int yychecklim = YYLAST - yyn + 1;
int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;
/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;
int yyx;
yyarg[yycount++] = yytokenName (yytoken);
for (yyx = yyxbegin; yyx < yyxend; ++yyx)
if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR
&& !yytable_value_is_error (yytable[yyx + yyn]))
@@ -2141,6 +2161,8 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
yysize_overflow |= yysize1 < yysize;
yysize = yysize1;
}
}
}
switch (yycount)
{
@@ -2148,6 +2170,7 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
@@ -2188,9 +2211,6 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
yyerror (]b4_lyyerror_args[YY_("syntax error"));
yyMemoryExhausted (yystackp);
}
}
else
yyerror (]b4_lyyerror_args[YY_("syntax error"));
#endif /* YYERROR_VERBOSE */
yynerrs += 1;
}

View File

@@ -719,6 +719,8 @@ m4_ifdef([b4_lex_param], [, ]b4_lex_param))[;
if (!yyerrstatus_)
{
++yynerrs_;
if (yychar == yyempty_)
yytoken = yyempty_;
error (yylloc, yysyntax_error_ (yystate, yytoken));
}
@@ -848,26 +850,52 @@ b4_error_verbose_if([int yystate, int yytoken],
[int, int])[)
{]b4_error_verbose_if([[
std::string yyres;
int yyn = yypact_[yystate];
if (yypact_ninf_ < yyn && yyn <= yylast_)
// Number of reported tokens (one for the "unexpected", one per
// "expected").
size_t yycount = 0;
// Its maximum.
enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
// Arguments of yyformat.
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
/* There are many possibilities here to consider:
- If this state is a consistent state with a default action, then
the only way this function was invoked is if the default action
is an error action. In that case, don't check for expected
tokens because there are none.
- The only way there can be no lookahead present (in yytoken) is
if this state is a consistent state with a default action.
Thus, detecting the absence of a lookahead is sufficient to
determine that there is no unexpected or expected token to
report. In that case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this state is
a consistent state with a default action. There might have
been a previous inconsistent state, consistent state with a
non-default action, or user semantic action that manipulated
yychar.
- Of course, the expected token list depends on states to have
correct lookahead information, and it depends on the parser not
to perform extra reductions after fetching a lookahead from the
scanner and before detecting a syntax error. Thus, state
merging (from LALR or IELR) and default reductions corrupt the
expected token list. However, the list is correct for
canonical LR with one exception: it will still contain any
token that will not be accepted due to an error action in a
later state.
*/
if (yytoken != yyempty_)
{
yyarg[yycount++] = yytname_[yytoken];
int yyn = yypact_[yystate];
if (!yy_pact_value_is_default_ (yyn))
{
/* Start YYX at -YYN if negative to avoid negative indexes in
YYCHECK. In other words, skip the first -YYN actions for
this state because they are default actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;
/* Stay within bounds of both yycheck and yytname. */
int yychecklim = yylast_ - yyn + 1;
int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
// Number of reported tokens (one for the "unexpected", one per
// "expected").
size_t yycount = 0;
// Its maximum.
enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
// Arguments of yyformat.
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
yyarg[yycount++] = yytname_[yytoken];
for (int yyx = yyxbegin; yyx < yyxend; ++yyx)
if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_
&& !yy_table_value_is_error_ (yytable_[yyx + yyn]))
@@ -880,6 +908,8 @@ b4_error_verbose_if([int yystate, int yytoken],
else
yyarg[yycount++] = yytname_[yyx];
}
}
}
char const* yyformat = 0;
switch (yycount)
@@ -888,6 +918,7 @@ b4_error_verbose_if([int yystate, int yytoken],
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
@@ -895,6 +926,7 @@ b4_error_verbose_if([int yystate, int yytoken],
YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
#undef YYCASE_
}
// Argument number.
size_t yyi = 0;
for (char const* yyp = yyformat; *yyp; ++yyp)
@@ -905,9 +937,6 @@ b4_error_verbose_if([int yystate, int yytoken],
}
else
yyres += *yyp;
}
else
yyres = YY_("syntax error");
return yyres;]], [[
return YY_("syntax error");]])[
}

View File

@@ -582,8 +582,10 @@ m4_popdef([b4_at_dollar])])dnl
/* If not already recovering from an error, report this error. */
if (yyerrstatus_ == 0)
{
++yynerrs_;
yyerror (]b4_locations_if([yylloc, ])[yysyntax_error (yystate, yytoken));
++yynerrs_;
if (yychar == yyempty_)
yytoken = yyempty_;
yyerror (]b4_locations_if([yylloc, ])[yysyntax_error (yystate, yytoken));
}
]b4_locations_if([yyerrloc = yylloc;])[
@@ -683,17 +685,52 @@ m4_popdef([b4_at_dollar])])dnl
{
if (errorVerbose)
{
int yyn = yypact_[yystate];
if (yypact_ninf_ < yyn && yyn <= yylast_)
/* There are many possibilities here to consider:
- Assume YYFAIL is not used. It's too flawed to consider.
See
<http://lists.gnu.org/archive/html/bison-patches/2009-12/msg00024.html>
for details. YYERROR is fine as it does not invoke this
function.
- If this state is a consistent state with a default action,
then the only way this function was invoked is if the
default action is an error action. In that case, don't
check for expected tokens because there are none.
- The only way there can be no lookahead present (in tok) is
if this state is a consistent state with a default action.
Thus, detecting the absence of a lookahead is sufficient to
determine that there is no unexpected or expected token to
report. In that case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this
state is a consistent state with a default action. There
might have been a previous inconsistent state, consistent
state with a non-default action, or user semantic action
that manipulated yychar. (However, yychar is currently out
of scope during semantic actions.)
- Of course, the expected token list depends on states to
have correct lookahead information, and it depends on the
parser not to perform extra reductions after fetching a
lookahead from the scanner and before detecting a syntax
error. Thus, state merging (from LALR or IELR) and default
reductions corrupt the expected token list. However, the
list is correct for canonical LR with one exception: it
will still contain any token that will not be accepted due
to an error action in a later state.
*/
if (tok != yyempty_)
{
StringBuffer res;
// FIXME: This method of building the message is not compatible
// with internationalization.
StringBuffer res =
new StringBuffer ("syntax error, unexpected ");
res.append (yytnamerr_ (yytname_[tok]));
int yyn = yypact_[yystate];
if (!yy_pact_value_is_default_ (yyn))
{
/* Start YYX at -YYN if negative to avoid negative
indexes in YYCHECK. In other words, skip the first
-YYN actions for this state because they are default
actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;
/* Stay within bounds of both yycheck and yytname. */
int yychecklim = yylast_ - yyn + 1;
int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
@@ -702,11 +739,6 @@ m4_popdef([b4_at_dollar])])dnl
if (yycheck_[x + yyn] == x && x != yyterror_
&& !yy_table_value_is_error_ (yytable_[x + yyn]))
++count;
// FIXME: This method of building the message is not compatible
// with internationalization.
res = new StringBuffer ("syntax error, unexpected ");
res.append (yytnamerr_ (yytname_[tok]));
if (count < 5)
{
count = 0;
@@ -718,6 +750,7 @@ m4_popdef([b4_at_dollar])])dnl
res.append (yytnamerr_ (yytname_[x]));
}
}
}
return res.toString ();
}
}

View File

@@ -972,20 +972,13 @@ yytnamerr (char *yyres, const char *yystr)
/* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
about the unexpected token YYTOKEN while in state YYSTATE.
Return 0 if *YYMSG was successfully written. Return 1 if an ordinary
"syntax error" message will suffice instead. Return 2 if *YYMSG is
not large enough to hold the message. In the last case, also set
*YYMSG_ALLOC to either (a) the required number of bytes or (b) zero
if the required number of bytes is too large to store. */
Return 0 if *YYMSG was successfully written. Return 1 if *YYMSG is
not large enough to hold the message. In that case, also set
*YYMSG_ALLOC to the required number of bytes. Return 2 if the
required number of bytes is too large to store. */
static int
yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
int yystate, int yytoken)
{
int yyn = yypact[yystate];
if (! (YYPACT_NINF < yyn && yyn <= YYLAST))
return 1;
else
{
YYSIZE_T yysize0 = yytnamerr (0, yytname[yytoken]);
YYSIZE_T yysize = yysize0;
@@ -995,22 +988,51 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
const char *yyformat = 0;
/* Arguments of yyformat. */
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;
/* There are many possibilities here to consider:
- Assume YYFAIL is not used. It's too flawed to consider. See
<http://lists.gnu.org/archive/html/bison-patches/2009-12/msg00024.html>
for details. YYERROR is fine as it does not invoke this
function.
- If this state is a consistent state with a default action, then
the only way this function was invoked is if the default action
is an error action. In that case, don't check for expected
tokens because there are none.
- The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this state is a
consistent state with a default action. There might have been a
previous inconsistent state, consistent state with a non-default
action, or user semantic action that manipulated yychar.
- Of course, the expected token list depends on states to have
correct lookahead information, and it depends on the parser not
to perform extra reductions after fetching a lookahead from the
scanner and before detecting a syntax error. Thus, state merging
(from LALR or IELR) and default reductions corrupt the expected
token list. However, the list is correct for canonical LR with
one exception: it will still contain any token that will not be
accepted due to an error action in a later state.
*/
if (yytoken != YYEMPTY)
{
int yyn = yypact[yystate];
yyarg[yycount++] = yytname[yytoken];
if (!yypact_value_is_default (yyn))
{
/* Start YYX at -YYN if negative to avoid negative indexes in
YYCHECK. In other words, skip the first -YYN actions for
this state because they are default actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;
/* Stay within bounds of both yycheck and yytname. */
int yychecklim = YYLAST - yyn + 1;
int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;
/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;
int yyx;
yyarg[yycount++] = yytname[yytoken];
for (yyx = yyxbegin; yyx < yyxend; ++yyx)
if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR
&& !yytable_value_is_error (yytable[yyx + yyn]))
@@ -1023,14 +1045,13 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
}
yyarg[yycount++] = yytname[yyx];
yysize1 = yysize + yytnamerr (0, yytname[yyx]);
if (! (yysize <= yysize1 && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
{
/* Overflow. */
*yymsg_alloc = 0;
return 2;
}
if (! (yysize <= yysize1
&& yysize1 <= YYSTACK_ALLOC_MAXIMUM))
return 2;
yysize = yysize1;
}
}
}
switch (yycount)
{
@@ -1038,6 +1059,7 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
@@ -1048,11 +1070,7 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
yysize1 = yysize + yystrlen (yyformat);
if (! (yysize <= yysize1 && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
{
/* Overflow. */
*yymsg_alloc = 0;
return 2;
}
return 2;
yysize = yysize1;
if (*yymsg_alloc < yysize)
@@ -1061,7 +1079,7 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
if (! (yysize <= *yymsg_alloc
&& *yymsg_alloc <= YYSTACK_ALLOC_MAXIMUM))
*yymsg_alloc = YYSTACK_ALLOC_MAXIMUM;
return 2;
return 1;
}
/* Avoid sprintf, as that infringes on the user's name space.
@@ -1084,7 +1102,6 @@ yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
}
return 0;
}
}
#endif /* YYERROR_VERBOSE */
@@ -1533,7 +1550,7 @@ yyreduce:
yyerrlab:
/* Make sure we have latest lookahead translation. See comments at
user semantic actions for why this is necessary. */
yytoken = YYTRANSLATE (yychar);
yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
/* If not already recovering from an error, report this error. */
if (!yyerrstatus)
@@ -1549,7 +1566,7 @@ yyerrlab:
int yysyntax_error_status = YYSYNTAX_ERROR;
if (yysyntax_error_status == 0)
yymsgp = yymsg;
else if (yysyntax_error_status == 2 && 0 < yymsg_alloc)
else if (yysyntax_error_status == 1)
{
if (yymsg != yymsgbuf)
YYSTACK_FREE (yymsg);
@@ -1558,6 +1575,7 @@ yyerrlab:
{
yymsg = yymsgbuf;
yymsg_alloc = sizeof yymsgbuf;
yysyntax_error_status = 2;
}
else
{