Do not allow identifiers that start with a dash.

This cleans up our previous fixes for a bug whereby Bison
discarded `.field' in `$-1.field'.  The previous fixes were less
restrictive about where a dash could appear in an identifier, but
the restrictions were hard to explain.  That bug was reported and
this final fix was originally suggested by Paul Hilfinger.  This
also fixes a remaining bug reported by Paul Eggert whereby Bison
parses `%token ID -123' as `%token ID - 123' and handles `-' as an
identifier.  Now, `-' cannot be an identifier.  Discussed in
threads beginning at
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00000.html>,
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00004.html>.
* NEWS (2.5): Update entry describing the dash extension to
grammar symbol names.  Also, move that entry before the named
references entry because the latter mentions the former.
* doc/bison.texinfo (Symbol): Update documentation for symbol
names.  As suggested by Paul Eggert, mention the effect of periods
and dashes on named references.
(Decl Summary): Update documentation for unquoted %define values,
which, as a side effect, can no longer start with dashes either.
* src/scan-code.l (id): Implement.
* src/scan-gram.l (id): Implement.
* tests/actions.at (Exotic Dollars): Extend test group to exercise
bug reported by Paul Hilfinger.
* tests/input.at (Symbols): Update test group, and extend to
exercise bug reported by Paul Eggert.
* tests/named-refs.at (Stray symbols in brackets): Update test
group.
($ or @ followed by . or -): Likewise.
* tests/regression.at (Invalid inputs): Likewise.
(cherry picked from commit 82f3355eaf)
This commit is contained in:
Joel E. Denny
2011-01-29 12:54:28 -05:00
parent 448dc38bc4
commit eb8c66bbda
9 changed files with 123 additions and 35 deletions

View File

@@ -1,3 +1,36 @@
2011-01-29 Joel E. Denny <joeldenny@joeldenny.org>
Do not allow identifiers that start with a dash.
This cleans up our previous fixes for a bug whereby Bison
discarded `.field' in `$-1.field'. The previous fixes were less
restrictive about where a dash could appear in an identifier, but
the restrictions were hard to explain. That bug was reported and
this final fix was originally suggested by Paul Hilfinger. This
also fixes a remaining bug reported by Paul Eggert whereby Bison
parses `%token ID -123' as `%token ID - 123' and handles `-' as an
identifier. Now, `-' cannot be an identifier. Discussed in
threads beginning at
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00000.html>,
<http://lists.gnu.org/archive/html/bug-bison/2011-01/msg00004.html>.
* NEWS (2.5): Update entry describing the dash extension to
grammar symbol names. Also, move that entry before the named
references entry because the latter mentions the former.
* doc/bison.texinfo (Symbol): Update documentation for symbol
names. As suggested by Paul Eggert, mention the effect of periods
and dashes on named references.
(Decl Summary): Update documentation for unquoted %define values,
which, as a side effect, can no longer start with dashes either.
* src/scan-code.l (id): Implement.
* src/scan-gram.l (id): Implement.
* tests/actions.at (Exotic Dollars): Extend test group to exercise
bug reported by Paul Hilfinger.
* tests/input.at (Symbols): Update test group, and extend to
exercise bug reported by Paul Eggert.
* tests/named-refs.at (Stray symbols in brackets): Update test
group.
($ or @ followed by . or -): Likewise.
* tests/regression.at (Invalid inputs): Likewise.
2011-01-24 Joel E. Denny <joeldenny@joeldenny.org>
* data/yacc.c: Fix last apostrophe warning from xgettext.

16
NEWS
View File

@@ -3,6 +3,14 @@ Bison News
* Changes in version 2.5 (????-??-??):
** Grammar symbol names can now contain non-initial dashes:
Consistently with directives (such as %error-verbose) and with
%define variables (e.g. push-pull), grammar symbol names may contain
dashes in any position except the beginning. This is a GNU
extension over POSIX Yacc. Thus, use of this extension is reported
by -Wyacc and rejected in Yacc mode (--yacc).
** Named references:
Historically, Yacc and Bison have supported positional references
@@ -98,14 +106,6 @@ Bison News
LAC is an experimental feature. More user feedback will help to
stabilize it.
** Grammar symbol names can now contain dashes:
Consistently with directives (such as %error-verbose) and variables
(e.g. push-pull), grammar symbol names may include dashes in any
position, similarly to periods and underscores. This is GNU
extension over POSIX Yacc whose use is reported by -Wyacc, and
rejected in Yacc mode (--yacc).
** %define improvements:
*** Can now be invoked via the command line:

View File

@@ -3049,12 +3049,13 @@ A @dfn{nonterminal symbol} stands for a class of syntactically
equivalent groupings. The symbol name is used in writing grammar rules.
By convention, it should be all lower case.
Symbol names can contain letters, underscores, periods, dashes, and (not
at the beginning) digits. Dashes in symbol names are a GNU
extension, incompatible with POSIX Yacc. Terminal symbols
that contain periods or dashes make little sense: since they are not
valid symbols (in most programming languages) they are not exported as
token names.
Symbol names can contain letters, underscores, periods, and non-initial
digits and dashes. Dashes in symbol names are a GNU extension, incompatible
with POSIX Yacc. Periods and dashes make symbol names less convenient to
use with named references, which require brackets around such names
(@pxref{Named References}). Terminal symbols that contain periods or dashes
make little sense: since they are not valid symbols (in most programming
languages) they are not exported as token names.
There are three ways of writing terminal symbols in the grammar:
@@ -4959,9 +4960,8 @@ Define a variable to adjust Bison's behavior.
It is an error if a @var{variable} is defined by @code{%define} multiple
times, but see @ref{Bison Options,,-D @var{name}[=@var{value}]}.
@var{value} must be placed in quotation marks if it contains any
character other than a letter, underscore, period, dash, or non-initial
digit.
@var{value} must be placed in quotation marks if it contains any character
other than a letter, underscore, period, or non-initial dash or digit.
Omitting @code{"@var{value}"} entirely is always equivalent to specifying
@code{""}.

View File

@@ -85,7 +85,7 @@ splice (\\[ \f\t\v]*\n)*
named symbol references. Shall be kept synchronized with
scan-gram.l "letter" and "id". */
letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
id -*(-|{letter}({letter}|[-0-9])*)
id {letter}({letter}|[-0-9])*
ref -?[0-9]+|{id}|"["{id}"]"|"$"
%%

View File

@@ -104,7 +104,7 @@ static void unexpected_newline (boundary, char const *);
%x SC_BRACKETED_ID SC_RETURN_BRACKETED_ID
letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
id -*(-|{letter}({letter}|[-0-9])*)
id {letter}({letter}|[-0-9])*
directive %{id}
int [0-9]+

View File

@@ -158,6 +158,52 @@ AT_PARSER_CHECK([./input], 0,
[[15
]])
# Make sure that fields after $n or $-n are parsed correctly. At one
# point while implementing dashes in symbol names, we were dropping
# fields after $-n.
AT_DATA_GRAMMAR([[input.y]],
[[
%{
# include <stdio.h>
static int yylex (void);
static void yyerror (char const *msg);
typedef struct { int val; } stype;
# define YYSTYPE stype
%}
%%
start: one two { $$.val = $1.val + $2.val; } sum ;
one: { $$.val = 1; } ;
two: { $$.val = 2; } ;
sum: { printf ("%d\n", $0.val + $-1.val + $-2.val); } ;
%%
static int
yylex (void)
{
return 0;
}
static void
yyerror (char const *msg)
{
fprintf (stderr, "%s\n", msg);
}
int
main (void)
{
return yyparse ();
}
]])
AT_BISON_CHECK([[-o input.c input.y]])
AT_COMPILE([[input]])
AT_PARSER_CHECK([[./input]], [[0]],
[[6
]])
AT_CLEANUP

View File

@@ -658,17 +658,20 @@ AT_BISON_CHECK([-o input.c input.y])
AT_COMPILE([input.o], [-c input.c])
# Periods and dashes are genuine letters, they can start identifiers.
# Digits cannot.
# Periods are genuine letters, they can start identifiers.
# Digits and dashes cannot.
AT_DATA_GRAMMAR([input.y],
[[%token .GOOD
-GOOD
1NV4L1D
-123
%%
start: .GOOD -GOOD
start: .GOOD GOOD
]])
AT_BISON_CHECK([-o input.c input.y], [1], [],
[[input.y:11.10-16: invalid identifier: `1NV4L1D'
[[input.y:10.10: invalid character: `-'
input.y:11.10-16: invalid identifier: `1NV4L1D'
input.y:12.10: invalid character: `-'
]])
AT_CLEANUP

View File

@@ -446,13 +446,14 @@ AT_SETUP([Stray symbols in brackets])
AT_DATA_GRAMMAR([test.y],
[[
%%
start: foo[ /* aaa */ *&-+ ] bar
start: foo[ /* aaa */ *&-.+ ] bar
{ s = $foo; }
]])
AT_BISON_CHECK([-o test.c test.y], 1, [],
[[test.y:11.23: invalid character in bracketed name: `*'
test.y:11.24: invalid character in bracketed name: `&'
test.y:11.26: invalid character in bracketed name: `+'
test.y:11.25: invalid character in bracketed name: `-'
test.y:11.27: invalid character in bracketed name: `+'
]])
AT_CLEANUP
@@ -570,23 +571,27 @@ AT_DATA([[test.y]],
%%
start:
.field { $.field; }
| -field { @-field; }
| 'a' { @.field; }
| 'a' { $-field; }
;
.field: ;
-field: ;
]])
AT_BISON_CHECK([[test.y]], [[1]], [],
[[test.y:4.12-18: invalid reference: `$.field'
test.y:4.13: syntax error after `$', expecting integer, letter, `_', `@<:@', or `$'
test.y:4.3-8: possibly meant: $[.field] at $1
test.y:5.12-18: invalid reference: `@-field'
test.y:5.12-18: invalid reference: `@.field'
test.y:5.13: syntax error after `@', expecting integer, letter, `_', `@<:@', or `$'
test.y:5.3-8: possibly meant: @[-field] at $1
test.y:6.12-18: invalid reference: `@.field'
test.y:6.13: syntax error after `@', expecting integer, letter, `_', `@<:@', or `$'
test.y:7.12-18: invalid reference: `$-field'
test.y:7.13: syntax error after `$', expecting integer, letter, `_', `@<:@', or `$'
]])
AT_DATA([[test.y]],
[[
%%
start:
'a' { $-field; }
| 'b' { @-field; }
;
]])
AT_BISON_CHECK([[test.y]], [[0]], [],
[[test.y:4.9: warning: stray `$'
test.y:5.9: warning: stray `@'
]])
AT_CLEANUP

View File

@@ -392,7 +392,8 @@ input.y:3.14: invalid character: `}'
input.y:4.1: invalid character: `%'
input.y:4.2: invalid character: `&'
input.y:5.1-17: invalid directive: `%a-does-not-exist'
input.y:6.1-2: invalid directive: `%-'
input.y:6.1: invalid character: `%'
input.y:6.2: invalid character: `-'
input.y:7.1-8.0: missing `%}' at end of file
input.y:7.1-8.0: syntax error, unexpected %{...%}
]])