Extract the parsing of user actions from the grammar scanner.

As a consequence, the relation between the grammar scanner and
parser is much simpler.  We can also split "composite tokens" back
into simple tokens.
* src/gram.h (ITEM_NUMBER_MAX, RULE_NUMBER_MAX): New.
* src/scan-gram.l (add_column_width, adjust_location): Move to and
rename as...
* src/location.h, src/location.c (add_column_width)
(location_compute): these.
Fix the column count: the initial column is 0.
(location_print): Be robust to ending column being 0.
* src/location.h (boundary_set): New.
* src/main.c: Adjust to scanner_free being renamed as
gram_scanner_free.
* src/output.c: Include scan-code.h.
* src/parse-gram.y: Include scan-gram.h and scan-code.h.
Use boundary_set.
(PERCENT_DESTRUCTOR, PERCENT_PRINTER, PERCENT_INITIAL_ACTION)
(PERCENT_LEX_PARAM, PERCENT_PARSE_PARAM): Remove the {...} part,
which is now, again, a separate token.
Adjust all dependencies.
Whereever actions with $ and @ are used, use translate_code.
(action): Remove this nonterminal which is now useless.
* src/reader.c: Include assert.h, scan-gram.h and scan-code.h.
(grammar_current_rule_action_append): Use translate_code.
(packgram): Bound check ruleno, itemno, and rule_length.
* src/reader.h (gram_in, gram__flex_debug, scanner_cursor)
(last_string, last_braced_code_loc, max_left_semantic_context)
(scanner_initialize, scanner_free, scanner_last_string_free)
(gram_out, gram_lineno, YY_DECL_): Move to...
* src/scan-gram.h: this new file.
(YY_DECL): Rename as...
(GRAM_DECL): this.
* src/scan-code.h, src/scan-code.l, src/scan-code-c.c: New.
* src/scan-gram.l (gram_get_lineno, gram_get_in, gram_get_out):
(gram_get_leng, gram_get_text, gram_set_lineno, gram_set_in):
(gram_set_out, gram_get_debug, gram_set_debug, gram_lex_destroy):
Move these declarations, and...
(obstack_for_string, STRING_GROW, STRING_FINISH, STRING_FREE):
these to...
* src/flex-scanner.h: this new file.
* src/scan-gram.l (rule_length, rule_length_overflow)
(increment_rule_length): Remove.
(last_braced_code_loc): Rename as...
(gram_last_braced_code_loc): this.
Adjust to the changes of the parser.
Move all the handling of $ and @ into...
* src/scan-code.l: here.
* src/scan-gram.l (handle_dollar, handle_at): Remove.
(handle_action_dollar, handle_action_at): Move to...
* src/scan-code.l: here.
* src/Makefile.am (bison_SOURCES): Add flex-scanner.h,
scan-code.h, scan-code-c.c, scan-gram.h.
(EXTRA_bison_SOURCES): Add scan-code.l.
(BUILT_SOURCES): Add scan-code.c.
(yacc): Be robust to white spaces.
* tests/conflicts.at, tests/input.at, tests/reduce.at,
* tests/regression.at: Adjust the column numbers.
* tests/regression.at: Adjust the error message.
This commit is contained in:
Akim Demaille
2006-06-06 16:40:06 +00:00
parent 184e42f065
commit e9071366c3
21 changed files with 1857 additions and 776 deletions

View File

@@ -1,4 +1,65 @@
$Id$ 2006-06-06 Akim Demaille <akim@epita.fr>
Extract the parsing of user actions from the grammar scanner.
As a consequence, the relation between the grammar scanner and
parser is much simpler. We can also split "composite tokens" back
into simple tokens.
* src/gram.h (ITEM_NUMBER_MAX, RULE_NUMBER_MAX): New.
* src/scan-gram.l (add_column_width, adjust_location): Move to and
rename as...
* src/location.h, src/location.c (add_column_width)
(location_compute): these.
Fix the column count: the initial column is 0.
(location_print): Be robust to ending column being 0.
* src/location.h (boundary_set): New.
* src/main.c: Adjust to scanner_free being renamed as
gram_scanner_free.
* src/output.c: Include scan-code.h.
* src/parse-gram.y: Include scan-gram.h and scan-code.h.
Use boundary_set.
(PERCENT_DESTRUCTOR, PERCENT_PRINTER, PERCENT_INITIAL_ACTION)
(PERCENT_LEX_PARAM, PERCENT_PARSE_PARAM): Remove the {...} part,
which is now, again, a separate token.
Adjust all dependencies.
Whereever actions with $ and @ are used, use translate_code.
(action): Remove this nonterminal which is now useless.
* src/reader.c: Include assert.h, scan-gram.h and scan-code.h.
(grammar_current_rule_action_append): Use translate_code.
(packgram): Bound check ruleno, itemno, and rule_length.
* src/reader.h (gram_in, gram__flex_debug, scanner_cursor)
(last_string, last_braced_code_loc, max_left_semantic_context)
(scanner_initialize, scanner_free, scanner_last_string_free)
(gram_out, gram_lineno, YY_DECL_): Move to...
* src/scan-gram.h: this new file.
(YY_DECL): Rename as...
(GRAM_DECL): this.
* src/scan-code.h, src/scan-code.l, src/scan-code-c.c: New.
* src/scan-gram.l (gram_get_lineno, gram_get_in, gram_get_out):
(gram_get_leng, gram_get_text, gram_set_lineno, gram_set_in):
(gram_set_out, gram_get_debug, gram_set_debug, gram_lex_destroy):
Move these declarations, and...
(obstack_for_string, STRING_GROW, STRING_FINISH, STRING_FREE):
these to...
* src/flex-scanner.h: this new file.
* src/scan-gram.l (rule_length, rule_length_overflow)
(increment_rule_length): Remove.
(last_braced_code_loc): Rename as...
(gram_last_braced_code_loc): this.
Adjust to the changes of the parser.
Move all the handling of $ and @ into...
* src/scan-code.l: here.
* src/scan-gram.l (handle_dollar, handle_at): Remove.
(handle_action_dollar, handle_action_at): Move to...
* src/scan-code.l: here.
* src/Makefile.am (bison_SOURCES): Add flex-scanner.h,
scan-code.h, scan-code-c.c, scan-gram.h.
(EXTRA_bison_SOURCES): Add scan-code.l.
(BUILT_SOURCES): Add scan-code.c.
(yacc): Be robust to white spaces.
* tests/conflicts.at, tests/input.at, tests/reduce.at,
* tests/regression.at: Adjust the column numbers.
* tests/regression.at: Adjust the error message.
2006-06-06 Joel E. Denny <jdenny@ces.clemson.edu> 2006-06-06 Joel E. Denny <jdenny@ces.clemson.edu>
@@ -16057,3 +16118,5 @@ $Id$
Copying and distribution of this file, with or without Copying and distribution of this file, with or without
modification, are permitted provided the copyright notice and this modification, are permitted provided the copyright notice and this
notice are preserved. notice are preserved.
$Id$

View File

@@ -1,4 +1,4 @@
## Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. ## Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
## This program is free software; you can redistribute it and/or modify ## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by ## it under the terms of the GNU General Public License as published by
@@ -39,6 +39,7 @@ bison_SOURCES = \
conflicts.c conflicts.h \ conflicts.c conflicts.h \
derives.c derives.h \ derives.c derives.h \
files.c files.h \ files.c files.h \
flex-scanner.h \
getargs.c getargs.h \ getargs.c getargs.h \
gram.c gram.h \ gram.c gram.h \
lalr.h lalr.c \ lalr.h lalr.c \
@@ -54,8 +55,9 @@ bison_SOURCES = \
reduce.c reduce.h \ reduce.c reduce.h \
revision.c revision.h \ revision.c revision.h \
relation.c relation.h \ relation.c relation.h \
scan-gram-c.c \ scan-code.h scan-code-c.c \
scan-skel-c.c scan-skel.h \ scan-gram.h scan-gram-c.c \
scan-skel.h scan-skel-c.c \
state.c state.h \ state.c state.h \
symlist.c symlist.h \ symlist.c symlist.h \
symtab.c symtab.h \ symtab.c symtab.h \
@@ -65,15 +67,20 @@ bison_SOURCES = \
vcg.c vcg.h \ vcg.c vcg.h \
vcg_defaults.h vcg_defaults.h
EXTRA_bison_SOURCES = scan-skel.l scan-gram.l EXTRA_bison_SOURCES = scan-code.l scan-skel.l scan-gram.l
BUILT_SOURCES = revision.c scan-skel.c scan-gram.c parse-gram.c parse-gram.h BUILT_SOURCES = \
parse-gram.c parse-gram.h \
revision.c \
scan-code.c \
scan-skel.c \
scan-gram.c \
MOSTLYCLEANFILES = yacc MOSTLYCLEANFILES = yacc
yacc: yacc:
echo '#! /bin/sh' >$@ echo '#! /bin/sh' >$@
echo 'exec $(bindir)/bison -y "$$@"' >>$@ echo "exec '$(bindir)/bison' -y \"$$@\"" >>$@
chmod a+x $@ chmod a+x $@
echo: echo:

View File

@@ -1,6 +1,6 @@
/* Data definitions for internal representation of Bison's input. /* Data definitions for internal representation of Bison's input.
Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005 Copyright (C) 1984, 1986, 1989, 1992, 2001, 2002, 2003, 2004, 2005, 2006
Free Software Foundation, Inc. Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler. This file is part of Bison, the GNU Compiler Compiler.
@@ -115,6 +115,7 @@ extern int ntokens;
extern int nvars; extern int nvars;
typedef int item_number; typedef int item_number;
#define ITEM_NUMBER_MAX INT_MAX
extern item_number *ritem; extern item_number *ritem;
extern unsigned int nritems; extern unsigned int nritems;
@@ -146,6 +147,7 @@ item_number_is_symbol_number (item_number i)
/* Rule numbers. */ /* Rule numbers. */
typedef int rule_number; typedef int rule_number;
#define RULE_NUMBER_MAX INT_MAX
extern rule_number nrules; extern rule_number nrules;
static inline item_number static inline item_number

View File

@@ -1,6 +1,5 @@
/* Locations for Bison /* Locations for Bison
Copyright (C) 2002, 2005, 2006 Free Software Foundation, Inc.
Copyright (C) 2002, 2005 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler. This file is part of Bison, the GNU Compiler Compiler.
@@ -28,11 +27,80 @@
location const empty_location; location const empty_location;
/* If BUF is null, add BUFSIZE (which in this case must be less than
INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
COLUMN. If an overflow occurs, or might occur but is undetectable,
return INT_MAX. Assume COLUMN is nonnegative. */
static inline int
add_column_width (int column, char const *buf, size_t bufsize)
{
size_t width;
unsigned int remaining_columns = INT_MAX - column;
if (buf)
{
if (INT_MAX / 2 <= bufsize)
return INT_MAX;
width = mbsnwidth (buf, bufsize, 0);
}
else
width = bufsize;
return width <= remaining_columns ? column + width : INT_MAX;
}
/* Set *LOC and adjust scanner cursor to account for token TOKEN of
size SIZE. */
void
location_compute (location *loc, boundary *cur, char const *token, size_t size)
{
int line = cur->line;
int column = cur->column;
char const *p0 = token;
char const *p = token;
char const *lim = token + size;
loc->start = *cur;
for (p = token; p < lim; p++)
switch (*p)
{
case '\n':
line += line < INT_MAX;
column = 1;
p0 = p + 1;
break;
case '\t':
column = add_column_width (column, p0, p - p0);
column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
p0 = p + 1;
break;
default:
break;
}
cur->line = line;
cur->column = column = add_column_width (column, p0, p - p0);
loc->end = *cur;
if (line == INT_MAX && loc->start.line != INT_MAX)
warn_at (*loc, _("line number overflow"));
if (column == INT_MAX && loc->start.column != INT_MAX)
warn_at (*loc, _("column number overflow"));
}
/* Output to OUT the location LOC. /* Output to OUT the location LOC.
Warning: it uses quotearg's slot 3. */ Warning: it uses quotearg's slot 3. */
void void
location_print (FILE *out, location loc) location_print (FILE *out, location loc)
{ {
int end_col = 0 < loc.end.column ? loc.end.column - 1 : 0;
fprintf (out, "%s:%d.%d", fprintf (out, "%s:%d.%d",
quotearg_n_style (3, escape_quoting_style, loc.start.file), quotearg_n_style (3, escape_quoting_style, loc.start.file),
loc.start.line, loc.start.column); loc.start.line, loc.start.column);
@@ -40,9 +108,9 @@ location_print (FILE *out, location loc)
if (loc.start.file != loc.end.file) if (loc.start.file != loc.end.file)
fprintf (out, "-%s:%d.%d", fprintf (out, "-%s:%d.%d",
quotearg_n_style (3, escape_quoting_style, loc.end.file), quotearg_n_style (3, escape_quoting_style, loc.end.file),
loc.end.line, loc.end.column - 1); loc.end.line, end_col);
else if (loc.start.line < loc.end.line) else if (loc.start.line < loc.end.line)
fprintf (out, "-%d.%d", loc.end.line, loc.end.column - 1); fprintf (out, "-%d.%d", loc.end.line, end_col);
else if (loc.start.column < loc.end.column - 1) else if (loc.start.column < end_col)
fprintf (out, "-%d", loc.end.column - 1); fprintf (out, "-%d", end_col);
} }

View File

@@ -40,6 +40,15 @@ typedef struct
} boundary; } boundary;
/* Set the position of \a a. */
static inline void
boundary_set (boundary *b, const char *f, int l, int c)
{
b->file = f;
b->line = l;
b->column = c;
}
/* Return nonzero if A and B are equal boundaries. */ /* Return nonzero if A and B are equal boundaries. */
static inline bool static inline bool
equal_boundaries (boundary a, boundary b) equal_boundaries (boundary a, boundary b)
@@ -64,6 +73,11 @@ typedef struct
extern location const empty_location; extern location const empty_location;
/* Set *LOC and adjust scanner cursor to account for token TOKEN of
size SIZE. */
void location_compute (location *loc,
boundary *cur, char const *token, size_t size);
void location_print (FILE *out, location loc); void location_print (FILE *out, location loc);
#endif /* ! defined LOCATION_H_ */ #endif /* ! defined LOCATION_H_ */

View File

@@ -1,6 +1,7 @@
/* Top level entry point of Bison. /* Top level entry point of Bison.
Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005 Copyright (C) 1984, 1986, 1989, 1992, 1995, 2000, 2001, 2002, 2004, 2005,
2006
Free Software Foundation, Inc. Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler. This file is part of Bison, the GNU Compiler Compiler.
@@ -169,7 +170,7 @@ main (int argc, char *argv[])
/* The scanner memory cannot be released right after parsing, as it /* The scanner memory cannot be released right after parsing, as it
contains things such as user actions, prologue, epilogue etc. */ contains things such as user actions, prologue, epilogue etc. */
scanner_free (); gram_scanner_free ();
muscle_free (); muscle_free ();
uniqstrs_free (); uniqstrs_free ();
timevar_pop (TV_FREE); timevar_pop (TV_FREE);

View File

@@ -36,6 +36,7 @@
#include "muscle_tab.h" #include "muscle_tab.h"
#include "output.h" #include "output.h"
#include "reader.h" #include "reader.h"
#include "scan-code.h" /* max_left_semantic_context */
#include "scan-skel.h" #include "scan-skel.h"
#include "symtab.h" #include "symtab.h"
#include "tables.h" #include "tables.h"

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
/* A Bison parser, made by GNU Bison 2.2a. */ /* A Bison parser, made by GNU Bison 2.1b. */
/* Skeleton interface for Bison's Yacc-like parsers in C /* Skeleton interface for Bison's Yacc-like parsers in C
@@ -148,7 +148,7 @@
#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED #if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef union YYSTYPE typedef union YYSTYPE
#line 94 "parse-gram.y" #line 95 "../../src/parse-gram.y"
{ {
symbol *symbol; symbol *symbol;
symbol_list *list; symbol_list *list;
@@ -158,7 +158,7 @@ typedef union YYSTYPE
uniqstr uniqstr; uniqstr uniqstr;
} }
/* Line 1529 of yacc.c. */ /* Line 1529 of yacc.c. */
#line 162 "parse-gram.h" #line 162 "../../src/parse-gram.h"
YYSTYPE; YYSTYPE;
# define yystype YYSTYPE /* obsolescent; will be withdrawn */ # define yystype YYSTYPE /* obsolescent; will be withdrawn */
# define YYSTYPE_IS_DECLARED 1 # define YYSTYPE_IS_DECLARED 1

View File

@@ -32,6 +32,8 @@
#include "quotearg.h" #include "quotearg.h"
#include "reader.h" #include "reader.h"
#include "symlist.h" #include "symlist.h"
#include "scan-gram.h"
#include "scan-code.h"
#include "strverscmp.h" #include "strverscmp.h"
#define YYLLOC_DEFAULT(Current, Rhs, N) (Current) = lloc_default (Rhs, N) #define YYLLOC_DEFAULT(Current, Rhs, N) (Current) = lloc_default (Rhs, N)
@@ -84,9 +86,8 @@ static int current_prec = 0;
{ {
/* Bison's grammar can initial empty locations, hence a default /* Bison's grammar can initial empty locations, hence a default
location is needed. */ location is needed. */
@$.start.file = @$.end.file = current_file; boundary_set (&@$.start, current_file, 1, 0);
@$.start.line = @$.end.line = 1; boundary_set (&@$.end, current_file, 1, 0);
@$.start.column = @$.end.column = 0;
} }
/* Only NUMBERS have a value. */ /* Only NUMBERS have a value. */
@@ -109,8 +110,8 @@ static int current_prec = 0;
%token PERCENT_NTERM "%nterm" %token PERCENT_NTERM "%nterm"
%token PERCENT_TYPE "%type" %token PERCENT_TYPE "%type"
%token PERCENT_DESTRUCTOR "%destructor {...}" %token PERCENT_DESTRUCTOR "%destructor"
%token PERCENT_PRINTER "%printer {...}" %token PERCENT_PRINTER "%printer"
%token PERCENT_UNION "%union {...}" %token PERCENT_UNION "%union {...}"
@@ -137,8 +138,8 @@ static int current_prec = 0;
PERCENT_EXPECT_RR "%expect-rr" PERCENT_EXPECT_RR "%expect-rr"
PERCENT_FILE_PREFIX "%file-prefix" PERCENT_FILE_PREFIX "%file-prefix"
PERCENT_GLR_PARSER "%glr-parser" PERCENT_GLR_PARSER "%glr-parser"
PERCENT_INITIAL_ACTION "%initial-action {...}" PERCENT_INITIAL_ACTION "%initial-action"
PERCENT_LEX_PARAM "%lex-param {...}" PERCENT_LEX_PARAM "%lex-param"
PERCENT_LOCATIONS "%locations" PERCENT_LOCATIONS "%locations"
PERCENT_NAME_PREFIX "%name-prefix" PERCENT_NAME_PREFIX "%name-prefix"
PERCENT_NO_DEFAULT_PREC "%no-default-prec" PERCENT_NO_DEFAULT_PREC "%no-default-prec"
@@ -146,7 +147,7 @@ static int current_prec = 0;
PERCENT_NONDETERMINISTIC_PARSER PERCENT_NONDETERMINISTIC_PARSER
"%nondeterministic-parser" "%nondeterministic-parser"
PERCENT_OUTPUT "%output" PERCENT_OUTPUT "%output"
PERCENT_PARSE_PARAM "%parse-param {...}" PERCENT_PARSE_PARAM "%parse-param"
PERCENT_PURE_PARSER "%pure-parser" PERCENT_PURE_PARSER "%pure-parser"
PERCENT_REQUIRE "%require" PERCENT_REQUIRE "%require"
PERCENT_SKELETON "%skeleton" PERCENT_SKELETON "%skeleton"
@@ -167,23 +168,14 @@ static int current_prec = 0;
%token EPILOGUE "epilogue" %token EPILOGUE "epilogue"
%token BRACED_CODE "{...}" %token BRACED_CODE "{...}"
%type <chars> STRING string_content %type <chars> STRING string_content
"%destructor {...}" "{...}"
"%initial-action {...}"
"%lex-param {...}"
"%parse-param {...}"
"%printer {...}"
"%union {...}" "%union {...}"
PROLOGUE EPILOGUE PROLOGUE EPILOGUE
%printer { fprintf (stderr, "\"%s\"", $$); } %printer { fprintf (stderr, "\"%s\"", $$); }
STRING string_content STRING string_content
%printer { fprintf (stderr, "{\n%s\n}", $$); } %printer { fprintf (stderr, "{\n%s\n}", $$); }
"%destructor {...}" "{...}"
"%initial-action {...}"
"%lex-param {...}"
"%parse-param {...}"
"%printer {...}"
"%union {...}" "%union {...}"
PROLOGUE EPILOGUE PROLOGUE EPILOGUE
%type <uniqstr> TYPE %type <uniqstr> TYPE
@@ -214,7 +206,8 @@ declarations:
declaration: declaration:
grammar_declaration grammar_declaration
| PROLOGUE { prologue_augment ($1, @1); } | PROLOGUE { prologue_augment (translate_code ($1, @1),
@1); }
| "%debug" { debug_flag = true; } | "%debug" { debug_flag = true; }
| "%define" string_content | "%define" string_content
{ {
@@ -232,17 +225,17 @@ declaration:
nondeterministic_parser = true; nondeterministic_parser = true;
glr_parser = true; glr_parser = true;
} }
| "%initial-action {...}" | "%initial-action" "{...}"
{ {
muscle_code_grow ("initial_action", $1, @1); muscle_code_grow ("initial_action", translate_symbol_action ($2, @2), @2);
} }
| "%lex-param {...}" { add_param ("lex_param", $1, @1); } | "%lex-param" "{...}" { add_param ("lex_param", $2, @2); }
| "%locations" { locations_flag = true; } | "%locations" { locations_flag = true; }
| "%name-prefix" "=" string_content { spec_name_prefix = $3; } | "%name-prefix" "=" string_content { spec_name_prefix = $3; }
| "%no-lines" { no_lines_flag = true; } | "%no-lines" { no_lines_flag = true; }
| "%nondeterministic-parser" { nondeterministic_parser = true; } | "%nondeterministic-parser" { nondeterministic_parser = true; }
| "%output" "=" string_content { spec_outfile = $3; } | "%output" "=" string_content { spec_outfile = $3; }
| "%parse-param {...}" { add_param ("parse_param", $1, @1); } | "%parse-param" "{...}" { add_param ("parse_param", $2, @2); }
| "%pure-parser" { pure_parser = true; } | "%pure-parser" { pure_parser = true; }
| "%require" string_content { version_check (&@2, $2); } | "%require" string_content { version_check (&@2, $2); }
| "%skeleton" string_content { skeleton = $2; } | "%skeleton" string_content { skeleton = $2; }
@@ -275,19 +268,21 @@ grammar_declaration:
typed = true; typed = true;
muscle_code_grow ("stype", body, @1); muscle_code_grow ("stype", body, @1);
} }
| "%destructor {...}" symbols.1 | "%destructor" "{...}" symbols.1
{ {
symbol_list *list; symbol_list *list;
for (list = $2; list; list = list->next) const char *action = translate_symbol_action ($2, @2);
symbol_destructor_set (list->sym, $1, @1); for (list = $3; list; list = list->next)
symbol_list_free ($2); symbol_destructor_set (list->sym, action, @2);
symbol_list_free ($3);
} }
| "%printer {...}" symbols.1 | "%printer" "{...}" symbols.1
{ {
symbol_list *list; symbol_list *list;
for (list = $2; list; list = list->next) const char *action = translate_symbol_action ($2, @2);
symbol_printer_set (list->sym, $1, @1); for (list = $3; list; list = list->next)
symbol_list_free ($2); symbol_printer_set (list->sym, action, @2);
symbol_list_free ($3);
} }
| "%default-prec" | "%default-prec"
{ {
@@ -346,7 +341,6 @@ type.opt:
; ;
/* One or more nonterminals to be %typed. */ /* One or more nonterminals to be %typed. */
symbols.1: symbols.1:
symbol { $$ = symbol_list_new ($1, @1); } symbol { $$ = symbol_list_new ($1, @1); }
| symbols.1 symbol { $$ = symbol_list_prepend ($1, $2, @2); } | symbols.1 symbol { $$ = symbol_list_prepend ($1, $2, @2); }
@@ -426,7 +420,9 @@ rhs:
{ grammar_current_rule_begin (current_lhs, current_lhs_location); } { grammar_current_rule_begin (current_lhs, current_lhs_location); }
| rhs symbol | rhs symbol
{ grammar_current_rule_symbol_append ($2, @2); } { grammar_current_rule_symbol_append ($2, @2); }
| rhs action | rhs "{...}"
{ grammar_current_rule_action_append (gram_last_string,
gram_last_braced_code_loc); }
| rhs "%prec" symbol | rhs "%prec" symbol
{ grammar_current_rule_prec_set ($3, @3); } { grammar_current_rule_prec_set ($3, @3); }
| rhs "%dprec" INT | rhs "%dprec" INT
@@ -440,23 +436,6 @@ symbol:
| string_as_id { $$ = $1; } | string_as_id { $$ = $1; }
; ;
/* Handle the semantics of an action specially, with a mid-rule
action, so that grammar_current_rule_action_append is invoked
immediately after the braced code is read by the scanner.
This implementation relies on the LALR(1) parsing algorithm.
If grammar_current_rule_action_append were executed in a normal
action for this rule, then when the input grammar contains two
successive actions, the scanner would have to read both actions
before reducing this rule. That wouldn't work, since the scanner
relies on all preceding input actions being processed by
grammar_current_rule_action_append before it scans the next
action. */
action:
{ grammar_current_rule_action_append (last_string, last_braced_code_loc); }
BRACED_CODE
;
/* A string used as an ID: quote it. */ /* A string used as an ID: quote it. */
string_as_id: string_as_id:
STRING STRING
@@ -477,8 +456,8 @@ epilogue.opt:
/* Nothing. */ /* Nothing. */
| "%%" EPILOGUE | "%%" EPILOGUE
{ {
muscle_code_grow ("epilogue", $2, @2); muscle_code_grow ("epilogue", translate_code ($2, @2), @2);
scanner_last_string_free (); gram_scanner_last_string_free ();
} }
; ;
@@ -563,7 +542,7 @@ add_param (char const *type, char *decl, location loc)
free (name); free (name);
} }
scanner_last_string_free (); gram_scanner_last_string_free ();
} }
static void static void

View File

@@ -22,6 +22,7 @@
#include <config.h> #include <config.h>
#include "system.h" #include "system.h"
#include <assert.h>
#include <quotearg.h> #include <quotearg.h>
@@ -34,6 +35,8 @@
#include "reader.h" #include "reader.h"
#include "symlist.h" #include "symlist.h"
#include "symtab.h" #include "symtab.h"
#include "scan-gram.h"
#include "scan-code.h"
static void check_and_convert_grammar (void); static void check_and_convert_grammar (void);
@@ -77,6 +80,8 @@ prologue_augment (const char *prologue, location loc)
!typed ? &pre_prologue_obstack : &post_prologue_obstack; !typed ? &pre_prologue_obstack : &post_prologue_obstack;
obstack_fgrow1 (oout, "]b4_syncline(%d, [[", loc.start.line); obstack_fgrow1 (oout, "]b4_syncline(%d, [[", loc.start.line);
/* FIXME: Protection of M4 characters missing here. See
output.c:escaped_output. */
MUSCLE_OBSTACK_SGROW (oout, MUSCLE_OBSTACK_SGROW (oout,
quotearg_style (c_quoting_style, loc.start.file)); quotearg_style (c_quoting_style, loc.start.file));
obstack_sgrow (oout, "]])[\n"); obstack_sgrow (oout, "]])[\n");
@@ -398,9 +403,7 @@ grammar_current_rule_symbol_append (symbol *sym, location loc)
void void
grammar_current_rule_action_append (const char *action, location loc) grammar_current_rule_action_append (const char *action, location loc)
{ {
/* There's no need to invoke grammar_midrule_action here, since the current_rule->action = translate_rule_action (current_rule, action, loc);
scanner already did it if necessary. */
current_rule->action = action;
current_rule->action_location = loc; current_rule->action_location = loc;
} }
@@ -426,6 +429,7 @@ packgram (void)
while (p) while (p)
{ {
int rule_length = 0;
symbol *ruleprec = p->ruleprec; symbol *ruleprec = p->ruleprec;
rules[ruleno].user_number = ruleno; rules[ruleno].user_number = ruleno;
rules[ruleno].number = ruleno; rules[ruleno].number = ruleno;
@@ -440,18 +444,22 @@ packgram (void)
rules[ruleno].action = p->action; rules[ruleno].action = p->action;
rules[ruleno].action_location = p->action_location; rules[ruleno].action_location = p->action_location;
p = p->next; for (p = p->next; p && p->sym; p = p->next)
while (p && p->sym)
{ {
++rule_length;
/* Don't allow rule_length == INT_MAX, since that might
cause confusion with strtol if INT_MAX == LONG_MAX. */
if (rule_length == INT_MAX)
fatal_at (rules[ruleno].location, _("rule is too long"));
/* item_number = symbol_number. /* item_number = symbol_number.
But the former needs to contain more: negative rule numbers. */ But the former needs to contain more: negative rule numbers. */
ritem[itemno++] = symbol_number_as_item_number (p->sym->number); ritem[itemno++] = symbol_number_as_item_number (p->sym->number);
/* A rule gets by default the precedence and associativity /* A rule gets by default the precedence and associativity
of the last token in it. */ of its last token. */
if (p->sym->class == token_sym && default_prec) if (p->sym->class == token_sym && default_prec)
rules[ruleno].prec = p->sym; rules[ruleno].prec = p->sym;
if (p)
p = p->next;
} }
/* If this rule has a %prec, /* If this rule has a %prec,
@@ -461,8 +469,11 @@ packgram (void)
rules[ruleno].precsym = ruleprec; rules[ruleno].precsym = ruleprec;
rules[ruleno].prec = ruleprec; rules[ruleno].prec = ruleprec;
} }
/* An item ends by the rule number (negated). */
ritem[itemno++] = rule_number_as_item_number (ruleno); ritem[itemno++] = rule_number_as_item_number (ruleno);
assert (itemno < ITEM_NUMBER_MAX);
++ruleno; ++ruleno;
assert (ruleno < RULE_NUMBER_MAX);
if (p) if (p)
p = p->next; p = p->next;
@@ -511,7 +522,7 @@ reader (void)
gram__flex_debug = trace_flag & trace_scan; gram__flex_debug = trace_flag & trace_scan;
gram_debug = trace_flag & trace_parse; gram_debug = trace_flag & trace_parse;
scanner_initialize (); gram_scanner_initialize ();
gram_parse (); gram_parse ();
if (! complaint_issued) if (! complaint_issued)

View File

@@ -35,26 +35,6 @@ typedef struct merger_list
uniqstr type; uniqstr type;
} merger_list; } merger_list;
/* From the scanner. */
extern FILE *gram_in;
extern int gram__flex_debug;
extern boundary scanner_cursor;
extern char *last_string;
extern location last_braced_code_loc;
extern int max_left_semantic_context;
void scanner_initialize (void);
void scanner_free (void);
void scanner_last_string_free (void);
/* These are declared by the scanner, but not used. We put them here
to pacify "make syntax-check". */
extern FILE *gram_out;
extern int gram_lineno;
# define YY_DECL int gram_lex (YYSTYPE *val, location *loc)
YY_DECL;
/* From the parser. */ /* From the parser. */
extern int gram_debug; extern int gram_debug;
int gram_parse (void); int gram_parse (void);

866
src/scan-action.l Normal file
View File

@@ -0,0 +1,866 @@
/* Bison Grammar Scanner -*- C -*-
Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA
*/
%option debug nodefault nounput noyywrap never-interactive
%option prefix="gram_" outfile="lex.yy.c"
%{
#include "system.h"
#include <mbswidth.h>
#include <get-errno.h>
#include <quote.h>
#include "complain.h"
#include "files.h"
#include "getargs.h"
#include "gram.h"
#include "quotearg.h"
#include "reader.h"
#include "uniqstr.h"
#define YY_USER_INIT \
do \
{ \
scanner_cursor.file = current_file; \
scanner_cursor.line = 1; \
scanner_cursor.column = 1; \
code_start = scanner_cursor; \
} \
while (0)
/* Location of scanner cursor. */
boundary scanner_cursor;
static void adjust_location (location *, char const *, size_t);
#define YY_USER_ACTION adjust_location (loc, yytext, yyleng);
static size_t no_cr_read (FILE *, char *, size_t);
#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
/* Within well-formed rules, RULE_LENGTH is the number of values in
the current rule so far, which says where to find `$0' with respect
to the top of the stack. It is not the same as the rule->length in
the case of mid rule actions.
Outside of well-formed rules, RULE_LENGTH has an undefined value. */
int rule_length;
static void handle_dollar (int token_type, char *cp, location loc);
static void handle_at (int token_type, char *cp, location loc);
static void handle_syncline (char *args);
static unsigned long int scan_integer (char const *p, int base, location loc);
static int convert_ucn_to_byte (char const *hex_text);
static void unexpected_eof (boundary, char const *);
static void unexpected_newline (boundary, char const *);
%}
%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
%x SC_STRING SC_CHARACTER
%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
id {letter}({letter}|[0-9])*
directive %{letter}({letter}|[0-9]|-)*
int [0-9]+
/* POSIX says that a tag must be both an id and a C union member, but
historically almost any character is allowed in a tag. We disallow
NUL and newline, as this simplifies our implementation. */
tag [^\0\n>]+
/* Zero or more instances of backslash-newline. Following GCC, allow
white space between the backslash and the newline. */
splice (\\[ \f\t\v]*\n)*
%%
%{
/* Nesting level of the current code in braces. */
int braces_level IF_LINT (= 0);
/* Parent context state, when applicable. */
int context_state IF_LINT (= 0);
/* Token type to return, when applicable. */
int token_type IF_LINT (= 0);
/* Where containing code started, when applicable. Its initial
value is relevant only when yylex is invoked in the SC_EPILOGUE
start condition. */
boundary code_start = scanner_cursor;
/* Where containing comment or string or character literal started,
when applicable. */
boundary token_start IF_LINT (= scanner_cursor);
%}
/*-----------------------.
| Scanning white space. |
`-----------------------*/
<INITIAL>
{
/* Comments and white space. */
"," warn_at (*loc, _("stray `,' treated as white space"));
[ \f\n\t\v] |
"//".* ;
"/*" {
token_start = loc->start;
context_state = YY_START;
BEGIN SC_YACC_COMMENT;
}
/* #line directives are not documented, and may be withdrawn or
modified in future versions of Bison. */
^"#line "{int}" \"".*"\"\n" {
handle_syncline (yytext + sizeof "#line " - 1);
}
}
/*----------------------------.
| Scanning Bison directives. |
`----------------------------*/
<INITIAL>
{
/* Code in between braces. */
"{" {
STRING_GROW;
token_type = BRACED_CODE;
braces_level = 0;
code_start = loc->start;
BEGIN SC_BRACED_CODE;
}
}
/*------------------------------------------------------------.
| Scanning a C comment. The initial `/ *' is already eaten. |
`------------------------------------------------------------*/
<SC_COMMENT>
{
"*"{splice}"/" STRING_GROW; BEGIN context_state;
<<EOF>> unexpected_eof (token_start, "*/"); BEGIN context_state;
}
/*--------------------------------------------------------------.
| Scanning a line comment. The initial `//' is already eaten. |
`--------------------------------------------------------------*/
<SC_LINE_COMMENT>
{
"\n" STRING_GROW; BEGIN context_state;
{splice} STRING_GROW;
<<EOF>> BEGIN context_state;
}
/*------------------------------------------------.
| Scanning a Bison string, including its escapes. |
| The initial quote is already eaten. |
`------------------------------------------------*/
<SC_ESCAPED_STRING>
{
"\"" {
STRING_FINISH;
loc->start = token_start;
val->chars = last_string;
rule_length++;
BEGIN INITIAL;
return STRING;
}
\n unexpected_newline (token_start, "\""); BEGIN INITIAL;
<<EOF>> unexpected_eof (token_start, "\""); BEGIN INITIAL;
}
/*----------------------------------------------------------.
| Scanning a Bison character literal, decoding its escapes. |
| The initial quote is already eaten. |
`----------------------------------------------------------*/
<SC_ESCAPED_CHARACTER>
{
"'" {
unsigned char last_string_1;
STRING_GROW;
STRING_FINISH;
loc->start = token_start;
val->symbol = symbol_get (quotearg_style (escape_quoting_style,
last_string),
*loc);
symbol_class_set (val->symbol, token_sym, *loc);
last_string_1 = last_string[1];
symbol_user_token_number_set (val->symbol, last_string_1, *loc);
STRING_FREE;
rule_length++;
BEGIN INITIAL;
return ID;
}
\n unexpected_newline (token_start, "'"); BEGIN INITIAL;
<<EOF>> unexpected_eof (token_start, "'"); BEGIN INITIAL;
}
<SC_ESCAPED_CHARACTER,SC_ESCAPED_STRING>
{
\0 complain_at (*loc, _("invalid null character"));
}
/*----------------------------.
| Decode escaped characters. |
`----------------------------*/
<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>
{
\\[0-7]{1,3} {
unsigned long int c = strtoul (yytext + 1, 0, 8);
if (UCHAR_MAX < c)
complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
else if (! c)
complain_at (*loc, _("invalid null character: %s"), quote (yytext));
else
obstack_1grow (&obstack_for_string, c);
}
\\x[0-9abcdefABCDEF]+ {
unsigned long int c;
set_errno (0);
c = strtoul (yytext + 2, 0, 16);
if (UCHAR_MAX < c || get_errno ())
complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
else if (! c)
complain_at (*loc, _("invalid null character: %s"), quote (yytext));
else
obstack_1grow (&obstack_for_string, c);
}
\\a obstack_1grow (&obstack_for_string, '\a');
\\b obstack_1grow (&obstack_for_string, '\b');
\\f obstack_1grow (&obstack_for_string, '\f');
\\n obstack_1grow (&obstack_for_string, '\n');
\\r obstack_1grow (&obstack_for_string, '\r');
\\t obstack_1grow (&obstack_for_string, '\t');
\\v obstack_1grow (&obstack_for_string, '\v');
/* \\[\"\'?\\] would be shorter, but it confuses xgettext. */
\\("\""|"'"|"?"|"\\") obstack_1grow (&obstack_for_string, yytext[1]);
\\(u|U[0-9abcdefABCDEF]{4})[0-9abcdefABCDEF]{4} {
int c = convert_ucn_to_byte (yytext);
if (c < 0)
complain_at (*loc, _("invalid escape sequence: %s"), quote (yytext));
else if (! c)
complain_at (*loc, _("invalid null character: %s"), quote (yytext));
else
obstack_1grow (&obstack_for_string, c);
}
\\(.|\n) {
complain_at (*loc, _("unrecognized escape sequence: %s"), quote (yytext));
STRING_GROW;
}
}
/*--------------------------------------------.
| Scanning user-code characters and strings. |
`--------------------------------------------*/
<SC_CHARACTER,SC_STRING>
{
{splice}|\\{splice}[^\n$@\[\]] STRING_GROW;
}
<SC_CHARACTER>
{
"'" STRING_GROW; BEGIN context_state;
\n unexpected_newline (token_start, "'"); BEGIN context_state;
<<EOF>> unexpected_eof (token_start, "'"); BEGIN context_state;
}
<SC_STRING>
{
"\"" STRING_GROW; BEGIN context_state;
\n unexpected_newline (token_start, "\""); BEGIN context_state;
<<EOF>> unexpected_eof (token_start, "\""); BEGIN context_state;
}
/*---------------------------------------------------.
| Strings, comments etc. can be found in user code. |
`---------------------------------------------------*/
<INITIAL>
{
"'" {
STRING_GROW;
context_state = YY_START;
token_start = loc->start;
BEGIN SC_CHARACTER;
}
"\"" {
STRING_GROW;
context_state = YY_START;
token_start = loc->start;
BEGIN SC_STRING;
}
"/"{splice}"*" {
STRING_GROW;
context_state = YY_START;
token_start = loc->start;
BEGIN SC_COMMENT;
}
"/"{splice}"/" {
STRING_GROW;
context_state = YY_START;
BEGIN SC_LINE_COMMENT;
}
}
/*---------------------------------------------------------------.
| Scanning some code in braces (%union and actions). The initial |
| "{" is already eaten. |
`---------------------------------------------------------------*/
<INITIAL>
{
"{"|"<"{splice}"%" STRING_GROW; braces_level++;
"%"{splice}">" STRING_GROW; braces_level--;
"}" {
bool outer_brace = --braces_level < 0;
/* As an undocumented Bison extension, append `;' before the last
brace in braced code, so that the user code can omit trailing
`;'. But do not append `;' if emulating Yacc, since Yacc does
not append one.
FIXME: Bison should warn if a semicolon seems to be necessary
here, and should omit the semicolon if it seems unnecessary
(e.g., after ';', '{', or '}', each followed by comments or
white space). Such a warning shouldn't depend on --yacc; it
should depend on a new --pedantic option, which would cause
Bison to warn if it detects an extension to POSIX. --pedantic
should also diagnose other Bison extensions like %yacc.
Perhaps there should also be a GCC-style --pedantic-errors
option, so that such warnings are diagnosed as errors. */
if (outer_brace && token_type == BRACED_CODE && ! yacc_flag)
obstack_1grow (&obstack_for_string, ';');
obstack_1grow (&obstack_for_string, '}');
if (outer_brace)
{
STRING_FINISH;
rule_length++;
loc->start = code_start;
val->chars = last_string;
BEGIN INITIAL;
return token_type;
}
}
/* Tokenize `<<%' correctly (as `<<' `%') rather than incorrrectly
(as `<' `<%'). */
"<"{splice}"<" STRING_GROW;
"$"("<"{tag}">")?(-?[0-9]+|"$") handle_dollar (token_type, yytext, *loc);
"@"(-?[0-9]+|"$") handle_at (token_type, yytext, *loc);
<<EOF>> unexpected_eof (code_start, "}"); BEGIN INITIAL;
}
/*--------------------------------------------------------------.
| Scanning some prologue: from "%{" (already scanned) to "%}". |
`--------------------------------------------------------------*/
<SC_PROLOGUE>
{
"%}" {
STRING_FINISH;
loc->start = code_start;
val->chars = last_string;
BEGIN INITIAL;
return PROLOGUE;
}
<<EOF>> unexpected_eof (code_start, "%}"); BEGIN INITIAL;
}
/*---------------------------------------------------------------.
| Scanning the epilogue (everything after the second "%%", which |
| has already been eaten). |
`---------------------------------------------------------------*/
<SC_EPILOGUE>
{
<<EOF>> {
STRING_FINISH;
loc->start = code_start;
val->chars = last_string;
BEGIN INITIAL;
return EPILOGUE;
}
}
/*-----------------------------------------.
| Escape M4 quoting characters in C code. |
`-----------------------------------------*/
<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
{
\$ obstack_sgrow (&obstack_for_string, "$][");
\@ obstack_sgrow (&obstack_for_string, "@@");
\[ obstack_sgrow (&obstack_for_string, "@{");
\] obstack_sgrow (&obstack_for_string, "@}");
}
/*-----------------------------------------------------.
| By default, grow the string obstack with the input. |
`-----------------------------------------------------*/
<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>. |
<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>\n STRING_GROW;
%%
/* Keeps track of the maximum number of semantic values to the left of
a handle (those referenced by $0, $-1, etc.) are required by the
semantic actions of this grammar. */
int max_left_semantic_context = 0;
/* Set *LOC and adjust scanner cursor to account for token TOKEN of
size SIZE. */
static void
adjust_location (location *loc, char const *token, size_t size)
{
int line = scanner_cursor.line;
int column = scanner_cursor.column;
char const *p0 = token;
char const *p = token;
char const *lim = token + size;
loc->start = scanner_cursor;
for (p = token; p < lim; p++)
switch (*p)
{
case '\n':
line++;
column = 1;
p0 = p + 1;
break;
case '\t':
column += mbsnwidth (p0, p - p0, 0);
column += 8 - ((column - 1) & 7);
p0 = p + 1;
break;
}
scanner_cursor.line = line;
scanner_cursor.column = column + mbsnwidth (p0, p - p0, 0);
loc->end = scanner_cursor;
}
/* Read bytes from FP into buffer BUF of size SIZE. Return the
number of bytes read. Remove '\r' from input, treating \r\n
and isolated \r as \n. */
static size_t
no_cr_read (FILE *fp, char *buf, size_t size)
{
size_t bytes_read = fread (buf, 1, size, fp);
if (bytes_read)
{
char *w = memchr (buf, '\r', bytes_read);
if (w)
{
char const *r = ++w;
char const *lim = buf + bytes_read;
for (;;)
{
/* Found an '\r'. Treat it like '\n', but ignore any
'\n' that immediately follows. */
w[-1] = '\n';
if (r == lim)
{
int ch = getc (fp);
if (ch != '\n' && ungetc (ch, fp) != ch)
break;
}
else if (*r == '\n')
r++;
/* Copy until the next '\r'. */
do
{
if (r == lim)
return w - buf;
}
while ((*w++ = *r++) != '\r');
}
return w - buf;
}
}
return bytes_read;
}
/*------------------------------------------------------------------.
| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
| |
| Possible inputs: $[<TYPENAME>]($|integer) |
| |
| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
`------------------------------------------------------------------*/
static inline bool
handle_action_dollar (char *text, location loc)
{
const char *type_name = NULL;
char *cp = text + 1;
if (! current_rule)
return false;
/* Get the type name if explicit. */
if (*cp == '<')
{
type_name = ++cp;
while (*cp != '>')
++cp;
*cp = '\0';
++cp;
}
if (*cp == '$')
{
if (!type_name)
type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
if (!type_name && typed)
complain_at (loc, _("$$ of `%s' has no declared type"),
current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow1 (&obstack_for_string,
"]b4_lhs_value([%s])[", type_name);
}
else
{
long int num;
set_errno (0);
num = strtol (cp, 0, 10);
if (INT_MIN <= num && num <= rule_length && ! get_errno ())
{
int n = num;
if (1-n > max_left_semantic_context)
max_left_semantic_context = 1-n;
if (!type_name && n > 0)
type_name = symbol_list_n_type_name_get (current_rule, loc, n);
if (!type_name && typed)
complain_at (loc, _("$%d of `%s' has no declared type"),
n, current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow3 (&obstack_for_string,
"]b4_rhs_value(%d, %d, [%s])[",
rule_length, n, type_name);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
return true;
}
/*----------------------------------------------------------------.
| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
| (are we in an action?). |
`----------------------------------------------------------------*/
static void
handle_dollar (int token_type, char *text, location loc)
{
switch (token_type)
{
case BRACED_CODE:
if (handle_action_dollar (text, loc))
return;
break;
case PERCENT_DESTRUCTOR:
case PERCENT_INITIAL_ACTION:
case PERCENT_PRINTER:
if (text[1] == '$')
{
obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
return;
}
break;
default:
break;
}
complain_at (loc, _("invalid value: %s"), quote (text));
}
/*------------------------------------------------------.
| TEXT is a location token (i.e., a `@...'). Output to |
| OBSTACK_FOR_STRING a reference to this location. |
`------------------------------------------------------*/
static inline bool
handle_action_at (char *text, location loc)
{
char *cp = text + 1;
locations_flag = true;
if (! current_rule)
return false;
if (*cp == '$')
obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
else
{
long int num;
set_errno (0);
num = strtol (cp, 0, 10);
if (INT_MIN <= num && num <= rule_length && ! get_errno ())
{
int n = num;
obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
rule_length, n);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
return true;
}
/*----------------------------------------------------------------.
| Map `@?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
| (are we in an action?). |
`----------------------------------------------------------------*/
static void
handle_at (int token_type, char *text, location loc)
{
switch (token_type)
{
case BRACED_CODE:
handle_action_at (text, loc);
return;
case PERCENT_INITIAL_ACTION:
case PERCENT_DESTRUCTOR:
case PERCENT_PRINTER:
if (text[1] == '$')
{
obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
return;
}
break;
default:
break;
}
complain_at (loc, _("invalid value: %s"), quote (text));
}
/*------------------------------------------------------.
| Scan NUMBER for a base-BASE integer at location LOC. |
`------------------------------------------------------*/
static unsigned long int
scan_integer (char const *number, int base, location loc)
{
unsigned long int num;
set_errno (0);
num = strtoul (number, 0, base);
if (INT_MAX < num || get_errno ())
{
complain_at (loc, _("integer out of range: %s"), quote (number));
num = INT_MAX;
}
return num;
}
/*------------------------------------------------------------------.
| Convert universal character name UCN to a single-byte character, |
| and return that character. Return -1 if UCN does not correspond |
| to a single-byte character. |
`------------------------------------------------------------------*/
static int
convert_ucn_to_byte (char const *ucn)
{
unsigned long int code = strtoul (ucn + 2, 0, 16);
/* FIXME: Currently we assume Unicode-compatible unibyte characters
on ASCII hosts (i.e., Latin-1 on hosts with 8-bit bytes). On
non-ASCII hosts we support only the portable C character set.
These limitations should be removed once we add support for
multibyte characters. */
if (UCHAR_MAX < code)
return -1;
#if ! ('$' == 0x24 && '@' == 0x40 && '`' == 0x60 && '~' == 0x7e)
{
/* A non-ASCII host. Use CODE to index into a table of the C
basic execution character set, which is guaranteed to exist on
all Standard C platforms. This table also includes '$', '@',
and '`', which are not in the basic execution character set but
which are unibyte characters on all the platforms that we know
about. */
static signed char const table[] =
{
'\0', -1, -1, -1, -1, -1, -1, '\a',
'\b', '\t', '\n', '\v', '\f', '\r', -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1,
' ', '!', '"', '#', '$', '%', '&', '\'',
'(', ')', '*', '+', ',', '-', '.', '/',
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', ':', ';', '<', '=', '>', '?',
'@', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
'X', 'Y', 'Z', '[', '\\', ']', '^', '_',
'`', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z', '{', '|', '}', '~'
};
code = code < sizeof table ? table[code] : -1;
}
#endif
return code;
}
/*----------------------------------------------------------------.
| Handle `#line INT "FILE"'. ARGS has already skipped `#line '. |
`----------------------------------------------------------------*/
static void
handle_syncline (char *args)
{
int lineno = strtol (args, &args, 10);
const char *file = NULL;
file = strchr (args, '"') + 1;
*strchr (file, '"') = 0;
scanner_cursor.file = current_file = uniqstr_new (file);
scanner_cursor.line = lineno;
scanner_cursor.column = 1;
}
/*----------------------------------------------------------------.
| For a token or comment starting at START, report message MSGID, |
| which should say that an end marker was found before |
| the expected TOKEN_END. |
`----------------------------------------------------------------*/
static void
unexpected_end (boundary start, char const *msgid, char const *token_end)
{
location loc;
loc.start = start;
loc.end = scanner_cursor;
complain_at (loc, _(msgid), token_end);
}
/*------------------------------------------------------------------------.
| Report an unexpected EOF in a token or comment starting at START. |
| An end of file was encountered and the expected TOKEN_END was missing. |
`------------------------------------------------------------------------*/
static void
unexpected_eof (boundary start, char const *token_end)
{
unexpected_end (start, N_("missing `%s' at end of file"), token_end);
}
/*----------------------------------------.
| Likewise, but for unexpected newlines. |
`----------------------------------------*/
static void
unexpected_newline (boundary start, char const *token_end)
{
unexpected_end (start, N_("missing `%s' at end of line"), token_end);
}
/*-------------------------.
| Initialize the scanner. |
`-------------------------*/
void
scanner_initialize (void)
{
obstack_init (&obstack_for_string);
}
/*-----------------------------------------------.
| Free all the memory allocated to the scanner. |
`-----------------------------------------------*/
void
scanner_free (void)
{
obstack_free (&obstack_for_string, 0);
/* Reclaim Flex's buffers. */
yy_delete_buffer (YY_CURRENT_BUFFER);
}

2
src/scan-code-c.c Normal file
View File

@@ -0,0 +1,2 @@
#include <config.h>
#include "scan-code.c"

47
src/scan-code.h Normal file
View File

@@ -0,0 +1,47 @@
/* Bison Action Scanner
Copyright (C) 2006 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA
*/
#ifndef SCAN_CODE_H_
# define SCAN_CODE_H_
# include "location.h"
# include "symlist.h"
/* Keeps track of the maximum number of semantic values to the left of
a handle (those referenced by $0, $-1, etc.) are required by the
semantic actions of this grammar. */
extern int max_left_semantic_context;
void code_scanner_free (void);
/* The action A contains $$, $1 etc. referring to the values
of the rule R. */
const char *translate_rule_action (symbol_list *r, const char *a, location l);
/* The action A refers to $$ and @$ only, referring to a symbol. */
const char *translate_symbol_action (const char *a, location l);
/* The action contains no special escapes, just protect M4 special
symbols. */
const char *translate_code (const char *a, location l);
#endif /* !SCAN_CODE_H_ */

358
src/scan-code.l Normal file
View File

@@ -0,0 +1,358 @@
/* Bison Action Scanner -*- C -*-
Copyright (C) 2006 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA
*/
%option debug nodefault nounput noyywrap never-interactive
%option prefix="code_" outfile="lex.yy.c"
%{
/* Work around a bug in flex 2.5.31. See Debian bug 333231
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
#undef code_wrap
#define code_wrap() 1
#define FLEX_PREFIX(Id) code_ ## Id
#include "flex-scanner.h"
#include "reader.h"
#include "getargs.h"
#include <assert.h>
#include <get-errno.h>
#include <quote.h>
#include "scan-code.h"
/* The current calling start condition: SC_RULE_ACTION or
SC_SYMBOL_ACTION. */
# define YY_DECL const char *code_lex (int sc_context)
YY_DECL;
#define YY_USER_ACTION location_compute (loc, &loc->end, yytext, yyleng);
static void handle_action_dollar (char *cp, location loc);
static void handle_action_at (char *cp, location loc);
static location the_location;
static location *loc = &the_location;
/* The rule being processed. */
symbol_list *current_rule;
%}
/* C and C++ comments in code. */
%x SC_COMMENT SC_LINE_COMMENT
/* Strings and characters in code. */
%x SC_STRING SC_CHARACTER
/* Whether in a rule or symbol action. Specifies the translation
of $ and @. */
%x SC_RULE_ACTION SC_SYMBOL_ACTION
/* POSIX says that a tag must be both an id and a C union member, but
historically almost any character is allowed in a tag. We disallow
NUL and newline, as this simplifies our implementation. */
tag [^\0\n>]+
/* Zero or more instances of backslash-newline. Following GCC, allow
white space between the backslash and the newline. */
splice (\\[ \f\t\v]*\n)*
%%
%{
/* This scanner is special: it is invoked only once, henceforth
is expected to return only once. This initialization is
therefore done once per action to translate. */
assert (sc_context == SC_SYMBOL_ACTION
|| sc_context == SC_RULE_ACTION
|| sc_context == INITIAL);
BEGIN sc_context;
%}
/*------------------------------------------------------------.
| Scanning a C comment. The initial `/ *' is already eaten. |
`------------------------------------------------------------*/
<SC_COMMENT>
{
"*"{splice}"/" STRING_GROW; BEGIN sc_context;
}
/*--------------------------------------------------------------.
| Scanning a line comment. The initial `//' is already eaten. |
`--------------------------------------------------------------*/
<SC_LINE_COMMENT>
{
"\n" STRING_GROW; BEGIN sc_context;
{splice} STRING_GROW;
}
/*--------------------------------------------.
| Scanning user-code characters and strings. |
`--------------------------------------------*/
<SC_CHARACTER,SC_STRING>
{
{splice}|\\{splice}. STRING_GROW;
}
<SC_CHARACTER>
{
"'" STRING_GROW; BEGIN sc_context;
}
<SC_STRING>
{
"\"" STRING_GROW; BEGIN sc_context;
}
<SC_RULE_ACTION,SC_SYMBOL_ACTION>{
"'" {
STRING_GROW;
BEGIN SC_CHARACTER;
}
"\"" {
STRING_GROW;
BEGIN SC_STRING;
}
"/"{splice}"*" {
STRING_GROW;
BEGIN SC_COMMENT;
}
"/"{splice}"/" {
STRING_GROW;
BEGIN SC_LINE_COMMENT;
}
}
<SC_RULE_ACTION>
{
"$"("<"{tag}">")?(-?[0-9]+|"$") handle_action_dollar (yytext, *loc);
"@"(-?[0-9]+|"$") handle_action_at (yytext, *loc);
"$" {
warn_at (*loc, _("stray `$'"));
obstack_sgrow (&obstack_for_string, "$][");
}
"@" {
warn_at (*loc, _("stray `@'"));
obstack_sgrow (&obstack_for_string, "@@");
}
}
<SC_SYMBOL_ACTION>
{
"$$" obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
"@$" obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
}
/*-----------------------------------------.
| Escape M4 quoting characters in C code. |
`-----------------------------------------*/
<*>
{
\$ obstack_sgrow (&obstack_for_string, "$][");
\@ obstack_sgrow (&obstack_for_string, "@@");
\[ obstack_sgrow (&obstack_for_string, "@{");
\] obstack_sgrow (&obstack_for_string, "@}");
}
/*-----------------------------------------------------.
| By default, grow the string obstack with the input. |
`-----------------------------------------------------*/
<*>.|\n STRING_GROW;
/* End of processing. */
<*><<EOF>> {
obstack_1grow (&obstack_for_string, '\0');
return obstack_finish (&obstack_for_string);
}
%%
/* Keeps track of the maximum number of semantic values to the left of
a handle (those referenced by $0, $-1, etc.) are required by the
semantic actions of this grammar. */
int max_left_semantic_context = 0;
/*------------------------------------------------------------------.
| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
| |
| Possible inputs: $[<TYPENAME>]($|integer) |
| |
| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
`------------------------------------------------------------------*/
static void
handle_action_dollar (char *text, location loc)
{
const char *type_name = NULL;
char *cp = text + 1;
int rule_length = symbol_list_length (current_rule->next);
/* Get the type name if explicit. */
if (*cp == '<')
{
type_name = ++cp;
while (*cp != '>')
++cp;
*cp = '\0';
++cp;
}
if (*cp == '$')
{
if (!type_name)
type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
if (!type_name && typed)
complain_at (loc, _("$$ of `%s' has no declared type"),
current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow1 (&obstack_for_string,
"]b4_lhs_value([%s])[", type_name);
current_rule->used = true;
}
else
{
long int num;
set_errno (0);
num = strtol (cp, 0, 10);
if (INT_MIN <= num && num <= rule_length && ! get_errno ())
{
int n = num;
if (1-n > max_left_semantic_context)
max_left_semantic_context = 1-n;
if (!type_name && n > 0)
type_name = symbol_list_n_type_name_get (current_rule, loc, n);
if (!type_name && typed)
complain_at (loc, _("$%d of `%s' has no declared type"),
n, current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow3 (&obstack_for_string,
"]b4_rhs_value(%d, %d, [%s])[",
rule_length, n, type_name);
symbol_list_n_used_set (current_rule, n, true);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
}
/*------------------------------------------------------.
| TEXT is a location token (i.e., a `@...'). Output to |
| OBSTACK_FOR_STRING a reference to this location. |
`------------------------------------------------------*/
static void
handle_action_at (char *text, location loc)
{
char *cp = text + 1;
int rule_length = symbol_list_length (current_rule->next);
locations_flag = true;
if (*cp == '$')
obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
else
{
long int num;
set_errno (0);
num = strtol (cp, 0, 10);
if (INT_MIN <= num && num <= rule_length && ! get_errno ())
{
int n = num;
obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
rule_length, n);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
}
/*-------------------------.
| Initialize the scanner. |
`-------------------------*/
/* Translate the dollars and ats in \a a, whose location is l.
Depending on the \a sc_context (SC_RULE_ACTION, SC_SYMBOL_ACTION,
INITIAL), the processing is different. */
static const char *
translate_action (int sc_context, const char *a, location l)
{
const char *res;
static bool initialized = false;
if (!initialized)
{
obstack_init (&obstack_for_string);
/* The initial buffer, never used. */
yy_delete_buffer (YY_CURRENT_BUFFER);
yy_flex_debug = 0;
initialized = true;
}
loc->start = loc->end = l.start;
yy_switch_to_buffer (yy_scan_string (a));
res = code_lex (sc_context);
yy_delete_buffer (YY_CURRENT_BUFFER);
return res;
}
const char *
translate_rule_action (symbol_list *r, const char *a, location l)
{
current_rule = r;
return translate_action (SC_RULE_ACTION, a, l);
}
const char *
translate_symbol_action (const char *a, location l)
{
return translate_action (SC_SYMBOL_ACTION, a, l);
}
const char *
translate_code (const char *a, location l)
{
return translate_action (INITIAL, a, l);
}
/*-----------------------------------------------.
| Free all the memory allocated to the scanner. |
`-----------------------------------------------*/
void
code_scanner_free (void)
{
obstack_free (&obstack_for_string, 0);
/* Reclaim Flex's buffers. */
yy_delete_buffer (YY_CURRENT_BUFFER);
}

44
src/scan-gram.h Normal file
View File

@@ -0,0 +1,44 @@
/* Bison Grammar Scanner
Copyright (C) 2006 Free Software Foundation, Inc.
This file is part of Bison, the GNU Compiler Compiler.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA
*/
#ifndef SCAN_GRAM_H_
# define SCAN_GRAM_H_
/* From the scanner. */
extern FILE *gram_in;
extern int gram__flex_debug;
extern boundary gram_scanner_cursor;
extern char *gram_last_string;
extern location gram_last_braced_code_loc;
void gram_scanner_initialize (void);
void gram_scanner_free (void);
void gram_scanner_last_string_free (void);
/* These are declared by the scanner, but not used. We put them here
to pacify "make syntax-check". */
extern FILE *gram_out;
extern int gram_lineno;
# define GRAM_LEX_DECL int gram_lex (YYSTYPE *val, location *loc)
GRAM_LEX_DECL;
#endif /* !SCAN_GRAM_H_ */

View File

@@ -29,112 +29,48 @@
#undef gram_wrap #undef gram_wrap
#define gram_wrap() 1 #define gram_wrap() 1
#include "system.h" #define FLEX_PREFIX(Id) gram_ ## Id
#include "flex-scanner.h"
#include <mbswidth.h>
#include <quote.h>
#include "complain.h" #include "complain.h"
#include "files.h" #include "files.h"
#include "getargs.h" #include "getargs.h" /* yacc_flag */
#include "gram.h" #include "gram.h"
#include "quotearg.h" #include "quotearg.h"
#include "reader.h" #include "reader.h"
#include "uniqstr.h" #include "uniqstr.h"
#define YY_USER_INIT \ #include <mbswidth.h>
do \ #include <quote.h>
{ \
scanner_cursor.file = current_file; \
scanner_cursor.line = 1; \
scanner_cursor.column = 1; \
code_start = scanner_cursor; \
} \
while (0)
/* Pacify "gcc -Wmissing-prototypes" when flex 2.5.31 is used. */ #include "scan-gram.h"
int gram_get_lineno (void);
FILE *gram_get_in (void); #define YY_DECL GRAM_LEX_DECL
FILE *gram_get_out (void);
int gram_get_leng (void); #define YY_USER_INIT \
char *gram_get_text (void); code_start = scanner_cursor = loc->start; \
void gram_set_lineno (int);
void gram_set_in (FILE *);
void gram_set_out (FILE *);
int gram_get_debug (void);
void gram_set_debug (int);
int gram_lex_destroy (void);
/* Location of scanner cursor. */ /* Location of scanner cursor. */
boundary scanner_cursor; boundary scanner_cursor;
static void adjust_location (location *, char const *, size_t); #define YY_USER_ACTION location_compute (loc, &scanner_cursor, yytext, yyleng);
#define YY_USER_ACTION adjust_location (loc, yytext, yyleng);
static size_t no_cr_read (FILE *, char *, size_t); static size_t no_cr_read (FILE *, char *, size_t);
#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size)) #define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
/* OBSTACK_FOR_STRING -- Used to store all the characters that we need to
keep (to construct ID, STRINGS etc.). Use the following macros to
use it.
Use STRING_GROW to append what has just been matched, and
STRING_FINISH to end the string (it puts the ending 0).
STRING_FINISH also stores this string in LAST_STRING, which can be
used, and which is used by STRING_FREE to free the last string. */
static struct obstack obstack_for_string;
/* A string representing the most recently saved token. */ /* A string representing the most recently saved token. */
char *last_string; char *last_string;
/* The location of the most recently saved token, if it was a
BRACED_CODE token; otherwise, this has an unspecified value. */
location last_braced_code_loc;
#define STRING_GROW \
obstack_grow (&obstack_for_string, yytext, yyleng)
#define STRING_FINISH \
do { \
obstack_1grow (&obstack_for_string, '\0'); \
last_string = obstack_finish (&obstack_for_string); \
} while (0)
#define STRING_FREE \
obstack_free (&obstack_for_string, last_string)
void void
scanner_last_string_free (void) gram_scanner_last_string_free (void)
{ {
STRING_FREE; STRING_FREE;
} }
/* Within well-formed rules, RULE_LENGTH is the number of values in /* The location of the most recently saved token, if it was a
the current rule so far, which says where to find `$0' with respect BRACED_CODE token; otherwise, this has an unspecified value. */
to the top of the stack. It is not the same as the rule->length in location gram_last_braced_code_loc;
the case of mid rule actions.
Outside of well-formed rules, RULE_LENGTH has an undefined value. */
static int rule_length;
static void rule_length_overflow (location) __attribute__ ((__noreturn__));
/* Increment the rule length by one, checking for overflow. */
static inline void
increment_rule_length (location loc)
{
rule_length++;
/* Don't allow rule_length == INT_MAX, since that might cause
confusion with strtol if INT_MAX == LONG_MAX. */
if (rule_length == INT_MAX)
rule_length_overflow (loc);
}
static void handle_dollar (int token_type, char *cp, location loc);
static void handle_at (int token_type, char *cp, location loc);
static void handle_syncline (char *, location); static void handle_syncline (char *, location);
static unsigned long int scan_integer (char const *p, int base, location loc); static unsigned long int scan_integer (char const *p, int base, location loc);
static int convert_ucn_to_byte (char const *hex_text); static int convert_ucn_to_byte (char const *hex_text);
@@ -142,11 +78,26 @@ static void unexpected_eof (boundary, char const *);
static void unexpected_newline (boundary, char const *); static void unexpected_newline (boundary, char const *);
%} %}
%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT /* A C-like comment in directives/rules. */
%x SC_STRING SC_CHARACTER %x SC_YACC_COMMENT
%x SC_AFTER_IDENTIFIER /* Strings and characters in directives/rules. */
%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER %x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
%x SC_PRE_CODE SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE /* A identifier was just read in directives/rules. Special state
to capture the sequence `identifier :'. */
%x SC_AFTER_IDENTIFIER
/* A keyword that should be followed by some code was read (e.g.
%printer). */
%x SC_PRE_CODE
/* Three types of user code:
- prologue (code between `%{' `%}' in the first section, before %%);
- actions, printers, union, etc, (between braced in the middle section);
- epilogue (everything after the second %%). */
%x SC_PROLOGUE SC_BRACED_CODE SC_EPILOGUE
/* C and C++ comments in code. */
%x SC_COMMENT SC_LINE_COMMENT
/* Strings and characters in code. */
%x SC_STRING SC_CHARACTER
letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_] letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
id {letter}({letter}|[0-9])* id {letter}({letter}|[0-9])*
@@ -221,17 +172,17 @@ splice (\\[ \f\t\v]*\n)*
"%default"[-_]"prec" return PERCENT_DEFAULT_PREC; "%default"[-_]"prec" return PERCENT_DEFAULT_PREC;
"%define" return PERCENT_DEFINE; "%define" return PERCENT_DEFINE;
"%defines" return PERCENT_DEFINES; "%defines" return PERCENT_DEFINES;
"%destructor" token_type = PERCENT_DESTRUCTOR; BEGIN SC_PRE_CODE; "%destructor" /* FIXME: Remove once %union handled differently. */ token_type = BRACED_CODE; return PERCENT_DESTRUCTOR;
"%dprec" return PERCENT_DPREC; "%dprec" return PERCENT_DPREC;
"%error"[-_]"verbose" return PERCENT_ERROR_VERBOSE; "%error"[-_]"verbose" return PERCENT_ERROR_VERBOSE;
"%expect" return PERCENT_EXPECT; "%expect" return PERCENT_EXPECT;
"%expect"[-_]"rr" return PERCENT_EXPECT_RR; "%expect"[-_]"rr" return PERCENT_EXPECT_RR;
"%file-prefix" return PERCENT_FILE_PREFIX; "%file-prefix" return PERCENT_FILE_PREFIX;
"%fixed"[-_]"output"[-_]"files" return PERCENT_YACC; "%fixed"[-_]"output"[-_]"files" return PERCENT_YACC;
"%initial-action" token_type = PERCENT_INITIAL_ACTION; BEGIN SC_PRE_CODE; "%initial-action" /* FIXME: Remove once %union handled differently. */ token_type = BRACED_CODE; return PERCENT_INITIAL_ACTION;
"%glr-parser" return PERCENT_GLR_PARSER; "%glr-parser" return PERCENT_GLR_PARSER;
"%left" return PERCENT_LEFT; "%left" return PERCENT_LEFT;
"%lex-param" token_type = PERCENT_LEX_PARAM; BEGIN SC_PRE_CODE; "%lex-param" /* FIXME: Remove once %union handled differently. */ token_type = BRACED_CODE; return PERCENT_LEX_PARAM;
"%locations" return PERCENT_LOCATIONS; "%locations" return PERCENT_LOCATIONS;
"%merge" return PERCENT_MERGE; "%merge" return PERCENT_MERGE;
"%name"[-_]"prefix" return PERCENT_NAME_PREFIX; "%name"[-_]"prefix" return PERCENT_NAME_PREFIX;
@@ -241,9 +192,9 @@ splice (\\[ \f\t\v]*\n)*
"%nondeterministic-parser" return PERCENT_NONDETERMINISTIC_PARSER; "%nondeterministic-parser" return PERCENT_NONDETERMINISTIC_PARSER;
"%nterm" return PERCENT_NTERM; "%nterm" return PERCENT_NTERM;
"%output" return PERCENT_OUTPUT; "%output" return PERCENT_OUTPUT;
"%parse-param" token_type = PERCENT_PARSE_PARAM; BEGIN SC_PRE_CODE; "%parse-param" /* FIXME: Remove once %union handled differently. */ token_type = BRACED_CODE; return PERCENT_PARSE_PARAM;
"%prec" rule_length--; return PERCENT_PREC; "%prec" return PERCENT_PREC;
"%printer" token_type = PERCENT_PRINTER; BEGIN SC_PRE_CODE; "%printer" /* FIXME: Remove once %union handled differently. */ token_type = BRACED_CODE; return PERCENT_PRINTER;
"%pure"[-_]"parser" return PERCENT_PURE_PARSER; "%pure"[-_]"parser" return PERCENT_PURE_PARSER;
"%require" return PERCENT_REQUIRE; "%require" return PERCENT_REQUIRE;
"%right" return PERCENT_RIGHT; "%right" return PERCENT_RIGHT;
@@ -262,13 +213,12 @@ splice (\\[ \f\t\v]*\n)*
} }
"=" return EQUAL; "=" return EQUAL;
"|" rule_length = 0; return PIPE; "|" return PIPE;
";" return SEMICOLON; ";" return SEMICOLON;
{id} { {id} {
val->symbol = symbol_get (yytext, *loc); val->symbol = symbol_get (yytext, *loc);
id_loc = *loc; id_loc = *loc;
increment_rule_length (*loc);
BEGIN SC_AFTER_IDENTIFIER; BEGIN SC_AFTER_IDENTIFIER;
} }
@@ -335,7 +285,6 @@ splice (\\[ \f\t\v]*\n)*
<SC_AFTER_IDENTIFIER> <SC_AFTER_IDENTIFIER>
{ {
":" { ":" {
rule_length = 0;
*loc = id_loc; *loc = id_loc;
BEGIN INITIAL; BEGIN INITIAL;
return ID_COLON; return ID_COLON;
@@ -401,7 +350,6 @@ splice (\\[ \f\t\v]*\n)*
STRING_FINISH; STRING_FINISH;
loc->start = token_start; loc->start = token_start;
val->chars = last_string; val->chars = last_string;
increment_rule_length (*loc);
BEGIN INITIAL; BEGIN INITIAL;
return STRING; return STRING;
} }
@@ -428,7 +376,6 @@ splice (\\[ \f\t\v]*\n)*
last_string_1 = last_string[1]; last_string_1 = last_string[1];
symbol_user_token_number_set (val->symbol, last_string_1, *loc); symbol_user_token_number_set (val->symbol, last_string_1, *loc);
STRING_FREE; STRING_FREE;
increment_rule_length (*loc);
BEGIN INITIAL; BEGIN INITIAL;
return ID; return ID;
} }
@@ -501,7 +448,7 @@ splice (\\[ \f\t\v]*\n)*
<SC_CHARACTER,SC_STRING> <SC_CHARACTER,SC_STRING>
{ {
{splice}|\\{splice}[^\n$@\[\]] STRING_GROW; {splice}|\\{splice}[^\n\[\]] STRING_GROW;
} }
<SC_CHARACTER> <SC_CHARACTER>
@@ -622,8 +569,7 @@ splice (\\[ \f\t\v]*\n)*
STRING_FINISH; STRING_FINISH;
loc->start = code_start; loc->start = code_start;
val->chars = last_string; val->chars = last_string;
increment_rule_length (*loc); gram_last_braced_code_loc = *loc;
last_braced_code_loc = *loc;
BEGIN INITIAL; BEGIN INITIAL;
return token_type; return token_type;
} }
@@ -633,18 +579,6 @@ splice (\\[ \f\t\v]*\n)*
(as `<' `<%'). */ (as `<' `<%'). */
"<"{splice}"<" STRING_GROW; "<"{splice}"<" STRING_GROW;
"$"("<"{tag}">")?(-?[0-9]+|"$") handle_dollar (token_type, yytext, *loc);
"@"(-?[0-9]+|"$") handle_at (token_type, yytext, *loc);
"$" {
warn_at (*loc, _("stray `$'"));
obstack_sgrow (&obstack_for_string, "$][");
}
"@" {
warn_at (*loc, _("stray `@'"));
obstack_sgrow (&obstack_for_string, "@@");
}
<<EOF>> unexpected_eof (code_start, "}"); BEGIN INITIAL; <<EOF>> unexpected_eof (code_start, "}"); BEGIN INITIAL;
} }
@@ -684,19 +618,6 @@ splice (\\[ \f\t\v]*\n)*
} }
/*-----------------------------------------.
| Escape M4 quoting characters in C code. |
`-----------------------------------------*/
<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
{
\$ obstack_sgrow (&obstack_for_string, "$][");
\@ obstack_sgrow (&obstack_for_string, "@@");
\[ obstack_sgrow (&obstack_for_string, "@{");
\] obstack_sgrow (&obstack_for_string, "@}");
}
/*-----------------------------------------------------. /*-----------------------------------------------------.
| By default, grow the string obstack with the input. | | By default, grow the string obstack with the input. |
`-----------------------------------------------------*/ `-----------------------------------------------------*/
@@ -706,79 +627,6 @@ splice (\\[ \f\t\v]*\n)*
%% %%
/* Keeps track of the maximum number of semantic values to the left of
a handle (those referenced by $0, $-1, etc.) are required by the
semantic actions of this grammar. */
int max_left_semantic_context = 0;
/* If BUF is null, add BUFSIZE (which in this case must be less than
INT_MAX) to COLUMN; otherwise, add mbsnwidth (BUF, BUFSIZE, 0) to
COLUMN. If an overflow occurs, or might occur but is undetectable,
return INT_MAX. Assume COLUMN is nonnegative. */
static inline int
add_column_width (int column, char const *buf, size_t bufsize)
{
size_t width;
unsigned int remaining_columns = INT_MAX - column;
if (buf)
{
if (INT_MAX / 2 <= bufsize)
return INT_MAX;
width = mbsnwidth (buf, bufsize, 0);
}
else
width = bufsize;
return width <= remaining_columns ? column + width : INT_MAX;
}
/* Set *LOC and adjust scanner cursor to account for token TOKEN of
size SIZE. */
static void
adjust_location (location *loc, char const *token, size_t size)
{
int line = scanner_cursor.line;
int column = scanner_cursor.column;
char const *p0 = token;
char const *p = token;
char const *lim = token + size;
loc->start = scanner_cursor;
for (p = token; p < lim; p++)
switch (*p)
{
case '\n':
line += line < INT_MAX;
column = 1;
p0 = p + 1;
break;
case '\t':
column = add_column_width (column, p0, p - p0);
column = add_column_width (column, NULL, 8 - ((column - 1) & 7));
p0 = p + 1;
break;
default:
break;
}
scanner_cursor.line = line;
scanner_cursor.column = column = add_column_width (column, p0, p - p0);
loc->end = scanner_cursor;
if (line == INT_MAX && loc->start.line != INT_MAX)
warn_at (*loc, _("line number overflow"));
if (column == INT_MAX && loc->start.column != INT_MAX)
warn_at (*loc, _("column number overflow"));
}
/* Read bytes from FP into buffer BUF of size SIZE. Return the /* Read bytes from FP into buffer BUF of size SIZE. Return the
number of bytes read. Remove '\r' from input, treating \r\n number of bytes read. Remove '\r' from input, treating \r\n
and isolated \r as \n. */ and isolated \r as \n. */
@@ -826,173 +674,6 @@ no_cr_read (FILE *fp, char *buf, size_t size)
} }
/*------------------------------------------------------------------.
| TEXT is pointing to a wannabee semantic value (i.e., a `$'). |
| |
| Possible inputs: $[<TYPENAME>]($|integer) |
| |
| Output to OBSTACK_FOR_STRING a reference to this semantic value. |
`------------------------------------------------------------------*/
static inline bool
handle_action_dollar (char *text, location loc)
{
const char *type_name = NULL;
char *cp = text + 1;
if (! current_rule)
return false;
/* Get the type name if explicit. */
if (*cp == '<')
{
type_name = ++cp;
while (*cp != '>')
++cp;
*cp = '\0';
++cp;
}
if (*cp == '$')
{
if (!type_name)
type_name = symbol_list_n_type_name_get (current_rule, loc, 0);
if (!type_name && typed)
complain_at (loc, _("$$ of `%s' has no declared type"),
current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow1 (&obstack_for_string,
"]b4_lhs_value([%s])[", type_name);
current_rule->used = true;
}
else
{
long int num = strtol (cp, NULL, 10);
if (1 - INT_MAX + rule_length <= num && num <= rule_length)
{
int n = num;
if (max_left_semantic_context < 1 - n)
max_left_semantic_context = 1 - n;
if (!type_name && 0 < n)
type_name = symbol_list_n_type_name_get (current_rule, loc, n);
if (!type_name && typed)
complain_at (loc, _("$%d of `%s' has no declared type"),
n, current_rule->sym->tag);
if (!type_name)
type_name = "";
obstack_fgrow3 (&obstack_for_string,
"]b4_rhs_value(%d, %d, [%s])[",
rule_length, n, type_name);
symbol_list_n_used_set (current_rule, n, true);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
return true;
}
/*----------------------------------------------------------------.
| Map `$?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
| (are we in an action?). |
`----------------------------------------------------------------*/
static void
handle_dollar (int token_type, char *text, location loc)
{
switch (token_type)
{
case BRACED_CODE:
if (handle_action_dollar (text, loc))
return;
break;
case PERCENT_DESTRUCTOR:
case PERCENT_INITIAL_ACTION:
case PERCENT_PRINTER:
if (text[1] == '$')
{
obstack_sgrow (&obstack_for_string, "]b4_dollar_dollar[");
return;
}
break;
default:
break;
}
complain_at (loc, _("invalid value: %s"), quote (text));
}
/*------------------------------------------------------.
| TEXT is a location token (i.e., a `@...'). Output to |
| OBSTACK_FOR_STRING a reference to this location. |
`------------------------------------------------------*/
static inline bool
handle_action_at (char *text, location loc)
{
char *cp = text + 1;
locations_flag = true;
if (! current_rule)
return false;
if (*cp == '$')
obstack_sgrow (&obstack_for_string, "]b4_lhs_location[");
else
{
long int num = strtol (cp, NULL, 10);
if (1 - INT_MAX + rule_length <= num && num <= rule_length)
{
int n = num;
obstack_fgrow2 (&obstack_for_string, "]b4_rhs_location(%d, %d)[",
rule_length, n);
}
else
complain_at (loc, _("integer out of range: %s"), quote (text));
}
return true;
}
/*----------------------------------------------------------------.
| Map `@?' onto the proper M4 symbol, depending on its TOKEN_TYPE |
| (are we in an action?). |
`----------------------------------------------------------------*/
static void
handle_at (int token_type, char *text, location loc)
{
switch (token_type)
{
case BRACED_CODE:
handle_action_at (text, loc);
return;
case PERCENT_INITIAL_ACTION:
case PERCENT_DESTRUCTOR:
case PERCENT_PRINTER:
if (text[1] == '$')
{
obstack_sgrow (&obstack_for_string, "]b4_at_dollar[");
return;
}
break;
default:
break;
}
complain_at (loc, _("invalid value: %s"), quote (text));
}
/*------------------------------------------------------. /*------------------------------------------------------.
| Scan NUMBER for a base-BASE integer at location LOC. | | Scan NUMBER for a base-BASE integer at location LOC. |
@@ -1087,20 +768,8 @@ handle_syncline (char *args, location loc)
warn_at (loc, _("line number overflow")); warn_at (loc, _("line number overflow"));
lineno = INT_MAX; lineno = INT_MAX;
} }
scanner_cursor.file = current_file = uniqstr_new (file); current_file = uniqstr_new (file);
scanner_cursor.line = lineno; boundary_set, (&scanner_cursor, current_file, lineno, 1);
scanner_cursor.column = 1;
}
/*---------------------------------.
| Report a rule that is too long. |
`---------------------------------*/
static void
rule_length_overflow (location loc)
{
fatal_at (loc, _("rule is too long"));
} }
@@ -1148,7 +817,7 @@ unexpected_newline (boundary start, char const *token_end)
`-------------------------*/ `-------------------------*/
void void
scanner_initialize (void) gram_scanner_initialize (void)
{ {
obstack_init (&obstack_for_string); obstack_init (&obstack_for_string);
} }
@@ -1159,7 +828,7 @@ scanner_initialize (void)
`-----------------------------------------------*/ `-----------------------------------------------*/
void void
scanner_free (void) gram_scanner_free (void)
{ {
obstack_free (&obstack_for_string, 0); obstack_free (&obstack_for_string, 0);
/* Reclaim Flex's buffers. */ /* Reclaim Flex's buffers. */

View File

@@ -113,6 +113,8 @@ char *base_name (char const *name);
# define ATTRIBUTE_UNUSED __attribute__ ((__unused__)) # define ATTRIBUTE_UNUSED __attribute__ ((__unused__))
#endif #endif
#define FUNCTION_PRINT() fprintf (stderr, "%s: ", __func__)
/*------. /*------.
| NLS. | | NLS. |
`------*/ `------*/

View File

@@ -25,33 +25,17 @@ AT_BANNER([[Input Processing.]])
## Invalid $n. ## ## Invalid $n. ##
## ------------ ## ## ------------ ##
AT_SETUP([Invalid dollar-n]) AT_SETUP([Invalid \$n and @n])
AT_DATA([input.y], AT_DATA([input.y],
[[%% [[%%
exp: { $$ = $1 ; }; exp: { $$ = $1 ; };
]])
AT_CHECK([bison input.y], [1], [],
[[input.y:2.13-14: integer out of range: `$1'
]])
AT_CLEANUP
## ------------ ##
## Invalid @n. ##
## ------------ ##
AT_SETUP([Invalid @n])
AT_DATA([input.y],
[[%%
exp: { @$ = @1 ; }; exp: { @$ = @1 ; };
]]) ]])
AT_CHECK([bison input.y], [1], [], AT_CHECK([bison input.y], [1], [],
[[input.y:2.13-14: integer out of range: `@1' [[input.y:2.13-14: integer out of range: `$1'
input.y:3.13-14: integer out of range: `@1'
]]) ]])
AT_CLEANUP AT_CLEANUP
@@ -200,11 +184,11 @@ AT_SETUP([Torturing the Scanner])
AT_DATA([input.y], []) AT_DATA([input.y], [])
AT_CHECK([bison input.y], [1], [], AT_CHECK([bison input.y], [1], [],
[[input.y:1.1: syntax error, unexpected end of file [[input.y:1.0: syntax error, unexpected end of file
]]) ]])
AT_DATA([input.y], AT_DATA([input.y],
[{} [{}
]) ])
AT_CHECK([bison input.y], [1], [], AT_CHECK([bison input.y], [1], [],

View File

@@ -346,9 +346,7 @@ AT_DATA([input.y],
]]) ]])
AT_CHECK([bison input.y], [1], [], AT_CHECK([bison input.y], [1], [],
[[input.y:3.1: missing `{' in "%destructor {...}" [[input.y:3.1-15: syntax error, unexpected %initial-action, expecting {...}
input.y:4.1: missing `{' in "%initial-action {...}"
input.y:4.1: syntax error, unexpected %initial-action {...}, expecting string or identifier
]]) ]])
AT_CLEANUP AT_CLEANUP