Commit Graph

70 Commits

Author SHA1 Message Date
Paul Eggert
dca81a78f8 uniqstr wasn't being used for handle_syncline like it should. 2004-08-08 04:57:06 +00:00
Paul Eggert
4febdd9667 Reject unescaped newlines in strings. 2004-05-03 07:42:52 +00:00
Paul Hilfinger
d63282419d * src/parse-gram.y: Define PERCENT_EXPECT_RR.
(declaration): Replace expected_conflicts with expected_sr_conflicts.
Add %expect-rr rule.

* src/scan-gram.l: Recognize %expect-rr.

* src/conflicts.h (expected_sr_conflicts): Rename from
expected_conflicts.
(expected_rr_conflicts): Declare.

* src/conflicts.c (expected_sr_conflicts): Rename from
expected_conflicts.
(expected_rr_conflicts): Define.
(conflicts_print): Check r/r conflicts against expected_rr_conflicts
for GLR parsers.
Use expected_sr_conflicts in place of expected_conflicts.
Warn if expected_rr_conflicts used in non-GLR parser.

* doc/bison.texinfo: Add documentation for %expect-rr.
2004-03-26 22:41:16 +00:00
Paul Eggert
1452af69b4 Add support for hex token numbers. 2004-03-08 20:49:34 +00:00
Paul Eggert
92ac370570 Do not allow NUL bytes in string literals or character constants. 2003-10-07 07:32:57 +00:00
Paul Eggert
22fccf958f Use "%no-default-prec" instead of "%default-prec 0". 2003-10-01 21:33:24 +00:00
Paul Eggert
39a06c251a Add %default-prec. 2003-09-30 20:11:29 +00:00
Akim Demaille
cd3684cfa8 When reducing initial empty rules, Bison parser read an initial
location that is not defined.  This results in garbage, and that
affects Bison's own parser.  Therefore we need (i) to extend Bison
to support a means to initialize this location, and (ii) to use
this CVS Bison to fix CVS Bison's parser.
* src/reader.h, reader.c (epilogue_augment): Remove, replace
with...
* src/muscle_tab.h, src/muscle_tab.c (muscle_code_grow): this.
* src/parse-gram.y: Adjust.
(%initial-action): New.
(%error-verbose): Since we require CVS Bison, there is no reason
not to use it.
* src/scan-gram.l: Adjust.
* src/Makefile.am (YACC): New, to make sure we use our own parser.
* data/yacc.c (yyparse): Use b4_initial_action.
2003-08-25 15:16:25 +00:00
Paul Hilfinger
25005f6ab0 * data/glr.c (YYERROR): Update definition.
(yyrecoverSyntaxError): Correct yyerrState logic. Correct comment.
Allow states with only a default reduction.

Fixes to avoid problem that $-N rules in GLR parsers can cause
buffer overruns, corrupting state.

* src/output.c (prepare_rules): Output max_left_semantic_context.
* src/reader.h (max_left_semantic_context): New
* src/scan-gram.l (max_left_semantic_context): Define.
(handle_action_dollar): Update max_left_semantic_context.
* data/glr.c (YYMAXLEFT): New.
(yydoAction): Increase yyrhsVals size.
(yyresolveAction): Ditto.

Fixes to problems with location handling in GLR parsers reported by
Frank Heckenbach (2003/06/05).

* data/glr.c (YYLTYPE): Make trivial if locations not used.
(YYRHSLOC): Add parentheses, make depend on whether locations used.
(YYLLOC_DEFAULT): Ditto.
(yyuserAction): Use YYLLOC_DEFAULT.
(yydoAction): Remove redundant code.

* tests/cxx-type.at: Exercise location information.
(yylex): Track locations.
(stmtMerge): Return value rather than printing.
2003-06-10 02:44:58 +00:00
Paul Eggert
d08290769c Switch from 'int' to 'bool' where that makes sense. 2003-05-24 19:16:02 +00:00
Akim Demaille
916708d59e * src/gram.h, src/gram.c (pure_parser, glr_parser): Move to...
* src/getargs.c, src/getargs.h: here, as bool, not int.
(nondeterministic_parser): New.
* src/parse-gram.y, src/scan-gram.l: Support
%nondeterministic-parser.
* src/output.c (prepare): Use nondeterministic_parser instead
of glr_parser where appropriate.
* src/tables.c (conflict_row, action_row, save_row)
(token_actions, token_actions, pack_vector): Ditto.
2003-04-29 12:57:36 +00:00
Paul Eggert
aa4180418f Add %option nounput, since we no longer use unput.
(unexpected_eof): Renamed from unexpected_end_of_file, for brevity.
Do not insert the expected token via unput, as this runs afoul
of a POSIX-compatibility bug in flex 2.5.31.
All uses changed to BEGIN the parent state,
since we no longer insert the expected token via unput.
2003-04-18 07:26:19 +00:00
Paul Eggert
379f0ac840 (YY_USER_INIT): Initialize code_start, too.
(<INITIAL><<EOF>>, <SC_PRE_CODE><<EOF>>): Set *loc to the scanner
cursor, instead of leaving it undefined.  This fixes a bug
reported by Tim Van Holder in
<http://mail.gnu.org/archive/html/bug-bison/2003-03/msg00023.html>.
2003-03-13 07:07:17 +00:00
Paul Eggert
a2bc9dbc7b (code_start): Initialize it to scanner_cursor,
not loc->end, since loc->end might contain garbage and this leads
to undefined behavior on some platforms.
(id_loc, token_start): Use (IF_LINTed) initial values that do not
depend on *loc, so that the reader doesn't give the the false
impression that *loc is initialized.
(<INITIAL>"%%"): Do not bother setting code_start, since its value
does not survive the return.
2003-03-02 06:55:15 +00:00
Akim Demaille
0433ba88f9 * src/scan-gram.l (code_start): Always initialize it when entering
into yylex, as SC_EPILOGUE is activated *before* the corresponding
yylex invocation.  An alternative would be making it static, but
then it starts with the second %%'s beginning, instead of its end.
2003-03-01 10:55:31 +00:00
Paul Eggert
a737b2163c Use more-consistent naming conventions for local vars. 2003-02-03 15:35:57 +00:00
Paul Eggert
1deb9bdcad src/scan-gram.l (<SC_BRACED_CODE>"}"): Append ";" only in braced code,
not in unions etc.
2002-12-31 02:26:51 +00:00
Paul Eggert
83adb046bf (<INITIAL,SC_AFTER_IDENTIFIER,SC_PRE_CODE>","):
Moved here from...
(<INITIAL>","): Here.  This causes stray "," to be treated
more uniformly.
2002-12-30 23:38:20 +00:00
Paul Eggert
255227393f (<SC_BRACED_CODE>"}"): Append ";" before the last brace in braced code
when not in Yacc mode, for compatibility with Bison 1.35.  This
resurrects the 2001-12-15 patch to src/reader.c.
2002-12-30 22:40:52 +00:00
Paul Eggert
624a35e20b (handle_dollar, handle_at): Now takes int
token_type, not braced_code code_kind.  All uses changed.
(SC_PRE_CODE): New state, for scanning after a keyword that
has (or usually has) an immediately-following braced code.
(token_type): New local var, to keep track of which token type
to return when scanning braced code.
(<INITIAL>"%destructor", <INITIAL>"%lex-param",
<INITIAL>"%parse-param", <INITIAL>"%printer,
<INITIAL>"%union"): Set token type and BEGIN SC_PRE_CODE
instead of returning a token type immediately.
(<INITIAL>"{"): Set token type.
(<SC_BRACED_CODE>"}"): Use it.
(handle_action_dollar, handle_action_at): Now returns bool
indicating success.  Fail if ! current_rule; this prevents a core dump.
(handle_symbol_code_dollar, handle_symbol_code_at):
Remove; merge body into caller.
(handle_dollar, handle_at): Complain in invalid contexts.
2002-12-24 07:46:49 +00:00
Paul Eggert
3b1e470c6d (<SC_ESCAPED_CHARACTER>"'"): Use unsigned char
local var instead of casting to unsigned char, to avoid casts.
2002-12-13 08:35:16 +00:00
Paul Eggert
223ff46e4c (<INITIAL>{int}): Use set_errno and get_errno instead of errno.
(<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>\\x[0-9abcdefABCDEF]+): Likewise.
(handle_action_dollar, handle_action_at): Likewise.
(obstack_for_string): Renamed from string_obstack.
2002-12-11 06:48:18 +00:00
Paul Eggert
3f2d73f157 Include "files.h".
(YY_USER_INIT): Initialize scanner_cursor instead
of *loc.
(STEP): Remove.  No longer needed, now that adjust_location does
the work.  All uses removed.
(scanner_cursor): New var.
(adjust_location): Renamed from extend_location.  It now sets
*loc and adjusts the scanner cursor.  All uses changed.
Don't bother testing for CR.
(handle_syncline): Remove location arg; now updates scanner cursor.
All callers changed.
(unexpected_end_of_file): Now accepts start boundary of token or
comment, not location.  All callers changed.  Update scanner cursor,
not the location.
(SC_AFTER_IDENTIFIER): New state.
(context_state): Renamed from c_context.  All uses changed.
(id_loc, code_start, token_start): New local vars.
(<INITIAL,SC_AFTER_IDENTIFIER>): New initial context.  Move all
processing of Yacc white space and equivalents here.
(<INITIAL>{id}): Save id_loc.  Begin state SC_AFTER_IDENTIFIER
instead of returning ID immediately, since we need to search for
a subsequent colon.
(<INITIAL>"'", "\""): Save token_start.
(<INITIAL>"%{", "{", "%%"): Save code_start.
(<SC_AFTER_IDENTIFIER>): New state, looking for a colon.
(<SC_YACC_COMMENT>, <SC_COMMENT>, <SC_LINE_COMMENT>):
BEGIN context_state at end, not INITIAL.
(<SC_ESCAPED_STRING>"\"", <SC_ESCAPED_CHARACTER>"'",
<SC_BRACED_CODE>"}", <SC_PROLOGUE>"%}", <SC_EPILOGUE><<EOF>>):
Return correct token start.
(<SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>): Save start boundary when
the start of a character, string or multiline comment is found.
2002-12-07 06:14:27 +00:00
Paul Eggert
6c30d6413e (no_cr_read, extend_location): Move to epilogue,
and put only a forward declaration in the prologue.  This is for
consistency with the other scanner helper functions.
2002-12-01 02:37:56 +00:00
Paul Eggert
6b0d38ab2c [a-f] -> [abcdef], so that we don't assume the C locale. 2002-11-29 09:03:16 +00:00
Paul Eggert
763ed7a687 "," now elicits a warning, rather than being
a token; this is more compatible with byacc.
2002-11-29 08:44:40 +00:00
Paul Eggert
41141c568e (STEP): Renamed from YY_STEP. All uses changed.
(STRING_GROW): Renamed from YY_OBS_GROW.  All uses changed.
(STRING_FINISH): Renamed from YY_OBS_FINISH.  All uses changed.
(STRING_FREE): Renamed from YY_OBS_FREE.  All uses changed.
2002-11-27 18:34:14 +00:00
Paul Eggert
412f8a5975 Revamp regular expressions so that " and '
do not confuse xgettext.
2002-11-13 06:40:35 +00:00
Akim Demaille
7ec2d4cd39 * src/scan-gram.l, src/reader.h (scanner_last_string_free):
Restore.
* src/scan-gram.l (last_string): Is global to the file, not to
yylex.
* src/parse-gram.y (input): Don't append the epilogue here,
(epilogue.opt): do it here, and free the scanner's obstack.
* src/reader.c (epilogue_set): Rename as...
(epilogue_augment): this.
* data/c.m4 (b4_epilogue): Defaults to empty.
2002-11-12 08:26:38 +00:00
Akim Demaille
95612cfa60 * src/struniq.h, src/struniq.c (struniq_t): Is const.
(STRUNIQ_EQ, struniq_assert, struniq_assert_p): New.
Use struniq for symbols.
* src/symtab.h (symbol_t): The tag member is a struniq.
(symbol_type_set): Adjust.
* src/symtab.c (symbol_new): Takes a struniq.
(symbol_free): Don't free the tag member.
(hash_compare_symbol_t, hash_symbol_t): Rename as...
(hash_compare_symbol, hash_symbol): these.
Use the fact that tags as struniqs.
(symbol_get): Use struniq_new.
* src/symlist.h, src/symlist.c (symbol_list_n_type_name_get):
Returns a strniq.
* src/reader.h (merger_list, grammar_currentmerge_set): The name
and type members are struniqs.
* src/reader.c (get_merge_function)
(grammar_current_rule_merge_set): Adjust.
(TYPE, current_type): Are struniq.
Use struniq for file names.
* src/files.h, src/files.c (infile): Split into...
(grammar_file, current_file): these.
* src/scan-gram.c (YY_USER_INIT, handle_syncline): Adjust.
* src/reduce.c (reduce_print): Likewise.
* src/getargs.c (getargs): Likewise.
* src/complain.h, src/complain.c: Likewise.
* src/main.c (main): Call struniqs_new early enough to use it for
file names.
Don't free the input file name.
2002-11-12 08:05:59 +00:00
Akim Demaille
3e6656f9ab * src/symtab.c (symbol_free): Remove dead deactivated code:
type_name are properly removed.
Don't use XFREE to free items that cannot be NULL.
* src/struniq.h, src/struniq.c: New.
* src/main.c (main): Initialize/free struniqs.
* src/parse-gram.y (%union): Add astruniq member.
(yyprint): Adjust.
* src/scan-gram.l (<{tag}>): Return a struniq.
Free the obstack bit that used to store it.
* src/symtab.h (symbol_t): The 'type_name' member is a struniq.
2002-11-12 07:55:55 +00:00
Paul Eggert
ac060e78a3 (<SC_CHARACTER>): Don't worry about any backslash
escapes other than \\ and \'; this simplifies the code.
(<SC_STRING>): Likewise, for \\ and \".
(<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,
SC_PROLOGUE,SC_EPILOGUE>): Escape $ and @, too.
Use new escapes @{ and @} for [ and ].
2002-11-12 07:27:04 +00:00
Paul Eggert
345532d70b (unexpected_end_of_file): Fix bug: columns were counted in the token
inserted at end of file.  Now takes location_t *, not location_t, so
that the location can be adjusted.  All uses changed.
2002-11-10 05:17:56 +00:00
Paul Eggert
a706a1cc03 Remove stack option. We no longer use the stack, since the stack was
never deeper than 1; instead, use the new auto var c_context to record
the stacked value.

Remove nounput option.  At an unexpected end of file, we now unput
the minimal input necessary to end cleanly; this simplifies the
code.

Avoid unbounded token sizes where this is easy.

(unexpected_end_of_file): New function.
Use it to systematize the error message on unexpected EOF.
(last-string): Now auto, not static.
(YY_OBS_FREE): Remove unnecessary do while (0) wrapper.
(scanner_last_string_free): Remove; not used.
(percent_percent_count): Move decl to just before use.
(SC_ESCAPED_CHARACTER): Return ID at unexpected end of file,
not the (never otherwised-used) CHARACTER.
2002-11-08 05:20:20 +00:00
Paul Eggert
8e6ef48342 (unexpected_end_of_file): New function.
Use it to systematize the error message on unexpected EOF.
2002-11-07 08:15:11 +00:00
Akim Demaille
900c5db537 * src/main.c (main): Free `infile'.
* src/scan-gram.l (handle_syncline): New.
Recognize `#line'.
* src/output.c (user_actions_output, symbol_destructors_output)
(symbol_printers_output): Use the location's file name, not
infile.
* src/reader.c (prologue_augment, epilogue_set): Likewise.
2002-11-06 08:08:46 +00:00
Paul Eggert
98f2caaa5f Use more accurate diagnostics, e.g.
"integer out of range" rather than "invalid value".
2002-11-06 07:01:06 +00:00
Paul Eggert
1a9e39f116 (braces_level): Now auto, not static.
Initialize to zero if the compiler is being picky.
(INITIAL): Clear braces_level instead of incrementing it.
(SC_BRACED_CODE): Treat <% and %> as { and } when inside C code,
as POSIX 1003.1-2001 requires.
2002-11-05 23:50:11 +00:00
Akim Demaille
29c017256a * src/scan-gram.l: When it starts with `%', complain about the
whole directive, not just that `invalid character: %'.
2002-11-05 21:20:14 +00:00
Akim Demaille
c4d720cdbb * src/location.h (LOCATION_PRINT): Use quotearg slot 3 to avoid
clashes.
* src/scan-gram.l: Use ['] instead of ['] to pacify
font-lock-mode.
Use complain_at.
Use quote, not quote_n since LOCATION_PRINT no longer uses the
slot 0.
2002-11-04 08:28:01 +00:00
Paul Eggert
d8d3f94a99 Revamp to fix POSIX incompatibilities, to count columns correctly, and
to check for invalid inputs.

Use mbsnwidth to count columns correctly.  Account for tabs, too.
Include mbswidth.h.
(YY_USER_ACTION): Invoke extend_location rather than LOCATION_COLUMNS.
(extend_location): New function.
(YY_LINES): Remove.

Handle CRLF in C code rather than in Lex code.
(YY_INPUT): New macro.
(no_cr_read): New function.

Scan UCNs, even though we don't fully handle them yet.
(convert_ucn_to_byte): New function.

Handle backslash-newline correctly in C code.
(SC_LINE_COMMENT, SC_YACC_COMMENT): New states.
(eols, blanks): Remove.  YY_USER_ACTION now counts newlines etc.;
all uses changed.
(tag, splice): New EREs.  Do not allow NUL or newline in tags.
Use {splice} wherever C allows backslash-newline.
YY_STEP after space, newline, vertical-tab.
("/*"): BEGIN SC_YACC_COMMENT, not yy_push_state (SC_COMMENT).

(letter, id): Don't assume ASCII; e.g., spell out a-z.

({int}, handle_action_dollar, handle_action_at): Check for integer
overflow.

(YY_STEP): Omit trailing semicolon, so that it's more like C.

(<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>): Allow \0 and \00
as well as \000.  Check for UCHAR_MAX, not 255.
Allow \x with an arbitrary positive number of digits, as in C.
Check for overflow here.
Allow \? and UCNs, for compatibility with C.

(handle_symbol_code_dollar): Use quote_n slot 1 to avoid collision
with quote slot used by complain_at.
2002-11-03 08:42:32 +00:00
Paul Eggert
d33cb3ae09 Remove all uses of PARAMS, since we now assume C89 or better. 2002-10-21 05:30:50 +00:00
Akim Demaille
ae7453f2ba Prototype support of %lex-param and %parse-param.
* src/parse-gram.y: Add the definition of the %lex-param and
%parse-param tokens, plus their rules.
Drop the `_' version of %glr-parser.
Add the "," token.
* src/scan-gram.l (INITIAL): Scan them.
* src/muscle_tab.c: Comment changes.
(muscle_insert, muscle_find): Rename `pair' as `probe'.
* src/muscle_tab.h (MUSCLE_INSERT_PREFIX): Remove unused.
(muscle_entry_s): The `value' member is no longer const.
Adjust all dependencies.
* src/muscle_tab.c (muscle_init): Adjust: use
MUSCLE_INSERT_STRING.
Initialize the obstack earlier.
* src/muscle_tab.h, src/muscle_tab.c (muscle_grow)
(muscle_pair_list_grow): New.
* data/c.m4 (b4_c_function_call, b4_c_args): New.
* data/yacc.c (YYLEX): Use b4_c_function_call to honor %lex-param.
* tests/calc.at: Use %locations, not --locations.
(AT_CHECK_CALC_GLR): Use %glr-parser, not %glr_parser.
2002-10-19 14:38:06 +00:00
Akim Demaille
473d0a7567 * src/getargs.h (trace_e): Add trace_scan, and trace_parse.
* src/getargs.c (trace_types, trace_args): Adjust.
* src/reader.c (grammar_current_rule_prec_set)
(grammar_current_rule_dprec_set, grammar_current_rule_merge_set):
Standardize error messages.
And s/@prec/%prec/!
(reader): Use trace_flag to enable scanner/parser debugging,
instead of an adhoc scheme.
* src/scan-gram.l: Remove trailing debugging code.
2002-10-17 17:47:33 +00:00
Paul Eggert
efcb44dd47 (rule_length): New static var.
Use it to keep track of the rule length in the scanner, since
we can't expect the parser to be in lock-step sync with the scanner.
(handle_action_dollar, handle_action_at): Use this var.
2002-10-13 08:38:39 +00:00
Akim Demaille
eb71459201 * tests/regression.at Characters Escapes): New.
* src/scan-gram.l (SC_ESCAPED_CHARACTER): Accept ' in strings and
characters.
Reported by Jan Nieuwenhuizen.
2002-10-11 11:23:19 +00:00
Paul Eggert
db2cc12fd0 Wrap strings in _() if they need translation.
Use strings rather than escapes when possible,
to minimize the number of warnings from xgettext.

(handle_action_dollar, handle_action_at): Don't use isdigit,
as it mishandles negative chars and it may not work as expected
outside the C locale.
2002-08-12 14:52:47 +00:00
Akim Demaille
5dde258a9e * src/scan-gram.l (id): Can start with an underscore. 2002-07-19 08:31:32 +00:00
Akim Demaille
1a715ef2fc * lib/quotearg.h: Protect against multiple inclusions.
* src/location.h (location_t): Add a `file' member.
(LOCATION_RESET, LOCATION_PRINT): Adjust.
* src/complain.c (warn_at, complain_at, fatal_at): Drop
`error_one_per_line' support.
2002-07-09 16:24:57 +00:00
Akim Demaille
a5d5099417 * src/complain.h, src/complain.c (warn, complain): Remove, unused.
* src/reader.c (lineno): Remove.
Adjust all dependencies.
(get_merge_function): Take a location and use complain_at.
* src/symtab.h, src/symtab.c (symbol_make_alias): Likewise.
* tests/regression.at (Invalid inputs, Mixing %token styles):
Adjust.
2002-07-09 15:54:39 +00:00