mirror of
https://git.savannah.gnu.org/git/bison.git
synced 2026-03-09 20:33:03 +00:00
477 lines
16 KiB
Plaintext
477 lines
16 KiB
Plaintext
-*- outline -*-
|
|
|
|
|
|
* URGENT: Prologue
|
|
The %union is declared after the user C declarations. It can be
|
|
a problem if YYSTYPE is declared after the user part.
|
|
|
|
Actually, the real problem seems that the %union ought to be output
|
|
where it was defined. For instance, in gettext/intl/plural.y, we
|
|
have:
|
|
|
|
%{
|
|
...
|
|
#include "gettextP.h"
|
|
...
|
|
%}
|
|
|
|
%union {
|
|
unsigned long int num;
|
|
enum operator op;
|
|
struct expression *exp;
|
|
}
|
|
|
|
%{
|
|
...
|
|
static int yylex PARAMS ((YYSTYPE *lval, const char **pexp));
|
|
...
|
|
%}
|
|
|
|
Where the first part defines struct expression, the second uses it to
|
|
define YYSTYPE, and the last uses YYSTYPE. Only this order is valid.
|
|
|
|
Note that we have the same problem with GCC.
|
|
|
|
I suggest splitting the prologue into pre-prologue and post-prologue.
|
|
The reason is that:
|
|
|
|
1. we keep language independance as it is the skeleton that joins the
|
|
two prologues (there is no need for the engine to encode union yystype
|
|
and to output it inside the prologue, which breaks the language
|
|
independance of the generator)
|
|
|
|
2. that makes it possible to have several %union in input. I think
|
|
this is a pleasant (but useless currently) feature, but in the future,
|
|
I want a means to %include other bits of grammars, and _then_ it will
|
|
be important for the various bits to define their needs in %union.
|
|
|
|
When implementing multiple-%union support, bare the following in mind:
|
|
|
|
- when --yacc, this must be flagged as an error. Don't make it fatal
|
|
though.
|
|
|
|
- The #line must now appear *inside* the definition of yystype.
|
|
Something like
|
|
|
|
{
|
|
#line 12 "foo.y"
|
|
int ival;
|
|
#line 23 "foo.y"
|
|
char *sval;
|
|
}
|
|
|
|
* Language independent actions
|
|
|
|
Currently bison, the generator, transforms $1, $$ and so forth into
|
|
direct C code, manipulating the stacks. This is problematic, because
|
|
(i) it means that if we want more languages, we need to update the
|
|
generator, and (ii), it forces names everywhere (e.g., the C++
|
|
skeleton would be happy to use other naming schemes, and actually,
|
|
even other accessing schemes).
|
|
|
|
Therefore we want
|
|
|
|
1. the generator to replace $1, etc. by M4 macro invocations
|
|
(b4_dollar(1), b4_at(3), b4_dollar_dollar) etc.
|
|
|
|
2. the skeletons to define these macros.
|
|
|
|
But currently the actions are double-quoted, to protect them from M4
|
|
evaluation. So we need to:
|
|
|
|
3. stop quoting them
|
|
|
|
4. change the [ and ] in the actions into @<:@ and @:>@
|
|
|
|
5. extend the postprocessor to maps these back onto [ and ].
|
|
|
|
* Coding system independence
|
|
Paul notes:
|
|
|
|
Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
|
|
255). It also assumes that the 8-bit character encoding is
|
|
the same for the invocation of 'bison' as it is for the
|
|
invocation of 'cc', but this is not necessarily true when
|
|
people run bison on an ASCII host and then use cc on an EBCDIC
|
|
host. I don't think these topics are worth our time
|
|
addressing (unless we find a gung-ho volunteer for EBCDIC or
|
|
PDP-10 ports :-) but they should probably be documented
|
|
somewhere.
|
|
|
|
* Using enums instead of int for tokens.
|
|
Paul suggests:
|
|
|
|
#ifndef YYTOKENTYPE
|
|
# if defined (__STDC__) || defined (__cplusplus)
|
|
/* Put the tokens into the symbol table, so that GDB and other debuggers
|
|
know about them. */
|
|
enum yytokentype {
|
|
FOO = 256,
|
|
BAR,
|
|
...
|
|
};
|
|
/* POSIX requires `int' for tokens in interfaces. */
|
|
# define YYTOKENTYPE int
|
|
# endif
|
|
#endif
|
|
#define FOO 256
|
|
#define BAR 257
|
|
...
|
|
|
|
* Output directory
|
|
Akim:
|
|
|
|
| I consider this to be a bug in bison:
|
|
|
|
|
| /tmp % mkdir src
|
|
| /tmp % cp ~/src/bison/tests/calc.y src
|
|
| /tmp % mkdir build && cd build
|
|
| /tmp/build % bison ../src/calc.y
|
|
| /tmp/build % cd ..
|
|
| /tmp % ls -l build src
|
|
| build:
|
|
| total 0
|
|
|
|
|
| src:
|
|
| total 32
|
|
| -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
|
|
| -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
|
|
|
|
|
|
|
|
| Would it be safe to change this behavior to something more reasonable?
|
|
| Do you think some people depend upon this?
|
|
|
|
Jim:
|
|
|
|
Is it that behavior documented?
|
|
If so, then it's probably not reasonable to change it.
|
|
I've Cc'd the automake list, because some of automake's
|
|
rules use bison through $(YACC) -- though I'll bet they
|
|
all use it in yacc-compatible mode.
|
|
|
|
Pavel:
|
|
|
|
Hello, Jim and others!
|
|
|
|
> Is it that behavior documented?
|
|
> If so, then it's probably not reasonable to change it.
|
|
> I've Cc'd the automake list, because some of automake's
|
|
> rules use bison through $(YACC) -- though I'll bet they
|
|
> all use it in yacc-compatible mode.
|
|
|
|
Yes, Automake currently used bison in Automake-compatible mode, but it
|
|
would be fair for Automake to switch to the native mode as long as the
|
|
processed files are distributed and "missing" emulates bison.
|
|
|
|
In any case, the makefiles should specify the output file explicitly
|
|
instead of relying on weird defaults.
|
|
|
|
> | src:
|
|
> | total 32
|
|
> | -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
|
|
> | -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
|
|
|
|
This is not _that_ ugly as it seems - with Automake you want to put
|
|
sources where they belong - to the source directory.
|
|
|
|
> | This is not _that_ ugly as it seems - with Automake you want to put
|
|
> | sources where they belong - to the source directory.
|
|
>
|
|
> The difference source/build you are referring to is based on Automake
|
|
> concepts. They have no sense at all for tools such as bison or gcc
|
|
> etc. They have input and output. I do not want them to try to grasp
|
|
> source/build. I want them to behave uniformly: output *here*.
|
|
|
|
I realize that.
|
|
|
|
It's unfortunate that the native mode of Bison behaves in a less uniform
|
|
way than the yacc mode. I agree with your point. Bison maintainters may
|
|
want to fix it along with the documentation.
|
|
|
|
|
|
* Unit rules
|
|
Maybe we could expand unit rules, i.e., transform
|
|
|
|
exp: arith | bool;
|
|
arith: exp '+' exp;
|
|
bool: exp '&' exp;
|
|
|
|
into
|
|
|
|
exp: exp '+' exp | exp '&' exp;
|
|
|
|
when there are no actions. This can significantly speed up some
|
|
grammars.
|
|
|
|
* Stupid error messages
|
|
An example shows it easily:
|
|
|
|
src/bison/tests % ./testsuite -k calc,location,error-verbose -l
|
|
GNU Bison 1.49a test suite test groups:
|
|
|
|
NUM: FILENAME:LINE TEST-GROUP-NAME
|
|
KEYWORDS
|
|
|
|
51: calc.at:440 Calculator --locations --yyerror-verbose
|
|
52: calc.at:442 Calculator --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
|
|
54: calc.at:445 Calculator --debug --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
|
|
src/bison/tests % ./testsuite 51 -d
|
|
## --------------------------- ##
|
|
## GNU Bison 1.49a test suite. ##
|
|
## --------------------------- ##
|
|
51: calc.at:440 ok
|
|
## ---------------------------- ##
|
|
## All 1 tests were successful. ##
|
|
## ---------------------------- ##
|
|
src/bison/tests % cd ./testsuite.dir/51
|
|
tests/testsuite.dir/51 % echo "()" | ./calc
|
|
1.2-1.3: parse error, unexpected ')', expecting error or "number" or '-' or '('
|
|
|
|
* yyerror, yyprint interface
|
|
It should be improved, in particular when using Bison features such as
|
|
locations, and YYPARSE_PARAMS. For the time being, it is recommended
|
|
to #define yyerror and yyprint to steal internal variables...
|
|
|
|
* read_pipe.c
|
|
This is not portable to DOS for instance. Implement a more portable
|
|
scheme. Sources of inspiration include GNU diff, and Free Recode.
|
|
|
|
* Memory leaks in the generator
|
|
A round of memory leak clean ups would be most welcome. Dmalloc,
|
|
Checker GCC, Electric Fence, or Valgrind: you chose your tool.
|
|
|
|
* Memory leaks in the parser
|
|
The same applies to the generated parsers. In particular, this is
|
|
critical for user data: when aborting a parsing, when handling the
|
|
error token etc., we often throw away yylval without giving a chance
|
|
of cleaning it up to the user.
|
|
|
|
* --graph
|
|
Show reductions. []
|
|
|
|
* Broken options ?
|
|
** %no-lines [ok]
|
|
** %no-parser []
|
|
** %pure-parser []
|
|
** %semantic-parser []
|
|
** %token-table []
|
|
** Options which could use parse_dquoted_param ().
|
|
Maybe transfered in lex.c.
|
|
*** %skeleton [ok]
|
|
*** %output []
|
|
*** %file-prefix []
|
|
*** %name-prefix []
|
|
|
|
** Skeleton strategy. []
|
|
Must we keep %no-parser?
|
|
%token-table?
|
|
*** New skeletons. []
|
|
|
|
* src/print_graph.c
|
|
Find the best graph parameters. []
|
|
|
|
* doc/bison.texinfo
|
|
** Update
|
|
informations about ERROR_VERBOSE. []
|
|
** Add explainations about
|
|
skeleton muscles. []
|
|
%skeleton. []
|
|
|
|
* testsuite
|
|
** tests/pure-parser.at []
|
|
New tests.
|
|
|
|
* Debugging parsers
|
|
|
|
From Greg McGary:
|
|
|
|
akim demaille <akim.demaille@epita.fr> writes:
|
|
|
|
> With great pleasure! Nonetheless, things which are debatable
|
|
> (or not, but just `big') should be discuss in `public': something
|
|
> like help- or bug-bison@gnu.org is just fine. Jesse and I are there,
|
|
> but there is also Jim and some other people.
|
|
|
|
I have no idea whether it qualifies as big or controversial, so I'll
|
|
just summarize for you. I proposed this change years ago and was
|
|
surprised that it was met with utter indifference!
|
|
|
|
This debug feature is for the programs/grammars one develops with
|
|
bison, not for debugging bison itself. I find that the YYDEBUG
|
|
output comes in a very inconvenient format for my purposes.
|
|
When debugging gcc, for instance, what I want is to see a trace of
|
|
the sequence of reductions and the line#s for the semantic actions
|
|
so I can follow what's happening. Single-step in gdb doesn't cut it
|
|
because to move from one semantic action to the next takes you through
|
|
lots of internal machinery of the parser, which is uninteresting.
|
|
|
|
The change I made was to the format of the debug output, so that it
|
|
comes out in the format of C error messages, digestible by emacs
|
|
compile mode, like so:
|
|
|
|
grammar.y:1234: foo: bar(0x123456) baz(0x345678)
|
|
|
|
where "foo: bar baz" is the reduction rule, whose semantic action
|
|
appears on line 1234 of the bison grammar file grammar.y. The hex
|
|
numbers on the rhs tokens are the parse-stack values associated with
|
|
those tokens. Of course, yytype might be something totally
|
|
incompatible with that representation, but for the most part, yytype
|
|
values are single words (scalars or pointers). In the case of gcc,
|
|
they're most often pointers to tree nodes. Come to think of it, the
|
|
right thing to do is to make the printing of stack values be
|
|
user-definable. It would also be useful to include the filename &
|
|
line# of the file being parsed, but the main filename & line# should
|
|
continue to be that of grammar.y
|
|
|
|
Anyway, this feature has saved my life on numerous occasions. The way
|
|
I customarily use it is to first run bison with the traces on, isolate
|
|
the sequence of reductions that interests me, put those traces in a
|
|
buffer and force it into compile-mode, then visit each of those lines
|
|
in the grammar and set breakpoints with C-x SPACE. Then, I can run
|
|
again under the control of gdb and stop at each semantic action.
|
|
With the hex addresses of tree nodes, I can inspect the values
|
|
associated with any rhs token.
|
|
|
|
You like?
|
|
|
|
* input synclines
|
|
Some users create their foo.y files, and equip them with #line. Bison
|
|
should recognize these, and preserve them.
|
|
|
|
* BTYacc
|
|
See if we can integrate backtracking in Bison. Contact the BTYacc
|
|
maintainers.
|
|
|
|
* Automaton report
|
|
Display more clearly the lookaheads for each item.
|
|
|
|
* RR conflicts
|
|
See if we can use precedence between rules to solve RR conflicts. See
|
|
what POSIX says.
|
|
|
|
* Precedence
|
|
It is unfortunate that there is a total order for precedence. It
|
|
makes it impossible to have modular precedence information. We should
|
|
move to partial orders.
|
|
|
|
* Parsing grammars
|
|
Rewrite the reader in Bison.
|
|
|
|
* Problems with aliases
|
|
From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
|
|
Subject: Token Alias Bug
|
|
To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
|
|
|
|
I've noticed a bug in bison. Sadly, our eternally wise sysadmins won't let
|
|
us use CVS, so I can't find out if it's been fixed already...
|
|
|
|
Basically, I made a program (in flex) that went through a .y file looking
|
|
for "..."-tokens, and then outputed a %token
|
|
line for it. For single-character ""-tokens, I reasoned, I could just use
|
|
[%token 'A' "A"]. However, this causes Bison to output a [#define 'A' 65],
|
|
which cppp chokes on, not unreasonably. (And even if cppp didn't choke, I
|
|
obviously wouldn't want (char)'A' to be replaced with (int)65 throughout my
|
|
code.
|
|
|
|
Bison normally forgoes outputing a #define for a character token. However,
|
|
it always outputs an aliased token -- even if the token is an alias for a
|
|
character token. We don't want that. The problem is in /output.c/, as I
|
|
recall. When it outputs the token definitions, it checks for a character
|
|
token, and then checks for an alias token. If the character token check is
|
|
placed after the alias check, then it works correctly.
|
|
|
|
Alias tokens seem to be something of a kludge. What about an [%alias "..."]
|
|
command...
|
|
|
|
%alias T_IF "IF"
|
|
|
|
Hmm. I can't help thinking... What about a --generate-lex option that
|
|
creates an .l file for the alias tokens used... (Or an option to make a
|
|
gperf file, etc...)
|
|
|
|
* Presentation of the report file
|
|
From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
|
|
Subject: Token Alias Bug
|
|
To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
|
|
|
|
I've also noticed something, that whilst not *wrong*, is inconvienient: I
|
|
use the verbose mode to help find the causes of unresolved shift/reduce
|
|
conflicts. However, this mode insists on starting the .output file with a
|
|
list of *resolved* conflicts, something I find quite useless. Might it be
|
|
possible to define a -v mode, and a -vv mode -- Where the -vv mode shows
|
|
everything, but the -v mode only tells you what you need for examining
|
|
conflicts? (Or, perhaps, a "*** This state has N conflicts ***" marker above
|
|
each state with conflicts.)
|
|
|
|
|
|
* $undefined
|
|
From Hans:
|
|
- If the Bison generated parser experiences an undefined number in the
|
|
character range, that character is written out in diagnostic messages, an
|
|
addition to the $undefined value.
|
|
|
|
Suggest: Change the name $undefined to undefined; looks better in outputs.
|
|
|
|
* Default Action
|
|
From Hans:
|
|
- For use with my C++ parser, I transported the "switch (yyn)" statement
|
|
that Bison writes to the bison.simple skeleton file. This way, I can remove
|
|
the current default rule $$ = $1 implementation, which causes a double
|
|
assignment to $$ which may not be OK under C++, replacing it with a
|
|
"default:" part within the switch statement.
|
|
|
|
Note that the default rule $$ = $1, when typed, is perfectly OK under C,
|
|
but in the C++ implementation I made, this rule is different from
|
|
$<type_name>$ = $<type_name>1. I therefore think that one should implement
|
|
a Bison option where every typed default rule is explicitly written out
|
|
(same typed ruled can of course be grouped together).
|
|
|
|
* Pre and post actions.
|
|
From: Florian Krohm <florian@edamail.fishkill.ibm.com>
|
|
Subject: YYACT_EPILOGUE
|
|
To: bug-bison@gnu.org
|
|
X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
|
|
|
|
The other day I had the need for explicitly building the parse tree. I
|
|
used %locations for that and defined YYLLOC_DEFAULT to call a function
|
|
that returns the tree node for the production. Easy. But I also needed
|
|
to assign the S-attribute to the tree node. That cannot be done in
|
|
YYLLOC_DEFAULT, because it is invoked before the action is executed.
|
|
The way I solved this was to define a macro YYACT_EPILOGUE that would
|
|
be invoked after the action. For reasons of symmetry I also added
|
|
YYACT_PROLOGUE. Although I had no use for that I can envision how it
|
|
might come in handy for debugging purposes.
|
|
All is needed is to add
|
|
|
|
#if YYLSP_NEEDED
|
|
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
|
|
#else
|
|
YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
|
|
#endif
|
|
|
|
at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
|
|
|
|
I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
|
|
to bison. If you're interested, I'll work on a patch.
|
|
|
|
-----
|
|
|
|
Copyright (C) 2001, 2002 Free Software Foundation, Inc.
|
|
|
|
This file is part of GNU Autoconf.
|
|
|
|
GNU Autoconf is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 2, or (at your option)
|
|
any later version.
|
|
|
|
GNU Autoconf is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with autoconf; see the file COPYING. If not, write to
|
|
the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
|
Boston, MA 02111-1307, USA.
|