For each start symbol, generate a parsing function with a richer
return value than the usual of yyparse. Reserve a place for the
returned semantic value, in order to avoid having to pass a pointer as
argument to "return" that value. This also makes the call to the
parsing function independent of whether a given start-symbol is typed.
For instance, if the grammar file contains:
%type <int> expression
%start input expression
(so "input" is valueless) we get
typedef struct
{
int yystatus;
} yyparse_input_t;
yyparse_input_t yyparse_input (void);
typedef struct
{
int yyvalue;
int yystatus;
} yyparse_expression_t;
yyparse_expression_t yyparse_expression (void);
This commit also changes the implementation of the parser termination:
when there are multiple start symbols, it is the initial rules that
explicitly YYACCEPT. They do that after having exported the
start-symbol's value (if it is typed):
switch (yyn)
{
case 1: /* $accept: YY_EXPRESSION expression $end */
{ ((*yyvalue).TOK_expression) = (yyvsp[-1].TOK_expression); YYACCEPT; }
break;
case 2: /* $accept: YY_INPUT input $end */
{ YYACCEPT; }
break;
I have tried several ways to deal with termination, and this is the
one that appears the best one to me. It is also the most natural.
* src/scan-code.h, src/scan-code.l (obstack_for_actions): New.
* src/reader.c (grammar_rule_check_and_complete): Generate the actions
of the rules for each start symbol.
* data/skeletons/bison.m4 (b4_symbol_slot): New, with safer semantics
than type and type_tag.
* data/skeletons/yacc.c (b4_accept): New.
Generates the body of the action of the start rules.
(_b4_declare_sub_yyparse): For each start symbol define a dedicated
return type for its parsing function.
Adjust the declaration of its parsing function.
(_b4_define_sub_yyparse): Adjust the definition of the function.
* examples/c/lexcalc/parse.y: Check the case of valueless symbols.
* examples/c/lexcalc/lexcalc.test: Check start symbols.
This directory contains data needed by Bison.
Directory Content
Skeletons
Bison skeletons: the general shapes of the different parser kinds, that are specialized for specific grammars by the bison program.
Currently, the supported skeletons are:
-
yacc.c It used to be named bison.simple: it corresponds to C Yacc compatible LALR(1) parsers.
-
lalr1.cc Produces a C++ parser class.
-
lalr1.java Produces a Java parser class.
-
glr.c A Generalized LR C parser based on Bison's LALR(1) tables.
-
glr.cc A Generalized LR C++ parser. Actually a C++ wrapper around glr.c.
These skeletons are the only ones supported by the Bison team. Because the interface between skeletons and the bison program is not finished, we are not bound to it. In particular, Bison is not mature enough for us to consider that "foreign skeletons" are supported.
m4sugar
This directory contains M4sugar, sort of an extended library for M4, which is used by Bison to instantiate the skeletons.
xslt
This directory contains XSLT programs that transform Bison's XML output into various formats.
-
bison.xsl A library of routines used by the other XSLT programs.
-
xml2dot.xsl Conversion into GraphViz's dot format.
-
xml2text.xsl Conversion into text.
-
xml2xhtml.xsl Conversion into XHTML.
Implementation Notes About the Skeletons
"Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison executable with LR tables, facts about the symbols, etc. and they generate the output (say parser.cc, parser.hh, location.hh, etc.). They are only in charge of generating the parser and its auxiliary files, they do not generate the XML output, the parser.output reports, nor the graphical rendering.
The bits of information passing from bison to the backend is named
"muscles". Muscles are passed to M4 via its standard input: it's a set of
m4 definitions. To see them, use --trace=muscles.
Except for muscles, whose names are generated by bison, the skeletons have no constraint at all on the macro names: there is no technical/theoretical limitation, as long as you generate the output, you can do what you want. However, of course, that would be a bad idea if, say, the C and C++ skeletons used different approaches and had completely different implementations. That would be a maintenance nightmare.
Below, we document some of the macros that we use in several of the skeletons. If you are to write a new skeleton, please, implement them for your language. Overall, be sure to follow the same patterns as the existing skeletons.
Symbols
b4_symbol(NUM, FIELD)
In order to unify the handling of the various aspects of symbols (tag, type
name, whether terminal, etc.), bison.exe defines one macro per (token,
field), where field can has_id, id, etc.: see
prepare_symbols_definitions() in src/output.c.
The macro b4_symbol(NUM, FIELD) gives access to the following FIELDS:
-
has_id: 0 or 1 Whether the symbol has anid. -
id: string (e.g.,exp,NUM, orTOK_NUMwith api.token.prefix) Ifhas_id, the name of the token kind (prefixed by api.token.prefix if defined), otherwise empty. Guaranteed to be usable as a C identifier. This is used to define the token kind (i.e., the enum used by the return value of yylex). Should be namedtoken_kind. -
tag: string A human readable representation of the symbol. Can be'foo','foo.id','"foo"'etc. -
code: integer The token code associated to the token kindid. The external number as used by yylex. Can be ASCII code when a character, some number chosen by bison, or some user number in the case of%token FOO <NUM>. Corresponds toyycharinyacc.c. -
is_token: 0 or 1 Whether this is a terminal symbol. -
kind_base: string (e.g.,YYSYMBOL_exp,YYSYMBOL_NUM) The base of the symbol kind, i.e., the enumerator of this symbol (token or nonterminal) which is mapped to itsnumber. -
kind: string Same askind_base, but possibly with a prefix in some languages. E.g., EOF'skind_baseandkindareYYSYMBOL_YYEOFin C, but areS_YYEMPTYandsymbol_kind::S_YYEMPTYin C++. -
number: integer The code associated to thekind. The internal number (computed from the external number by yytranslate). Corresponds to yytoken in yacc.c. This is the same number that serves as key in b4_symbol(NUM, FIELD).In bison, symbols are first assigned increasing numbers in order of appearance (but tokens first, then nterms). After grammar reduction, unused nterms are then renumbered to appear last (i.e., first tokens, then used nterms and finally unused nterms). This final number NUM is the one contained in this field, and it is the one used as key in
b4_symbol(NUM, FIELD).The code of the rule actions, however, is emitted before we know what symbols are unused, so they use the original numbers. To avoid confusion, they actually use "orig NUM" instead of just "NUM". bison also emits definitions for
b4_symbol(orig NUM, number)that map from original numbers to the new ones.b4_symbolactually resolvesorig NUMin the other case, i.e.,b4_symbol(orig 42, tag)would return the tag of the symbols whose original number was 42. -
has_type: 0, 1 Whether has a semantic value. -
type_tag: string When api.value.type=union, the generated name for the union member. yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc. -
type: string If it has a semantic value, its type tag, or, if variant are used, its type. In the case of api.value.type=union, type is the real type (e.g. int). -
slot: string If it has a semantic value, the name of the union member (i.e., bounces to eithertype_tagortype). It would be better to fix our mess and always usetypefor the true type of the member, andtype_tagfor the name of the union member. -
has_printer: 0, 1 -
printer: string -
printer_file: string -
printer_line: integer -
printer_loc: location If the symbol has a printer, everything about it. -
has_destructor,destructor,destructor_file,destructor_line,destructor_locLikewise.
b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])
Expansion of $$, $1, $3, etc.
The semantic value from a given VAL.
VAL: some semantic value storage (typically a union). e.g.,yylvalSYMBOL-NUM: the symbol number from which we extract the type tag.TYPE-TAG, the user forced the<TYPE-TAG>.
The result can be used safely, it is put in parens to avoid nasty precedence issues.
b4_lhs_value(SYMBOL-NUM, [TYPE])
Expansion of $$ or $<TYPE>$, for symbol SYMBOL-NUM.
b4_rhs_data(RULE-LENGTH, POS)
The data corresponding to the symbol #POS, where the current rule has
RULE-LENGTH symbols on RHS.
b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])
Expansion of $<TYPE>POS, where the current rule has RULE-LENGTH symbols
on RHS.