* data/c++.m4 (b4_public_types_declare): Now define
symbol_type_base and symbol_type.
(b4_public_types_define): New.
In both cases, the definitions are taken verbatim from lalr1.cc.
* data/lalr1.cc: Adjust.
* data/c++.m4 (b4_semantic_type_declare): New.
Factors and generalizes what was in glr.cc and lalr1.cc.
* data/variant.hh (b4_semantic_type_declare): Redefine it for
variants.
* data/lalr1.cc, data/glr.cc: Use it.
* data/lalr1.cc: here.
There is no good reason to keep it private (and it is convenient
to use it from the scanner for instance). It is already public in
glr.cc.
This is a temporary band-aid until Bison gets proper alignment handling.
We need it on ARM.
* data/lalr1.cc (variant): Declare the buffer as a union to force
the same alignment as "long double".
* data/lalr1.cc: Comment changes.
* data/yacc.c (yysyntax_error): Rewrite, using a switch as in
lalr1.cc instead of building dynamically the format string.
Instead of defining complex list of tuples to define various properties of
the symbols, we now prefer to define symbols as "structs" in m4: using the
symbol key (its number), and the property name, b4_symbol gives it value.
Use this to handle destructors and printers.
* src/output.c (CODE_PROP): New.
(prepare_symbol_definitions): Use it to define the printer and
destructor related attributes of the symbols.
* data/lalr1.cc (b4_symbol_actions): Rename as...
(b4_symbol_action): this.
Use b4_symbol instead of 6 arguments.
(b4_symbol_printer, b4_symbol_destructor): New.
Use them instead of b4_symbol_actions.
* data/lalr1.cc (b4_tables_map): Move to...
* data/bison.m4: here.
Update the comment for yytable during the flight.
(b4_tables_declare, b4_tables_define): New.
* data/lalr1.cc: Use them.
* data/c.m4 (b4_table_define): New.
* data/yacc.c: Use b4_tables_define instead of output the tables
by hand.
* tests/regression.at (Web2c Actions): Adjust the expected output,
the order of the tables changed.
The point is to factor the generation of the tables across skeletons.
This is language dependant.
* data/c.m4 (b4_comment_): New.
Should be usable to define how to generate tables independently of
the language.
(b4_c_comment): New.
(b4_comment): Bounce to b4_c_comment.
Now support $2 = [PREFIX] for indentation.
* data/lalr1.cc (b4_table_declare): Don't output a comment if
there is no comment.
Indent it properly when there is one.
Output the ending semicolon.
(b4_table_define): Space changes.
Output the ending semicolon.
(b4_tables_map): New.
Use it twice instead of declaring and defining the (integral)
tables by hand.
* data/lalr1.cc (b4_table_declare): New.
Use it to declare the tables defined with b4_table_define.
(b4_table_define): Declare a third arg to match b4_table_declare
signature.
Move all the comments around invocations of b4_table_define into
the invocations itselves.
Move things around to have the order for declarations and
definitions.
* data/lalr1.cc (b4_subtract): Move to...
* data/bison.m4: here.
* data/glr.c (b4_rhs_data): Use it.
* data/yacc.c (b4_rhs_value, b4_rhs_location): Use it.
There are two issues to handle: first scanning nested angle bracket pairs
to support types such as std::pair< std::string, std::list<std::string> > >.
Another issue is to address idiosyncracies of C++: do not glue two closing
angle brackets together (otherwise it's operator>>), and avoid sticking
blindly a TYPE to the opening <, as it can result in '<:' which is a
digraph for '['.
* src/scan-gram.l (brace_level): Rename as...
(nesting): this.
(SC_TAG): New.
Implement support for complex tags.
(tag): Accept \n, but not <.
* data/lalr1.cc (b4_symbol_value, b4_symbol_value_template)
(b4_symbol_variant): Leave space around types as parameters.
* examples/variant.yy: Use nested template types and leading ::.
* src/parse-gram.y (TYPE, TYPE_TAG_ANY, TYPE_TAG_NONE, type.opt):
Rename as...
(TAG, TAG_ANY, TAG_NONE, tag.opt): these.
* tests/c++.at: Test parametric types.
Using template buys us nothing, and makes it uselessly complex to
construct a symbol. Besides, it could not be generalized to other
languages, while make_FOO would work in C/Java etc.
* data/lalr1.cc (b4_symbol_): New.
(b4_symbol): Use it.
(b4_symbol_constructor_declaration_)
(b4_symbol_constructor_definition_): Instead of generating
specializations of an overloaded template function, just generate
several functions whose names are forged from the token names
without the token.prefix.
(b4_symbol_constructor_declarations): Generate them for all the
symbols, not just by class of symbol type, now that instead of
specializing a function template by the token, we generate a
function named after the token.
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations): Remove.
* etc/bench.pl.in: Adjust to this new API.
Provide a means to add a prefix to the name of the tokens as output in the
generated files. Because of name clashes, it is good to have such a
prefix such as TOK_ that protects from names such as EOF, FILE etc.
But it clutters the grammar itself.
* data/bison.m4 (token.prefix): Empty by default.
* data/c.m4 (b4_token_enum, b4_token_define): Use it.
* data/lalr1.cc (b4_symbol): Ditto.
This is allows the user to get the type of a token return by
yylex.
* data/lalr1.cc (symbol::token): New.
(yytoknum_): Define when %define lex_symbol, independently of
%debug.
(yytoken_number_): Move into...
(symbol::token): here, since that's the only use.
The other one is YYPRINT which was not officially supported
by lalr1.cc, and anyway it did not work since YYPRINT uses this
array under a different name (yytoknum).
To reach good performances these functions should be inlined (yet this is
to measure precisely). To this end they must be available to the caller.
* data/lalr1.cc (b4_symbol_constructor_definition_): Qualify
location_type with the class name.
Since will now be output in the header, declare "inline".
No longer use b4_symbol_constructor_specializations, but
b4_symbol_constructor_definitions in the header.
Don't call it in the *.cc file.
The constructors are called by the make_symbol functions, which a
forthcoming patch will move elsewhere. Hence the interest of putting them
together.
The stack_symbol_type does not need to be moved, it is used only by the
parser.
* data/lalr1.cc: Move symbol_type and symbol_base_type
constructors into...
(b4_symbol_constructor_definitions): here.
Adjust.
Forthcoming changes will make it possible to use yytranslate_
from outside the parser implementation file.
* data/lalr1.cc (b4_yytranslate_definition): New.
Use it.
* data/lalr1.cc (b4_symbol_constructor_specialization_): No need
to refer to the class name to use a type defined by the class for
arguments of member functions.
This patch is debatable: the tradition expects yylex to return an int
which happens to correspond to token_number (which is an enum). This
allows for instance to return characters (such as '*' etc.). But this
goes against the stronger typing I am trying to have with the new
lex interface which return a symbol_type. So in this case, feed
yytranslate_ with a token_type.
* data/lalr1.cc (yytranslate_): When in %define lex-symbol,
expect a token_type.
The union used to compute the size of the variant used to iterate over the
type of all the symbols, with a lot of redundancy. Now iterate over the
lists of symbols having the same type-name.
* data/lalr1.cc (b4_char_sizeof_): New.
(b4_char_sizeof): Use it.
Adjust to be called with a list of numbers instead of a single
number.
Adjust its caller for new-line issues.
Symbols may have several string representations, for instance if they
have an alias. What I call its "id" is a string that can be used as
an identifier. May not exist.
Currently the symbols which have the "tag_is_id" flag set are those that
don't have an alias. Look harder for the id.
* src/output.c (is_identifier): Move to...
* src/symtab.c (is_identifier): here.
* src/symtab.h, src/symtab.c (symbol_id_get): New.
* src/output.c (symbol_definitions_output): Use it to define "id"
and "has_id".
Remove the definition of "tag_is_id".
* data/lalr1.cc: Use the "id" and "has_id" whereever "tag" and
"tag_is_id" were used to produce code.
We still use "tag" for documentation.
* data/lalr1.cc (_b4_args, b4_args): New.
Adjust all uses of locations to make them optional.
* tests/c++.at (AT_CHECK_VARIANTS): No longer use the locations.
(AT_CHECK_NAMESPACE): Check the use of locations.
* tests/calc.at (_AT_DATA_CALC_Y): Adjust to be usable with or
without locations with lalr1.cc.
Test these cases.
* tests/output.at: Check lalr1.cc with and without location
support.
* tests/regression.at (_AT_DATA_EXPECT2_Y, _AT_DATA_DANCER_Y):
Don't use locations.
* TODO (lalr1.cc/I18n): Remove.
* data/lalr1.cc (yysyntax_error_): Support the translation of the
error messages, as done in yacc.c.
Stay within the yy* pseudo namespace.
* data/lalr1.cc (b4_lex_symbol_if): New.
(parse): When lex_symbol is defined, expected yylex to return the
complete lookahead.
* etc/bench.pl.in (generate_grammar_list): Extend to support this
yylex interface.
(bench_variant_parser): Exercise it.
* data/lalr1.cc (yytranslate_): Handle the EOF case.
Adjust callers.
No longer expect yychar to be equal to yyeof_, rather, test the
lookahead's (translated) kind.
make_symbol provides a means to construct a full symbol (kind, value,
location) in a single shot. It is meant to be a Symbol constructor,
parameterized by the symbol kind so that overloading would prevent
incorrect kind/value pairs. Unfortunately parameterized constructors do
not work well in C++ (unless the parameter also appears as an argument,
which is not acceptable), hence the use of a function instead of a
constructor.
* data/lalr1.cc (b4_symbol_constructor_declaration_)
(b4_symbol_constructor_declarations)
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations)
(b4_symbol_constructor_definition_)
(b4_symbol_constructor_definitions): New.
Use them where appropriate to generate declaration, declaration of
the specializations, and implementations of the templated
overloaded function "make_symbol".
(variant::variant): Always define a default ctor.
Also provide a copy ctor.
(symbol_base_type, symbol_type): New ctor overloads for value-less
symbols.
(symbol_type): Now public, so that functions such as yylex can use
it.
* src/output.c (type_names_output): Document all the symbols,
including those that don't have a type-name.
(symbol_definitions_output): Define "is_token" and
"has_type_name".
* data/lalr1.cc (b4_type_action_): Skip symbols that have an empty
type-name, now that they are defined too in b4_type_names.