Using template buys us nothing, and makes it uselessly complex to
construct a symbol. Besides, it could not be generalized to other
languages, while make_FOO would work in C/Java etc.
* data/lalr1.cc (b4_symbol_): New.
(b4_symbol): Use it.
(b4_symbol_constructor_declaration_)
(b4_symbol_constructor_definition_): Instead of generating
specializations of an overloaded template function, just generate
several functions whose names are forged from the token names
without the token.prefix.
(b4_symbol_constructor_declarations): Generate them for all the
symbols, not just by class of symbol type, now that instead of
specializing a function template by the token, we generate a
function named after the token.
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations): Remove.
* etc/bench.pl.in: Adjust to this new API.
Provide a means to add a prefix to the name of the tokens as output in the
generated files. Because of name clashes, it is good to have such a
prefix such as TOK_ that protects from names such as EOF, FILE etc.
But it clutters the grammar itself.
* data/bison.m4 (token.prefix): Empty by default.
* data/c.m4 (b4_token_enum, b4_token_define): Use it.
* data/lalr1.cc (b4_symbol): Ditto.
This is allows the user to get the type of a token return by
yylex.
* data/lalr1.cc (symbol::token): New.
(yytoknum_): Define when %define lex_symbol, independently of
%debug.
(yytoken_number_): Move into...
(symbol::token): here, since that's the only use.
The other one is YYPRINT which was not officially supported
by lalr1.cc, and anyway it did not work since YYPRINT uses this
array under a different name (yytoknum).
To reach good performances these functions should be inlined (yet this is
to measure precisely). To this end they must be available to the caller.
* data/lalr1.cc (b4_symbol_constructor_definition_): Qualify
location_type with the class name.
Since will now be output in the header, declare "inline".
No longer use b4_symbol_constructor_specializations, but
b4_symbol_constructor_definitions in the header.
Don't call it in the *.cc file.
The constructors are called by the make_symbol functions, which a
forthcoming patch will move elsewhere. Hence the interest of putting them
together.
The stack_symbol_type does not need to be moved, it is used only by the
parser.
* data/lalr1.cc: Move symbol_type and symbol_base_type
constructors into...
(b4_symbol_constructor_definitions): here.
Adjust.
Forthcoming changes will make it possible to use yytranslate_
from outside the parser implementation file.
* data/lalr1.cc (b4_yytranslate_definition): New.
Use it.
* data/lalr1.cc (b4_symbol_constructor_specialization_): No need
to refer to the class name to use a type defined by the class for
arguments of member functions.
This patch is debatable: the tradition expects yylex to return an int
which happens to correspond to token_number (which is an enum). This
allows for instance to return characters (such as '*' etc.). But this
goes against the stronger typing I am trying to have with the new
lex interface which return a symbol_type. So in this case, feed
yytranslate_ with a token_type.
* data/lalr1.cc (yytranslate_): When in %define lex-symbol,
expect a token_type.
The union used to compute the size of the variant used to iterate over the
type of all the symbols, with a lot of redundancy. Now iterate over the
lists of symbols having the same type-name.
* data/lalr1.cc (b4_char_sizeof_): New.
(b4_char_sizeof): Use it.
Adjust to be called with a list of numbers instead of a single
number.
Adjust its caller for new-line issues.
Symbols may have several string representations, for instance if they
have an alias. What I call its "id" is a string that can be used as
an identifier. May not exist.
Currently the symbols which have the "tag_is_id" flag set are those that
don't have an alias. Look harder for the id.
* src/output.c (is_identifier): Move to...
* src/symtab.c (is_identifier): here.
* src/symtab.h, src/symtab.c (symbol_id_get): New.
* src/output.c (symbol_definitions_output): Use it to define "id"
and "has_id".
Remove the definition of "tag_is_id".
* data/lalr1.cc: Use the "id" and "has_id" whereever "tag" and
"tag_is_id" were used to produce code.
We still use "tag" for documentation.
* data/lalr1.cc (_b4_args, b4_args): New.
Adjust all uses of locations to make them optional.
* tests/c++.at (AT_CHECK_VARIANTS): No longer use the locations.
(AT_CHECK_NAMESPACE): Check the use of locations.
* tests/calc.at (_AT_DATA_CALC_Y): Adjust to be usable with or
without locations with lalr1.cc.
Test these cases.
* tests/output.at: Check lalr1.cc with and without location
support.
* tests/regression.at (_AT_DATA_EXPECT2_Y, _AT_DATA_DANCER_Y):
Don't use locations.
* TODO (lalr1.cc/I18n): Remove.
* data/lalr1.cc (yysyntax_error_): Support the translation of the
error messages, as done in yacc.c.
Stay within the yy* pseudo namespace.
* data/lalr1.cc (b4_lex_symbol_if): New.
(parse): When lex_symbol is defined, expected yylex to return the
complete lookahead.
* etc/bench.pl.in (generate_grammar_list): Extend to support this
yylex interface.
(bench_variant_parser): Exercise it.
* data/lalr1.cc (yytranslate_): Handle the EOF case.
Adjust callers.
No longer expect yychar to be equal to yyeof_, rather, test the
lookahead's (translated) kind.
make_symbol provides a means to construct a full symbol (kind, value,
location) in a single shot. It is meant to be a Symbol constructor,
parameterized by the symbol kind so that overloading would prevent
incorrect kind/value pairs. Unfortunately parameterized constructors do
not work well in C++ (unless the parameter also appears as an argument,
which is not acceptable), hence the use of a function instead of a
constructor.
* data/lalr1.cc (b4_symbol_constructor_declaration_)
(b4_symbol_constructor_declarations)
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations)
(b4_symbol_constructor_definition_)
(b4_symbol_constructor_definitions): New.
Use them where appropriate to generate declaration, declaration of
the specializations, and implementations of the templated
overloaded function "make_symbol".
(variant::variant): Always define a default ctor.
Also provide a copy ctor.
(symbol_base_type, symbol_type): New ctor overloads for value-less
symbols.
(symbol_type): Now public, so that functions such as yylex can use
it.
* src/output.c (type_names_output): Document all the symbols,
including those that don't have a type-name.
(symbol_definitions_output): Define "is_token" and
"has_type_name".
* data/lalr1.cc (b4_type_action_): Skip symbols that have an empty
type-name, now that they are defined too in b4_type_names.
Small speedup (1%) on the list grammar. And makes yytranslate_ available
in non member functions.
* data/lalr1.cc (yytranslate_): Does not need to be a instance
function.
* data/c.m4: b4_comment(TEXT): Don't indent empty lines.
* data/lalr1.cc: Don't indent before rule and symbol actions, as
they can be empty, and anyway this incorrectly indents the first
action.
This is just nicer to read, I observed no speedup.
* data/lalr1.cc (yyeof_, yylast_, yynnts_, yyempty_, yyfinal_)
(yterror_, yyerrcode_, yyntokens_): Define as members of an enum.
(yyuser_token_number_max_, yyundef_token_): Move into...
(yytranslate_): here.
Before we were using tables which lines were the symbols and which
columns were things like number, tag, type-name etc. It is was
difficult to extend: each time a column was added, all the numbers had
to be updated (you asked for colon $2, not for "tag"). Also, it was
hard to filter these tables when only a subset of the symbols (say the
tokens, or the nterms, or the tokens that have and external number
*and* a type-name) was of interest.
Now instead of monolithic tables, we define one macro per cell. For
instance "b4_symbol(0, tag)" is a macro name which contents is
self-decriptive. The macro "b4_symbol" provides easier access to
these cells.
* src/output.c (type_names_output): Remove.
(symbol_numbers_output, symbol_definitions_output): New.
(muscles_output): Call them.
(prepare_symbols): Define b4_symbols_number.
This improves the "list" bench by 2%.
* data/lalr1.cc (variant::build): Add an overloaded version with
an argument.
* tests/c++.at (AT_CHECK_VARIANT): Check it.
* data/lalr1.cc (symbol_base_type, symbol_type): New.
(data_type): Rename as...
(stack_symbol_type): this.
Derive from symbol_base_type.
(yy_symbol_value_print_): Merge into...
(yy_symbol_print_): this.
Rename as...
(yy_print_): this.
(yydestruct_): Rename as...
(yy_destroy_): this.
(b4_symbols_actions, YY_SYMBOL_PRINT): Adjust.
(parser::parse): yyla is now of symbol_type.
Use its type member instead of yytoken.
* data/lalr1.cc (b4_symbol_actions): Bounce $$ and @$ to
yydata.value and yydata.location.
(yy_symbol_value_print_, yy_symbol_print_, yydestruct_)
(YY_SYMBOL_PRINT): Now take semantic value and location as a
single arg.
Adjust all callers.
(yydestruct_): New overload for a stack symbol.
To display rhs symbols before a reduction, we used information about the rule
reduced, which required the tables yyrhs and yyprhs. Now use rely only on the
state stack to get the same information.
* data/lalr1.cc (b4_rhs_data, b4_rhs_state): New.
Use them.
(parser::yyrhs_, parser::yyprhs_): Remove.
(parser::yy_reduce_print_): Use the state stack.
* data/lalr1.cc (b4_lhs_value, b4_lhs_location): Adjust to using
yylhs.
(parse): Replace yyval and yyloc with yylhs.value and
yylhs.location.
After a user action, compute yylhs.state earlier.
(yyerrlab1): Do not play tricks with yylhs.location, rather, use a
fresh error_token.
* data/bison.m4 (b4_copyright): Fix the indentation of the
copyright year paragraph.
Use b4_copyright_years when no years are given.
* data/lalr1.cc, data/lalr1-fusion.cc, data/location.cc
(b4_copyright_years): New.
Use it.
This is needed to prepare a forthcoming patch that fuses the three
stacks into one.
* data/lalr1.cc (parser::yypush_): New.
(parser::yynewstate): Change the semantics: instead of arriving to
this label when value and location have been pushed, but yystate
is to be pushed on the state stack, now the three of them must
have been pushed before. yystate still must be the new state.
This allows to use yypush_ everywhere instead of individual
handling of the stacks.
* data/lalr1.cc (b4_symbol_actions): New, overrides the default C
definition to use references instead of pointers.
(yy_symbol_value_print_, yy_symbol_print_, yydestruct_):
Take the value and location as references.
Adjust callers.
This patch was inspired by work by Michiel De Wilde. But he used Boost
variants which (i) requires Boost on the user side, (ii) is slow, and
(iii) has useless overhead (the parser knows the type of the semantic value
there is no reason to duplicate this information as Boost.Variants do).
This implementation reserves a buffer large enough to store the largest
objects. yy::variant implements this buffer. It was implemented with
Quentin Hocquet.
* src/output.c (type_names_output): New.
(output_skeleton): Invoke it.
* data/c++.m4 (b4_variant_if): New.
(b4_symbol_value): If needed, provide a definition for variants.
* data/lalr1.cc (b4_symbol_value, b4_symbol_action_)
(b4_symbol_variant, _b4_char_sizeof_counter, _b4_char_sizeof_dummy)
(b4_char_sizeof, yy::variant): New.
(parser::parse): If variants are requested, define
parser::union_type, parser::variant, change the definition of
semantic_type, construct $$ before running the user action instead
of performing a default $$ = $1.
* examples/variant.yy: New.
Based on an example by Michiel De Wilde.