Using template buys us nothing, and makes it uselessly complex to
construct a symbol. Besides, it could not be generalized to other
languages, while make_FOO would work in C/Java etc.
* data/lalr1.cc (b4_symbol_): New.
(b4_symbol): Use it.
(b4_symbol_constructor_declaration_)
(b4_symbol_constructor_definition_): Instead of generating
specializations of an overloaded template function, just generate
several functions whose names are forged from the token names
without the token.prefix.
(b4_symbol_constructor_declarations): Generate them for all the
symbols, not just by class of symbol type, now that instead of
specializing a function template by the token, we generate a
function named after the token.
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations): Remove.
* etc/bench.pl.in: Adjust to this new API.
Provide a means to add a prefix to the name of the tokens as output in the
generated files. Because of name clashes, it is good to have such a
prefix such as TOK_ that protects from names such as EOF, FILE etc.
But it clutters the grammar itself.
* data/bison.m4 (token.prefix): Empty by default.
* data/c.m4 (b4_token_enum, b4_token_define): Use it.
* data/lalr1.cc (b4_symbol): Ditto.
This is allows the user to get the type of a token return by
yylex.
* data/lalr1.cc (symbol::token): New.
(yytoknum_): Define when %define lex_symbol, independently of
%debug.
(yytoken_number_): Move into...
(symbol::token): here, since that's the only use.
The other one is YYPRINT which was not officially supported
by lalr1.cc, and anyway it did not work since YYPRINT uses this
array under a different name (yytoknum).
To reach good performances these functions should be inlined (yet this is
to measure precisely). To this end they must be available to the caller.
* data/lalr1.cc (b4_symbol_constructor_definition_): Qualify
location_type with the class name.
Since will now be output in the header, declare "inline".
No longer use b4_symbol_constructor_specializations, but
b4_symbol_constructor_definitions in the header.
Don't call it in the *.cc file.
The constructors are called by the make_symbol functions, which a
forthcoming patch will move elsewhere. Hence the interest of putting them
together.
The stack_symbol_type does not need to be moved, it is used only by the
parser.
* data/lalr1.cc: Move symbol_type and symbol_base_type
constructors into...
(b4_symbol_constructor_definitions): here.
Adjust.
Forthcoming changes will make it possible to use yytranslate_
from outside the parser implementation file.
* data/lalr1.cc (b4_yytranslate_definition): New.
Use it.
* data/lalr1.cc (b4_symbol_constructor_specialization_): No need
to refer to the class name to use a type defined by the class for
arguments of member functions.
This patch is debatable: the tradition expects yylex to return an int
which happens to correspond to token_number (which is an enum). This
allows for instance to return characters (such as '*' etc.). But this
goes against the stronger typing I am trying to have with the new
lex interface which return a symbol_type. So in this case, feed
yytranslate_ with a token_type.
* data/lalr1.cc (yytranslate_): When in %define lex-symbol,
expect a token_type.
The union used to compute the size of the variant used to iterate over the
type of all the symbols, with a lot of redundancy. Now iterate over the
lists of symbols having the same type-name.
* data/lalr1.cc (b4_char_sizeof_): New.
(b4_char_sizeof): Use it.
Adjust to be called with a list of numbers instead of a single
number.
Adjust its caller for new-line issues.
Symbols may have several string representations, for instance if they
have an alias. What I call its "id" is a string that can be used as
an identifier. May not exist.
Currently the symbols which have the "tag_is_id" flag set are those that
don't have an alias. Look harder for the id.
* src/output.c (is_identifier): Move to...
* src/symtab.c (is_identifier): here.
* src/symtab.h, src/symtab.c (symbol_id_get): New.
* src/output.c (symbol_definitions_output): Use it to define "id"
and "has_id".
Remove the definition of "tag_is_id".
* data/lalr1.cc: Use the "id" and "has_id" whereever "tag" and
"tag_is_id" were used to produce code.
We still use "tag" for documentation.
* data/lalr1.cc (_b4_args, b4_args): New.
Adjust all uses of locations to make them optional.
* tests/c++.at (AT_CHECK_VARIANTS): No longer use the locations.
(AT_CHECK_NAMESPACE): Check the use of locations.
* tests/calc.at (_AT_DATA_CALC_Y): Adjust to be usable with or
without locations with lalr1.cc.
Test these cases.
* tests/output.at: Check lalr1.cc with and without location
support.
* tests/regression.at (_AT_DATA_EXPECT2_Y, _AT_DATA_DANCER_Y):
Don't use locations.
* TODO (lalr1.cc/I18n): Remove.
* data/lalr1.cc (yysyntax_error_): Support the translation of the
error messages, as done in yacc.c.
Stay within the yy* pseudo namespace.
* data/lalr1.cc (b4_lex_symbol_if): New.
(parse): When lex_symbol is defined, expected yylex to return the
complete lookahead.
* etc/bench.pl.in (generate_grammar_list): Extend to support this
yylex interface.
(bench_variant_parser): Exercise it.
* data/lalr1.cc (yytranslate_): Handle the EOF case.
Adjust callers.
No longer expect yychar to be equal to yyeof_, rather, test the
lookahead's (translated) kind.
make_symbol provides a means to construct a full symbol (kind, value,
location) in a single shot. It is meant to be a Symbol constructor,
parameterized by the symbol kind so that overloading would prevent
incorrect kind/value pairs. Unfortunately parameterized constructors do
not work well in C++ (unless the parameter also appears as an argument,
which is not acceptable), hence the use of a function instead of a
constructor.
* data/lalr1.cc (b4_symbol_constructor_declaration_)
(b4_symbol_constructor_declarations)
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations)
(b4_symbol_constructor_definition_)
(b4_symbol_constructor_definitions): New.
Use them where appropriate to generate declaration, declaration of
the specializations, and implementations of the templated
overloaded function "make_symbol".
(variant::variant): Always define a default ctor.
Also provide a copy ctor.
(symbol_base_type, symbol_type): New ctor overloads for value-less
symbols.
(symbol_type): Now public, so that functions such as yylex can use
it.
Test 214 was failing: it greps with a pattern containing [ ]* which
obviously meant to catch spaces and tabs, but contained only tabs.
Tabulations in sources are a nuisance, so to simplify the matter, get rid
of all the tabulations in the Java sources. The other skeletons will be
treated equally later.
* data/java.m4, data/lalr1.java: Untabify.
* tests/java.at: Simplify AT_CHECK_JAVA_GREP invocations:
tabulations are no longer generated.
* NEWS: Document them.
General Java skeleton improvements.
* configure.ac (gt_JAVACOMP): Request target of 1.4, which allows
using gcj < 4.3 in the testsuite, according to comments in
gnulib/m4/javacomp.m4.
* data/java.m4 (stype, parser_class_name, lex_throws, throws,
location_type, position_type): Remove extraneous brackets from
b4_percent_define_default.
(b4_lex_param, b4_parse_param): Remove extraneous brackets from
m4_define and m4_define_default.
* data/lalr1.java (b4_pre_prologue): Change to b4_user_post_prologue,
which marks the end of user code with appropriate syncline, like all
the other skeletons.
(b4_user_post_prologue): Add. Don't silently drop.
(yylex): Remove.
(parse): Inline yylex.
* doc/bison.texinfo (bisonVersion, bisonSkeleton): Document.
(%{...%}): Fix typo of %code imports.
* tests/java.at (AT_JAVA_COMPILE): Add "java" keyword.
Support annotations on parser class with %define annotations.
* data/lalr1.java (annotations): Add to parser class modifier.
* doc/bison.texinfo (Java Parser Interface): Document
%define annotations.
(Java Declarations Summary): Document %define annotations.
* tests/java.at (Java parser class modifiers): Test annotations.
Do not generate code for %error-verbose unless requested.
* data/lalr1.java (errorVerbose): Rename to yyErrorVerbose.
Make private. Make conditional on %error-verbose.
(getErrorVerbose, setErrorVerbose): New.
(yytnamerr_): Make conditional on %error-verbose.
(yysyntax_error): Make some code conditional on %error-verbose.
* doc/bison.texinfo (Java Bison Interface): Remove the parts
about %error-verbose having no effect.
(getErrorVerbose, setErrorVerbose): Document.
Move constants for token names to Lexer interface.
* data/lalr1.java (Lexer): Move EOF, b4_token_enums(b4_tokens) here.
* data/java.m4 (b4_token_enum): Indent for move to Lexer interface.
(parse): Qualify EOF to Lexer.EOF.
* doc/bison.texinfo (Java Parser Interface): Move documentation of
EOF and token names to Java Lexer Interface.
* tests/java.at (_AT_DATA_JAVA_CALC_Y): Remove Calc qualifier.
Make yyerror public.
* data/lalr1.java (Lexer.yyerror): Use longer parameter name.
(yyerror): Change to public. Add Javadoc comments. Use longer
parameter names. Make the body rather than the declarator
conditional on %locations.
* doc/bison.texinfo (yyerror): Document. Don't mark as protected.
Allow user to add code to the constructor with %code init.
* data/java.m4 (b4_init_throws): New, for %define init_throws.
* data/lalr1.java (YYParser.YYParser): Add b4_init_throws.
Add %code init to the front of the constructor body.
* doc/bison.texinfo (YYParser.YYParser): Document %code init
and %define init_throws.
(Java Declarations Summary): Document %code init and
%define init_throws.
* tests/java.at (Java %parse-param and %lex-param): Adjust grep.
(Java constructor init and init_throws): Add tests.
* src/output.c (type_names_output): Document all the symbols,
including those that don't have a type-name.
(symbol_definitions_output): Define "is_token" and
"has_type_name".
* data/lalr1.cc (b4_type_action_): Skip symbols that have an empty
type-name, now that they are defined too in b4_type_names.
Small speedup (1%) on the list grammar. And makes yytranslate_ available
in non member functions.
* data/lalr1.cc (yytranslate_): Does not need to be a instance
function.
* data/c.m4: b4_comment(TEXT): Don't indent empty lines.
* data/lalr1.cc: Don't indent before rule and symbol actions, as
they can be empty, and anyway this incorrectly indents the first
action.
This is just nicer to read, I observed no speedup.
* data/lalr1.cc (yyeof_, yylast_, yynnts_, yyempty_, yyfinal_)
(yterror_, yyerrcode_, yyntokens_): Define as members of an enum.
(yyuser_token_number_max_, yyundef_token_): Move into...
(yytranslate_): here.
Before we were using tables which lines were the symbols and which
columns were things like number, tag, type-name etc. It is was
difficult to extend: each time a column was added, all the numbers had
to be updated (you asked for colon $2, not for "tag"). Also, it was
hard to filter these tables when only a subset of the symbols (say the
tokens, or the nterms, or the tokens that have and external number
*and* a type-name) was of interest.
Now instead of monolithic tables, we define one macro per cell. For
instance "b4_symbol(0, tag)" is a macro name which contents is
self-decriptive. The macro "b4_symbol" provides easier access to
these cells.
* src/output.c (type_names_output): Remove.
(symbol_numbers_output, symbol_definitions_output): New.
(muscles_output): Call them.
(prepare_symbols): Define b4_symbols_number.
This improves the "list" bench by 2%.
* data/lalr1.cc (variant::build): Add an overloaded version with
an argument.
* tests/c++.at (AT_CHECK_VARIANT): Check it.
* data/lalr1.cc (symbol_base_type, symbol_type): New.
(data_type): Rename as...
(stack_symbol_type): this.
Derive from symbol_base_type.
(yy_symbol_value_print_): Merge into...
(yy_symbol_print_): this.
Rename as...
(yy_print_): this.
(yydestruct_): Rename as...
(yy_destroy_): this.
(b4_symbols_actions, YY_SYMBOL_PRINT): Adjust.
(parser::parse): yyla is now of symbol_type.
Use its type member instead of yytoken.
* data/lalr1.cc (b4_symbol_actions): Bounce $$ and @$ to
yydata.value and yydata.location.
(yy_symbol_value_print_, yy_symbol_print_, yydestruct_)
(YY_SYMBOL_PRINT): Now take semantic value and location as a
single arg.
Adjust all callers.
(yydestruct_): New overload for a stack symbol.
To display rhs symbols before a reduction, we used information about the rule
reduced, which required the tables yyrhs and yyprhs. Now use rely only on the
state stack to get the same information.
* data/lalr1.cc (b4_rhs_data, b4_rhs_state): New.
Use them.
(parser::yyrhs_, parser::yyprhs_): Remove.
(parser::yy_reduce_print_): Use the state stack.
* data/lalr1.cc (b4_lhs_value, b4_lhs_location): Adjust to using
yylhs.
(parse): Replace yyval and yyloc with yylhs.value and
yylhs.location.
After a user action, compute yylhs.state earlier.
(yyerrlab1): Do not play tricks with yylhs.location, rather, use a
fresh error_token.
* data/lalr1-fusion.cc (yydestruct_): Invoke the variant's
destructor.
Display the value only if yymsg is nonnull.
(yyreduce): Invoke yydestruct_ when popping lhs symbols.
This is used to help the user catch cases where some value gets
ovewritten by a new one. This should not happen, as this will
probably leak.
Unfortunately this uncovered a bug in the C++ parser itself: the
lookahead value was not destroyed between two calls to yylex. For
instance if the previous lookahead was a std::string, and then an int,
then the value of the std::string was correctly taken (i.e., the
lookahead was now an empty string), but std::string structure itself
was not reclaimed.
This is now done in variant::build(other&) (which is used to take the
value of the lookahead): other is not only stolen from its value, it
is also destroyed. This incurs a new performance penalty of a few
percent, and union becomes faster again.
* data/lalr1-fusion.cc (variant::build(other&)): Destroy other.
(b4_variant_if): New.
(variant::built): New.
Use it whereever the status of the variant changes.
* etc/bench.pl.in: Check the penalty of %define assert.