diff --git a/NEWS b/NEWS index 8e5b2da2..83163f56 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,20 @@ GNU Bison NEWS The C++ deterministic skeleton (lalr1.cc) now supports LAC, via the %define variable parse.lac. +*** Variable api.token.raw: Optimized token numbers (all skeletons) + + In the generated parsers, tokens have two numbers: the "external" token + number as returned by yylex (which starts at 257), and the "internal" + symbol number (which starts at 3). Each time yylex is called, a table + lookup maps the external token number to the internal symbol number. + + When the %define variable api.token.raw is set, tokens are assigned their + internal number, which saves one table lookup per token, and also saves + the generation of the mapping table. + + The gain is typically moderate, but in extreme cases (very simple user + actions), a 10% improvement can be observed. + *** Debug traces in Java The Java backend no longer emits code and data for parser tracing if the diff --git a/TODO b/TODO index 0ddd6729..f0ec27da 100644 --- a/TODO +++ b/TODO @@ -73,7 +73,11 @@ syntax error, unexpected $end, expecting ↦ or 🎅🐃 or '\n' While at it, we should stop using "$end" by default, in favor of "end of -file", or "end of input", whatever. +file", or "end of input", whatever. See how lalr1.java does that. + +** api.token.raw +Maybe we should exhibit the YYUNDEFTOK token. It could also be assigned a +semantic value so that yyerror could be used to report invalid lexemes. * Bison 3.6 ** Unit rules diff --git a/doc/bison.texi b/doc/bison.texi index 9b6981d3..5a171639 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -6212,6 +6212,42 @@ introduced in Bison 3.0 @c api.token.prefix +@c ================================================== api.token.raw +@deffn Directive {%define api.token.raw} + +@itemize @bullet +@item Language(s): +all + +@item Purpose: +The output files normally define the tokens with Yacc-compatible token +numbers: sequential numbers starting at 257 except for single character +tokens which stand for themselves (e.g., in ASCII, @samp{'a'} is numbered +65). The parser however uses symbol numbers assigned sequentially starting +at 3. Therefore each time the scanner returns an (external) token number, +it must be mapped to the (internal) symbol number. + +When @code{api.token.raw} is set, tokens are assigned their internal number, +which saves one table lookup per token to map them from the external to the +internal number, and also saves the generation of the mapping table. The +gain is typically moderate, but in extreme cases (very simple user actions), +a 10% improvement can be observed. + +When @code{api.token.raw} is set, the grammar cannot use character literals +(such as @samp{'a'}). + +@item Accepted Values: Boolean. + +@item Default Value: +@code{false} +@item History: +introduced in Bison 3.5. Was initialy introduced in Bison 1.25 as +@samp{%raw}, but never worked and was removed in Bison 1.29. +@end itemize +@end deffn +@c api.token.raw + + @c ================================================== api.value.automove @deffn Directive {%define api.value.automove}