* doc/bison.texinfo (Location Tracking Calc): New node.

2026-03-18 00:33:03 +00:00 · 2001-08-29 12:16:04 +00:00
parent 870f12c270
commit db433e9db8
8 changed files with 765 additions and 355 deletions
--- a/doc/bison.info-2
+++ b/doc/bison.info-2
@@ -494,7 +494,7 @@ Precedence.
     9


-File: bison.info,  Node: Simple Error Recovery,  Next: Multi-function Calc,  Prev: Infix Calc,  Up: Examples
+File: bison.info,  Node: Simple Error Recovery,  Next: Location Tracking Calc,  Prev: Infix Calc,  Up: Examples

 Simple Error Recovery
 =====================
@@ -533,7 +533,192 @@ the current line of input.  We won't discuss this issue further because
 it is not specific to Bison programs.


-File: bison.info,  Node: Multi-function Calc,  Next: Exercises,  Prev: Simple Error Recovery,  Up: Examples
+File: bison.info,  Node: Location Tracking Calc,  Next: Multi-function Calc,  Prev: Simple Error Recovery,  Up: Examples
+
+Location Tracking Calculator: `ltcalc'
+======================================
+
+   This example extends the infix notation calculator with location
+tracking.  This feature will be used to improve error reporting, and
+provide better error messages.
+
+   For the sake of clarity, we will switch for this example to an
+integer calculator, since most of the work needed to use locations will
+be done in the lexical analyser.
+
+* Menu:
+
+* Decls: Ltcalc Decls.  Bison and C declarations for ltcalc.
+* Rules: Ltcalc Rules.  Grammar rules for ltcalc, with explanations.
+* Lexer: Ltcalc Lexer.  The lexical analyzer.
+
+
+File: bison.info,  Node: Ltcalc Decls,  Next: Ltcalc Rules,  Up: Location Tracking Calc
+
+Declarations for `ltcalc'
+-------------------------
+
+   The C and Bison declarations for the location tracking calculator
+are the same as the declarations for the infix notation calculator.
+
+     /* Location tracking calculator.  */
+     
+     %{
+     #define YYSTYPE int
+     #include <math.h>
+     %}
+     
+     /* Bison declarations.  */
+     %token NUM
+     
+     %left '-' '+'
+     %left '*' '/'
+     %left NEG
+     %right '^'
+     
+     %% /* Grammar follows */
+
+   In the code above, there are no declarations specific to locations.
+Defining a data type for storing locations is not needed: we will use
+the type provided by default (*note Data Types of Locations: Location
+Type.), which is a four member structure with the following integer
+fields: `first_line', `first_column', `last_line' and `last_column'.
+
+
+File: bison.info,  Node: Ltcalc Rules,  Next: Ltcalc Lexer,  Prev: Ltcalc Decls,  Up: Location Tracking Calc
+
+Grammar Rules for `ltcalc'
+--------------------------
+
+   Whether you choose to handle locations or not has no effect on the
+syntax of your language.  Therefore, grammar rules for this example
+will be very close to those of the previous example: we will only
+modify them to benefit from the new informations we will have.
+
+   Here, we will use locations to report divisions by zero, and locate
+the wrong expressions or subexpressions.
+
+     input   : /* empty */
+             | input line
+     ;
+     
+     line    : '\n'
+             | exp '\n' { printf ("%d\n", $1); }
+     ;
+     
+     exp     : NUM           { $$ = $1; }
+             | exp '+' exp   { $$ = $1 + $3; }
+             | exp '-' exp   { $$ = $1 - $3; }
+             | exp '*' exp   { $$ = $1 * $3; }
+             | exp '/' exp
+                 {
+                   if ($3)
+                     $$ = $1 / $3;
+                   else
+                     {
+                       $$ = 1;
+                       printf("Division by zero, l%d,c%d-l%d,c%d",
+                              @3.first_line, @3.first_column,
+                              @3.last_line, @3.last_column);
+                     }
+                 }
+             | '-' exp %preg NEG     { $$ = -$2; }
+             | exp '^' exp           { $$ = pow ($1, $3); }
+             | '(' exp ')'           { $$ = $2; }
+
+   This code shows how to reach locations inside of semantic actions, by
+using the pseudo-variables `@N' for rule components, and the
+pseudo-variable `@$' for groupings.
+
+   In this example, we never assign a value to `@$', because the output
+parser can do this automatically.  By default, before executing the C
+code of each action, `@$' is set to range from the beginning of `@1' to
+the end of `@N', for a rule with N components.
+
+   Of course, this behavior can be redefined (*note Default Action for
+Locations: Location Default Action.), and for very specific rules, `@$'
+can be computed by hand.
+
+
+File: bison.info,  Node: Ltcalc Lexer,  Prev: Ltcalc Rules,  Up: Location Tracking Calc
+
+The `ltcalc' Lexical Analyzer.
+------------------------------
+
+   Until now, we relied on Bison's defaults to enable location
+tracking. The next step is to rewrite the lexical analyser, and make it
+able to feed the parser with locations of tokens, as he already does
+for semantic values.
+
+   To do so, we must take into account every single character of the
+input text, to avoid the computed locations of being fuzzy or wrong:
+
+     int
+     yylex (void)
+     {
+       int c;
+     
+       /* skip white space */
+       while ((c = getchar ()) == ' ' || c == '\t')
+         ++yylloc.last_column;
+     
+       /* step */
+       yylloc.first_line = yylloc.last_line;
+       yylloc.first_column = yylloc.last_column;
+     
+       /* process numbers */
+       if (isdigit (c))
+         {
+           yylval = c - '0';
+           ++yylloc.last_column;
+           while (isdigit (c = getchar ()))
+             {
+               ++yylloc.last_column;
+               yylval = yylval * 10 + c - '0';
+             }
+           ungetc (c, stdin);
+           return NUM;
+         }
+     
+       /* return end-of-file */
+       if (c == EOF)
+         return 0;
+     
+       /* return single chars and update location */
+       if (c == '\n')
+         {
+           ++yylloc.last_line;
+           yylloc.last_column = 0;
+         }
+       else
+         ++yylloc.last_column;
+       return c;
+     }
+
+   Basically, the lexical analyzer does the same processing as before:
+it skips blanks and tabs, and reads numbers or single-character tokens.
+In addition to this, it updates the `yylloc' global variable (of type
+`YYLTYPE'), where the location of tokens is stored.
+
+   Now, each time this function returns a token, the parser has it's
+number as well as it's semantic value, and it's position in the text.
+The last needed change is to initialize `yylloc', for example in the
+controlling function:
+
+     int
+     main (void)
+     {
+       yylloc.first_line = yylloc.last_line = 1;
+       yylloc.first_column = yylloc.last_column = 0;
+       return yyparse ();
+     }
+
+   Remember that computing locations is not a matter of syntax.  Every
+character must be associated to a location update, whether it is in
+valid input, in comments, in literal strings, and so on...
+
+
+File: bison.info,  Node: Multi-function Calc,  Next: Exercises,  Prev: Location Tracking Calc,  Up: Examples

 Multi-Function Calculator: `mfcalc'
 ===================================
@@ -1240,147 +1425,3 @@ associated with X and Y.
                      This says when, why and how to use the exceptional
                        action in the middle of a rule.

-
-File: bison.info,  Node: Value Type,  Next: Multiple Types,  Up: Semantics
-
-Data Types of Semantic Values
-----------------------------
-
-   In a simple program it may be sufficient to use the same data type
-for the semantic values of all language constructs.  This was true in
-the RPN and infix calculator examples (*note Reverse Polish Notation
-Calculator: RPN Calc.).
-
-   Bison's default is to use type `int' for all semantic values.  To
-specify some other type, define `YYSTYPE' as a macro, like this:
-
-     #define YYSTYPE double
-
-This macro definition must go in the C declarations section of the
-grammar file (*note Outline of a Bison Grammar: Grammar Outline.).
-
-
-File: bison.info,  Node: Multiple Types,  Next: Actions,  Prev: Value Type,  Up: Semantics
-
-More Than One Value Type
------------------------
-
-   In most programs, you will need different data types for different
-kinds of tokens and groupings.  For example, a numeric constant may
-need type `int' or `long', while a string constant needs type `char *',
-and an identifier might need a pointer to an entry in the symbol table.
-
-   To use more than one data type for semantic values in one parser,
-Bison requires you to do two things:
-
-   * Specify the entire collection of possible data types, with the
-     `%union' Bison declaration (*note The Collection of Value Types:
-     Union Decl.).
-
-   * Choose one of those types for each symbol (terminal or
-     nonterminal) for which semantic values are used.  This is done for
-     tokens with the `%token' Bison declaration (*note Token Type
-     Names: Token Decl.)  and for groupings with the `%type' Bison
-     declaration (*note Nonterminal Symbols: Type Decl.).
-
-
-File: bison.info,  Node: Actions,  Next: Action Types,  Prev: Multiple Types,  Up: Semantics
-
-Actions
-------
-
-   An action accompanies a syntactic rule and contains C code to be
-executed each time an instance of that rule is recognized.  The task of
-most actions is to compute a semantic value for the grouping built by
-the rule from the semantic values associated with tokens or smaller
-groupings.
-
-   An action consists of C statements surrounded by braces, much like a
-compound statement in C.  It can be placed at any position in the rule;
-it is executed at that position.  Most rules have just one action at
-the end of the rule, following all the components.  Actions in the
-middle of a rule are tricky and used only for special purposes (*note
-Actions in Mid-Rule: Mid-Rule Actions.).
-
-   The C code in an action can refer to the semantic values of the
-components matched by the rule with the construct `$N', which stands for
-the value of the Nth component.  The semantic value for the grouping
-being constructed is `$$'.  (Bison translates both of these constructs
-into array element references when it copies the actions into the parser
-file.)
-
-   Here is a typical example:
-
-     exp:    ...
-             | exp '+' exp
-                 { $$ = $1 + $3; }
-
-This rule constructs an `exp' from two smaller `exp' groupings
-connected by a plus-sign token.  In the action, `$1' and `$3' refer to
-the semantic values of the two component `exp' groupings, which are the
-first and third symbols on the right hand side of the rule.  The sum is
-stored into `$$' so that it becomes the semantic value of the
-addition-expression just recognized by the rule.  If there were a
-useful semantic value associated with the `+' token, it could be
-referred to as `$2'.
-
-   If you don't specify an action for a rule, Bison supplies a default:
-`$$ = $1'.  Thus, the value of the first symbol in the rule becomes the
-value of the whole rule.  Of course, the default rule is valid only if
-the two data types match.  There is no meaningful default action for an
-empty rule; every empty rule must have an explicit action unless the
-rule's value does not matter.
-
-   `$N' with N zero or negative is allowed for reference to tokens and
-groupings on the stack _before_ those that match the current rule.
-This is a very risky practice, and to use it reliably you must be
-certain of the context in which the rule is applied.  Here is a case in
-which you can use this reliably:
-
-     foo:      expr bar '+' expr  { ... }
-             | expr bar '-' expr  { ... }
-             ;
-     
-     bar:      /* empty */
-             { previous_expr = $0; }
-             ;
-
-   As long as `bar' is used only in the fashion shown here, `$0' always
-refers to the `expr' which precedes `bar' in the definition of `foo'.
-
-
-File: bison.info,  Node: Action Types,  Next: Mid-Rule Actions,  Prev: Actions,  Up: Semantics
-
-Data Types of Values in Actions
-------------------------------
-
-   If you have chosen a single data type for semantic values, the `$$'
-and `$N' constructs always have that data type.
-
-   If you have used `%union' to specify a variety of data types, then
-you must declare a choice among these types for each terminal or
-nonterminal symbol that can have a semantic value.  Then each time you
-use `$$' or `$N', its data type is determined by which symbol it refers
-to in the rule.  In this example,
-
-     exp:    ...
-             | exp '+' exp
-                 { $$ = $1 + $3; }
-
-`$1' and `$3' refer to instances of `exp', so they all have the data
-type declared for the nonterminal symbol `exp'.  If `$2' were used, it
-would have the data type declared for the terminal symbol `'+'',
-whatever that might be.
-
-   Alternatively, you can specify the data type when you refer to the
-value, by inserting `<TYPE>' after the `$' at the beginning of the
-reference.  For example, if you have defined types as shown here:
-
-     %union {
-       int itype;
-       double dtype;
-     }
-
-then you can write `$<itype>1' to refer to the first subunit of the
-rule as an integer, or `$<dtype>1' to refer to it as a double.
-