parser: keep string aliases as the user wrote it

Currently our scanner decodes all the escapes in the strings, and we
later reescape the strings when we emit them.

This is troublesome, as we do not respect the user input.  For
instance, when the user writes in UTF-8, we destroy her string when we
write it back.  And this shows everywhere: in the reports we show the
escaped string instead of the actual alias:

    0 $accept: . exp $end
    1 exp: . exp "\342\212\225" exp
    2    | . exp "+" exp
    3    | . exp "+" exp
    4    | . "number"
    5    | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"

    "number"                                                    shift, and go to state 1
    "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"  shift, and go to state 2

This commit preserves the user's exact spelling of the string aliases,
instead of interpreting the escapes and then reescaping.  The report
now shows:

    0 $accept: . exp $end
    1 exp: . exp "⊕" exp
    2    | . exp "+" exp
    3    | . exp "+" exp
    4    | . "number"
    5    | . "Ñùṃéℝô"

    "number"          shift, and go to state 1
    "Ñùṃéℝô"  shift, and go to state 2

Likewise, the XML (and therefore HTML) outputs are fixed.

* src/scan-gram.l (STRING, TSTRING): Do not interpret the escapes in
the resulting string.
* src/parse-gram.y (unquote, parser_init, parser_free, unquote_free)
(handle_defines, handle_language, obstack_for_unquote): New.
Use them to unquote where needed.
* tests/regression.at, tests/report.at: Update.
This commit is contained in:
Akim Demaille
2020-06-13 08:46:58 +02:00
parent 5d5e1df1dc
commit 5855da4722
7 changed files with 266 additions and 129 deletions

View File

@@ -415,7 +415,7 @@ AT_BISON_CHECK([-fcaret -o input.c input.y], [[0]], [[]],
input.y:25.8-14: note: previous declaration
25 | %token SPECIAL "\\\'\?\"\a\b\f\n\r\t\v\001\201\x001\x000081??!"
| ^~~~~~~
input.y:26.16-63: warning: symbol "\\'?\"\a\b\f\n\r\t\v\001\201\001\201??!" used more than once as a literal string [-Wother]
input.y:26.16-63: warning: symbol "\\\'\?\"\a\b\f\n\r\t\v\001\201\x001\x000081??!" used more than once as a literal string [-Wother]
26 | %token SPECIAL "\\\'\?\"\a\b\f\n\r\t\v\001\201\x001\x000081??!"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
]])
@@ -427,7 +427,7 @@ AT_COMPILE([input])
# symbol name reported by the parser is exactly the same as that reported by
# Bison itself.
AT_PARSER_CHECK([input], 1, [],
[[syntax error, unexpected a, expecting ]AT_ERROR_VERBOSE_IF([["\\'?\"\a\b\f\n\r\t\v\001\201\001\201??!"]], [[∃¬∩∪∀]])[
[[syntax error, unexpected a, expecting ]AT_ERROR_VERBOSE_IF([["\\\'\?\"\a\b\f\n\r\t\v\001\201\x001\x000081??!"]], [[∃¬∩∪∀]])[
]])
AT_BISON_OPTION_POPDEFS

View File

@@ -1184,11 +1184,11 @@ Grammar
0 $accept: exp $end
1 exp: exp "\342\212\225" exp
1 exp: exp "" exp
2 | exp "+" exp
3 | exp "+" exp
4 | "number"
5 | "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
5 | "Ñùṃéℝô"
Terminals, with rules where they appear
@@ -1196,9 +1196,9 @@ Terminals, with rules where they appear
$end (0) 0
error (256)
"+" (258) 2 3
"\342\212\225" (259) 1
"" (259) 1
"number" (260) 4
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" (261) 5
"Ñùṃéℝô" (261) 5
Nonterminals, with rules where they appear
@@ -1213,14 +1213,14 @@ Nonterminals, with rules where they appear
State 0
0 $accept: . exp $end
1 exp: . exp "\342\212\225" exp
1 exp: . exp "" exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
5 | . "Ñùṃéℝô"
"number" shift, and go to state 1
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" shift, and go to state 2
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
exp go to state 3
@@ -1234,7 +1234,7 @@ State 1
State 2
5 exp: "\303\221\303\271\341\271\203\303\251\342\204\235\303\264" .
5 exp: "Ñùṃéℝô" .
$default reduce using rule 5 (exp)
@@ -1242,13 +1242,13 @@ State 2
State 3
0 $accept: exp . $end
1 exp: exp . "\342\212\225" exp
1 exp: exp . "" exp
2 | exp . "+" exp
3 | exp . "+" exp
$end shift, and go to state 4
"+" shift, and go to state 5
"\342\212\225" shift, and go to state 6
$end shift, and go to state 4
"+" shift, and go to state 5
"" shift, and go to state 6
State 4
@@ -1260,69 +1260,69 @@ State 4
State 5
1 exp: . exp "\342\212\225" exp
1 exp: . exp "" exp
2 | . exp "+" exp
2 | exp "+" . exp
3 | . exp "+" exp
3 | exp "+" . exp
4 | . "number"
5 | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
5 | . "Ñùṃéℝô"
"number" shift, and go to state 1
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" shift, and go to state 2
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
exp go to state 7
State 6
1 exp: . exp "\342\212\225" exp
1 | exp "\342\212\225" . exp
1 exp: . exp "" exp
1 | exp "" . exp
2 | . exp "+" exp
3 | . exp "+" exp
4 | . "number"
5 | . "\303\221\303\271\341\271\203\303\251\342\204\235\303\264"
5 | . "Ñùṃéℝô"
"number" shift, and go to state 1
"\303\221\303\271\341\271\203\303\251\342\204\235\303\264" shift, and go to state 2
"number" shift, and go to state 1
"Ñùṃéℝô" shift, and go to state 2
exp go to state 8
State 7
1 exp: exp . "\342\212\225" exp
1 exp: exp . "" exp
2 | exp . "+" exp
2 | exp "+" exp . [$end, "+", "\342\212\225"]
2 | exp "+" exp . [$end, "+", ""]
3 | exp . "+" exp
3 | exp "+" exp . [$end, "+", "\342\212\225"]
3 | exp "+" exp . [$end, "+", ""]
"\342\212\225" shift, and go to state 6
"" shift, and go to state 6
$end reduce using rule 2 (exp)
$end [reduce using rule 3 (exp)]
"+" reduce using rule 2 (exp)
"+" [reduce using rule 3 (exp)]
"\342\212\225" [reduce using rule 2 (exp)]
"\342\212\225" [reduce using rule 3 (exp)]
$default reduce using rule 2 (exp)
$end reduce using rule 2 (exp)
$end [reduce using rule 3 (exp)]
"+" reduce using rule 2 (exp)
"+" [reduce using rule 3 (exp)]
"⊕" [reduce using rule 2 (exp)]
"⊕" [reduce using rule 3 (exp)]
$default reduce using rule 2 (exp)
Conflict between rule 2 and token "+" resolved as reduce (%left "+").
State 8
1 exp: exp . "\342\212\225" exp
1 | exp "\342\212\225" exp . [$end, "+", "\342\212\225"]
1 exp: exp . "" exp
1 | exp "" exp . [$end, "+", ""]
2 | exp . "+" exp
3 | exp . "+" exp
"+" shift, and go to state 5
"\342\212\225" shift, and go to state 6
"+" shift, and go to state 5
"" shift, and go to state 6
"+" [reduce using rule 1 (exp)]
"\342\212\225" [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
"+" [reduce using rule 1 (exp)]
"⊕" [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
]])
@@ -1338,43 +1338,43 @@ digraph "input.y"
node [fontname = courier, shape = box, colorscheme = paired6]
edge [fontname = courier]
0 [label="State 0\n\l 0 $accept: . exp $end\l 1 exp: . exp \"\\342\\212\\225\" exp\l 2 | . exp \"+\" exp\l 3 | . exp \"+\" exp\l 4 | . \"number\"\l 5 | . \"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\"\l"]
0 [label="State 0\n\l 0 $accept: . exp $end\l 1 exp: . exp \"\342\212\225\" exp\l 2 | . exp \"+\" exp\l 3 | . exp \"+\" exp\l 4 | . \"number\"\l 5 | . \"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\"\l"]
0 -> 1 [style=solid label="\"number\""]
0 -> 2 [style=solid label="\"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\""]
0 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]
0 -> 3 [style=dashed label="exp"]
1 [label="State 1\n\l 4 exp: \"number\" .\l"]
1 -> "1R4" [style=solid]
"1R4" [label="R4", fillcolor=3, shape=diamond, style=filled]
2 [label="State 2\n\l 5 exp: \"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\" .\l"]
2 [label="State 2\n\l 5 exp: \"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\" .\l"]
2 -> "2R5" [style=solid]
"2R5" [label="R5", fillcolor=3, shape=diamond, style=filled]
3 [label="State 3\n\l 0 $accept: exp . $end\l 1 exp: exp . \"\\342\\212\\225\" exp\l 2 | exp . \"+\" exp\l 3 | exp . \"+\" exp\l"]
3 [label="State 3\n\l 0 $accept: exp . $end\l 1 exp: exp . \"\342\212\225\" exp\l 2 | exp . \"+\" exp\l 3 | exp . \"+\" exp\l"]
3 -> 4 [style=solid label="$end"]
3 -> 5 [style=solid label="\"+\""]
3 -> 6 [style=solid label="\"\\342\\212\\225\""]
3 -> 6 [style=solid label="\"\342\212\225\""]
4 [label="State 4\n\l 0 $accept: exp $end .\l"]
4 -> "4R0" [style=solid]
"4R0" [label="Acc", fillcolor=1, shape=diamond, style=filled]
5 [label="State 5\n\l 1 exp: . exp \"\\342\\212\\225\" exp\l 2 | . exp \"+\" exp\l 2 | exp \"+\" . exp\l 3 | . exp \"+\" exp\l 3 | exp \"+\" . exp\l 4 | . \"number\"\l 5 | . \"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\"\l"]
5 [label="State 5\n\l 1 exp: . exp \"\342\212\225\" exp\l 2 | . exp \"+\" exp\l 2 | exp \"+\" . exp\l 3 | . exp \"+\" exp\l 3 | exp \"+\" . exp\l 4 | . \"number\"\l 5 | . \"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\"\l"]
5 -> 1 [style=solid label="\"number\""]
5 -> 2 [style=solid label="\"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\""]
5 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]
5 -> 7 [style=dashed label="exp"]
6 [label="State 6\n\l 1 exp: . exp \"\\342\\212\\225\" exp\l 1 | exp \"\\342\\212\\225\" . exp\l 2 | . exp \"+\" exp\l 3 | . exp \"+\" exp\l 4 | . \"number\"\l 5 | . \"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\"\l"]
6 [label="State 6\n\l 1 exp: . exp \"\342\212\225\" exp\l 1 | exp \"\342\212\225\" . exp\l 2 | . exp \"+\" exp\l 3 | . exp \"+\" exp\l 4 | . \"number\"\l 5 | . \"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\"\l"]
6 -> 1 [style=solid label="\"number\""]
6 -> 2 [style=solid label="\"\\303\\221\\303\\271\\341\\271\\203\\303\\251\\342\\204\\235\\303\\264\""]
6 -> 2 [style=solid label="\"\303\221\303\271\341\271\203\303\251\342\204\235\303\264\""]
6 -> 8 [style=dashed label="exp"]
7 [label="State 7\n\l 1 exp: exp . \"\\342\\212\\225\" exp\l 2 | exp . \"+\" exp\l 2 | exp \"+\" exp . [$end, \"+\", \"\\342\\212\\225\"]\l 3 | exp . \"+\" exp\l 3 | exp \"+\" exp . [$end, \"+\", \"\\342\\212\\225\"]\l"]
7 -> 6 [style=solid label="\"\\342\\212\\225\""]
7 -> "7R2d" [label="[\"\\342\\212\\225\"]", style=solid]
7 [label="State 7\n\l 1 exp: exp . \"\342\212\225\" exp\l 2 | exp . \"+\" exp\l 2 | exp \"+\" exp . [$end, \"+\", \"\342\212\225\"]\l 3 | exp . \"+\" exp\l 3 | exp \"+\" exp . [$end, \"+\", \"\342\212\225\"]\l"]
7 -> 6 [style=solid label="\"\342\212\225\""]
7 -> "7R2d" [label="[\"\342\212\225\"]", style=solid]
"7R2d" [label="R2", fillcolor=5, shape=diamond, style=filled]
7 -> "7R2" [style=solid]
"7R2" [label="R2", fillcolor=3, shape=diamond, style=filled]
7 -> "7R3d" [label="[$end, \"+\", \"\\342\\212\\225\"]", style=solid]
7 -> "7R3d" [label="[$end, \"+\", \"\342\212\225\"]", style=solid]
"7R3d" [label="R3", fillcolor=5, shape=diamond, style=filled]
8 [label="State 8\n\l 1 exp: exp . \"\\342\\212\\225\" exp\l 1 | exp \"\\342\\212\\225\" exp . [$end, \"+\", \"\\342\\212\\225\"]\l 2 | exp . \"+\" exp\l 3 | exp . \"+\" exp\l"]
8 [label="State 8\n\l 1 exp: exp . \"\342\212\225\" exp\l 1 | exp \"\342\212\225\" exp . [$end, \"+\", \"\342\212\225\"]\l 2 | exp . \"+\" exp\l 3 | exp . \"+\" exp\l"]
8 -> 5 [style=solid label="\"+\""]
8 -> 6 [style=solid label="\"\\342\\212\\225\""]
8 -> "8R1d" [label="[\"+\", \"\\342\\212\\225\"]", style=solid]
8 -> 6 [style=solid label="\"\342\212\225\""]
8 -> "8R1d" [label="[\"+\", \"\342\212\225\"]", style=solid]
"8R1d" [label="R1", fillcolor=5, shape=diamond, style=filled]
8 -> "8R1" [style=solid]
"8R1" [label="R1", fillcolor=3, shape=diamond, style=filled]
@@ -1402,7 +1402,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<lhs>exp</lhs>
<rhs>
<symbol>exp</symbol>
<symbol>&quot;\342\212\225&quot;</symbol>
<symbol>&quot;&quot;</symbol>
<symbol>exp</symbol>
</rhs>
</rule>
@@ -1431,7 +1431,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<rule number="5" usefulness="useful">
<lhs>exp</lhs>
<rhs>
<symbol>&quot;\303\221\303\271\341\271\203\303\251\342\204\235\303\264&quot;</symbol>
<symbol>&quot;Ñùṃéℝô&quot;</symbol>
</rhs>
</rule>
</rules>
@@ -1439,9 +1439,9 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<terminal symbol-number="0" token-number="0" name="$end" usefulness="useful"/>
<terminal symbol-number="1" token-number="256" name="error" usefulness="useful"/>
<terminal symbol-number="3" token-number="258" name="&quot;+&quot;" usefulness="useful" prec="1" assoc="left"/>
<terminal symbol-number="4" token-number="259" name="&quot;\342\212\225&quot;" usefulness="useful"/>
<terminal symbol-number="4" token-number="259" name="&quot;&quot;" usefulness="useful"/>
<terminal symbol-number="5" token-number="260" name="&quot;number&quot;" usefulness="useful"/>
<terminal symbol-number="6" token-number="261" name="&quot;\303\221\303\271\341\271\203\303\251\342\204\235\303\264&quot;" usefulness="useful"/>
<terminal symbol-number="6" token-number="261" name="&quot;Ñùṃéℝô&quot;" usefulness="useful"/>
</terminals>
<nonterminals>
<nonterminal symbol-number="7" name="$accept" usefulness="useful"/>
@@ -1463,7 +1463,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<actions>
<transitions>
<transition type="shift" symbol="&quot;number&quot;" state="1"/>
<transition type="shift" symbol="&quot;\303\221\303\271\341\271\203\303\251\342\204\235\303\264&quot;" state="2"/>
<transition type="shift" symbol="&quot;Ñùṃéℝô&quot;" state="2"/>
<transition type="goto" symbol="exp" state="3"/>
</transitions>
<errors/>
@@ -1511,7 +1511,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<transitions>
<transition type="shift" symbol="$end" state="4"/>
<transition type="shift" symbol="&quot;+&quot;" state="5"/>
<transition type="shift" symbol="&quot;\342\212\225&quot;" state="6"/>
<transition type="shift" symbol="&quot;&quot;" state="6"/>
</transitions>
<errors/>
<reductions/>
@@ -1546,7 +1546,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<actions>
<transitions>
<transition type="shift" symbol="&quot;number&quot;" state="1"/>
<transition type="shift" symbol="&quot;\303\221\303\271\341\271\203\303\251\342\204\235\303\264&quot;" state="2"/>
<transition type="shift" symbol="&quot;Ñùṃéℝô&quot;" state="2"/>
<transition type="goto" symbol="exp" state="7"/>
</transitions>
<errors/>
@@ -1567,7 +1567,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<actions>
<transitions>
<transition type="shift" symbol="&quot;number&quot;" state="1"/>
<transition type="shift" symbol="&quot;\303\221\303\271\341\271\203\303\251\342\204\235\303\264&quot;" state="2"/>
<transition type="shift" symbol="&quot;Ñùṃéℝô&quot;" state="2"/>
<transition type="goto" symbol="exp" state="8"/>
</transitions>
<errors/>
@@ -1584,7 +1584,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<lookaheads>
<symbol>$end</symbol>
<symbol>&quot;+&quot;</symbol>
<symbol>&quot;\342\212\225&quot;</symbol>
<symbol>&quot;&quot;</symbol>
</lookaheads>
</item>
<item rule-number="3" point="1"/>
@@ -1592,13 +1592,13 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<lookaheads>
<symbol>$end</symbol>
<symbol>&quot;+&quot;</symbol>
<symbol>&quot;\342\212\225&quot;</symbol>
<symbol>&quot;&quot;</symbol>
</lookaheads>
</item>
</itemset>
<actions>
<transitions>
<transition type="shift" symbol="&quot;\342\212\225&quot;" state="6"/>
<transition type="shift" symbol="&quot;&quot;" state="6"/>
</transitions>
<errors/>
<reductions>
@@ -1606,8 +1606,8 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<reduction symbol="$end" rule="3" enabled="false"/>
<reduction symbol="&quot;+&quot;" rule="2" enabled="true"/>
<reduction symbol="&quot;+&quot;" rule="3" enabled="false"/>
<reduction symbol="&quot;\342\212\225&quot;" rule="2" enabled="false"/>
<reduction symbol="&quot;\342\212\225&quot;" rule="3" enabled="false"/>
<reduction symbol="&quot;&quot;" rule="2" enabled="false"/>
<reduction symbol="&quot;&quot;" rule="3" enabled="false"/>
<reduction symbol="$default" rule="2" enabled="true"/>
</reductions>
</actions>
@@ -1623,7 +1623,7 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<lookaheads>
<symbol>$end</symbol>
<symbol>&quot;+&quot;</symbol>
<symbol>&quot;\342\212\225&quot;</symbol>
<symbol>&quot;&quot;</symbol>
</lookaheads>
</item>
<item rule-number="2" point="1"/>
@@ -1632,12 +1632,12 @@ AT_CHECK([[sed -e 's/bison-xml-report version="[^"]*"/bison-xml-report version="
<actions>
<transitions>
<transition type="shift" symbol="&quot;+&quot;" state="5"/>
<transition type="shift" symbol="&quot;\342\212\225&quot;" state="6"/>
<transition type="shift" symbol="&quot;&quot;" state="6"/>
</transitions>
<errors/>
<reductions>
<reduction symbol="&quot;+&quot;" rule="1" enabled="false"/>
<reduction symbol="&quot;\342\212\225&quot;" rule="1" enabled="false"/>
<reduction symbol="&quot;&quot;" rule="1" enabled="false"/>
<reduction symbol="$default" rule="1" enabled="true"/>
</reductions>
</actions>