The bug showed up when a semicolon was located anywhere after \".
These three test cases are syntaxically correct but didn't compile:
1)
SECTION "HOME", HOME
db "\";"
2)
SECTION "HOME", HOME
db "\""
nop
;
3)
SECTION "HOME", HOME
db "\"" ;
The problem was located in yy_create_buffer(). Basicaly, this function loads an
entire source file, uniformizes EOL terminators and filters out comments without
touching literal strings.
However, bounds of literal strings were wrongly guessed because \" was
interpreted as two characters (and so the double quote was not escaped).
In test 1, the string terminates early and so ;" is filtered out as it was a
comment and so the assembler complains of an unterminated string.
In test 2 and 3, the string is in fact interpreted as two strings, the second
one terminates at EOF in these cases and so comments are not filtered out and
that makes the assembler complains.
A special case must be taken into account:
4)
SECTION "HOME", HOME
db "\\" ;
So we need to ignore \\ as well.
Note that there is still a problem left: in yy_create_buffer() a string may
span multiple lines but not in the lexer. However in this case I think the lexer
would quit at the first newline so there should be nothing to worry about.
A reference to an invalid macro argument (\ not followed by a digit
between 1 and 9) will cause an access outside of the bounds of the
currentmacroargs array in sym_FindMacroArg().
Macro arg references are processed in two places:
In CopyMacroArg(): called when scanning tokens between "", {} and
arguments of a macro call. The only problem here is that it accepts \0
as valid and so calls sym_FindMacroArg with a invalid value.
In PutMacroArg(): called by the lexer automata when it encounters a
token matching \\[0-9]? (in other cases than above). So not only it
accepts \0 but also \ alone.
Memo: In setuplex(), a rule is defined with a regex composed of up to
three ranges of chars and takes the form:
[FirstRange]
or [FirstRange][SecondRange]?
or [FirstRange]([SecondRange][Range]*)?
On scanning, when several rules match, the first longuest one is
choosen.
Regression test:
1)
SECTION "HOME", HOME
db "\0"
2)
SECTION "HOME", HOME
db \A
3)
SECTION "HOME", HOME
db \
The problem occurs when \@ is accessed outside the definition of a
macro and is not between "" or {}, or when it is used in an argument of
a macro call, i.e. when the access is processed by PutUniqueArg().
PutUniqueArg(), puts back the string value of \@ to the lexer's buffer
(to substitute \@ by the unique label string).
When \@ is not defined, sym_FindMacroArg(-1) returns NULL, which
yyunputstr() doesn't expect and crashes.
Solution:
Generate an error when \@ is not defined.
Regression test:
SECTION "HOME", HOME
ld a,\@
Will segfault.
On the fixed version, it will return an error as \@ is not available.
- separated the lexer into multiple functions so it is more readable
- fixed issue with long label names in macro arguments
- added error checking code to prevent buffer overflows
Symbols are created when using a label in the wild, even if they aren't defined. Solely using a symbol as an argument to BANK() skips this, so the symbol is never created.
This evaluates the argument instead of trying to find a symbol. This way, symbols that don't exist are created when passed into BANK().