There are two ways in which the assembly process can fail:
1. If there is a really big problem that compromises the whole process,
the assembler has to stop right there and generate an error message.
This happens with unterminated REPT loops, macros, etc.
2. If the problem isn't that big and the process can still continue,
even though the final result is invalid, the assembler can try to
continue and warn the user about all errors it finds in the code.
This patch clarifies the use of each function and replaces the function
used in two places by the correct one.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
Replace spaces by tabs for consistency. The rest of the codebase uses
tabs, so the linkerscript parser has to change.
Removed trailing tabs in all codebase.
Signed-off-by: AntonioND <antonio_nd@outlook.com>
The bug showed up when a semicolon was located anywhere after \".
These three test cases are syntaxically correct but didn't compile:
1)
SECTION "HOME", HOME
db "\";"
2)
SECTION "HOME", HOME
db "\""
nop
;
3)
SECTION "HOME", HOME
db "\"" ;
The problem was located in yy_create_buffer(). Basicaly, this function loads an
entire source file, uniformizes EOL terminators and filters out comments without
touching literal strings.
However, bounds of literal strings were wrongly guessed because \" was
interpreted as two characters (and so the double quote was not escaped).
In test 1, the string terminates early and so ;" is filtered out as it was a
comment and so the assembler complains of an unterminated string.
In test 2 and 3, the string is in fact interpreted as two strings, the second
one terminates at EOF in these cases and so comments are not filtered out and
that makes the assembler complains.
A special case must be taken into account:
4)
SECTION "HOME", HOME
db "\\" ;
So we need to ignore \\ as well.
Note that there is still a problem left: in yy_create_buffer() a string may
span multiple lines but not in the lexer. However in this case I think the lexer
would quit at the first newline so there should be nothing to worry about.
A reference to an invalid macro argument (\ not followed by a digit
between 1 and 9) will cause an access outside of the bounds of the
currentmacroargs array in sym_FindMacroArg().
Macro arg references are processed in two places:
In CopyMacroArg(): called when scanning tokens between "", {} and
arguments of a macro call. The only problem here is that it accepts \0
as valid and so calls sym_FindMacroArg with a invalid value.
In PutMacroArg(): called by the lexer automata when it encounters a
token matching \\[0-9]? (in other cases than above). So not only it
accepts \0 but also \ alone.
Memo: In setuplex(), a rule is defined with a regex composed of up to
three ranges of chars and takes the form:
[FirstRange]
or [FirstRange][SecondRange]?
or [FirstRange]([SecondRange][Range]*)?
On scanning, when several rules match, the first longuest one is
choosen.
Regression test:
1)
SECTION "HOME", HOME
db "\0"
2)
SECTION "HOME", HOME
db \A
3)
SECTION "HOME", HOME
db \
- separated the lexer into multiple functions so it is more readable
- fixed issue with long label names in macro arguments
- added error checking code to prevent buffer overflows
On Linux, valgrind complains about the overflow like this:
Pass 1...
==20054== Invalid read of size 1
==20054== at 0x406CDA: yylex (lexer.c:396)
==20054== by 0x40207C: yyparse (asmy.c:2921)
==20054== by 0x4086AF: main (main.c:351)
==20054== Address 0x503a102 is 0 bytes after a block of size 23,538 alloc'd
==20054== at 0x402994D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20054== by 0x406411: yy_create_buffer (lexer.c:147)
==20054== by 0x404FE3: fstk_RunInclude (fstack.c:243)
==20054== by 0x4025F5: yyparse (asmy.y:744)
==20054== by 0x4086AF: main (main.c:351)
==20054==
This is a bit of a crude fix which simply exits the hashing loop when
we reach the end of the string. We should probably do some kind of
length calculation on the buffer instead.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Merging lai's source with this one is very irritating because
they have different indentation styles. I couldn't find what profile
vegard used for his version, so I used these flags (which should bring
the source close to KNF):
-bap
-br
-ce
-ci4
-cli0
-d0
-di0
-i8
-ip
-l79
-nbc
-ncdb
-ndj
-ei
-nfc1
-nlp
-npcs
-psl
-sc
-sob