Null characters in the middle of strings interact badly with the RGBDS
codebase, which assumes null-terminated strings. There is no reason to
support null characters in input source code, so the simplest way to deal
with null characters is to reject them early.
Unlike macros, REPTs and INCLUDEs, this recursion depth is independent.
This is intentional, because string expansions work very differently.
While it's easy to know when a string expansion begins, checking where it
ends is much more complicated, since the expansion's contents are simply
injected back into the lex buffer. Therefore, the depth has to be checked
after lexing took place.
Because of this, the placement of the expansion end check is somewhat
haphazard, but I think it's good. While I have no certainty, all tests
ended with all expansions properly ended, and I couldn't find any pitfalls.
Finally, `pCurrentStringExpansion` has been made global so error printing
can use it to tell the user if an error occurred inside of an expansion.
Should partially cover #178 and close#270.
This allows printing numbers in different bases and without the dollar prefix
This is especially useful in macros because the dollar isnt a valid character
for symbol names, requiring heavy `STRSUB` usage.
There is a bug in processing the comments in source files. It's
related to #326. And this bug comes out when you comment something
with the character ';', and include the quotation mark without its
pair in it.
The lastest version of rgbds compiler has a step to parse the given
source to convert its line endings to a unified one, and it
processes quotation marks even before it processes the comments.
I edited a little bit of the source, and it works fine now.
If the type char is signed, then in the function
yylex_GetFloatMaskAndFloatLen(), *s can have a negative value and be converted
to a negative int32_t which is then used as an array index. It should be
converted to uint8_t instead to ensure that the value is in the bounds of the
tFloatingFirstChar, tFloatingSecondChar, and tFloatingChars arrays.
When a macro arg appears in a symbol name, the contents are appended.
However, the contents of the macro arg were not being validated.
Any character, regardless of whether it was allowed in a symbol name,
would be appended. With this change, the contents of the macro arg
are now validated character by character. The symbol name is considered
to end at the last valid character. The remainder of the macro arg is
treated as though it followed the symbol name in the asm source code.
Standalone bracketed symbols like the following weren't being zero-terminated.
X EQUS {Y}
This doesn't apply to bracketed symbols that aren't standalone, but are
instead found in a string. For example, the following works even without this
fix.
X EQUS "{Y}"
Fix a few warnings related needed to build the source with this option.
Add new exception to .checkpatch.conf.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
For example:
PrintMacro : MACRO
PRINTT \1
ENDM
PrintMacro STRCAT(\"Hello\"\, \
\" world\\n\")
It is possible to have spaces after the '\' and before the newline
character. This is needed because Windows line endings "\r\n" are
converted to " \n" before the lexer has a chance to handle them.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
Lines can be continuated after a newline character ('\n'):
DB 1, 2, 3, 4 \
5, 6, 7, 8
This doesn't work for now in lists of arguments of macros.
It is possible to have spaces after the '\' and before the newline
character. This is needed because Windows line endings "\r\n" are
converted to " \n" before the lexer has a chance to handle them.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
Newlines have to be handled before comments or comments won't be able to
handle line endings that don't include at least one LF character.
Also, document an obscure comment syntax: Anything that follows a '*'
placed at the start of a line is also a comment until the end of the
line.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
With permission from the main authors [1], most of the code has been
relicensed under the MIT license.
SPDX license identifiers are used so that the license headers in source
code files aren't too large.
Add CONTRIBUTORS.rst file.
[1] https://github.com/rednex/rgbds/issues/128
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
Not all occurrences have been replaced, in some cases they have been
left as they were before (like in rgbgfx and when they are in the
interface of a C standard library function).
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
There are two ways in which the assembly process can fail:
1. If there is a really big problem that compromises the whole process,
the assembler has to stop right there and generate an error message.
This happens with unterminated REPT loops, macros, etc.
2. If the problem isn't that big and the process can still continue,
even though the final result is invalid, the assembler can try to
continue and warn the user about all errors it finds in the code.
This patch clarifies the use of each function and replaces the function
used in two places by the correct one.
Signed-off-by: Antonio Niño Díaz <antonio_nd@outlook.com>
Replace spaces by tabs for consistency. The rest of the codebase uses
tabs, so the linkerscript parser has to change.
Removed trailing tabs in all codebase.
Signed-off-by: AntonioND <antonio_nd@outlook.com>
The bug showed up when a semicolon was located anywhere after \".
These three test cases are syntaxically correct but didn't compile:
1)
SECTION "HOME", HOME
db "\";"
2)
SECTION "HOME", HOME
db "\""
nop
;
3)
SECTION "HOME", HOME
db "\"" ;
The problem was located in yy_create_buffer(). Basicaly, this function loads an
entire source file, uniformizes EOL terminators and filters out comments without
touching literal strings.
However, bounds of literal strings were wrongly guessed because \" was
interpreted as two characters (and so the double quote was not escaped).
In test 1, the string terminates early and so ;" is filtered out as it was a
comment and so the assembler complains of an unterminated string.
In test 2 and 3, the string is in fact interpreted as two strings, the second
one terminates at EOF in these cases and so comments are not filtered out and
that makes the assembler complains.
A special case must be taken into account:
4)
SECTION "HOME", HOME
db "\\" ;
So we need to ignore \\ as well.
Note that there is still a problem left: in yy_create_buffer() a string may
span multiple lines but not in the lexer. However in this case I think the lexer
would quit at the first newline so there should be nothing to worry about.
A reference to an invalid macro argument (\ not followed by a digit
between 1 and 9) will cause an access outside of the bounds of the
currentmacroargs array in sym_FindMacroArg().
Macro arg references are processed in two places:
In CopyMacroArg(): called when scanning tokens between "", {} and
arguments of a macro call. The only problem here is that it accepts \0
as valid and so calls sym_FindMacroArg with a invalid value.
In PutMacroArg(): called by the lexer automata when it encounters a
token matching \\[0-9]? (in other cases than above). So not only it
accepts \0 but also \ alone.
Memo: In setuplex(), a rule is defined with a regex composed of up to
three ranges of chars and takes the form:
[FirstRange]
or [FirstRange][SecondRange]?
or [FirstRange]([SecondRange][Range]*)?
On scanning, when several rules match, the first longuest one is
choosen.
Regression test:
1)
SECTION "HOME", HOME
db "\0"
2)
SECTION "HOME", HOME
db \A
3)
SECTION "HOME", HOME
db \