Charmap's previous structure was using brute-force comparison for
converting the strings in source files. It always compared given
strings to all of the strings in charmap, which was very costly
in huge projects.
For its improvement, I changed its structure into trie, which is
being used in many string-processing areas. It's now much faster
than before.
There is a bug in processing the comments in source files. It's
related to #326. And this bug comes out when you comment something
with the character ';', and include the quotation mark without its
pair in it.
The lastest version of rgbds compiler has a step to parse the given
source to convert its line endings to a unified one, and it
processes quotation marks even before it processes the comments.
I edited a little bit of the source, and it works fine now.
If the type char is signed, then in the function
yylex_GetFloatMaskAndFloatLen(), *s can have a negative value and be converted
to a negative int32_t which is then used as an array index. It should be
converted to uint8_t instead to ensure that the value is in the bounds of the
tFloatingFirstChar, tFloatingSecondChar, and tFloatingChars arrays.
The createpatch() function was using a fixed-size buffer. I've changed it
to be dynamically allocated. I saw that the RPN format used in patches is
slightly different from the one used internally in the assembler, so I
added a new member to the Expression struct to track the patch size.
I've also limited the RPN expression length to 1MB. I realized that the
patch RPN expression could potentially be longer than the internal RPN
expression, so the internal expression would need a limit smaller than
UINT32_MAX. I thought 1MB would be a reasonable limit.
Standalone bracketed symbols like the following weren't being zero-terminated.
X EQUS {Y}
This doesn't apply to bracketed symbols that aren't standalone, but are
instead found in a string. For example, the following works even without this
fix.
X EQUS "{Y}"
When the while loop in `ParseSymbol` stops because of the symbol length,
`copied` will have the value of `MAXSYMLEN`, which is obviously not
greater than `MAXSYMLEN`. Changing the condition to `>=` fixes the
issue.
As a bonus, the correct union field will now be used. It shouldn't
matter, but it's technically UB to use a wrong one.
The linker script now allows you to assign a section with the same
attributes as in the source.
To do this, I've removed a check from AssignSectionAddressAndBankByName
that would never be triggered, due to that condition being checked
before. Shouldn't this and IsSectionSameTypeBankAndAttrs be condensed
into a single function?
Currently, all symbols are assigned a filename and line when they're
first encountered and added to the internal hash table. This is often
not expected and leads to erroneous error messages.