Commit Graph

240 Commits

Author SHA1 Message Date
ISSOtm
1a07391a97 Introduce ARRAY_SIZE macro
Checked by `checkpatch`, and you know what? Not a bad thing
See https://github.com/gbdev/rgbds/pull/931#discussion_r738856724
2021-10-31 07:53:33 +01:00
Rangi
4a73eb56ea Make peek() tail recursive instead of using goto
Compilation is identical with `gcc` or `clang`, -O3` or `-O2`
2021-08-18 01:30:47 +02:00
Rangi
03bb510588 endCapture shouldn't handle lexerState->atLineStart
`startCapture` did not initialize `lexerState->atLineStart`;
its final value is a consequence of the separate but similar
behaviors within `lexer_CaptureRept` and `lexer_CaptureMacroBody`.
2021-07-04 18:31:46 -04:00
Rangi
695dfe9dbd Add missing file line-continuation-string.asm
Also make some minor formatting corrections
2021-07-04 16:12:34 -04:00
Rangi
9782f7d942 Factor out endCapture to go with startCapture (#904)
This also refactors `startCapture` to modify the
capture body as an argument.
2021-07-04 16:08:59 -04:00
Rangi
23721694ea Comment that anonymous labels internally start with '!'
`startsIdentifier` should not accept this character so
anonymous labels won't conflict with nonymous ones.
2021-05-15 12:57:22 -04:00
Eldred Habert
c06985a7ad Fix incorrect lexing of "$ff00+c" (#882)
Fixes #881 by moving the task from the lexer to the parser.
This both alleviates the need for backtracking in the lexer,
removing what is (was) arguably a hack, and causes tokenization
boundaries to be properly respected, fixing the issue mentioned above.

Co-authored-by: Rangi <remy.oukaour+rangi42@gmail.com>
2021-05-05 02:04:19 +02:00
ISSOtm
dcb8c69661 Fix UAF in lexer capture
Fixes #689
2021-05-02 03:24:18 +02:00
Rangi
d37aa93a7d Port some cleanup from the WIP 'strings' branch
This is mostly variable renaming
2021-04-28 11:58:56 -04:00
Rangi
3fdf01c0f5 Resolve some TODO comments
- `out_PushSection` should not set `currentSection` to NULL because
  PUSHS, PUSHC, and PUSHO consistently keep the current section,
  charmap, and options, even though the stack has been pushed.

- `Callback__FILE__` does not need to assert that `fileName` is not
  empty because `__FILE__`'s value is quoted, and can safely be empty.

- `YY_FATAL_ERROR` and `YYLMAX` are not needed since the lexer is
  not generated with flex.
2021-04-26 15:52:30 -04:00
Rangi
e050803ed1 Use size_t for measuring nested depths
Multiple functions involve tracking the current depth
of a nested structure (symbol expansions, interpolations,
REPT/FOR blocks, parentheses).
2021-04-23 14:28:10 +02:00
Rangi
27f38770d4 Parentheses in macro args prevent commas from starting new arguments
This is similar to C's behavior, and convenient for passing
function calls as single values, like `MUL(3.0, 4.0)` or
`STRSUB("str", 2, 1)`.

Fixes #704
2021-04-23 14:28:10 +02:00
Rangi
e596dbfc80 Make failed macro arg expansions non-fatal
Expanding empty strings is valid but pointless;
macro args already skipped doing so, now other
`beginExpansion` calls do too.

This also fixes failed interpolations (which were
already non-fatal) to continue reading characters,
not evaluate to their initial '{' character.
2021-04-22 09:59:02 +02:00
Rangi
c3e27217dd More specific "Symbol name too long" error messages
Identifiers, {interpolations} and \<macroArgs> are distinct
2021-04-20 17:14:21 +02:00
Rangi
fe3521c7a4 Switch from parentheses to angle brackets
`\(` is more likely to be a valid escape sequence in the
future (as is `\[`) and `\{` is already taken.
2021-04-20 17:14:21 +02:00
Rangi
7a314e7aff Support numeric symbol names in \(parentheses)
For example, \(_NARG) will get the last argument
2021-04-20 17:14:21 +02:00
Rangi
637bbbdf43 Support multi-digit macro arguments in parentheses
This allows access to arguments past \9 without using 'shift'
2021-04-20 17:14:21 +02:00
Rangi
8230e8165c Eliminate isAtEOF by changing yylex control flow
`yylex` calls `yywrap` at the beginning of the next call, after it
has set `lexerState->lastToken` to `T_EOB`.
2021-04-20 17:10:08 +02:00
Rangi
a727a0f81f Capture termination status is equivalent to not having reached EOF
This avoids the need for a separate `terminated` flag
2021-04-20 17:10:08 +02:00
Rangi
7a587eb7d6 Use midrule action values for captures' terminated status
Bison 3.1 introduces "typed midrule values", which would write
`<captureTerminated>{ ... }` and `$$` instead of `{ ... }` and
`$<captureTerminated>[1-9]`, but rgbds supports 3.0 or even lower.
2021-04-20 17:10:08 +02:00
Rangi
7ac8bd6e24 Return a marker token at the end of any buffer
Removes the lexer hack mentioned in #778
2021-04-20 17:10:08 +02:00
Rangi
be2572edca Track nested interpolation depth even outside string literals
Fixes #837
2021-04-20 09:37:29 -04:00
ISSOtm
6d0a3c75e9 Get rid of Hungarian notation for good
Bye bye it was not nice knowing ya
2021-04-19 22:12:10 +02:00
Rangi
52797b6f68 Implement SIZEOF("Section") and STARTOF("Section") (#766)
Updates the object file revision to 8

Fixes #765
2021-04-17 18:36:26 -04:00
Rangi
2005ed1df9 Implement CHARLEN and CHARSUB
Fixes #786
2021-04-17 18:18:34 -04:00
Rangi
9923fa3eee Fix expansions that start from the end of another expansion (#839)
Do not free an expansion until its offset is *past* its size.
This means potentially freeing a nested stack of expansions
all at once.

Fixes #696
2021-04-17 13:14:40 -04:00
Rangi
c755fa3469 readIdentifier does not process characters that get truncated
Previously a '.' could be past the truncation limit but still
cause the identifier to be marked as local, violating an
assertion in `sym_AddLocalLabel`.

Fixes #832
2021-04-16 21:15:01 -04:00
Rangi
e78a1d5bfd readInterpolation is limited by nMaxRecursionDepth
Fixes #837
2021-04-16 16:10:46 -04:00
Rangi
5c852c7651 Store the nested expansions starting from the deepest one (#829)
This shortens the lexer by 100 lines and simplifies
access to expansion contents, since it usually needs the
deepest one, not the top-level one.

Fixes #813
2021-04-16 09:54:13 -04:00
Rangi
6be3584467 LexerState's 'size' and 'offset' for mmapped files are unsigned
These were using signed 'off_t' because that is the type of
'st_size' from 'stat()', but neither one can be negative.
2021-04-16 10:23:37 +02:00
Rangi
8c90d9d2d7 Get rid of skip in struct Expansion
This was only used to skip the two macro arg characters,
but shiftChar() can skip them before the expansion.
2021-04-16 10:23:37 +02:00
Rangi
f69e666b00 expansionOfs cannot be negative
lexerState->expansionOfs is always either set to 0, or updated by
adding a positive quantity:

    if (distance > lexerState->expansions->distance) {
        lexerState->expansionOfs += distance - lexerState->expansions->distance;
        ...
    }

so it will always be positive or zero.
2021-04-16 10:23:37 +02:00
Rangi
eba06404f0 peek(0) => peek()
This does not completely refactor `peek` as #708 suggested,
to make it shift and cache a character itself. However it
does simplify the lexer code.
2021-04-16 10:23:37 +02:00
Rangi
9558ccea1b shiftChars(1) => shiftChar()
Only two sites were for distances greater than 1:
a `shiftChars(2)`, trivial to just do two `shiftChar()`s;
and `shiftChars(size)` in `reportGarbageChar`, which
can be a `for` loop, and should be fixed anyway to
"avoid having to peek further than 0".
2021-04-16 10:23:37 +02:00
Rangi
260d372acd Lex $ff00+c without needing large peek lookahead
This also allows arbitrary amounts of whitespace in `$ff00 + c`,
instead of needing to fit in the 42-byte LEXER_BUF_SIZE
2021-04-16 10:23:37 +02:00
Rangi
b3312886fb Use a lookupExpansion, but not as an X macro
Instead of defining `LOOKUP_PRE_NEST` and `LOOKUP_POST_NEST`,
pass a variable name and a block to `lookupExpansion`; it
will assign successive looked-up expansions to the variable
and use them in the block.

The technique of using `__VA_ARGS__` to allow commas within a
block passed to a macro is not original, and should be stable.
2021-04-13 17:58:46 +02:00
Rangi
7fc8a65d0a Refactor the lexer to not use the lookupExpansion X macro
This macro was only used twice, in `beginExpansion` and
`lexer_DumpStringExpansions`, with `getExpansionAtDistance`
already containing an inlined and slightly modified version
of `lookupExpansion` (retaining the `LOOKUP_PRE_NEST` and
`LOOKUP_POST_NEST` macros, but with both of them doing nothing).

Not using an X macro here makes the actual control flow in both
places more obvious, and I think the repeated code is acceptable
for the same reasons as the similar-but-distinct implementations
of `readString`, `appendStringLiteral`, `yylex_NORMAL`, and
`yylex_RAW`.
2021-04-13 17:58:46 +02:00
Rangi
a2f52867ad Rename print to printChar
This clarifies its usage, for printing a single character
in error messages.
2021-04-13 17:41:12 +02:00
Rangi
ab79e6bede Change how print(c) formats reported characters
Printable ASCII becomes single-quoted, using backslash
escapes if necessary. Unprintable characters use 0xNN
formatting, without quotes.
2021-04-13 17:41:12 +02:00
Rangi
850c78aaf4 Report garbage chars as their bytes; don't try decoding them as UTF-8
This decoding required high lookahead, and was not even
consistently useful (the `garbage_char` test case was not
valid UTF-8 and so did not benefit from `reportGarbageChar`).

This limits UTF-8 handling to the `STRLEN` and `STRSUB`
built-in functions, and to charmap conversion.
2021-04-13 17:41:12 +02:00
Rangi
c08cf783c8 Remove 'inline' from functions not in headers 2021-04-13 10:27:08 -04:00
ISSOtm
de7d1facf3 Add assertion that an expansion's total len doesn't overflow
Typically not needed because the recursion depth limit should prevent it,
but it might help debug weird lexer issues.
2021-04-03 18:31:30 +02:00
Rangi
596e17ee61 Factor out a common strlen into beginExpansion
This avoids the possibility of `size` not matching `str`
2021-03-31 14:41:38 -04:00
Rangi
c7ed9a275e Do not expand empty strings
Fixes #813
2021-03-31 10:21:04 -04:00
Jakub Kądziołka
d08bcc455d Handle errors when opening source file
Before this commit, opening a file for which the user didn't have
permission resulted in a "Bad file descriptor" error.
2021-03-30 23:35:50 +02:00
Rangi
17752d7094 Backslash in normal lexer mode must be a line continuation
Macro args were already handled by `peek`, and character escapes
do not exist outside of string literals.

This only affects the error message printed when a non-whitespace
character comes after the backslash. Instead of "Illegal character
escape '%s'", it will print "Begun line continuation, but
encountered character '%s'".
2021-03-26 13:23:05 -04:00
Rangi
aa99ed056c Do not evaluate an untaken ELIF's condition
Fixes #764
2021-03-23 16:20:24 +01:00
Rangi
b8093847dc New definition syntax with leading DEF keyword
This will enable fixing #457 later once the old
definition syntax is removed.
2021-03-19 01:48:36 +01:00
ISSOtm
3ca58e13dc Fix verbose messages claiming non-existent errors
They were confusing when trying to debug other things
2021-03-14 18:52:16 +01:00
ISSOtm
60019cf476 Fix a bunch of Clang warnings
As reported by #789
Should avoid relying on 32-bit int (for implicit conversions)
and account for more extreme uses of RGBDS.
2021-03-10 10:56:57 +01:00