Commit Graph

216 Commits

Author SHA1 Message Date
Rangi
2005ed1df9 Implement CHARLEN and CHARSUB
Fixes #786
2021-04-17 18:18:34 -04:00
Rangi
9923fa3eee Fix expansions that start from the end of another expansion (#839)
Do not free an expansion until its offset is *past* its size.
This means potentially freeing a nested stack of expansions
all at once.

Fixes #696
2021-04-17 13:14:40 -04:00
Rangi
c755fa3469 readIdentifier does not process characters that get truncated
Previously a '.' could be past the truncation limit but still
cause the identifier to be marked as local, violating an
assertion in `sym_AddLocalLabel`.

Fixes #832
2021-04-16 21:15:01 -04:00
Rangi
e78a1d5bfd readInterpolation is limited by nMaxRecursionDepth
Fixes #837
2021-04-16 16:10:46 -04:00
Rangi
5c852c7651 Store the nested expansions starting from the deepest one (#829)
This shortens the lexer by 100 lines and simplifies
access to expansion contents, since it usually needs the
deepest one, not the top-level one.

Fixes #813
2021-04-16 09:54:13 -04:00
Rangi
6be3584467 LexerState's 'size' and 'offset' for mmapped files are unsigned
These were using signed 'off_t' because that is the type of
'st_size' from 'stat()', but neither one can be negative.
2021-04-16 10:23:37 +02:00
Rangi
8c90d9d2d7 Get rid of skip in struct Expansion
This was only used to skip the two macro arg characters,
but shiftChar() can skip them before the expansion.
2021-04-16 10:23:37 +02:00
Rangi
f69e666b00 expansionOfs cannot be negative
lexerState->expansionOfs is always either set to 0, or updated by
adding a positive quantity:

    if (distance > lexerState->expansions->distance) {
        lexerState->expansionOfs += distance - lexerState->expansions->distance;
        ...
    }

so it will always be positive or zero.
2021-04-16 10:23:37 +02:00
Rangi
eba06404f0 peek(0) => peek()
This does not completely refactor `peek` as #708 suggested,
to make it shift and cache a character itself. However it
does simplify the lexer code.
2021-04-16 10:23:37 +02:00
Rangi
9558ccea1b shiftChars(1) => shiftChar()
Only two sites were for distances greater than 1:
a `shiftChars(2)`, trivial to just do two `shiftChar()`s;
and `shiftChars(size)` in `reportGarbageChar`, which
can be a `for` loop, and should be fixed anyway to
"avoid having to peek further than 0".
2021-04-16 10:23:37 +02:00
Rangi
260d372acd Lex $ff00+c without needing large peek lookahead
This also allows arbitrary amounts of whitespace in `$ff00 + c`,
instead of needing to fit in the 42-byte LEXER_BUF_SIZE
2021-04-16 10:23:37 +02:00
Rangi
b3312886fb Use a lookupExpansion, but not as an X macro
Instead of defining `LOOKUP_PRE_NEST` and `LOOKUP_POST_NEST`,
pass a variable name and a block to `lookupExpansion`; it
will assign successive looked-up expansions to the variable
and use them in the block.

The technique of using `__VA_ARGS__` to allow commas within a
block passed to a macro is not original, and should be stable.
2021-04-13 17:58:46 +02:00
Rangi
7fc8a65d0a Refactor the lexer to not use the lookupExpansion X macro
This macro was only used twice, in `beginExpansion` and
`lexer_DumpStringExpansions`, with `getExpansionAtDistance`
already containing an inlined and slightly modified version
of `lookupExpansion` (retaining the `LOOKUP_PRE_NEST` and
`LOOKUP_POST_NEST` macros, but with both of them doing nothing).

Not using an X macro here makes the actual control flow in both
places more obvious, and I think the repeated code is acceptable
for the same reasons as the similar-but-distinct implementations
of `readString`, `appendStringLiteral`, `yylex_NORMAL`, and
`yylex_RAW`.
2021-04-13 17:58:46 +02:00
Rangi
a2f52867ad Rename print to printChar
This clarifies its usage, for printing a single character
in error messages.
2021-04-13 17:41:12 +02:00
Rangi
ab79e6bede Change how print(c) formats reported characters
Printable ASCII becomes single-quoted, using backslash
escapes if necessary. Unprintable characters use 0xNN
formatting, without quotes.
2021-04-13 17:41:12 +02:00
Rangi
850c78aaf4 Report garbage chars as their bytes; don't try decoding them as UTF-8
This decoding required high lookahead, and was not even
consistently useful (the `garbage_char` test case was not
valid UTF-8 and so did not benefit from `reportGarbageChar`).

This limits UTF-8 handling to the `STRLEN` and `STRSUB`
built-in functions, and to charmap conversion.
2021-04-13 17:41:12 +02:00
Rangi
c08cf783c8 Remove 'inline' from functions not in headers 2021-04-13 10:27:08 -04:00
ISSOtm
de7d1facf3 Add assertion that an expansion's total len doesn't overflow
Typically not needed because the recursion depth limit should prevent it,
but it might help debug weird lexer issues.
2021-04-03 18:31:30 +02:00
Rangi
596e17ee61 Factor out a common strlen into beginExpansion
This avoids the possibility of `size` not matching `str`
2021-03-31 14:41:38 -04:00
Rangi
c7ed9a275e Do not expand empty strings
Fixes #813
2021-03-31 10:21:04 -04:00
Jakub Kądziołka
d08bcc455d Handle errors when opening source file
Before this commit, opening a file for which the user didn't have
permission resulted in a "Bad file descriptor" error.
2021-03-30 23:35:50 +02:00
Rangi
17752d7094 Backslash in normal lexer mode must be a line continuation
Macro args were already handled by `peek`, and character escapes
do not exist outside of string literals.

This only affects the error message printed when a non-whitespace
character comes after the backslash. Instead of "Illegal character
escape '%s'", it will print "Begun line continuation, but
encountered character '%s'".
2021-03-26 13:23:05 -04:00
Rangi
aa99ed056c Do not evaluate an untaken ELIF's condition
Fixes #764
2021-03-23 16:20:24 +01:00
Rangi
b8093847dc New definition syntax with leading DEF keyword
This will enable fixing #457 later once the old
definition syntax is removed.
2021-03-19 01:48:36 +01:00
ISSOtm
3ca58e13dc Fix verbose messages claiming non-existent errors
They were confusing when trying to debug other things
2021-03-14 18:52:16 +01:00
ISSOtm
60019cf476 Fix a bunch of Clang warnings
As reported by #789
Should avoid relying on 32-bit int (for implicit conversions)
and account for more extreme uses of RGBDS.
2021-03-10 10:56:57 +01:00
Rangi
365484ef18 Factor out handleCRLF()
This logic repeats ten times
2021-03-03 10:03:52 +01:00
Rangi
6655e04ef0 Restore the "EOF-newline" lexer hack
This was removed in b3c0db218d
(along with two unrelated changes).

Removing this hack introduced issue #742, whereby INCLUDing
a file without a trailing newline can cause a syntax error.

A more proper fix would involve Bison's tracking locations,
but for now the EOF-newline hack fixes the issue while only
affecting some reported errors (expecting "newline"
instead of "end of file").

Fixes #742
2021-03-02 16:53:56 -08:00
daid
5d6e0677d9 Fix error-related issues (#773)
* Mark `error` as a `format` function, to properly scan its format

* Fix the call to error() from parser.y:
  - Use '%s' to avoid passing an arbitrary format
  - Simplify yyerror overall

* Fix size parameter of %.*s format being an int... bonkers standard.

* Report the number of arguments required and provided on a STRFMT mismatch

* Add an assert to check for a very unlikely bug
2021-03-02 16:34:19 +01:00
Rangi
56071599e7 Allow trailing commas in bare lists
This applies to macro arguments, DB, DW, DL, DS,
PRINT, PRINTLN, EXPORT, PURGE, and OPT.

It also removes support for empty entries in DB/DW/DL.
(Deprecating it would require keeping parser support,
which is ambiguous with trailing commas.)

Fixes #753
2021-03-02 11:48:20 +01:00
Rangi
1dafc1c762 Allow empty macro arguments, with a warning
Fixes #739
2021-02-28 10:38:49 -08:00
Rangi
63d15ac8c9 append_yylval_tzString should always evaluate its argument
Fixes #762
2021-02-28 10:38:49 -08:00
Rangi
1f579deaff Trim right whitespace from macro args before warning about length 2021-02-28 10:38:49 -08:00
Rangi
fad48326f8 Support /* block comments */ in macro arguments
Fixes #746
2021-02-28 10:38:49 -08:00
Rangi
3c5e1caa7c Disallow "." as a local label
Fixes #760
2021-02-25 04:40:42 +01:00
Rangi
d4028fff10 Prevent ELIF or ELSE after an ELSE
Fixes #749
2021-02-25 04:39:49 +01:00
Eldred Habert
dafef5a953 Remove column 1 restriction for labels with colons (#635)
Partial fix for #457
2021-02-23 14:42:24 -05:00
ISSOtm
cd072d5e6a Add assertion check for potential UAF trigger
It actually currently triggers if an INCLUDE directive is given at EOF,
but #742 will fix that.
2021-02-19 16:53:30 +01:00
ISSOtm
c29b616f93 Fix forgotten initialization of lexerState->isReferenced 2021-02-18 16:33:06 +01:00
Rangi
39c179aa32 Fix missing newline in a dbgPrint 2021-02-17 13:14:02 -05:00
Rangi
efbfeca292 Avoid two peek(1) calls when lexing """multi-line strings""" 2021-02-17 10:36:36 -05:00
Rangi
d049ffc0f0 Handle string literals within macro arguments (#685)
Fixes #683 and #691

The lexer's raw mode for reading macro args already attempted
to handle semicolons inside string literals, versus outside ones
which start comments. This change reuses the same function for
reading string literals in normal and raw modes, also handling:

- Commas in strings versus between macro args
- Character escapes
- {Interpolations} and \1-\9 args inside vs. outside strings
- Multi-line string literals

Macro args now allow escaping '\', '"', and '\\'.

A consistent model for expanding macro args and interpolations,
within macro args, string literals, and normal context:

- "{S}" should always equal the contents of S
- "\1" should always act like quoting the value of \1
2021-02-16 22:44:25 -05:00
Rangi
96bce05be2 Newlines in multi-line strings update the line number
This affects error and warning messages, and dbgPrint
2021-02-14 16:16:52 +01:00
Rangi
8415ce3ed0 Correct the keywordDict size
The stale keywords XDEF and GLOBAL were removed,
so there can be fewer keyword nodes.
2021-02-14 16:16:52 +01:00
Rangi
ebb5aab6db Correct some comments
nPCOffset no longer exists, and the lexer only
returns T_NEWLINE for EOF in raw mode.
2021-02-14 16:16:52 +01:00
ISSOtm
88e1cc7302 Make EOF token name consistent across Bison versions
The "end of file" name apparently only became a default recently
2021-02-11 13:04:43 +01:00
ISSOtm
b3c0db218d Remove "EOF-newline" lexer hack
In preparation for an upcoming change
Makes for nicer error messages, complaining about EOF instead of newlines
The hack had to be kept for the lexer raw mode to avoid a bug;
see the relevant code comment for more info.
2021-02-11 12:48:37 +01:00
ISSOtm
70bbb098d3 Remove stale keywords
They were removed from the grammar, but not the lexer
2021-01-23 00:05:56 +01:00
ISSOtm
a679e02246 Get rid of asm.h and localasm.h
No need to make them global
2021-01-22 13:34:16 +01:00
ISSOtm
fa0fa4d5ac Significantly overhaul OPT code
Simplify the mess that was option setting (2 redundant variables !?)
Move options to a separate file
Have "modules" own their options, and OPT only access them (less redundancy)
Simplify code, respect naming conventions better
2021-01-22 10:41:09 +01:00