Improve object file format documentation (#1010)

Replacing the big pre-formatted text block with a list brings:
- Better accessibility, obviously
- Responsiveness
- Better formatting (bold, etc.)
- Sub-sections that can now be linked to
- Hyperlink cross-refs to other pages

The slight disadvantage is that `ENDC` etc. are now individual
list items, whereas they'd be better as part of the same item.
No big deal though, it was much worse before.

Some descriptions have been overhauled for clarity, and some
outright corrected (such as Assertions' "Offset" field).

Co-authored-by: Antonio Vivace <avivace4@gmail.com>
This commit is contained in:
Eldred Habert
2022-07-29 22:48:55 +02:00
committed by GitHub
parent 9ec8186ac6
commit f3f2c2ca16

View File

@@ -16,251 +16,381 @@ This is the description of the object files used by
.Xr rgbasm 1 .Xr rgbasm 1
and and
.Xr rgblink 1 . .Xr rgblink 1 .
.Em Please note that the specifications may change . .Em Please note that the specification is not stable yet.
This toolchain is in development and new features may require adding more information to the current format, or modifying some fields, which would break compatibility with older versions. RGBDS is still in active development, and some new features require adding more information to the object file, or modifying some fields, both of which break compatibility with older versions.
.Sh FILE STRUCTURE .Sh FILE STRUCTURE
The following types are used: The following types are used:
.Pp .Pp
.Ar LONG .Cm LONG
is a 32-bit integer stored in little-endian format. is a 32-bit integer stored in little-endian format.
.Ar BYTE .Cm BYTE
is an 8-bit integer. is an 8-bit integer.
.Ar STRING .Cm STRING
is a 0-terminated string of is a 0-terminated string of
.Ar BYTE . .Cm BYTE .
.Bd -literal Brackets after a type
; Header .Pq e.g. Cm LONG Ns Bq Ar n
indicate
BYTE ID[4] ; "RGB9" .Ar n
LONG RevisionNumber ; The format's revision number this file uses. consecutive elements
LONG NumberOfSymbols ; The number of symbols used in this file. .Pq here, Cm LONG Ns s .
LONG NumberOfSections ; The number of sections used in this file. All items are contiguous, with no padding anywhere\(emthis also means that they may not be aligned in the file!
.Pp
; File info .Cm REPT Ar n
indicates that the fields between the
LONG NumberOfNodes ; The number of nodes contained in this file. .Cm REPT
and corresponding
REPT NumberOfNodes ; IMPORTANT NOTE: the nodes are actually written in .Cm ENDR
; **reverse** order, meaning the node with ID 0 is are repeated
; the last one in the file! .Ar n
times.
LONG ParentID ; ID of the parent node, -1 means this is the root. .Pp
All IDs refer to objects within the file; for example, symbol ID $0001 refers to the second symbol defined in
LONG ParentLineNo ; Line at which the parent context was exited. .Em this
; Meaningless on the root node. object file's
.Sx Symbols
BYTE Type ; 0 = REPT node array.
; 1 = File node The only exception is the
; 2 = Macro node .Sx Source file info
nodes, whose IDs are backwards, i.e. source node ID $0000 refers to the
IF Type != 0 ; If the node is not a REPT... .Em last
node in the array, not the first one.
STRING Name ; The node's name: either a file name, or macro name References to other object files are made by imports (symbols), by name (sections), etc.\(embut never by ID.
; prefixed by its definition file name. .Ss Header
.Bl -tag -width Ds -compact
ELSE ; If the node is a REPT, it also contains the iter .It Cm BYTE Ar Magic[4]
; counts of all the parent REPTs. "RGB9"
.It Cm LONG Ar RevisionNumber
LONG Depth ; Size of the array below. The format's revision number this file uses.
.Pq This is always in the same place in all revisions.
LONG Iter[Depth] ; The number of REPT iterations by increasing depth. .It Cm LONG Ar NumberOfSymbols
How many symbols are defined in this object file.
ENDC .It Cm LONG Ar NumberOfSections
How many sections are defined in this object file.
ENDR .El
.Ss Source file info
; Symbols .Bl -tag -width Ds -compact
.It Cm LONG Ar NumberOfNodes
REPT NumberOfSymbols ; Number of symbols defined in this object file. The number of source context nodes contained in this file.
.It Cm REPT Ar NumberOfNodes
STRING Name ; The name of this symbol. Local symbols are stored .Bl -tag -width Ds -compact
; as "Scope.Symbol". .It Cm LONG Ar ParentID
ID of the parent node, -1 meaning that this is the root node.
BYTE Type ; 0 = LOCAL symbol only used in this file. .Pp
; 1 = IMPORT this symbol from elsewhere .Sy Important :
; 2 = EXPORT this symbol to other objects. the nodes are actually written in
.Sy reverse
IF (Type & 0x7F) != 1 ; If symbol is defined in this object file. order, meaning the node with ID 0 is the last one in the list!
.It Cm LONG Ar ParentLineNo
LONG SourceFile ; File where the symbol is defined. Line at which the parent node's context was exited; meaningless for the root node.
.It Cm BYTE Ar Type
LONG LineNum ; Line number in the file where the symbol is defined. .Bl -column "Value" -compact
.It Sy Value Ta Sy Meaning
LONG SectionID ; The section number (of this object file) in which .It 0 Ta REPT node
; this symbol is defined. If it doesn't belong to any .It 1 Ta File node
; specific section (like a constant), this field has .It 2 Ta Macro node
; the value -1. .El
.It Cm IF Ar Type No \(!= 0
LONG Value ; The symbols value. It's the offset into that If the node is not a REPT node...
; symbol's section. .Pp
.Bl -tag -width Ds -compact
ENDC .It Cm STRING Ar Name
The node's name: either a file name, or the macro's name prefixes by its definition's file name
ENDR .Pq e.g. Ql src/includes/defines.asm::error .
.El
; Sections .It Cm ELSE
If the node is a REPT, it also contains the iteration counter of all parent REPTs.
REPT NumberOfSections .Pp
STRING Name ; Name of the section .Bl -tag -width Ds -compact
.It Cm LONG Ar Depth
LONG Size ; Size in bytes of this section .It Cm LONG Ar Iter Ns Bq Ar Depth
The number of REPT iterations, by increasing depth.
BYTE Type ; 0 = WRAM0 .El
; 1 = VRAM .It Cm ENDC
; 2 = ROMX .El
; 3 = ROM0 .It Cm ENDR
; 4 = HRAM .El
; 5 = WRAMX .Ss Symbols
; 6 = SRAM .Bl -tag -width Ds -compact
; 7 = OAM .It Cm REPT Ar NumberOfSymbols
; Bits 7 and 6 are independent from the above value: .Bl -tag -width Ds -compact
; Bit 7 encodes whether the section is unionized .It Cm STRING Ar Name
; Bit 6 encodes whether the section is a fragment This symbol's name.
; Bits 6 and 7 may not be both set at the same time! Local symbols are stored as their full name
.Pq Ql Scope.symbol .
LONG Org ; Address to fix this section at. -1 if the linker should .It Cm BYTE Ar Type
; decide (floating address). .Bl -column "Value" -compact
.It Sy Value Ta Sy Meaning
LONG Bank ; Bank to load this section into. -1 if the linker should .It 0 Ta Sy Local No symbol only used in this file.
; decide (floating bank). This field is only valid for ROMX, .It 1 Ta Sy Import No of an exported symbol (by name) from another object file.
; VRAM, WRAMX and SRAM sections. .It 2 Ta Sy Exported No symbol visible from other object files.
.El
BYTE Align ; Alignment of this section, as N bits. 0 when not specified. .It Cm IF Ar Type No \(!= 1
If the symbol is defined in this object file...
LONG Ofs ; Offset relative to the alignment specified above. .Pp
; Must be below 1 << Align. .Bl -tag -width Ds -compact
.It Cm LONG Ar NodeID
IF (Type == ROMX) || (Type == ROM0) ; Sections that can contain data. Context in which the symbol was defined.
.It Cm LONG Ar LineNo
BYTE Data[Size] ; Raw data of the section. Line number in the context at which the symbol was defined.
.It Cm LONG Ar SectionID
LONG NumberOfPatches ; Number of patches to apply. The ID of the section in which the symbol is defined.
If the symbol doesn't belong to any specific section (i.e. it's a constant), this field contains -1.
REPT NumberOfPatches .It Cm LONG Ar Value
The symbol's value.
LONG SourceFile ; ID of the source file node (for printing If the symbol belongs to a section, this is the offset within that symbol's section.
; error messages). .El
.It Cm ENDC
LONG LineNo ; Line at which the patch was created. .El
.It Cm ENDR
LONG Offset ; Offset into the section where patch should .El
; be applied (in bytes). .Ss Sections
.Bl -tag -width Ds -compact
LONG PCSectionID ; Index within the file of the section in which .It Cm REPT Ar NumberOfSections
; PC is located. .Bl -tag -width Ds -compact
; This is usually the same section that the .It Cm STRING Ar Name
; patch should be applied into, except e.g. The section's name.
; with LOAD blocks. .It Cm LONG Ar Size
The section's size, in bytes.
LONG PCOffset ; PC's offset into the above section. .It Cm BYTE Ar Type
; Used because the section may be floating, so Bits 0\(en2 indicate the section's type:
; PC's value is not known to RGBASM. .Bl -column "Value" -compact
.It Sy Value Ta Sy Meaning
BYTE Type ; 0 = BYTE patch. .It 0 Ta WRAM0
; 1 = little endian WORD patch. .It 1 Ta VRAM
; 2 = little endian LONG patch. .It 2 Ta ROMX
; 3 = JR offset value BYTE patch. .It 3 Ta ROM0
.It 4 Ta HRAM
LONG RPNSize ; Size of the buffer with the RPN. .It 5 Ta WRAMX
; expression. .It 6 Ta SRAM
.It 7 Ta OAM
BYTE RPN[RPNSize] ; RPN expression. Definition below. .El
.Pp
ENDR Bit\ 7 being set means that the section is a "union"
.Pq see Do Unionized sections Dc in Xr rgbasm 5 .
ENDC Bit\ 6 being set means that the section is a "fragment"
.Pq see Do Section fragments Dc in Xr rgbasm 5 .
ENDR These two bits are mutually exclusive.
.It Cm LONG Ar Address
; Assertions Address this section must be placed at.
This must either be valid for the section's
LONG NumberOfAssertions .Ar Type
(as affected by flags like
REPT NumberOfAssertions .Fl t
or
LONG SourceFile ; ID of the source file node (for printing the failure). .Fl d
in
LONG LineNo ; Line at which the assertion was created. .Xr rgblink 1 ) ,
or -1 to indicate that the linker should automatically decide
LONG Offset ; Offset into the section where the assertion is located. .Pq the section is Dq floating .
.It Cm LONG Ar Bank
LONG SectionID ; Index within the file of the section in which PC is ID of the bank this section must be placed in.
; located, or -1 if defined outside a section. This must either be valid for the section's
.Ar Type
LONG PCOffset ; PC's offset into the above section. (with the same caveats as for the
; Used because the section may be floating, so PC's value .Ar Address ) ,
; is not known to RGBASM. or -1 to indicate that the linker should automatically decide.
.It Cm BYTE Ar Alignment
BYTE Type ; 0 = Prints the message but allows linking to continue How many bits of the section's address should be equal to
; 1 = Prints the message and evaluates other assertions, .Ar AlignOfs ,
; but linking fails afterwards starting from the least-significant bit.
; 2 = Prints the message and immediately fails linking .It Cm LONG Ar AlignOfs
Alignment offset.
LONG RPNSize ; Size of the RPN expression's buffer. Must be strictly less than
.Ql 1 << Ar Alignment .
BYTE RPN[RPNSize] ; RPN expression, same as patches. Assert fails if == 0. .It Cm IF Ar Type No \(eq 2 || Ar Type No \(eq 3
If the section has ROM type, it contains data.
STRING Message ; A message displayed when the assert fails. If set to .Pp
; the empty string, a generic message is printed instead. .Bl -tag -width Ds -compact
.It Cm BYTE Ar Data Ns Bq Size
ENDR The section's raw data.
.Ed Bytes that will be patched over must be present, even though their contents will be overwritten.
.Ss RPN DATA .It Cm LONG Ar NumberOfPatches
Expressions in the object file are stored as RPN. How many patches must be applied to this section's
This is an expression of the form .Ar Data .
.Dq 2 5 + . .It Cm REPT Ar NumberOfPatches
This will first push the value .Bl -tag -width Ds -compact
.Do 2 Dc to the stack, then .It Cm LONG Ar NodeID
Context in which the patch was defined.
.It Cm LONG Ar LineNo
Line number in the context at which the patch was defined.
.It Cm LONG Ar Offset
Offset within the section's
.Ar Data
at which the patch should be applied.
Must not be greater than the section's
.Ar Size
minus the patch's size
.Pq see Ar Type No below .
.It Cm LONG Ar PCSectionID
ID of the section in which PC is located.
(This is usually the same section within which the patch is applied, except for e.g.\&
.Ql LOAD
blocks, see
.Do RAM code Dc in Xr rgbasm 5 . )
.It Cm LONG Ar PCOffset
Offset of the PC symbol within the section designated by
.Ar PCSectionID .
It is expected that PC points to the instruction's first byte for instruction operands (i.e.\&
.Ql jp @
must be an infinite loop), and to the patch's first byte otherwise
.Ql ( db ,
.Ql dw ,
.Ql dl ) .
.It Cm BYTE Ar Type
.Bl -column "Value" -compact
.It Sy Value Ta Sy Meaning
.It 0 Ta Single-byte patch
.It 1 Ta Little-endian two-byte patch
.It 2 Ta Little-endian four-byte patch
.It 3 Ta Single-byte Ql jr
patch; the patch's value will be subtracted to PC + 2 (i.e.\&
.Ql jr @
must be the infinite loop
.Ql 18 FE ) .
.El
.It Cm LONG Ar RPNSize
Size of the
.Ar RPNExpr
below.
.It Cm BYTE Ar RPNExpr Ns Bq RPNSize
The patch's value, encoded as a RPN expression
.Pq see Sx RPN EXPRESSIONS .
.El
.It Cm ENDR
.El
.It Cm ENDC
.El
.El
.Ss Assertions
.Bl -tag -width Ds -compact
.It Cm LONG Ar NumberOfAssertions
How many assertions this object file contains.
.It Cm REPT Ar NumberOfAssertions
Assertions are essentially patches with a message.
.Pp
.Bl -tag -width Ds -compact
.It Cm LONG Ar NodeID
Context in which the assertions was defined.
.It Cm LONG Ar LineNo
Line number in the context at which the assertion was defined.
.It Cm LONG Ar Offset
Unused leftover from the patch structure.
.It Cm LONG Ar PCSectionID
ID of the section in which PC is located.
.It Cm LONG Ar PCOffset
Offset of the PC symbol within the section designated by
.Ar PCSectionID .
.It Cm BYTE Ar Type
Describes what should happen if the expression evaluates to a non-zero value.
.Bl -column "Value" -compact
.It Sy Value Ta Sy Meaning
.It 0 Ta Print a warning message, and continue linking normally.
.It 1 Ta Print an error message, so linking will fail, but allow other assertions to be evaluated.
.It 2 Ta Print a fatal error message, and abort immediately.
.El
.It Cm LONG Ar RPNSize
Size of the
.Ar RPNExpr
below.
.It Cm BYTE Ar RPNExpr Ns Bq RPNSize
The patch's value, encoded as a RPN expression
.Pq see Sx RPN EXPRESSIONS .
.It Cm STRING Ar Message
The message displayed if the expression evaluates to a non-zero value.
If empty, a generic message is displayed instead.
.El
.It Cm ENDR
.El
.Ss RPN EXPRESSIONS
Expressions in the object file are stored as RPN, or
.Dq Reverse Polish Notation ,
which is a notation that allows computing arbitrary expressions with just a simple stack.
For example, the expression
.Ql 2 5 -
will first push the value
.Dq 2
to the stack, then
.Dq 5 . .Dq 5 .
The The
.Do + Dc operator pops two arguments from the stack, adds them, and then pushes the result on the stack, effectively replacing the two top arguments with their sum. .Ql -
In the RGB format, RPN expressions are stored as operator pops two arguments from the stack, subtracts them, and then pushes back the result
.Ar BYTE Ns s .Pq Dq 3
with some bytes being special prefixes for integers and symbols. on the stack.
.Bl -column -offset indent "Sy String" "Sy String" A well-formed RPN expression never tries to pop from an empty stack, and leaves exactly one value in it at the end.
.Pp
RGBDS encodes RPN expressions as an array of
.Cm BYTE Ns s .
The first byte encodes either an operator, or a literal, which consumes more
.Cm BYTE Ns s
after it.
.Bl -column -offset Ds "Value"
.It Sy Value Ta Sy Meaning .It Sy Value Ta Sy Meaning
.It Li $00 Ta Li + operator .It Li $00 Ta Addition operator Pq Ql +
.It Li $01 Ta Li - operator .It Li $01 Ta Subtraction operator Pq Ql -
.It Li $02 Ta Li * operator .It Li $02 Ta Multiplication operator Pq Ql *
.It Li $03 Ta Li / operator .It Li $03 Ta Division operator Pq Ql /
.It Li $04 Ta Li % operator .It Li $04 Ta Modulo operator Pq Ql %
.It Li $05 Ta Li unary - .It Li $05 Ta Negation Pq unary Ql -
.It Li $06 Ta Li ** operator .It Li $06 Ta Exponent operator Pq Ql **
.It Li $10 Ta Li \&| operator .It Li $10 Ta Bitwise OR operator Pq Ql \&|
.It Li $11 Ta Li & operator .It Li $11 Ta Bitwise AND operator Pq Ql &
.It Li $12 Ta Li ^ operator .It Li $12 Ta Bitwise XOR operator Pq Ql ^
.It Li $13 Ta Li unary ~ .It Li $13 Ta Bitwise complement operator Pq unary Ql ~
.It Li $21 Ta Li && comparison .It Li $21 Ta Logical AND operator Pq Ql &&
.It Li $22 Ta Li || comparison .It Li $22 Ta Logical OR operator Pq Ql ||
.It Li $23 Ta Li unary \&! .It Li $23 Ta Logical complement operator Pq unary Ql \&!
.It Li $30 Ta Li == comparison .It Li $30 Ta Equality operator Pq Ql ==
.It Li $31 Ta Li != comparison .It Li $31 Ta Non-equality operator Pq Ql !=
.It Li $32 Ta Li > comparison .It Li $32 Ta Greater-than operator Pq Ql >
.It Li $33 Ta Li < comparison .It Li $33 Ta Less-than operator Pq Ql <
.It Li $34 Ta Li >= comparison .It Li $34 Ta Greater-than-or-equal operator Pq Ql >=
.It Li $35 Ta Li <= comparison .It Li $35 Ta Less-than-or-equal operator Pq Ql <=
.It Li $40 Ta Li << operator .It Li $40 Ta Left shift operator Pq Ql <<
.It Li $41 Ta Li >> operator .It Li $41 Ta Arithmetic/signed right shift operator Pq Ql >>
.It Li $42 Ta Li >>> operator .It Li $42 Ta Logical/unsigned right shift operator Pq Ql >>>
.It Li $50 Ta Li BANK(symbol) , .It Li $50 Ta Fn BANK symbol ,
a followed by the
.Ar LONG .Ar symbol Ap s Cm LONG
Symbol ID follows, where -1 means PC ID.
.It Li $51 Ta Li BANK(section_name) , .It Li $51 Ta Fn BANK section ,
a null-terminated string follows. followed by the
.It Li $52 Ta Li Current BANK() .Ar section Ap s Cm STRING
.It Li $53 Ta Li SIZEOF(section_name) , name.
a null-terminated string follows. .It Li $52 Ta PC's Fn BANK Pq i.e. Ql BANK(@) .
.It Li $54 Ta Li STARTOF(section_name) , .It Li $53 Ta Fn SIZEOF section ,
a null-terminated string follows. followed by the
.It Li $60 Ta Li HRAMCheck . .Ar section Ap s Cm STRING
Checks if the value is in HRAM, ANDs it with 0xFF. name.
.It Li $61 Ta Li RSTCheck . .It Li $54 Ta Fn STARTOF section ,
Checks if the value is a RST vector, ORs it with 0xC7. followed by the
.It Li $80 Ta Ar LONG .Ar section Ap s Cm STRING
integer follows. name.
.It Li $81 Ta Ar LONG .It Li $60 Ta Ql ldh
symbol ID follows. check.
Checks if the value is a valid
.Ql ldh
operand
.Pq see Do Load Instructions Dc in Xr gbz80 7 ,
i.e. that it is between either $00 and $FF, or $FF00 and $FFFF, both inclusive.
The value is then ANDed with $00FF
.Pq Ql & $FF .
.It Li $61 Ta Ql rst
check.
Checks if the value is a valid
.Ql rst
.Pq see Do RST vec Dc in Xr gbz80 7
vector, that is one of $00, $08, $10, $18, $20, $28, $30, or $38.
The value is then ORed with $C7
.Pq Ql \&| $C7 .
.It Li $80 Ta Integer literal.
Followed by the
.Cm LONG
integer.
.It Li $81 Ta A symbol's value.
Followed by the symbol's
.Cm LONG
ID.
.El .El
.Sh SEE ALSO .Sh SEE ALSO
.Xr rgbasm 1 , .Xr rgbasm 1 ,