Checkpoint
This commit is contained in:
parent
d6a34075a5
commit
07544f49f2
27 changed files with 2013 additions and 440 deletions
253
STATUS.md
253
STATUS.md
|
|
@ -72,11 +72,13 @@ which runs correctly under MAME (apple2gs).
|
||||||
native object format) for round-tripping with classic dev tools.
|
native object format) for round-tripping with classic dev tools.
|
||||||
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
||||||
libgcc into linkable objects.
|
libgcc into linkable objects.
|
||||||
- `scripts/smokeTest.sh` runs 99 end-to-end checks (scalar ops,
|
- `scripts/smokeTest.sh` runs 102 end-to-end checks (scalar ops,
|
||||||
control flow, calling conventions, MAME execution, regressions,
|
control flow, calling conventions, MAME execution, regressions,
|
||||||
link816 bss-base safety, iigs/toolbox.h compile-check, standalone
|
link816 bss-base safety + weak-symbol resolution +
|
||||||
runtime headers, AsmPrinter peepholes for STZ / PEA / PEI —
|
heap_end-vs-heap_start sanity, iigs/toolbox.h compile-check,
|
||||||
single-STA, shared-LDA-multi-STA, and DPF0-forwarding cases).
|
standalone runtime headers, AsmPrinter peepholes for STZ /
|
||||||
|
PEA / PEI — single-STA, shared-LDA-multi-STA, and DPF0-
|
||||||
|
forwarding cases — malloc/free coalesce ordering).
|
||||||
Currently 100% pass at -O2 throughout.
|
Currently 100% pass at -O2 throughout.
|
||||||
|
|
||||||
**ABI:**
|
**ABI:**
|
||||||
|
|
@ -131,11 +133,10 @@ Two open bugs tracked:
|
||||||
both pass. Workaround comments in build.sh / smokeTest.sh
|
both pass. Workaround comments in build.sh / smokeTest.sh
|
||||||
removed.
|
removed.
|
||||||
|
|
||||||
The `__attribute__((noinline,optnone))` markers on iterative
|
The `__attribute__((noinline,optnone))` defenses on iterative
|
||||||
qsort, RPN `runAll`, and expression-parser `runAll` are kept
|
qsort / RPN `runAll` / expression-parser `runAll` were
|
||||||
for now as defense; with the new backend fixes they may no
|
subsequently dropped; the smoke now compiles them at plain
|
||||||
longer be required, but removing them needs case-by-case
|
`-O2` without escape hatches.
|
||||||
verification.
|
|
||||||
|
|
||||||
The W65816 backend assembler now supports all common indirect
|
The W65816 backend assembler now supports all common indirect
|
||||||
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
|
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
|
||||||
|
|
@ -208,18 +209,45 @@ sidecar bytes.
|
||||||
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
|
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
|
||||||
correct for negative offsets like `arr[i-1]`.
|
correct for negative offsets like `arr[i-1]`.
|
||||||
|
|
||||||
- **(d,s),y for stack-local pointer dereferences uses DBR**, so
|
- **Pointer-deref bank policy is now split-by-syntax** (FIXED):
|
||||||
user code that switches DBR (e.g. `pha;plb` to bank 2 to reach
|
`*p` (where `p` is a runtime pointer / local-or-arg vreg) lowers
|
||||||
IIgs hardware) must not call into a function that takes the
|
via `LDAptr / STAptr / STBptr` to `[$E0],Y` indirect-LONG with
|
||||||
address of one of its locals — the callee's `*p = v` will write
|
the bank byte at `$E2` forced to 0 — DBR-independent. The
|
||||||
to the wrong bank. Documented; no compiler-side mitigation
|
`*(volatile uint16 *)0x5000 = v` MMIO idiom (const-int pointer)
|
||||||
beyond the existing DPF0 fake-physreg routing for the i64-return
|
is matched by a separate TableGen pattern that lowers straight
|
||||||
high half. Workaround: inline pointer-arg helpers so the writes
|
to `STAabs` (DBR-relative) so the smoke tests' bank-2 write
|
||||||
stay in the caller's frame using stack-rel direct stores. The
|
path still works. Two tracked issues this resolved:
|
||||||
W65816 only has three DBR-independent addressing modes
|
(a) PHI-elim was eliding the inserter's `COPY $a = ptr_vreg`
|
||||||
(abs_long, abs_long,X, [dp],Y) — none cheap to retrofit into
|
when the loop body had multiple Acc16 PHIs competing for A —
|
||||||
the current pointer-deref lowering (+5 bytes minimum per access).
|
the inserter now spills the pointer to a fresh stack slot and
|
||||||
Real fix needs PHB/PLB at noinline-pointer-callee entry/exit.
|
reloads via LDAfi to keep RA honest; sumTable now correct.
|
||||||
|
(b) pointer staging through `[$E0]` is bank-0 only, so
|
||||||
|
switchToBank2 + helper-with-local-ptr no longer corrupts data
|
||||||
|
in the wrong bank. See `feedback_dbr_ptr_deref_spill.md`.
|
||||||
|
|
||||||
|
- **Greedy regalloc fails on long-arg call chains** — a function
|
||||||
|
that strings ~7+ independent `helper(longArg1, longArg2)` calls
|
||||||
|
overflows greedy at -O1+ ("ran out of registers during register
|
||||||
|
allocation"). Same root issue as softDouble's old -O2 hold-out.
|
||||||
|
Threshold raised somewhat by expanding IMG slots from 8 to 16
|
||||||
|
(now backed by DP $C0..$DE) — most "normal-looking" mixed-arity
|
||||||
|
workloads now compile, but pathological pressure (many i32+ args
|
||||||
|
+ bitmask SETCC chain) still fails. Workarounds (in order of
|
||||||
|
preference): mark the heaviest helper `__attribute__((noinline))`
|
||||||
|
to reduce caller pressure; `-mllvm -regalloc=fast` for that TU;
|
||||||
|
or `__attribute__((optnone))` on the affected function. A proper
|
||||||
|
fix needs either a custom greedy→fast fallback in
|
||||||
|
`W65816TargetMachine::createTargetRegisterAllocator` or a smarter
|
||||||
|
spill-placement pre-RA pass.
|
||||||
|
|
||||||
|
- **Bank-0 size limit (~48KB)** — the runtime + program must fit in
|
||||||
|
$1000-$BFFF (text+rodata) plus $D000-$DFFF (LC1 for rodata-spill
|
||||||
|
and BSS). Past that, link816 hard-fails because text would
|
||||||
|
cross the IO window. In practice this is rarely hit now that
|
||||||
|
link816 has `--gc-sections` (default ON, see Recently Fixed)
|
||||||
|
which drops unreachable functions: a minimal program shrinks
|
||||||
|
from ~43KB (whole runtime) to ~1.5KB. Programs that genuinely
|
||||||
|
use most of the runtime can still hit the limit.
|
||||||
|
|
||||||
## Recently fixed
|
## Recently fixed
|
||||||
|
|
||||||
|
|
@ -288,24 +316,173 @@ sidecar bytes.
|
||||||
also removes two PHA/PLA save-restore wraps around the LDA #0
|
also removes two PHA/PLA save-restore wraps around the LDA #0
|
||||||
(STZ doesn't touch A, so the wraps are unnecessary).
|
(STZ doesn't touch A, so the wraps are unnecessary).
|
||||||
|
|
||||||
|
- **libgcc.s `lda dp; pha` -> `pei dp`** — 2 sites in __divhi3 /
|
||||||
|
__modhi3 where the loaded A is dead after the push. PEI
|
||||||
|
doesn't touch A, saves 1 byte each.
|
||||||
|
|
||||||
|
- **W65816StackSlotCleanup Pass 1c skip-list extended** — added
|
||||||
|
STAabs / STA8abs / STAptr / STBptr / STAptrOff / STBptrOff and
|
||||||
|
ADJCALLSTACKDOWN to the A-transparent list. Lets the redundant-
|
||||||
|
CMP-after-A-modifier elimination see through more pseudo
|
||||||
|
stores and the call-stack-down pseudo. Saves 8 bytes in math.o.
|
||||||
|
(ADJCALLSTACKUP is NOT transparent — when PEI doesn't process
|
||||||
|
it, AsmPrinter emits a TSC/CLC/ADC/TCS that clobbers A.)
|
||||||
|
|
||||||
|
- **crt0.s `lda #0; sta` -> `stz`** — IRQ-disable block and the
|
||||||
|
BSS-zero loop both used `.byte 0xa9, 0x00 ; sta` raw-byte
|
||||||
|
workarounds for `lda #0` (the assembler emits a 16-bit immediate
|
||||||
|
in M=8, mis-encoding it). `stz` works in M=8 (stores 1 byte) and
|
||||||
|
doesn't touch A — both `.byte` workarounds removed; saves 4 bytes
|
||||||
|
in crt0.o.
|
||||||
|
|
||||||
|
- **Runtime correctness pass — five real bugs fixed:**
|
||||||
|
- `free()` coalesce: when a freed block was absorbed into a
|
||||||
|
lower-address neighbour (`bEnd == a` path), the absorbed entry
|
||||||
|
was left in the free list overlapping the extended one. A
|
||||||
|
follow-on malloc could hand out the same memory to two
|
||||||
|
callers. Fix: track outer-loop predecessor and excise the
|
||||||
|
absorbed entry. Smoke #100 added.
|
||||||
|
- `sqrt(-0.0)` returned NaN; should return -0.0 per IEEE-754.
|
||||||
|
The sign-bit check fired before the zero check. Fix: mask
|
||||||
|
sign bit when testing for zero.
|
||||||
|
- `log(0)` returned NaN; should return -Infinity (pole error).
|
||||||
|
Same sign-bit-vs-zero ordering issue; both ±0 now return
|
||||||
|
`-1.0/0.0`.
|
||||||
|
- `snprintf(buf, 0, ...)` wrote `'\0'` to `buf[-1]` (one byte
|
||||||
|
BEFORE the buffer). C99 says n=0 must not touch the buffer.
|
||||||
|
Fix: set `gEnd = NULL` for n=0 so neither the normal nor the
|
||||||
|
truncation NUL-write path fires. Smoke #76 extended.
|
||||||
|
- `malloc(>~32KB)` and `calloc(n, m)` had silent integer overflow
|
||||||
|
on size_t (16-bit), wrapping to small values and handing out
|
||||||
|
tiny allocations claiming huge sizes. Bumped malloc to bail
|
||||||
|
above 0x7FF0 (heap is at most ~32KB anyway) and made calloc
|
||||||
|
overflow-check before multiplying.
|
||||||
|
|
||||||
|
- **Removed** dead `runtime/src/softDouble.s` (a stub from before
|
||||||
|
`softDouble.c` was implemented; the build script doesn't reference
|
||||||
|
it but it was confusing to leave around).
|
||||||
|
|
||||||
|
- **inttypes.h PRId64 / PRIu64 / PRIx64** documented as
|
||||||
|
unsupported in the runtime's printf — the macros expand to
|
||||||
|
`"lld"`/`"llu"`/`"llx"` but the formatter only knows the `l`
|
||||||
|
length modifier, not `ll`, so the format prints literally and
|
||||||
|
the va_list misaligns. Use `PRId32` etc. for now.
|
||||||
|
|
||||||
|
- **More runtime fixes (round 2):**
|
||||||
|
- `fputs(s, stream)` was forwarding to `puts(s)`, which appends a
|
||||||
|
newline. C says fputs MUST NOT add one. Direct char-by-char
|
||||||
|
write now.
|
||||||
|
- `exit(code)` never invoked the registered `atexit` handler.
|
||||||
|
C99 7.20.4.3 requires it. Now runs the single-slot handler
|
||||||
|
(with re-entry guard) before the BRK.
|
||||||
|
- `printf("%f", -0.0)` printed `0.000000` instead of `-0.000000`
|
||||||
|
because `if (v < 0)` (a `__ltdf2` call) returns false for
|
||||||
|
negative zero. Switched to the IEEE-754 sign-bit test that
|
||||||
|
snprintf already uses.
|
||||||
|
- `vfprintf` was missing entirely (declared neither in stdio.h
|
||||||
|
nor implemented). Added a thin wrapper around vprintf.
|
||||||
|
|
||||||
|
- **link816 weak-symbol resolution:** the linker previously used
|
||||||
|
"last def wins" with no regard for STB_GLOBAL vs STB_WEAK. When
|
||||||
|
a user provided a strong override of a weak libc stub (e.g.
|
||||||
|
`putchar`), it worked only by link-order luck — reversing the
|
||||||
|
order let the weak stub silently overwrite the strong def.
|
||||||
|
Now properly: strong over weak (any order), strong + strong
|
||||||
|
errors out, weak + weak picks the first. Smoke #100 added.
|
||||||
|
|
||||||
|
- **More runtime fixes (round 3):**
|
||||||
|
- `writeHex` / `emitHex` had a stack-overflow buffer overrun
|
||||||
|
(`char buf[5]` but `printf("%08x", ...)` would write 8 bytes).
|
||||||
|
On 16-bit `unsigned int`, max useful width is 4 — buf shrunk
|
||||||
|
to 4 and width is now capped.
|
||||||
|
- `writeDec` / `writeSignedLong` / `emitDec` / `emitSignedLong`
|
||||||
|
used `-n` on signed input, which overflows for INT_MIN /
|
||||||
|
LONG_MIN (UB). All four switched to unsigned-negation
|
||||||
|
(`0u - (unsigned)n`) for correctness and to keep an
|
||||||
|
optimizer-aware compiler from exploiting the UB.
|
||||||
|
- `atoi` / `atol` / `strtol` / `strtoul` likewise built the
|
||||||
|
parsed magnitude in a signed accumulator and negated at the
|
||||||
|
end — same UB on the boundary value. All switched to
|
||||||
|
unsigned magnitude + unsigned-negation cast.
|
||||||
|
- `link816 parseInt` / `omfEmit parseInt` silently truncated
|
||||||
|
addresses > 24 bits to `uint32_t` low bits — `--text-base
|
||||||
|
0x100000000` would silently wrap to 0. Both now reject
|
||||||
|
out-of-range addresses with a clear error.
|
||||||
|
|
||||||
|
- **More runtime fixes (round 4):**
|
||||||
|
- `pow(x, y)` computed `n = -n` for the integer-y branch when
|
||||||
|
yi was INT_MIN (-32768); same signed-overflow UB pattern as
|
||||||
|
the print functions. Switched to unsigned magnitude.
|
||||||
|
- Added `perror(prefix)` — was missing from the runtime; common
|
||||||
|
pattern in portable code that reports I/O failure via
|
||||||
|
`errno + strerror`. Declared in stdio.h, implemented as
|
||||||
|
char-by-char emit through putchar (no fprintf dependency).
|
||||||
|
|
||||||
|
- **link816 `__heap_end` was hardcoded at $BF00**, ignoring where
|
||||||
|
`__heap_start` actually ended up. When BSS got auto-relocated
|
||||||
|
into LC1 ($D000+), heap_start ended up > heap_end and malloc
|
||||||
|
immediately returned NULL on every call — silently bricking any
|
||||||
|
program that allocated dynamic memory after the runtime grew
|
||||||
|
past the default-bss threshold. Heap_end now picks
|
||||||
|
$BF00 / $E000 based on where heap_start lands (and skips the IO
|
||||||
|
window if heap_start would have landed in $C000-$CFFF).
|
||||||
|
Smoke #102 added.
|
||||||
|
|
||||||
|
- **link816 rodata auto-skips IIgs IO window** ($C000-$CFFF). When
|
||||||
|
text+rodata grew past 0xC000 the rodata bytes silently corrupted
|
||||||
|
at runtime — string literals in the IO range read back as
|
||||||
|
hardware register values, breaking strcmp / strstr / printf / etc.
|
||||||
|
Now: rodata that would land in or cross $C000-$CFFF auto-skips
|
||||||
|
to $D000. Init_array gets the same treatment. Text that would
|
||||||
|
cross IO is hard-rejected at link time (no auto-fix possible —
|
||||||
|
PC fetches in IO would read hardware registers). This was the
|
||||||
|
root cause of the "tan/tanf triggers layout-sensitive failure"
|
||||||
|
symptom listed in older STATUS notes.
|
||||||
|
|
||||||
|
- **runInMame skips writes to IO window** during the binary load.
|
||||||
|
Without this, the zero-padding in the rodata-skip gap would
|
||||||
|
clobber soft switches (e.g. the LC1 RAM enable that crt0 sets
|
||||||
|
via $C083) when the loader naively wrote the entire image
|
||||||
|
byte-by-byte to memory.
|
||||||
|
|
||||||
|
- **link816 `--gc-sections` (default ON)** — discards sections not
|
||||||
|
reachable from the entry point (`__start` / `_start` / `main`
|
||||||
|
for the canonical crt0 setup) plus all `.init_array` sections.
|
||||||
|
Built on `-ffunction-sections` so each function is in its own
|
||||||
|
section. A minimal program with full runtime linked shrinks
|
||||||
|
from ~43KB to ~1.5KB. Adding `tan/tanf` to math.c (which
|
||||||
|
caused the latent layout-sensitive failure described above)
|
||||||
|
no longer pushes any test past the bank-0 limit. Tests that
|
||||||
|
intentionally check unreachable symbols pass `--no-gc-sections`
|
||||||
|
to opt out.
|
||||||
|
|
||||||
|
- **`fwrite(stdout, ...)` was a stub returning 0** even though
|
||||||
|
`stdout` has a working `putchar` route. Now actually writes
|
||||||
|
through `putchar` for stdout/stderr (only). Also gained the
|
||||||
|
same `size * nmemb` overflow guard as `calloc`.
|
||||||
|
|
||||||
## What's still needed for a "ship-ready" toolchain
|
## What's still needed for a "ship-ready" toolchain
|
||||||
|
|
||||||
- **softDouble.c -O1 hold-out** — `__muldf3`'s u64 lifetime pressure
|
- **softDouble.c -O2 — FIXED.** Marking `dclass` noinline (in
|
||||||
overflows the greedy register allocator at -O2 ("ran out of
|
addition to `dpack`) drops register pressure in `__muldf3`/
|
||||||
registers during register allocation"). Builds correctly at
|
`__divdf3`/`__adddf3` enough that greedy regalloc no longer
|
||||||
-O1. Investigated: marking dpack noinline reduces pressure but
|
runs out. The previous blocker was that noinline-dclass would
|
||||||
isn't enough; making dclass noinline would unblock -O2 (verified)
|
write through pointer args via the DBR-relative `(d,s),y` mode
|
||||||
but the (d,s),y-uses-DBR bug then corrupts dclass's pointer-arg
|
and corrupt caller data after a bank switch — that path now
|
||||||
writes when a caller has switched DBR (caught by smoke's
|
goes through `STAptr/STBptr` which use `[$E0],Y` indirect-long
|
||||||
dmul-after-bank-switch test). Real fix is gated on the broader
|
with the bank byte forced to 0, so DBR is irrelevant. All
|
||||||
DBR-pointer-deref limitation listed above.
|
three smoke build sites moved to `-O2`.
|
||||||
|
|
||||||
|
|
||||||
- **More of the C standard library**: real `<stdio.h>` file I/O
|
- **More of the C standard library**: real `<stdio.h>` file I/O
|
||||||
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
|
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
|
||||||
returning success/zero) — would need a memory-backed FS or a
|
returning success/zero) — would need a memory-backed FS or a
|
||||||
MAME hook. `<locale.h>` / `<signal.h>` are stubbed (compile and
|
MAME hook. `<locale.h>` / `<signal.h>` / `<time.h>` are stubbed
|
||||||
return safe defaults); `<wchar.h>` / `<time.h>` mostly absent.
|
(compile and return safe defaults). `<wchar.h>` mostly absent.
|
||||||
|
A `time()` impl wired to ReadTimeHex (Misc Tool $0D03) was
|
||||||
|
attempted but crashes MAME without the Tool Locator initialised
|
||||||
|
in crt0; `clock()` via VBL counter at $E1006B needs 24-bit
|
||||||
|
far-pointer support that the backend doesn't yet model.
|
||||||
|
|
||||||
- **C++ runtime support**: vtable layout for multiple inheritance,
|
- **C++ runtime support**: vtable layout for multiple inheritance,
|
||||||
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
|
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
|
||||||
|
|
@ -315,9 +492,15 @@ sidecar bytes.
|
||||||
whether any 8-bit accumulator value is used. A per-region
|
whether any 8-bit accumulator value is used. A per-region
|
||||||
scheduler would reduce the SEP/REP wrap overhead on i8 stores.
|
scheduler would reduce the SEP/REP wrap overhead on i8 stores.
|
||||||
|
|
||||||
- **Toolbox / IIgs system call bindings**: header files declaring
|
- **Toolbox / IIgs system call bindings**: `iigs/toolbox.h` covers
|
||||||
the Apple IIgs system calls (`SystemTask`, `WaitMouseUp`,
|
the common entry points across Tool Locator, Memory Manager,
|
||||||
`DrawString`, …) with the right inline-asm dispatch glue.
|
Misc Tools, QuickDraw II, Event Manager, Window Manager, plus
|
||||||
|
GS/OS Quit. Multi-arg wrappers (NewHandle, QDStartUp, MoveTo,
|
||||||
|
EMStartUp, GetNextEvent, NewWindow, CloseWindow) live in
|
||||||
|
`runtime/src/iigsToolbox.s` because the backend's inline-asm
|
||||||
|
constraints can't take memory operands. Single-arg / no-arg
|
||||||
|
wrappers stay inline. More routines (Menu Manager, Dialog
|
||||||
|
Manager, Standard File, Sound) still TBD.
|
||||||
|
|
||||||
- **Real-world program coverage**: the smoke tests are
|
- **Real-world program coverage**: the smoke tests are
|
||||||
microbenchmarks. A few known-good Apple IIgs C programs (e.g.
|
microbenchmarks. A few known-good Apple IIgs C programs (e.g.
|
||||||
|
|
|
||||||
|
|
@ -1,25 +1,27 @@
|
||||||
// IIgs toolbox helpers — minimal inline-asm wrappers for the most
|
// IIgs toolbox helpers — wrappers for commonly-used Apple IIgs system
|
||||||
// commonly-used Apple IIgs system calls.
|
// calls.
|
||||||
//
|
//
|
||||||
// Toolbox dispatch on the IIgs goes through the Tool Locator at
|
// Toolbox dispatch on the IIgs goes through the Tool Locator at
|
||||||
// $E10000. Each routine is identified by a 16-bit "tool number"
|
// $E10000. Each routine is identified by a 16-bit "tool number"
|
||||||
// (low byte = tool set, high byte = function within set), loaded
|
// (high byte = function within set, low byte = tool set), loaded
|
||||||
// into X, and called via JSL $E10000.
|
// into X, and called via JSL $E10000.
|
||||||
//
|
//
|
||||||
// Args go on the stack (push order: rightmost first), then the
|
// GS/OS dispatch goes through $E100A8 with X holding the call
|
||||||
// caller pushes a result-space slot if the routine returns something
|
// number and a parameter-block pointer pushed on the stack.
|
||||||
// non-i16-or-pointer, then JSL.
|
|
||||||
//
|
//
|
||||||
// This header keeps things simple: each function inlines a tiny
|
// Calling convention:
|
||||||
// asm block specific to that call. No #include guards on bigger
|
// - Args go on the stack (push order: rightmost first), then the
|
||||||
// abstractions; users that want full toolbox coverage should write
|
// caller pushes a result-space slot (16 or 32 bits) BEFORE
|
||||||
// their own wrappers using the same pattern.
|
// the args if the routine returns something non-void.
|
||||||
|
// - The result is read off the same stack slot AFTER JSL.
|
||||||
|
// - Tool number lives in X immediately before JSL.
|
||||||
|
// - Tools clobber A, X, Y, P; the runtime spills around the call.
|
||||||
//
|
//
|
||||||
// LIMITATIONS:
|
// Single-arg / no-arg wrappers are `static inline`. Multi-arg
|
||||||
// - Only a handful of routines wrapped. Calypsi has full toolbox.
|
// wrappers are declared `extern` here and implemented in
|
||||||
// - No error-handling — caller checks the return.
|
// runtime/src/iigsToolbox.s — backend constraints don't allow
|
||||||
// - Single-bank only. Cross-bank toolbox calls need different
|
// memory-operand inline asm so the multi-arg pushes need real
|
||||||
// dispatch logic.
|
// .s code.
|
||||||
|
|
||||||
#ifndef IIGS_TOOLBOX_H
|
#ifndef IIGS_TOOLBOX_H
|
||||||
#define IIGS_TOOLBOX_H
|
#define IIGS_TOOLBOX_H
|
||||||
|
|
@ -28,81 +30,284 @@
|
||||||
extern "C" {
|
extern "C" {
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// Tool number convention: high byte = function, low byte = tool set.
|
// ===== Tool numbers (high byte = function, low byte = tool set) =====
|
||||||
// Common tool sets: 04 = Misc, 0E = QuickDraw II, 18 = Window Mgr.
|
// Tool sets:
|
||||||
|
// 01 = Tool Locator 02 = Memory Manager 03 = Misc Tools
|
||||||
|
// 04 = QuickDraw II 06 = Event Manager 0E = Window Manager
|
||||||
|
// 1B = Menu Manager 29 = Standard File
|
||||||
|
|
||||||
// Misc Tool Set ---------------------------------------------------
|
// =====================================================================
|
||||||
|
// Tool Locator (Set $01)
|
||||||
// WriteCString (Misc Tool $290B) — write a NUL-terminated string to
|
// =====================================================================
|
||||||
// the text screen. Arg: 16-bit pointer pushed before the call.
|
static inline void TBoxTLStartUp(void) {
|
||||||
// Returns nothing.
|
|
||||||
static inline void TBoxWriteCString(const char *s) {
|
|
||||||
__asm__ volatile (
|
__asm__ volatile (
|
||||||
"pha\n" // push C-string pointer
|
"ldx #0x0201\n"
|
||||||
"ldx #0x290B\n" // tool number (function 0x29, set 0x0B)
|
"jsl 0xe10000\n"
|
||||||
"jsl 0xe10000\n" // tool dispatcher
|
|
||||||
:
|
:
|
||||||
: "a"(s)
|
:
|
||||||
|
: "a", "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void TBoxTLShutDown(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"ldx #0x0301\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
:
|
||||||
|
: "a", "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// Memory Manager (Set $02)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// MMStartUp — call as the first MM routine. Returns the caller's
|
||||||
|
// 16-bit userId; save it for later DisposeAll calls.
|
||||||
|
static inline unsigned short TBoxMMStartUp(void) {
|
||||||
|
unsigned short id;
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n" // result space
|
||||||
|
"ldx #0x0202\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
"pla\n"
|
||||||
|
: "=a"(id)
|
||||||
|
:
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
return id;
|
||||||
|
}
|
||||||
|
|
||||||
|
// MMShutDown — releases all MM resources owned by `userId`.
|
||||||
|
static inline void TBoxMMShutDown(unsigned short userId) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x0302\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(userId)
|
||||||
: "x", "y", "memory"
|
: "x", "y", "memory"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// SysBeep (Misc Tool $0303) — short beep through the speaker.
|
// NewHandle / DisposeHandle live in iigsToolbox.s — the parameter
|
||||||
|
// blocks are 4-arg with mixed widths and need explicit asm.
|
||||||
|
extern unsigned long TBoxNewHandle(unsigned long size,
|
||||||
|
unsigned short userId,
|
||||||
|
unsigned short attr,
|
||||||
|
unsigned long addr);
|
||||||
|
extern void TBoxDisposeHandle(unsigned long handle);
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// Misc Tools (Set $03)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// SysBeep — short beep through the speaker.
|
||||||
static inline void TBoxBeep(void) {
|
static inline void TBoxBeep(void) {
|
||||||
__asm__ volatile (
|
__asm__ volatile (
|
||||||
"ldx #0x0303\n"
|
"ldx #0x0303\n"
|
||||||
"jsl 0xe10000\n"
|
"jsl 0xe10000\n"
|
||||||
:
|
:
|
||||||
:
|
:
|
||||||
: "x", "y", "memory"
|
: "a", "x", "y", "memory"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// ReadKey (Event Mgr; simplified — actually KeyTrans/etc). Returns
|
// WriteCString — Misc Tool $0B; writes a NUL-terminated string to
|
||||||
// the next pending key in A, or 0 if none. This wraps GetNextEvent
|
// the text screen. Note: actual GS uses Text Tools or stdio;
|
||||||
// internally on a real GS; for the simple console harness it polls
|
// this is the legacy entry point.
|
||||||
// the keyboard buffer.
|
static inline void TBoxWriteCString(const char *s) {
|
||||||
static inline char TBoxReadKey(void) {
|
|
||||||
char r;
|
|
||||||
__asm__ volatile (
|
__asm__ volatile (
|
||||||
"ldx #0x250A\n" // GetEvent (placeholder; refine in real port)
|
"pha\n"
|
||||||
|
"ldx #0x290B\n"
|
||||||
"jsl 0xe10000\n"
|
"jsl 0xe10000\n"
|
||||||
: "=a"(r)
|
|
||||||
:
|
:
|
||||||
|
: "a"(s)
|
||||||
: "x", "y", "memory"
|
: "x", "y", "memory"
|
||||||
);
|
);
|
||||||
return r;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// ConsoleQuit — clean program shutdown via GS/OS Quit. Pushes a
|
// ReadAsciiTime — fills a 20-byte buffer with the current time
|
||||||
// pConditionTbl pointer (here, 0 for no condition) before JSL.
|
// formatted as "DDD MMM dd hh:mm:ss yyyy".
|
||||||
|
static inline void TBoxReadAsciiTime(char *buf20) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x0F03\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(buf20)
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// QuickDraw II (Set $04)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// QDStartUp / QDShutDown. Multi-arg startup lives in iigsToolbox.s.
|
||||||
|
extern void TBoxQDStartUp(unsigned short masterSCB,
|
||||||
|
unsigned short pageSize,
|
||||||
|
unsigned short userId);
|
||||||
|
|
||||||
|
static inline void TBoxQDShutDown(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"ldx #0x0304\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
:
|
||||||
|
: "a", "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// MoveTo — move the pen to absolute (h, v).
|
||||||
|
extern void TBoxMoveTo(short h, short v);
|
||||||
|
|
||||||
|
// DrawString — draw a Pascal-style length-prefixed string at the
|
||||||
|
// current pen position. First byte of `pstr` must be the length.
|
||||||
|
static inline void TBoxDrawString(const char *pstr) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x2C04\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(pstr)
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// PaintRect / FrameRect / EraseRect — rect is a 16-bit pointer to a
|
||||||
|
// 4-word Rect (top, left, bottom, right).
|
||||||
|
static inline void TBoxPaintRect(const short *rect) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x5104\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(rect)
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void TBoxFrameRect(const short *rect) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x4F04\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(rect)
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void TBoxEraseRect(const short *rect) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"pha\n"
|
||||||
|
"ldx #0x5004\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
: "a"(rect)
|
||||||
|
: "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// Event Manager (Set $06)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// EMStartUp — initialises Event Manager with default queue and
|
||||||
|
// 640x200 mouse clamp. Args other than userId are hardcoded; if
|
||||||
|
// you need custom clamp, write your own wrapper.
|
||||||
|
extern void TBoxEMStartUp(unsigned short userId);
|
||||||
|
|
||||||
|
static inline void TBoxEMShutDown(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"ldx #0x0306\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
:
|
||||||
|
: "a", "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// SystemTask — gives time to background tasks. Call regularly in
|
||||||
|
// event loops.
|
||||||
|
static inline void TBoxSystemTask(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"ldx #0x0306\n"
|
||||||
|
"jsl 0xe10000\n"
|
||||||
|
:
|
||||||
|
:
|
||||||
|
: "a", "x", "y", "memory"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNextEvent — fills the EventRecord pointed at by `theEvent`
|
||||||
|
// with the next event matching `eventMask`. Returns nonzero if an
|
||||||
|
// event was returned.
|
||||||
|
//
|
||||||
|
// EventRecord layout (16 bytes): what(2) message(4) when(4) where(4)
|
||||||
|
// modifiers(2).
|
||||||
|
extern unsigned short TBoxGetNextEvent(unsigned short eventMask, void *theEvent);
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// Window Manager (Set $0E)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// NewWindow — allocate and display a new window. paramList points
|
||||||
|
// to a NewWindow parameter block (in-bank 16-bit pointer). Returns
|
||||||
|
// a 32-bit window pointer.
|
||||||
|
extern void *TBoxNewWindow(const void *paramList);
|
||||||
|
|
||||||
|
// CloseWindow — tear down a window. Takes a 32-bit window pointer.
|
||||||
|
extern void TBoxCloseWindow(void *winPtr);
|
||||||
|
|
||||||
|
// =====================================================================
|
||||||
|
// GS/OS (dispatcher at $E100A8)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
|
// Quit — clean program shutdown via GS/OS. pConditionTbl = 0
|
||||||
|
// (no resume condition). Does not return.
|
||||||
static inline void TBoxQuit(void) {
|
static inline void TBoxQuit(void) {
|
||||||
__asm__ volatile (
|
__asm__ volatile (
|
||||||
"pea 0\n" // pConditionTbl = NULL
|
"pea 0\n" // pConditionTbl
|
||||||
"pea 0\n" // pParm
|
"pea 0\n" // pParm
|
||||||
"ldx #0x2029\n" // GS/OS Quit
|
"ldx #0x2029\n" // GS/OS Quit
|
||||||
"jsl 0xe100a8\n" // GS/OS dispatcher (different addr)
|
"jsl 0xe100a8\n"
|
||||||
:
|
:
|
||||||
:
|
:
|
||||||
: "x", "y", "memory"
|
: "a", "x", "y", "memory"
|
||||||
);
|
);
|
||||||
while (1) {} // unreachable
|
while (1) {} // unreachable
|
||||||
}
|
}
|
||||||
|
|
||||||
// QuickDraw II ----------------------------------------------------
|
// =====================================================================
|
||||||
|
// Helpers — direct hardware polling (no toolbox)
|
||||||
|
// =====================================================================
|
||||||
|
|
||||||
// QDStartUp / QDShutDown (sketches — real ones take more args).
|
// ReadKey — poll the IIgs keyboard latch at $C000 directly.
|
||||||
// Real apps typically use QuickDraw II via the "shell" startup
|
// Returns the ASCII byte (0 if no key ready). Strobes $C010 to
|
||||||
// sequence; this is for educational/sim scenarios.
|
// clear the latch. Does NOT use Event Manager — for a real GS
|
||||||
static inline void TBoxQDStartUp(void) {
|
// app, use TBoxGetNextEvent and pull from the queue instead.
|
||||||
|
static inline char TBoxReadKey(void) {
|
||||||
|
char r = 0;
|
||||||
__asm__ volatile (
|
__asm__ volatile (
|
||||||
"pea 0\n" "pea 0\n" "pea 0\n" // dummy direct-page handle
|
"sep #0x20\n" // 8-bit A
|
||||||
"ldx #0x0204\n"
|
"lda 0xc000\n"
|
||||||
"jsl 0xe10000\n"
|
"bpl 1f\n"
|
||||||
|
"sta 0xc010\n" // strobe
|
||||||
|
"and #0x7f\n"
|
||||||
|
"bra 2f\n"
|
||||||
|
"1:\n"
|
||||||
|
"lda #0\n"
|
||||||
|
"2:\n"
|
||||||
|
"rep #0x20\n"
|
||||||
|
"and #0x00ff\n"
|
||||||
|
: "=a"(r)
|
||||||
:
|
:
|
||||||
:
|
: "memory"
|
||||||
: "x", "y", "memory"
|
|
||||||
);
|
);
|
||||||
|
return r;
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef __cplusplus
|
#ifdef __cplusplus
|
||||||
|
|
|
||||||
|
|
@ -10,9 +10,14 @@
|
||||||
|
|
||||||
// (strtoimax / strtoumax not implemented — runtime has strtol /
|
// (strtoimax / strtoumax not implemented — runtime has strtol /
|
||||||
// strtoul for the 32-bit forms which cover the common needs.)
|
// strtoul for the 32-bit forms which cover the common needs.)
|
||||||
|
//
|
||||||
// PRIxN format macros. `int` is 16-bit on W65816, `long` is 32,
|
// **WARNING — limited printf support.** The runtime's printf /
|
||||||
// `long long` is 64.
|
// snprintf understand the `l` length modifier (long, 32-bit) but
|
||||||
|
// NOT `ll` (long long, 64-bit). Using PRId64 / PRIu64 / PRIx64
|
||||||
|
// will compile but the runtime treats the format as a literal
|
||||||
|
// "%lld" rather than reading 8 bytes off the va_list — wrong output
|
||||||
|
// AND a stack misalignment for any subsequent args. For 32-bit
|
||||||
|
// values, PRId32 / PRIu32 / PRIx32 work correctly.
|
||||||
|
|
||||||
#define PRId8 "d"
|
#define PRId8 "d"
|
||||||
#define PRIi8 "i"
|
#define PRIi8 "i"
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,8 @@ double sin (double x);
|
||||||
float sinf (float x);
|
float sinf (float x);
|
||||||
double cos (double x);
|
double cos (double x);
|
||||||
float cosf (float x);
|
float cosf (float x);
|
||||||
|
double tan (double x);
|
||||||
|
float tanf (float x);
|
||||||
double exp (double x);
|
double exp (double x);
|
||||||
float expf (float x);
|
float expf (float x);
|
||||||
double log (double x);
|
double log (double x);
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,8 @@ int snprintf(char *buf, size_t n, const char *fmt, ...);
|
||||||
int vsprintf(char *buf, const char *fmt, va_list ap);
|
int vsprintf(char *buf, const char *fmt, va_list ap);
|
||||||
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap);
|
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap);
|
||||||
int fprintf(FILE *stream, const char *fmt, ...);
|
int fprintf(FILE *stream, const char *fmt, ...);
|
||||||
|
int vfprintf(FILE *stream, const char *fmt, va_list ap);
|
||||||
|
void perror(const char *prefix);
|
||||||
int fputc(int c, FILE *stream);
|
int fputc(int c, FILE *stream);
|
||||||
int fputs(const char *s, FILE *stream);
|
int fputs(const char *s, FILE *stream);
|
||||||
int fflush(FILE *stream);
|
int fflush(FILE *stream);
|
||||||
|
|
|
||||||
|
|
@ -24,12 +24,13 @@ __start:
|
||||||
rep #0x30
|
rep #0x30
|
||||||
; Disable IIgs peripheral interrupt sources at the chip level —
|
; Disable IIgs peripheral interrupt sources at the chip level —
|
||||||
; SEI alone leaves the hardware lines asserted, and the IRQ trap
|
; SEI alone leaves the hardware lines asserted, and the IRQ trap
|
||||||
; in ROM keeps re-firing if the source isn't quiesced.
|
; in ROM keeps re-firing if the source isn't quiesced. STZ
|
||||||
|
; stores zero without going through A; in M=8 it stores 1 byte
|
||||||
|
; (matching the 8-bit registers), so no LDA #0 prelude is needed.
|
||||||
sep #0x20
|
sep #0x20
|
||||||
.byte 0xa9, 0x00 ; lda #$00 (8-bit M)
|
stz 0xc041 ; INTEN = 0 (clear AN3/mouse/0.25s/VBL/mouse-IRQ enables)
|
||||||
sta 0xc041 ; INTEN = 0 (clear AN3/mouse/0.25s/VBL/mouse-IRQ enables)
|
stz 0xc023 ; VGCINT = 0 (clear external/1-sec/scan-line IRQ enables)
|
||||||
sta 0xc023 ; VGCINT = 0 (clear external/1-sec/scan-line IRQ enables)
|
stz 0xc032 ; SCANINT clear
|
||||||
sta 0xc032 ; SCANINT clear
|
|
||||||
rep #0x20
|
rep #0x20
|
||||||
|
|
||||||
; Top-of-stack at $0FFF. Native-mode S is 16-bit, so we don't need
|
; Top-of-stack at $0FFF. Native-mode S is 16-bit, so we don't need
|
||||||
|
|
@ -58,20 +59,15 @@ __start:
|
||||||
|
|
||||||
; Zero BSS. X iterates from __bss_start to __bss_end; each
|
; Zero BSS. X iterates from __bss_start to __bss_end; each
|
||||||
; iteration writes one byte of zero at addr X (via DP=0 +
|
; iteration writes one byte of zero at addr X (via DP=0 +
|
||||||
; offset 0 — which is just X). Wraps in 8-bit M for the
|
; offset 0 — which is just X). STZ in M=8 stores 1 byte and
|
||||||
; byte-store.
|
; doesn't touch A, so we don't need the LDA #0 prelude.
|
||||||
rep #0x10 ; ensure X is 16-bit
|
rep #0x10 ; ensure X is 16-bit
|
||||||
ldx #__bss_start
|
ldx #__bss_start
|
||||||
.Lbss_loop:
|
.Lbss_loop:
|
||||||
cpx #__bss_end
|
cpx #__bss_end
|
||||||
bcs .Lbss_done ; X >= end -> done
|
bcs .Lbss_done ; X >= end -> done
|
||||||
sep #0x20 ; 8-bit M for 1-byte store
|
sep #0x20 ; 8-bit M for 1-byte store
|
||||||
; llvm-mc doesn't track SEP/REP — `lda #$0` after SEP gets
|
stz 0x0, x ; *(uint8_t *)X = 0 (DP=0)
|
||||||
; encoded as a 3-byte 16-bit immediate, so the CPU reads
|
|
||||||
; `a9 00 00` = LDA #$00 then BRK. Force the 1-byte form
|
|
||||||
; with raw bytes.
|
|
||||||
.byte 0xa9, 0x00 ; lda #$00 (8-bit M imm)
|
|
||||||
sta 0x0, x ; *(uint8_t *)X = 0 (DP=0)
|
|
||||||
rep #0x20
|
rep #0x20
|
||||||
inx
|
inx
|
||||||
bra .Lbss_loop
|
bra .Lbss_loop
|
||||||
|
|
|
||||||
|
|
@ -53,12 +53,14 @@ long atol(const char *s) {
|
||||||
} else if (*s == '+') {
|
} else if (*s == '+') {
|
||||||
s++;
|
s++;
|
||||||
}
|
}
|
||||||
long n = 0;
|
// Parse magnitude as unsigned to avoid signed-overflow UB (e.g.
|
||||||
|
// "-2147483648" — the magnitude 2147483648 doesn't fit in long).
|
||||||
|
unsigned long u = 0;
|
||||||
while (*s >= '0' && *s <= '9') {
|
while (*s >= '0' && *s <= '9') {
|
||||||
n = n * 10 + (*s - '0');
|
u = u * 10 + (unsigned long)(*s - '0');
|
||||||
s++;
|
s++;
|
||||||
}
|
}
|
||||||
return sign < 0 ? -n : n;
|
return sign < 0 ? (long)(0ul - u) : (long)u;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
223
runtime/src/iigsToolbox.s
Normal file
223
runtime/src/iigsToolbox.s
Normal file
|
|
@ -0,0 +1,223 @@
|
||||||
|
; iigsToolbox.s — multi-arg toolbox wrappers that can't be done as
|
||||||
|
; inline asm because the W65816 backend's inline-asm constraints
|
||||||
|
; can't take memory operands.
|
||||||
|
;
|
||||||
|
; C ABI on this target:
|
||||||
|
; - Arg 0 (i16): in A
|
||||||
|
; - Arg 0 (i32): low half in A, high half in X
|
||||||
|
; - Arg N>0 (i16):in stack at (4 + 2*(N-1)), S — args pushed
|
||||||
|
; rightmost-first, JSL adds 3 bytes of retaddr
|
||||||
|
; (4,S = arg1 lo)
|
||||||
|
; - i16 return: A
|
||||||
|
; - i32 return: A (low) + X (high)
|
||||||
|
;
|
||||||
|
; Toolbox calls expect:
|
||||||
|
; - Args on stack in toolbox order (rightmost pushed first), then
|
||||||
|
; a result slot of appropriate width pushed BEFORE the args (so
|
||||||
|
; the result ends up at the highest stack address after pushes).
|
||||||
|
; - Tool number in X.
|
||||||
|
; - JSL $E10000.
|
||||||
|
; - After JSL, pop result then args in reverse.
|
||||||
|
;
|
||||||
|
; All wrappers preserve nothing (toolbox clobbers A, X, Y, P).
|
||||||
|
|
||||||
|
.text
|
||||||
|
.globl TBoxNewHandle
|
||||||
|
.globl TBoxDisposeHandle
|
||||||
|
.globl TBoxQDStartUp
|
||||||
|
.globl TBoxMoveTo
|
||||||
|
.globl TBoxEMStartUp
|
||||||
|
.globl TBoxGetNextEvent
|
||||||
|
.globl TBoxNewWindow
|
||||||
|
.globl TBoxCloseWindow
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; unsigned long TBoxNewHandle(u32 size, u16 userId, u16 attr, u32 addr)
|
||||||
|
; Entry: A = size lo, X = size hi
|
||||||
|
; 4,S = userId, 6,S = attr, 8,S = addr lo, 10,S = addr hi
|
||||||
|
; Tool layout (push order, leftmost=outermost on stack):
|
||||||
|
; [result lo][result hi][size lo][size hi][userId][attr][addr lo][addr hi]
|
||||||
|
; Wait: NewHandle args per Apple GS docs are
|
||||||
|
; (Long blockSize, Word userId, Word attributes, Long memAttr)
|
||||||
|
; pushed leftmost-first, so:
|
||||||
|
; PEA result hi, PEA result lo
|
||||||
|
; PUSH blockSize hi, PUSH blockSize lo (long, lo first then hi? no — let me check)
|
||||||
|
;
|
||||||
|
; Actually GS toolbox push order: each parameter is pushed in
|
||||||
|
; declaration order, low word first then high word for longs.
|
||||||
|
; Result space is pushed FIRST (and is read LAST after the pop
|
||||||
|
; sequence reverses everything). So:
|
||||||
|
; PEA 0 ; result hi
|
||||||
|
; PEA 0 ; result lo
|
||||||
|
; PHA size lo
|
||||||
|
; PHB? no:
|
||||||
|
; per https://www.brutaldeluxe.fr/products/crossdevtools/cadius/
|
||||||
|
; Push order: parameters in order, longs as lo then hi.
|
||||||
|
; For NewHandle(blockSize=Long, userId=Word, attr=Word, memLoc=Long):
|
||||||
|
; pea 0 ; result lo
|
||||||
|
; pea 0 ; result hi
|
||||||
|
; pha ; blockSize lo
|
||||||
|
; phx ; blockSize hi (since size hi is in X)
|
||||||
|
; pha userId
|
||||||
|
; pha attr
|
||||||
|
; pha addrLo
|
||||||
|
; pha addrHi
|
||||||
|
; ldx #$0902 ; jsl $E10000
|
||||||
|
; ; result is now on stack: pop hi then lo into A:X return
|
||||||
|
;
|
||||||
|
; Note: the IIgs toolbox actually expects result space to be HIGHER
|
||||||
|
; on stack (pushed first) so that pops in reverse give result last.
|
||||||
|
; =====================================================================
|
||||||
|
TBoxNewHandle:
|
||||||
|
; Stash size lo (in A) and size hi (in X) before we use the
|
||||||
|
; stack — both must be pushed AFTER the result slot.
|
||||||
|
sta 0xe0 ; size lo to scratch
|
||||||
|
stx 0xe2 ; size hi to scratch
|
||||||
|
|
||||||
|
; Push 4-byte result space (will be popped at end).
|
||||||
|
pea 0 ; result lo
|
||||||
|
pea 0 ; result hi
|
||||||
|
|
||||||
|
; Push blockSize: lo first then hi.
|
||||||
|
lda 0xe0 ; size lo
|
||||||
|
pha
|
||||||
|
lda 0xe2 ; size hi
|
||||||
|
pha
|
||||||
|
|
||||||
|
; Push userId (was at 4,S originally; pushes since added: 4 result + 4 size = 8; +4 for JSL retaddr offset baseline)
|
||||||
|
; Original 4,S; we've pha'd 8 bytes (result+size) on top of retaddr
|
||||||
|
; So userId is now at 4 + 8 = 12,S.
|
||||||
|
lda 12, s ; userId
|
||||||
|
pha
|
||||||
|
|
||||||
|
; attr was at 6,S originally; now at 6 + 8 + 2 (one more pha) = 16,S.
|
||||||
|
lda 16, s ; attr
|
||||||
|
pha
|
||||||
|
|
||||||
|
; addr lo was at 8,S originally; with all our pushes (4 result + 4
|
||||||
|
; size + 2 user + 2 attr = 12), now at 8 + 12 = 20,S.
|
||||||
|
lda 20, s ; addr lo
|
||||||
|
pha
|
||||||
|
|
||||||
|
; addr hi was at 10,S originally; +14 = 24,S.
|
||||||
|
lda 24, s ; addr hi
|
||||||
|
pha
|
||||||
|
|
||||||
|
ldx #0x0902
|
||||||
|
jsl 0xe10000
|
||||||
|
|
||||||
|
; Pop result: hi then lo. Returns u32 in A:X (low in A, hi in X).
|
||||||
|
pla ; result hi
|
||||||
|
tax
|
||||||
|
pla ; result lo → A
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void TBoxDisposeHandle(unsigned long handle)
|
||||||
|
; Entry: A = handle lo, X = handle hi
|
||||||
|
; =====================================================================
|
||||||
|
TBoxDisposeHandle:
|
||||||
|
pha ; handle lo
|
||||||
|
phx ; handle hi
|
||||||
|
ldx #0x1002
|
||||||
|
jsl 0xe10000
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void TBoxQDStartUp(u16 masterSCB, u16 pageSize, u16 userId)
|
||||||
|
; Entry: A = masterSCB, 4,S = pageSize, 6,S = userId
|
||||||
|
; Tool: PEA userId, PEA pageSize, PHA masterSCB, JSL X=$0204
|
||||||
|
; =====================================================================
|
||||||
|
TBoxQDStartUp:
|
||||||
|
sta 0xe0 ; stash masterSCB
|
||||||
|
lda 6, s ; userId (originally 6,S, no pushes yet)
|
||||||
|
pha ; userId pushed; subsequent loads need +2
|
||||||
|
lda 6, s ; pageSize was at 4,S; +2 = 6,S
|
||||||
|
pha
|
||||||
|
lda 0xe0 ; masterSCB
|
||||||
|
pha
|
||||||
|
ldx #0x0204
|
||||||
|
jsl 0xe10000
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void TBoxMoveTo(short h, short v)
|
||||||
|
; Entry: A = h, 4,S = v
|
||||||
|
; =====================================================================
|
||||||
|
TBoxMoveTo:
|
||||||
|
pha ; h
|
||||||
|
lda 6, s ; v (originally 4,S; +2 after pha)
|
||||||
|
pha
|
||||||
|
ldx #0x3A04
|
||||||
|
jsl 0xe10000
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void TBoxEMStartUp(u16 userId)
|
||||||
|
; Entry: A = userId
|
||||||
|
; Default queueSize=0, mouse clamp 0..639 / 0..199
|
||||||
|
; Tool: PEA queueSize, PEA xMin, PEA xMax, PEA yMin, PEA yMax, PHA userId
|
||||||
|
; =====================================================================
|
||||||
|
TBoxEMStartUp:
|
||||||
|
pea 0 ; queueSize = use default
|
||||||
|
pea 0 ; xMin
|
||||||
|
pea 0x27F ; xMax = 639
|
||||||
|
pea 0 ; yMin
|
||||||
|
pea 0xC7 ; yMax = 199
|
||||||
|
pha ; userId (still in A from entry)
|
||||||
|
ldx #0x0206
|
||||||
|
jsl 0xe10000
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; unsigned short TBoxGetNextEvent(u16 eventMask, void *theEvent)
|
||||||
|
; Entry: A = eventMask, 4,S = theEvent
|
||||||
|
; Tool: PHA result(word), PHA eventMask, PHA theEvent, JSL X=$0A06
|
||||||
|
; =====================================================================
|
||||||
|
TBoxGetNextEvent:
|
||||||
|
sta 0xe0 ; stash eventMask
|
||||||
|
pea 0 ; result space (16-bit)
|
||||||
|
lda 0xe0 ; eventMask
|
||||||
|
pha
|
||||||
|
lda 8, s ; theEvent (originally 4,S; +4 after pea+pha)
|
||||||
|
pha
|
||||||
|
ldx #0x0A06
|
||||||
|
jsl 0xe10000
|
||||||
|
pla ; result → A
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void *TBoxNewWindow(const void *paramList)
|
||||||
|
; Entry: A = paramList
|
||||||
|
; Tool: PEA result hi, PEA result lo, PHA paramList, JSL X=$090E
|
||||||
|
; Returns 32-bit window ptr in A:X (low in A, hi in X).
|
||||||
|
; =====================================================================
|
||||||
|
TBoxNewWindow:
|
||||||
|
sta 0xe0 ; stash paramList
|
||||||
|
pea 0 ; result hi
|
||||||
|
pea 0 ; result lo
|
||||||
|
lda 0xe0 ; paramList
|
||||||
|
pha
|
||||||
|
ldx #0x090E
|
||||||
|
jsl 0xe10000
|
||||||
|
pla ; result lo → A
|
||||||
|
plx ; result hi → X
|
||||||
|
rtl
|
||||||
|
|
||||||
|
|
||||||
|
; =====================================================================
|
||||||
|
; void TBoxCloseWindow(void *winPtr)
|
||||||
|
; Entry: A = winPtr lo, X = winPtr hi
|
||||||
|
; =====================================================================
|
||||||
|
TBoxCloseWindow:
|
||||||
|
pha ; winPtr lo
|
||||||
|
phx ; winPtr hi
|
||||||
|
ldx #0x0B0E
|
||||||
|
jsl 0xe10000
|
||||||
|
rtl
|
||||||
|
|
@ -133,15 +133,17 @@ long labs(long n) { return n < 0 ? -n : n; }
|
||||||
|
|
||||||
int atoi(const char *s) {
|
int atoi(const char *s) {
|
||||||
int sign = 1;
|
int sign = 1;
|
||||||
int n = 0;
|
|
||||||
while (isspace(*s)) s++;
|
while (isspace(*s)) s++;
|
||||||
if (*s == '-') { sign = -1; s++; }
|
if (*s == '-') { sign = -1; s++; }
|
||||||
else if (*s == '+') { s++; }
|
else if (*s == '+') { s++; }
|
||||||
|
// Parse magnitude as unsigned to dodge signed-overflow UB on
|
||||||
|
// values like "32768" (parsing INT_MAX+1 as signed int).
|
||||||
|
unsigned int u = 0;
|
||||||
while (isdigit(*s)) {
|
while (isdigit(*s)) {
|
||||||
n = n * 10 + (*s - '0');
|
u = u * 10 + (unsigned int)(*s - '0');
|
||||||
s++;
|
s++;
|
||||||
}
|
}
|
||||||
return sign * n;
|
return sign < 0 ? (int)(0u - u) : (int)u;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -197,7 +199,10 @@ static void writeUDec(unsigned int n) {
|
||||||
}
|
}
|
||||||
|
|
||||||
static void writeDec(int n) {
|
static void writeDec(int n) {
|
||||||
if (n < 0) { putchar('-'); writeUDec((unsigned int)(-n)); }
|
// For INT_MIN, `-n` overflows signed int (UB). Negate as unsigned
|
||||||
|
// — well-defined (two's-complement wrap), and the magnitude is
|
||||||
|
// identical for the print path.
|
||||||
|
if (n < 0) { putchar('-'); writeUDec((unsigned int)(0u - (unsigned int)n)); }
|
||||||
else writeUDec((unsigned int)n);
|
else writeUDec((unsigned int)n);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -211,10 +216,14 @@ static void writeULong(unsigned long n) {
|
||||||
|
|
||||||
static void writeHex(unsigned int n, int width) {
|
static void writeHex(unsigned int n, int width) {
|
||||||
static const char digits[] = "0123456789abcdef";
|
static const char digits[] = "0123456789abcdef";
|
||||||
char buf[5];
|
// unsigned int is 16-bit on this target -> at most 4 hex digits.
|
||||||
|
// Cap width to that; without it `printf("%08x", ...)` blew past
|
||||||
|
// the buf[] tail and corrupted the stack.
|
||||||
|
char buf[4];
|
||||||
|
if (width > 4) width = 4;
|
||||||
int i = 0;
|
int i = 0;
|
||||||
if (n == 0) { buf[i++] = '0'; }
|
if (n == 0) { buf[i++] = '0'; }
|
||||||
while (n > 0) { buf[i++] = digits[n & 0xF]; n >>= 4; }
|
while (n > 0 && i < 4) { buf[i++] = digits[n & 0xF]; n >>= 4; }
|
||||||
while (i < width) buf[i++] = '0';
|
while (i < width) buf[i++] = '0';
|
||||||
while (i > 0) putchar(buf[--i]);
|
while (i > 0) putchar(buf[--i]);
|
||||||
}
|
}
|
||||||
|
|
@ -229,7 +238,8 @@ static void writeStr(const char *s) {
|
||||||
// reliably promotes Bxx to BRL when needed, so the inliner is free to
|
// reliably promotes Bxx to BRL when needed, so the inliner is free to
|
||||||
// merge them when it wants.
|
// merge them when it wants.
|
||||||
static void writeSignedLong(long n) {
|
static void writeSignedLong(long n) {
|
||||||
if (n < 0) { putchar('-'); writeULong((unsigned long)(-n)); }
|
// See writeDec: avoid the signed-overflow UB on LONG_MIN.
|
||||||
|
if (n < 0) { putchar('-'); writeULong(0ul - (unsigned long)n); }
|
||||||
else writeULong((unsigned long)n);
|
else writeULong((unsigned long)n);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -242,7 +252,17 @@ static void writeSignedLong(long n) {
|
||||||
static void writeDouble(double v, int prec) {
|
static void writeDouble(double v, int prec) {
|
||||||
if (prec < 0) prec = 6;
|
if (prec < 0) prec = 6;
|
||||||
if (prec > 9) prec = 9;
|
if (prec > 9) prec = 9;
|
||||||
if (v < 0) { putchar('-'); v = -v; }
|
// Test the IEEE-754 sign bit (so -0.0 prints with the sign per
|
||||||
|
// C99) and avoid the soft-float __ltdf2 comparison, which has
|
||||||
|
// historically miscompiled for negative inputs (see snprintf.c
|
||||||
|
// banner for the same workaround).
|
||||||
|
unsigned long long vbits;
|
||||||
|
__builtin_memcpy(&vbits, &v, 8);
|
||||||
|
if (vbits & ((unsigned long long)1 << 63)) {
|
||||||
|
putchar('-');
|
||||||
|
vbits &= ~((unsigned long long)1 << 63);
|
||||||
|
__builtin_memcpy(&v, &vbits, 8);
|
||||||
|
}
|
||||||
long ipart = (long)v;
|
long ipart = (long)v;
|
||||||
writeULong((unsigned long)ipart);
|
writeULong((unsigned long)ipart);
|
||||||
if (prec == 0) return;
|
if (prec == 0) return;
|
||||||
|
|
@ -398,6 +418,12 @@ static void mallocInitOnce(void) {
|
||||||
void *malloc(size_t n) {
|
void *malloc(size_t n) {
|
||||||
mallocInitOnce();
|
mallocInitOnce();
|
||||||
if (n == 0) n = 1;
|
if (n == 0) n = 1;
|
||||||
|
// Overflow guard: size_t is 16-bit on this target. Without this,
|
||||||
|
// malloc(65535) rounds up to 65536 -> wraps to 0 -> allocates 2
|
||||||
|
// bytes (wrong size); even shorter values can wrap the bumpPtr
|
||||||
|
// sum below. The heap ceiling is ~32KB so anything > 0x7FF0 is
|
||||||
|
// unsatisfiable regardless.
|
||||||
|
if (n > (size_t)0x7FF0) return (void *)0;
|
||||||
n = (n + 1) & ~(size_t)1; // round up to 2 bytes
|
n = (n + 1) & ~(size_t)1; // round up to 2 bytes
|
||||||
if (n < FREE_NODE_SZ - HDR_SZ)
|
if (n < FREE_NODE_SZ - HDR_SZ)
|
||||||
n = FREE_NODE_SZ - HDR_SZ; // ensure freed block can hold next-ptr
|
n = FREE_NODE_SZ - HDR_SZ; // ensure freed block can hold next-ptr
|
||||||
|
|
@ -435,38 +461,57 @@ void free(void *p) {
|
||||||
FreeBlk *blk = (FreeBlk *)((char *)p - HDR_SZ);
|
FreeBlk *blk = (FreeBlk *)((char *)p - HDR_SZ);
|
||||||
blk->next = freeList;
|
blk->next = freeList;
|
||||||
freeList = blk;
|
freeList = blk;
|
||||||
// Coalesce: walk the free list and merge adjacent blocks. O(n^2)
|
// Coalesce: walk the free list and merge adjacent blocks. Outer
|
||||||
// in the worst case but n is small in practice.
|
// loop tracks a's predecessor (a_link) so we can excise `a` when
|
||||||
FreeBlk *a = freeList;
|
// it gets absorbed into a lower-address neighbour. Without that,
|
||||||
|
// an `aEnd == b` from b's perspective (i.e. b precedes a in
|
||||||
|
// memory) would extend b but leave a in the list — a future malloc
|
||||||
|
// could then hand out a's range as a "free" block while the
|
||||||
|
// expanded b overlaps it. O(n^2) in the worst case; n is small.
|
||||||
|
FreeBlk **a_link = &freeList;
|
||||||
|
FreeBlk *a = freeList;
|
||||||
while (a) {
|
while (a) {
|
||||||
|
int a_absorbed = 0;
|
||||||
FreeBlk **link = &a->next;
|
FreeBlk **link = &a->next;
|
||||||
FreeBlk *b = a->next;
|
FreeBlk *b = a->next;
|
||||||
while (b) {
|
while (b) {
|
||||||
char *aEnd = (char *)a + HDR_SZ + a->size;
|
char *aEnd = (char *)a + HDR_SZ + a->size;
|
||||||
char *bEnd = (char *)b + HDR_SZ + b->size;
|
char *bEnd = (char *)b + HDR_SZ + b->size;
|
||||||
if (aEnd == (char *)b) {
|
if (aEnd == (char *)b) {
|
||||||
|
// a immediately precedes b — extend a, drop b.
|
||||||
a->size += HDR_SZ + b->size;
|
a->size += HDR_SZ + b->size;
|
||||||
*link = b->next;
|
*link = b->next;
|
||||||
b = *link;
|
b = *link;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if (bEnd == (char *)a) {
|
if (bEnd == (char *)a) {
|
||||||
|
// b immediately precedes a — extend b, drop a from
|
||||||
|
// the outer list. We can't continue the inner walk
|
||||||
|
// (a is gone), so break out and let the outer loop
|
||||||
|
// restart from a's successor.
|
||||||
b->size += HDR_SZ + a->size;
|
b->size += HDR_SZ + a->size;
|
||||||
// Remove `a` from the list (a is freeList head if first).
|
*a_link = a->next;
|
||||||
// Simpler: relink b in place of a, but a is at top.
|
a_absorbed = 1;
|
||||||
// For correctness, just skip — coalesce on next pass.
|
break;
|
||||||
link = &b->next;
|
|
||||||
b = b->next;
|
|
||||||
continue;
|
|
||||||
}
|
}
|
||||||
link = &b->next;
|
link = &b->next;
|
||||||
b = b->next;
|
b = b->next;
|
||||||
}
|
}
|
||||||
a = a->next;
|
if (a_absorbed) {
|
||||||
|
a = *a_link; // already advanced by the excise
|
||||||
|
} else {
|
||||||
|
a_link = &a->next;
|
||||||
|
a = a->next;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
void *calloc(size_t nmemb, size_t size) {
|
void *calloc(size_t nmemb, size_t size) {
|
||||||
|
// size_t is 16-bit on this target; nmemb*size can overflow and
|
||||||
|
// wrap to a small value (e.g. calloc(65536, 1) -> 0 -> 2-byte
|
||||||
|
// alloc), then the caller writes way past the returned region.
|
||||||
|
// Bail when the multiplication would overflow.
|
||||||
|
if (size != 0 && nmemb > (size_t)0xFFFF / size) return (void *)0;
|
||||||
size_t total = nmemb * size;
|
size_t total = nmemb * size;
|
||||||
void *p = malloc(total);
|
void *p = malloc(total);
|
||||||
if (p) memset(p, 0, total);
|
if (p) memset(p, 0, total);
|
||||||
|
|
@ -485,14 +530,25 @@ void *realloc(void *ptr, size_t n) {
|
||||||
return q;
|
return q;
|
||||||
}
|
}
|
||||||
|
|
||||||
// ---- exit ----
|
// ---- atexit / exit ----
|
||||||
//
|
//
|
||||||
// Standard exit() halts via BRK. Programs running under the IIgs
|
// Standard exit() halts via BRK after running any registered atexit
|
||||||
// runtime typically would call back into GS/OS Quit; here we just
|
// handler. Programs running under the IIgs runtime typically would
|
||||||
// wedge the CPU.
|
// call back into GS/OS Quit; here we just wedge the CPU. Single-slot
|
||||||
|
// atexit (the storage and registration function are below).
|
||||||
|
|
||||||
|
typedef void (*AtexitFn)(void);
|
||||||
|
static AtexitFn __atexitFn = (AtexitFn)0;
|
||||||
|
|
||||||
void exit(int code) {
|
void exit(int code) {
|
||||||
(void)code;
|
(void)code;
|
||||||
|
// C99 7.20.4.3: exit() must invoke registered atexit handlers in
|
||||||
|
// reverse-registration order before terminating.
|
||||||
|
if (__atexitFn) {
|
||||||
|
AtexitFn fn = __atexitFn;
|
||||||
|
__atexitFn = (AtexitFn)0; // prevent re-entry if fn calls exit
|
||||||
|
fn();
|
||||||
|
}
|
||||||
// BRK $00 — halts a 65816 in BRK, MAME's debugger catches.
|
// BRK $00 — halts a 65816 in BRK, MAME's debugger catches.
|
||||||
__asm__ volatile (".byte 0x00, 0x00");
|
__asm__ volatile (".byte 0x00, 0x00");
|
||||||
while (1) {} // unreachable
|
while (1) {} // unreachable
|
||||||
|
|
@ -522,14 +578,38 @@ char *strerror(int err) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// perror — write `prefix: errno-string\n` to stderr. Common pattern in
|
||||||
|
// portable programs that report I/O failures.
|
||||||
|
void perror(const char *prefix) {
|
||||||
|
if (prefix && *prefix) {
|
||||||
|
const char *p = prefix;
|
||||||
|
while (*p) { putchar(*p); p++; }
|
||||||
|
putchar(':');
|
||||||
|
putchar(' ');
|
||||||
|
}
|
||||||
|
const char *m = strerror(errno);
|
||||||
|
while (*m) { putchar(*m); m++; }
|
||||||
|
putchar('\n');
|
||||||
|
}
|
||||||
|
|
||||||
// ---- time.h ----
|
// ---- time.h ----
|
||||||
//
|
//
|
||||||
// W65816/IIgs has no standard clock from C's perspective. Provide
|
// time() and clock() are stubs returning 0. A real implementation
|
||||||
// stubs that return 0 / -1 so code that calls time() at least links.
|
// could either:
|
||||||
// A real implementation would call ReadTimeHex (GS/OS toolbox) or
|
// - Use ReadTimeHex (Misc Tool $0D03) — but this requires the GS
|
||||||
// poll the IIgs real-time clock.
|
// Tool Locator to be initialised (TLStartUp from iigs/toolbox.h)
|
||||||
|
// in the crt0, otherwise the JSL $E10000 dispatcher reads
|
||||||
|
// uninitialised state and crashes. Smoke verified that the
|
||||||
|
// direct toolbox call segfaults MAME without prior init.
|
||||||
|
// - Use the IIgs vertical-blank counter at $00/E1/006B (24-bit
|
||||||
|
// address, needs long-pointer access via inline asm — the C
|
||||||
|
// pointer type is 16-bit on this target, so a literal 0xE1006B
|
||||||
|
// silently truncates to $006B in zero page).
|
||||||
|
//
|
||||||
|
// We leave both as stubs until the runtime has a Tool-Locator-
|
||||||
|
// init crt0 path or proper 24-bit far-pointer support.
|
||||||
|
|
||||||
typedef long time_t;
|
typedef long time_t;
|
||||||
typedef unsigned long clock_t;
|
typedef unsigned long clock_t;
|
||||||
|
|
||||||
time_t time(time_t *t) {
|
time_t time(time_t *t) {
|
||||||
|
|
@ -559,7 +639,14 @@ FILE *stdout = &__stdout_obj;
|
||||||
FILE *stderr = &__stderr_obj;
|
FILE *stderr = &__stderr_obj;
|
||||||
|
|
||||||
int fputc(int c, FILE *stream) { (void)stream; return putchar(c); }
|
int fputc(int c, FILE *stream) { (void)stream; return putchar(c); }
|
||||||
int fputs(const char *s, FILE *stream) { (void)stream; return puts(s); }
|
// fputs writes the string WITHOUT appending a newline (puts does append).
|
||||||
|
// Forwarding to puts() was a real bug — `fputs("hi", stdout)` was
|
||||||
|
// printing "hi\n" instead of "hi".
|
||||||
|
int fputs(const char *s, FILE *stream) {
|
||||||
|
(void)stream;
|
||||||
|
while (*s) { putchar(*s); s++; }
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
int fflush(FILE *stream) { (void)stream; return 0; }
|
int fflush(FILE *stream) { (void)stream; return 0; }
|
||||||
int fclose(FILE *stream) { (void)stream; return 0; }
|
int fclose(FILE *stream) { (void)stream; return 0; }
|
||||||
|
|
||||||
|
|
@ -572,6 +659,11 @@ int fprintf(FILE *stream, const char *fmt, ...) {
|
||||||
return r;
|
return r;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
int vfprintf(FILE *stream, const char *fmt, va_list ap) {
|
||||||
|
(void)stream;
|
||||||
|
return vprintf(fmt, ap);
|
||||||
|
}
|
||||||
|
|
||||||
// ---- assert ----
|
// ---- assert ----
|
||||||
//
|
//
|
||||||
// __assert_fail is what most assert() macros call. Print a message
|
// __assert_fail is what most assert() macros call. Print a message
|
||||||
|
|
@ -589,9 +681,7 @@ void abort(void) {
|
||||||
exit(127);
|
exit(127);
|
||||||
}
|
}
|
||||||
|
|
||||||
// ---- atexit (stub — single slot) ----
|
// ---- atexit (single slot; storage + exit() invocation above) ----
|
||||||
typedef void (*AtexitFn)(void);
|
|
||||||
static AtexitFn __atexitFn = (AtexitFn)0;
|
|
||||||
int atexit(AtexitFn fn) {
|
int atexit(AtexitFn fn) {
|
||||||
if (__atexitFn) return -1;
|
if (__atexitFn) return -1;
|
||||||
__atexitFn = fn;
|
__atexitFn = fn;
|
||||||
|
|
@ -618,7 +708,20 @@ size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream) {
|
||||||
}
|
}
|
||||||
|
|
||||||
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
|
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
|
||||||
(void)ptr; (void)size; (void)nmemb; (void)stream;
|
// For stdout/stderr, route through putchar so programs that use
|
||||||
|
// fwrite for binary output ("write %d bytes to stdout") actually
|
||||||
|
// produce output instead of silently dropping it. For other
|
||||||
|
// streams (real file handles), still a stub returning 0.
|
||||||
|
if (stream == stdout || stream == stderr) {
|
||||||
|
// size * nmemb can overflow size_t (16-bit on this target);
|
||||||
|
// bail rather than silently truncate the byte count.
|
||||||
|
if (size != 0 && nmemb > (size_t)0xFFFF / size) return 0;
|
||||||
|
const u8 *p = (const u8 *)ptr;
|
||||||
|
size_t total = size * nmemb;
|
||||||
|
for (size_t i = 0; i < total; i++) putchar(p[i]);
|
||||||
|
return nmemb;
|
||||||
|
}
|
||||||
|
(void)ptr; (void)size; (void)nmemb;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -179,8 +179,7 @@ __divhi3:
|
||||||
jsr __divmod_setup
|
jsr __divmod_setup
|
||||||
jsr __udivmod_core
|
jsr __udivmod_core
|
||||||
; Quotient is in $ea. Negate if bit 1 of $ee is set.
|
; Quotient is in $ea. Negate if bit 1 of $ee is set.
|
||||||
lda 0xea
|
pei 0xea
|
||||||
pha
|
|
||||||
lda 0xee
|
lda 0xee
|
||||||
and #0x2
|
and #0x2
|
||||||
beq .Ldiv_pos
|
beq .Ldiv_pos
|
||||||
|
|
@ -199,8 +198,7 @@ __modhi3:
|
||||||
jsr __udivmod_core
|
jsr __udivmod_core
|
||||||
; Remainder is in $ec. Negate if bit 0 of $ee is set (dividend
|
; Remainder is in $ec. Negate if bit 0 of $ee is set (dividend
|
||||||
; was negative).
|
; was negative).
|
||||||
lda 0xec
|
pei 0xec
|
||||||
pha
|
|
||||||
lda 0xee
|
lda 0xee
|
||||||
and #0x1
|
and #0x1
|
||||||
beq .Lmod_pos
|
beq .Lmod_pos
|
||||||
|
|
@ -1131,10 +1129,9 @@ __negdi_b:
|
||||||
; setjmp returned 0 with all-callee-savable regs already preserved by
|
; setjmp returned 0 with all-callee-savable regs already preserved by
|
||||||
; setjmp's caller.
|
; setjmp's caller.
|
||||||
; --------------------------------------------------------------------
|
; --------------------------------------------------------------------
|
||||||
; NOTE: llvm-mc misencodes `sta (dp), y` and `lda (dp), y` as the
|
; setjmp / longjmp use the (dp),y indirect mode (opcodes 0x91/0xb1)
|
||||||
; absolute-,Y opcodes (0x99 / 0xb9) instead of the DP-indirect-Y
|
; to write through the jmp_buf pointer in $E0. Y is set explicitly
|
||||||
; opcodes (0x91 / 0xb1). Use raw `.byte` for those. Y is supplied
|
; before each indirect access; M=0 except where noted.
|
||||||
; via LDY before each indirect access.
|
|
||||||
.globl setjmp
|
.globl setjmp
|
||||||
setjmp:
|
setjmp:
|
||||||
sta 0xe0 ; jmp_buf addr -> DP scratch
|
sta 0xe0 ; jmp_buf addr -> DP scratch
|
||||||
|
|
|
||||||
|
|
@ -142,11 +142,13 @@ float fmodf(float x, float y) {
|
||||||
double sqrt(double x) {
|
double sqrt(double x) {
|
||||||
uint64_t b;
|
uint64_t b;
|
||||||
__builtin_memcpy(&b, &x, sizeof(b));
|
__builtin_memcpy(&b, &x, sizeof(b));
|
||||||
if (b & ((uint64_t)1 << 63)) {
|
// Check zero first (positive or negative) — IEEE-754 says
|
||||||
return 0.0 / 0.0; // NaN for negatives (well, -0.0 returns 0)
|
// sqrt(+0)=+0 and sqrt(-0)=-0; both lower 63 bits are zero.
|
||||||
|
if ((b & ~((uint64_t)1 << 63)) == 0) {
|
||||||
|
return x;
|
||||||
}
|
}
|
||||||
if (b == 0) {
|
if (b & ((uint64_t)1 << 63)) {
|
||||||
return 0.0;
|
return 0.0 / 0.0; // NaN for negatives
|
||||||
}
|
}
|
||||||
// Initial guess: halve the exponent. IEEE-754 trick gives a
|
// Initial guess: halve the exponent. IEEE-754 trick gives a
|
||||||
// surprisingly good starting point — within 2x of the true value.
|
// surprisingly good starting point — within 2x of the true value.
|
||||||
|
|
@ -188,12 +190,16 @@ double pow(double x, double y) {
|
||||||
return 0.0; // non-integer, non-0.5 y not supported yet
|
return 0.0; // non-integer, non-0.5 y not supported yet
|
||||||
}
|
}
|
||||||
// y is a whole number; convert via __fixdfsi. Range -32768..32767
|
// y is a whole number; convert via __fixdfsi. Range -32768..32767
|
||||||
// covers any practical exponent.
|
// covers any practical exponent. Use unsigned for the magnitude
|
||||||
int n = (int)yi;
|
// to avoid signed-overflow UB on INT_MIN.
|
||||||
|
int sn = (int)yi;
|
||||||
int neg = 0;
|
int neg = 0;
|
||||||
if (n < 0) {
|
unsigned int n;
|
||||||
|
if (sn < 0) {
|
||||||
neg = 1;
|
neg = 1;
|
||||||
n = -n;
|
n = 0u - (unsigned int)sn;
|
||||||
|
} else {
|
||||||
|
n = (unsigned int)sn;
|
||||||
}
|
}
|
||||||
double r = 1.0;
|
double r = 1.0;
|
||||||
double base = x;
|
double base = x;
|
||||||
|
|
@ -268,6 +274,15 @@ double cos(double x) {
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// tan(x) = sin(x) / cos(x). No special handling for poles at pi/2
|
||||||
|
// + n*pi (where cos(x) == 0): the soft-double divide returns +/-Inf,
|
||||||
|
// which is the IEEE-754-correct answer. Accuracy follows sin/cos
|
||||||
|
// (~1e-6) but degrades fast as |x| approaches a pole.
|
||||||
|
double tan(double x) {
|
||||||
|
return sin(x) / cos(x);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
float sinf(float x) {
|
float sinf(float x) {
|
||||||
return (float)sin((double)x);
|
return (float)sin((double)x);
|
||||||
}
|
}
|
||||||
|
|
@ -278,6 +293,11 @@ float cosf(float x) {
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
float tanf(float x) {
|
||||||
|
return (float)tan((double)x);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
// exp via 2^k * e^r where x = k*ln2 + r, |r| < ln2/2. Then Taylor
|
// exp via 2^k * e^r where x = k*ln2 + r, |r| < ln2/2. Then Taylor
|
||||||
// series for e^r converges in ~10 terms. k * 2 multiplication uses
|
// series for e^r converges in ~10 terms. k * 2 multiplication uses
|
||||||
// the IEEE-754 layout (add k to exponent field).
|
// the IEEE-754 layout (add k to exponent field).
|
||||||
|
|
@ -321,8 +341,13 @@ float expf(float x) {
|
||||||
double log(double x) {
|
double log(double x) {
|
||||||
uint64_t b;
|
uint64_t b;
|
||||||
__builtin_memcpy(&b, &x, sizeof(b));
|
__builtin_memcpy(&b, &x, sizeof(b));
|
||||||
if (b == 0 || (b & ((uint64_t)1 << 63))) {
|
// log(±0) = -Infinity (pole error). Mask off the sign bit when
|
||||||
return 0.0 / 0.0; // log(0) = -inf, log(neg) = NaN; return NaN
|
// testing for zero so -0.0 lands here instead of the negative path.
|
||||||
|
if ((b & ~((uint64_t)1 << 63)) == 0) {
|
||||||
|
return -1.0 / 0.0;
|
||||||
|
}
|
||||||
|
if (b & ((uint64_t)1 << 63)) {
|
||||||
|
return 0.0 / 0.0; // log(negative) = NaN (domain error)
|
||||||
}
|
}
|
||||||
int e = (int)((b >> 52) & 0x7FF) - 1023;
|
int e = (int)((b >> 52) & 0x7FF) - 1023;
|
||||||
// Force the exponent field to 1023 so m lands in [1, 2).
|
// Force the exponent field to 1023 so m lands in [1, 2).
|
||||||
|
|
|
||||||
|
|
@ -2,11 +2,11 @@
|
||||||
// and the byte-swap inner loop don't perturb other libc code.
|
// and the byte-swap inner loop don't perturb other libc code.
|
||||||
//
|
//
|
||||||
// qsort uses insertion sort (O(n^2)) rather than recursion-driven
|
// qsort uses insertion sort (O(n^2)) rather than recursion-driven
|
||||||
// quicksort; the W65816 backend's greedy regalloc still mis-orders
|
// quicksort. Originally chosen because the W65816 greedy regalloc
|
||||||
// spills in iterative quicksort with if/else recursion (#70), and
|
// mis-ordered spills in iterative quicksort (#70 — since fixed by a
|
||||||
// for the small arrays this runtime targets (typical IIgs C
|
// W65816StackSlotCleanup safety check), but kept because the typical
|
||||||
// program: dozens of items, not thousands) the constant-factor win
|
// IIgs C program sorts dozens of items, not thousands, and the
|
||||||
// of insertion sort over recursive quicksort is meaningful.
|
// constant-factor win of insertion sort dominates at that scale.
|
||||||
|
|
||||||
typedef unsigned int size_t;
|
typedef unsigned int size_t;
|
||||||
typedef int (*CmpFnT)(const void *, const void *);
|
typedef int (*CmpFnT)(const void *, const void *);
|
||||||
|
|
|
||||||
|
|
@ -92,9 +92,10 @@ static void emitUDec(unsigned int n) {
|
||||||
|
|
||||||
__attribute__((noinline))
|
__attribute__((noinline))
|
||||||
static void emitDec(int n) {
|
static void emitDec(int n) {
|
||||||
|
// -n on INT_MIN is signed-overflow UB; negate as unsigned.
|
||||||
if (n < 0) {
|
if (n < 0) {
|
||||||
emit('-');
|
emit('-');
|
||||||
emitUDec((unsigned int)(-n));
|
emitUDec(0u - (unsigned int)n);
|
||||||
} else {
|
} else {
|
||||||
emitUDec((unsigned int)n);
|
emitUDec((unsigned int)n);
|
||||||
}
|
}
|
||||||
|
|
@ -123,9 +124,10 @@ static void emitULong(unsigned long n) {
|
||||||
|
|
||||||
__attribute__((noinline))
|
__attribute__((noinline))
|
||||||
static void emitSignedLong(long n) {
|
static void emitSignedLong(long n) {
|
||||||
|
// See emitDec: avoid the signed-overflow UB on LONG_MIN.
|
||||||
if (n < 0) {
|
if (n < 0) {
|
||||||
emit('-');
|
emit('-');
|
||||||
emitULong((unsigned long)(-n));
|
emitULong(0ul - (unsigned long)n);
|
||||||
} else {
|
} else {
|
||||||
emitULong((unsigned long)n);
|
emitULong((unsigned long)n);
|
||||||
}
|
}
|
||||||
|
|
@ -135,12 +137,16 @@ static void emitSignedLong(long n) {
|
||||||
__attribute__((noinline))
|
__attribute__((noinline))
|
||||||
static void emitHex(unsigned int n, int width) {
|
static void emitHex(unsigned int n, int width) {
|
||||||
static const char digits[] = "0123456789abcdef";
|
static const char digits[] = "0123456789abcdef";
|
||||||
char buf[5];
|
// unsigned int is 16-bit on this target -> at most 4 hex digits.
|
||||||
|
// Cap width to that; without it `snprintf("%08x", ...)` blew past
|
||||||
|
// the buf[] tail and corrupted the stack.
|
||||||
|
char buf[4];
|
||||||
|
if (width > 4) width = 4;
|
||||||
int i = 0;
|
int i = 0;
|
||||||
if (n == 0) {
|
if (n == 0) {
|
||||||
buf[i++] = '0';
|
buf[i++] = '0';
|
||||||
}
|
}
|
||||||
while (n > 0) {
|
while (n > 0 && i < 4) {
|
||||||
buf[i++] = digits[n & 0xF];
|
buf[i++] = digits[n & 0xF];
|
||||||
n >>= 4;
|
n >>= 4;
|
||||||
}
|
}
|
||||||
|
|
@ -278,6 +284,11 @@ static int format(const char *fmt, va_list ap) {
|
||||||
if (gCur < gEnd) {
|
if (gCur < gEnd) {
|
||||||
*gCur = '\0';
|
*gCur = '\0';
|
||||||
} else if (gEnd > (char *)0) {
|
} else if (gEnd > (char *)0) {
|
||||||
|
// Truncated, but n > 0: overwrite the last byte with NUL so
|
||||||
|
// the result is a valid C string. snprintf with n=0 sets
|
||||||
|
// gEnd = NULL up front so this branch correctly skips —
|
||||||
|
// previously it wrote `gEnd[-1]` to `buf[-1]`, clobbering
|
||||||
|
// memory before the buffer.
|
||||||
gEnd[-1] = '\0';
|
gEnd[-1] = '\0';
|
||||||
}
|
}
|
||||||
return (int)gTotal;
|
return (int)gTotal;
|
||||||
|
|
@ -286,7 +297,10 @@ static int format(const char *fmt, va_list ap) {
|
||||||
|
|
||||||
int snprintf(char *buf, size_t n, const char *fmt, ...) {
|
int snprintf(char *buf, size_t n, const char *fmt, ...) {
|
||||||
gCur = buf;
|
gCur = buf;
|
||||||
gEnd = buf + (n ? n : 0);
|
// n == 0 must NOT touch the buffer (C99 7.19.6.5). Setting
|
||||||
|
// gEnd = NULL here makes both `gCur < gEnd` and `gEnd > 0`
|
||||||
|
// false, so no NUL terminator gets written.
|
||||||
|
gEnd = n ? buf + n : (char *)0;
|
||||||
gTotal = 0;
|
gTotal = 0;
|
||||||
va_list ap;
|
va_list ap;
|
||||||
va_start(ap, fmt);
|
va_start(ap, fmt);
|
||||||
|
|
@ -315,7 +329,7 @@ int sprintf(char *buf, const char *fmt, ...) {
|
||||||
|
|
||||||
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap) {
|
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap) {
|
||||||
gCur = buf;
|
gCur = buf;
|
||||||
gEnd = buf + (n ? n : 0);
|
gEnd = n ? buf + n : (char *)0;
|
||||||
gTotal = 0;
|
gTotal = 0;
|
||||||
return format(fmt, ap);
|
return format(fmt, ap);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -43,11 +43,12 @@ __attribute__((noinline)) static u64 dpack(u64 sign, s16 exp, u64 mant) {
|
||||||
|
|
||||||
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
|
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
|
||||||
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
|
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
|
||||||
// Inlinable on purpose — out_sign/out_exp/out_mant point at caller
|
// noinline reduces register pressure in __muldf3/__divdf3/__adddf3
|
||||||
// stack locals; if dclass were noinline the writes would lower to
|
// — without it, greedy regalloc runs out of registers in __muldf3
|
||||||
// `sta (d,s),y` which uses DBR for the bank, silently corrupting
|
// at -O2. Now safe because pointer-arg writes lower to STBptr/STAptr
|
||||||
// data when the caller has switched DBR. Caught by smoke's
|
// which use [$E0],Y indirect-long with the bank byte forced to 0
|
||||||
// dmul-after-bank-switch test (#dmul-bank-switch).
|
// (DBR-independent). See `feedback_dbr_ptr_deref_spill.md`.
|
||||||
|
__attribute__((noinline))
|
||||||
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
|
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
|
||||||
*out_sign = x & DSIGN_BIT;
|
*out_sign = x & DSIGN_BIT;
|
||||||
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);
|
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);
|
||||||
|
|
|
||||||
|
|
@ -1,91 +0,0 @@
|
||||||
; Stub double-precision soft-float — every routine returns 0.
|
|
||||||
;
|
|
||||||
; The C-based softDouble.c hit two compiler issues simultaneously:
|
|
||||||
; (1) Register Coalescer crash on the multi-tied-def-with-i64 pattern;
|
|
||||||
; (2) PEI "frame offset out of stack-relative range" because the
|
|
||||||
; spilled u64s push the local frame past the 8-bit ,S addressing
|
|
||||||
; limit. Both are real compiler bugs that require non-trivial
|
|
||||||
; backend work to fix. Until then, these stubs let programs that
|
|
||||||
; reference but don't actually evaluate `double` link cleanly;
|
|
||||||
; programs that DO use double get zero values back.
|
|
||||||
;
|
|
||||||
; Symbol set matches what clang's i64-routed double libcalls expect.
|
|
||||||
; ABI: i64 result returned via A:X:Y:DP[$F0] (matches LowerReturn).
|
|
||||||
|
|
||||||
.text
|
|
||||||
|
|
||||||
; Helper macro idiom: stub returning 64-bit zero.
|
|
||||||
.macro RET_ZERO64
|
|
||||||
lda #0
|
|
||||||
tax
|
|
||||||
tay
|
|
||||||
sta 0xf0
|
|
||||||
rtl
|
|
||||||
.endm
|
|
||||||
|
|
||||||
.globl __adddf3
|
|
||||||
__adddf3: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __subdf3
|
|
||||||
__subdf3: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __muldf3
|
|
||||||
__muldf3: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __divdf3
|
|
||||||
__divdf3: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __negdf2
|
|
||||||
__negdf2: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __cmpdf2
|
|
||||||
__cmpdf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __eqdf2
|
|
||||||
__eqdf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __nedf2
|
|
||||||
__nedf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __ltdf2
|
|
||||||
__ltdf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __gtdf2
|
|
||||||
__gtdf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __ledf2
|
|
||||||
__ledf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __gedf2
|
|
||||||
__gedf2: lda #0
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __floatsidf
|
|
||||||
__floatsidf: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __floatunsidf
|
|
||||||
__floatunsidf: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __fixdfsi
|
|
||||||
__fixdfsi: lda #0
|
|
||||||
tax
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __fixunsdfsi
|
|
||||||
__fixunsdfsi: lda #0
|
|
||||||
tax
|
|
||||||
rtl
|
|
||||||
|
|
||||||
.globl __extendsfdf2
|
|
||||||
__extendsfdf2: RET_ZERO64
|
|
||||||
|
|
||||||
.globl __truncdfsf2
|
|
||||||
__truncdfsf2: lda #0
|
|
||||||
tax
|
|
||||||
rtl
|
|
||||||
|
|
@ -40,7 +40,8 @@ unsigned long strtoul(const char *nptr, char **endptr, int base) {
|
||||||
s++;
|
s++;
|
||||||
}
|
}
|
||||||
if (endptr) *endptr = (char *)(saw_digit ? s : nptr);
|
if (endptr) *endptr = (char *)(saw_digit ? s : nptr);
|
||||||
return neg ? (unsigned long)-(long)n : n;
|
// Negate in unsigned arithmetic to avoid signed-overflow UB.
|
||||||
|
return neg ? (0ul - n) : n;
|
||||||
}
|
}
|
||||||
|
|
||||||
long strtol(const char *nptr, char **endptr, int base) {
|
long strtol(const char *nptr, char **endptr, int base) {
|
||||||
|
|
@ -55,5 +56,7 @@ long strtol(const char *nptr, char **endptr, int base) {
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
if (endptr) *endptr = ep;
|
if (endptr) *endptr = ep;
|
||||||
return neg ? -(long)n : (long)n;
|
// Negate as unsigned to avoid signed-overflow UB on LONG_MIN
|
||||||
|
// ("-2147483648" — the magnitude doesn't fit in long).
|
||||||
|
return neg ? (long)(0ul - n) : (long)n;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -63,7 +63,17 @@ emu.register_frame_done(function()
|
||||||
-- apple2gs CPU model doesn't honor a Lua-side PB!=0 set.
|
-- apple2gs CPU model doesn't honor a Lua-side PB!=0 set.
|
||||||
-- The user's code can switch DBR to bank 2+ for safe data
|
-- The user's code can switch DBR to bank 2+ for safe data
|
||||||
-- writes (bank 2 is clear of IIgs ROM IRQ scribbling).
|
-- writes (bank 2 is clear of IIgs ROM IRQ scribbling).
|
||||||
for i = 1, #data do mem:write_u8(0x001000 + i - 1, data:byte(i)) end
|
-- Skip writes that would land in the IIgs IO window
|
||||||
|
-- (\$C000-\$CFFF). link816 may pad this range with zeros
|
||||||
|
-- when rodata auto-skips it, and writing zeros into soft
|
||||||
|
-- switches could clobber IO state (e.g., the LC1 RAM enable
|
||||||
|
-- that crt0 sets up).
|
||||||
|
for i = 1, #data do
|
||||||
|
local addr = 0x001000 + i - 1
|
||||||
|
if not (addr >= 0x00C000 and addr < 0x00D000) then
|
||||||
|
mem:write_u8(addr, data:byte(i))
|
||||||
|
end
|
||||||
|
end
|
||||||
loaded = true
|
loaded = true
|
||||||
cpu.state["PC"].value = 0x1000
|
cpu.state["PC"].value = 0x1000
|
||||||
cpu.state["PB"].value = 0x00
|
cpu.state["PB"].value = 0x00
|
||||||
|
|
|
||||||
|
|
@ -294,11 +294,14 @@ EOF
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# 11a. SETCC via clang: a > b returns 0/1. Exercises the multi-branch
|
# 11a. SETCC via clang: a > b returns 0/1. Signed compares now go
|
||||||
# CC path (BEQ + BPL diamond, since SETGT can't be a single Bxx).
|
# through the EOR-with-sign-bit transform: each operand XORs $8000
|
||||||
|
# to convert signed-int ordering to unsigned-int ordering, then
|
||||||
|
# uses BCC/BCS — avoids BMI/BPL's V-flag-overflow bug for values
|
||||||
|
# near INT16_MIN/MAX.
|
||||||
CLANG="$BUILD_DIR/bin/clang"
|
CLANG="$BUILD_DIR/bin/clang"
|
||||||
if [ -x "$CLANG" ]; then
|
if [ -x "$CLANG" ]; then
|
||||||
log "check: clang compiles a > b via multi-branch SETCC"
|
log "check: clang compiles a > b via EOR-sign-bit + unsigned compare"
|
||||||
cFile="$(mktemp --suffix=.c)"
|
cFile="$(mktemp --suffix=.c)"
|
||||||
sCmpFile="$(mktemp --suffix=.s)"
|
sCmpFile="$(mktemp --suffix=.s)"
|
||||||
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile"' EXIT
|
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile"' EXIT
|
||||||
|
|
@ -306,18 +309,20 @@ if [ -x "$CLANG" ]; then
|
||||||
int gt(int a, int b) { return a > b; }
|
int gt(int a, int b) { return a > b; }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -S "$cFile" -o "$sCmpFile"
|
"$CLANG" --target=w65816 -O2 -S "$cFile" -o "$sCmpFile"
|
||||||
# Expect a stack-relative CMP (offset depends on current spill
|
# Expect: EOR #$8000 on each operand, CMP, then BCC/BCS on the
|
||||||
# behaviour — fast regalloc adds 2 PHA prologue bytes vs greedy
|
# carry from the unsigned compare. The 0/1 result is materialised
|
||||||
# which had no frame; either is acceptable as long as we cmp
|
# via lda #0/lda #1 in the diamond.
|
||||||
# against b through a stack-relative slot), then BEQ + BPL forming
|
for expect in "eor #0x8000" "lda #0x1" "lda #0x0"; do
|
||||||
# the multi-branch diamond.
|
|
||||||
for expect in "lda #0x1" "beq" "bpl" "lda #0x0"; do
|
|
||||||
if ! grep -qF "$expect" "$sCmpFile"; then
|
if ! grep -qF "$expect" "$sCmpFile"; then
|
||||||
warn "setcc gt test missing: $expect"
|
warn "setcc gt test missing: $expect"
|
||||||
cat "$sCmpFile" >&2
|
cat "$sCmpFile" >&2
|
||||||
die "setcc gt test failed"
|
die "setcc gt test failed"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
if ! grep -qE '^\s*(bcc|bcs)\b' "$sCmpFile"; then
|
||||||
|
cat "$sCmpFile" >&2
|
||||||
|
die "setcc gt test missing: bcc/bcs (carry-based unsigned branch)"
|
||||||
|
fi
|
||||||
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+,\s*s\s*$' "$sCmpFile"; then
|
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+,\s*s\s*$' "$sCmpFile"; then
|
||||||
cat "$sCmpFile" >&2
|
cat "$sCmpFile" >&2
|
||||||
die "setcc gt test missing: cmp <off>,s (stack-relative compare to arg b)"
|
die "setcc gt test missing: cmp <off>,s (stack-relative compare to arg b)"
|
||||||
|
|
@ -411,24 +416,38 @@ EOF
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# 11f. Pointer deref: *p loads via stack-relative-indirect-Y.
|
# 11f. Pointer deref: *p uses [dp],Y indirect-long (`LDA [$E0],Y`)
|
||||||
|
# which is DBR-independent. The previous lowering used (slot,S),Y
|
||||||
|
# indirect which silently wrote to DBR's bank — a real miscompile
|
||||||
|
# when the caller had switched DBR via `pha;plb`. The new lowering
|
||||||
|
# stages the pointer in DP scratch $E0..$E2 with the bank byte
|
||||||
|
# forced to 0, then loads/stores via [dp],Y — always bank 0.
|
||||||
|
# Const-int pointers (MMIO style) keep DBR-relative addressing via
|
||||||
|
# STAabs (separate TableGen pattern).
|
||||||
if [ -x "$CLANG" ]; then
|
if [ -x "$CLANG" ]; then
|
||||||
log "check: clang compiles *p via LDA (slot,s),y"
|
log "check: clang compiles *p via [dp],Y indirect-long (DBR-independent)"
|
||||||
cFile6="$(mktemp --suffix=.c)"
|
cFile6="$(mktemp --suffix=.c)"
|
||||||
sPtrFile="$(mktemp --suffix=.s)"
|
sPtrFile="$(mktemp --suffix=.s)"
|
||||||
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile" "$cFile2" "$sSelFile" "$cFile3" "$sChainFile" "$cFile4" "$sMulFile" "$cFile5" "$sShfFile" "$cFile6" "$sPtrFile"' EXIT
|
oPtrFile="$(mktemp --suffix=.o)"
|
||||||
|
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile" "$cFile2" "$sSelFile" "$cFile3" "$sChainFile" "$cFile4" "$sMulFile" "$cFile5" "$sShfFile" "$cFile6" "$sPtrFile" "$oPtrFile"' EXIT
|
||||||
cat > "$cFile6" <<'EOF'
|
cat > "$cFile6" <<'EOF'
|
||||||
int load_ptr(const int *p) { return *p; }
|
int load_ptr(const int *p) { return *p; }
|
||||||
void store_ptr(int *p, int v) { *p = v; }
|
void store_ptr(int *p, int v) { *p = v; }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -S "$cFile6" -o "$sPtrFile"
|
"$CLANG" --target=w65816 -O2 -c "$cFile6" -o "$oPtrFile"
|
||||||
for expect in "ldy #0x0" "lda (0x" "sta (0x"; do
|
# LDA [dp],Y = 0xB7; STA [dp],Y = 0x97 (followed by the dp byte 0xE0).
|
||||||
if ! grep -qF "$expect" "$sPtrFile"; then
|
if ! "$OBJDUMP" --triple=w65816 -d "$oPtrFile" 2>/dev/null \
|
||||||
warn "ptr-deref test missing: $expect"
|
| grep -qE '\b97 e0\b'; then
|
||||||
cat "$sPtrFile" >&2
|
warn "ptr-deref test: STA [dp],Y (0x97 0xE0) missing in store_ptr"
|
||||||
die "ptr-deref test failed"
|
"$OBJDUMP" --triple=w65816 -d "$oPtrFile" >&2
|
||||||
fi
|
die "ptr-deref test failed (STA [dp],Y expected)"
|
||||||
done
|
fi
|
||||||
|
if ! "$OBJDUMP" --triple=w65816 -d "$oPtrFile" 2>/dev/null \
|
||||||
|
| grep -qE '\bb7 e0\b'; then
|
||||||
|
warn "ptr-deref test: LDA [dp],Y (0xB7 0xE0) missing in load_ptr"
|
||||||
|
"$OBJDUMP" --triple=w65816 -d "$oPtrFile" >&2
|
||||||
|
die "ptr-deref test failed (LDA [dp],Y expected)"
|
||||||
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# 11g. i8 store via pointer: *p = v wraps the STA in SEP/REP so only
|
# 11g. i8 store via pointer: *p = v wraps the STA in SEP/REP so only
|
||||||
|
|
@ -444,10 +463,11 @@ void storeb(unsigned char *p, unsigned char v) { *p = v; }
|
||||||
unsigned char incb(unsigned char *p) { return ++*p; }
|
unsigned char incb(unsigned char *p) { return ++*p; }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -S "$cFile7" -o "$sBptrFile"
|
"$CLANG" --target=w65816 -O2 -S "$cFile7" -o "$sBptrFile"
|
||||||
# storeb body should contain SEP #$20 ... STA (slot,s),y ... REP #$20.
|
# storeb body should contain SEP #$20 ... STA [$E0],Y ... REP #$20.
|
||||||
|
# The STA uses [dp],Y indirect-long addressing (DBR-independent).
|
||||||
if ! grep -qF "sep #0x20" "$sBptrFile" \
|
if ! grep -qF "sep #0x20" "$sBptrFile" \
|
||||||
|| ! grep -qF "rep #0x20" "$sBptrFile" \
|
|| ! grep -qF "rep #0x20" "$sBptrFile" \
|
||||||
|| ! grep -qE 'sta \(0x[0-9a-f]+, s\), y' "$sBptrFile"; then
|
|| ! grep -qE 'sta \[0xe0\b' "$sBptrFile"; then
|
||||||
cat "$sBptrFile" >&2
|
cat "$sBptrFile" >&2
|
||||||
die "i8 ptr-store test missing SEP/STA/REP sequence"
|
die "i8 ptr-store test missing SEP/STA/REP sequence"
|
||||||
fi
|
fi
|
||||||
|
|
@ -1125,8 +1145,12 @@ EOF
|
||||||
"$CLANG" --target=w65816 -O2 -c "$cLinkFile" -o "$oLinkFile"
|
"$CLANG" --target=w65816 -O2 -c "$cLinkFile" -o "$oLinkFile"
|
||||||
"$BUILD_DIR/bin/llvm-mc" -arch=w65816 -filetype=obj \
|
"$BUILD_DIR/bin/llvm-mc" -arch=w65816 -filetype=obj \
|
||||||
"$PROJECT_ROOT/runtime/src/libgcc.s" -o "$oLibgccFile"
|
"$PROJECT_ROOT/runtime/src/libgcc.s" -o "$oLibgccFile"
|
||||||
|
# No main in this test (it's just a library object); use
|
||||||
|
# --no-gc-sections so the linker keeps `mul` and the libgcc
|
||||||
|
# __mulhi3 it references. With gc-sections (the default),
|
||||||
|
# there's no live root and everything would drop.
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binLinkFile" \
|
"$PROJECT_ROOT/tools/link816" -o "$binLinkFile" \
|
||||||
--text-base 0x8000 --map "$mapLinkFile" \
|
--text-base 0x8000 --map "$mapLinkFile" --no-gc-sections \
|
||||||
"$oLinkFile" "$oLibgccFile" 2>/dev/null
|
"$oLinkFile" "$oLibgccFile" 2>/dev/null
|
||||||
if [ ! -s "$binLinkFile" ]; then
|
if [ ! -s "$binLinkFile" ]; then
|
||||||
die "link816 produced empty/missing binary"
|
die "link816 produced empty/missing binary"
|
||||||
|
|
@ -1176,8 +1200,10 @@ EOF
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cFltFile" -o "$oFltFile"
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cFltFile" -o "$oFltFile"
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfFile"
|
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfFile"
|
||||||
|
# No main here either (test compiles a .o-only "soft-float lib" link).
|
||||||
|
# --no-gc-sections so all soft-float symbols stay.
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binFltFile" \
|
"$PROJECT_ROOT/tools/link816" -o "$binFltFile" \
|
||||||
--text-base 0x8000 --map "$mapFltFile" \
|
--text-base 0x8000 --map "$mapFltFile" --no-gc-sections \
|
||||||
"$oFltFile" "$oSfFile" "$oLibgccFile" 2>/dev/null
|
"$oFltFile" "$oSfFile" "$oLibgccFile" 2>/dev/null
|
||||||
if [ ! -s "$binFltFile" ]; then
|
if [ ! -s "$binFltFile" ]; then
|
||||||
die "soft-float runtime failed to link"
|
die "soft-float runtime failed to link"
|
||||||
|
|
@ -1214,10 +1240,10 @@ int toInt(double x) { return (int)x; }
|
||||||
double fromInt(int n) { return (double)n; }
|
double fromInt(int n) { return (double)n; }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
|
||||||
"$CLANG" --target=w65816 -O1 -ffunction-sections \
|
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
|
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
|
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
|
||||||
--text-base 0x8000 --map "$mapDblFile" \
|
--text-base 0x8000 --map "$mapDblFile" --no-gc-sections \
|
||||||
"$oDblFile" "$oSdFile" "$oLibgccFile" 2>/dev/null
|
"$oDblFile" "$oSdFile" "$oLibgccFile" 2>/dev/null
|
||||||
if [ ! -s "$binDblFile" ]; then
|
if [ ! -s "$binDblFile" ]; then
|
||||||
die "soft-double runtime failed to link"
|
die "soft-double runtime failed to link"
|
||||||
|
|
@ -1411,7 +1437,7 @@ int main(void) {
|
||||||
}
|
}
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblMame" -o "$oDblMame"
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblMame" -o "$oDblMame"
|
||||||
"$CLANG" --target=w65816 -O1 -ffunction-sections \
|
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdMame"
|
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdMame"
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binDblMame" \
|
"$PROJECT_ROOT/tools/link816" -o "$binDblMame" \
|
||||||
--text-base 0x1000 \
|
--text-base 0x1000 \
|
||||||
|
|
@ -1550,7 +1576,7 @@ EOF
|
||||||
-c "$PROJECT_ROOT/runtime/src/math.c" -o "$oMathF"
|
-c "$PROJECT_ROOT/runtime/src/math.c" -o "$oMathF"
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF"
|
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF"
|
||||||
"$CLANG" --target=w65816 -O1 -ffunction-sections \
|
"$CLANG" --target=w65816 -O2 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdF"
|
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdF"
|
||||||
oCrt0F="$(mktemp --suffix=.o)"
|
oCrt0F="$(mktemp --suffix=.o)"
|
||||||
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc" -arch=w65816 \
|
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc" -arch=w65816 \
|
||||||
|
|
@ -2294,6 +2320,15 @@ int main(void) {
|
||||||
if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
|
if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
|
||||||
r = sprintf(buf, "[%c%c%%]", 'A', 'B');
|
r = sprintf(buf, "[%c%c%%]", 'A', 'B');
|
||||||
if (r == 5 && eq(buf, "[AB%]")) ok |= 0x20;
|
if (r == 5 && eq(buf, "[AB%]")) ok |= 0x20;
|
||||||
|
/* C99: snprintf(buf, 0, ...) must NOT touch buf and must return
|
||||||
|
the would-be-written length. Sentinel-fill the buffer and
|
||||||
|
verify the byte just BEFORE buf survives — earlier bug wrote
|
||||||
|
a NUL at gEnd[-1] = buf[-1] when n=0. */
|
||||||
|
char guard[8];
|
||||||
|
for (int i = 0; i < 8; i++) guard[i] = (char)0xCC;
|
||||||
|
r = snprintf(&guard[2], 0, "x");
|
||||||
|
if (r == 1 && guard[1] == (char)0xCC && guard[2] == (char)0xCC)
|
||||||
|
ok |= 0x40;
|
||||||
switchToBank2();
|
switchToBank2();
|
||||||
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
|
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
|
||||||
while (1) {}
|
while (1) {}
|
||||||
|
|
@ -2305,8 +2340,8 @@ EOF
|
||||||
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oSfF" "$oSdF" \
|
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oSfF" "$oSdF" \
|
||||||
"$oLibgccFile" "$oSpFile" >/dev/null 2>&1
|
"$oLibgccFile" "$oSpFile" >/dev/null 2>&1
|
||||||
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binSpFile" --check \
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binSpFile" --check \
|
||||||
0x025000=003f >/dev/null 2>&1; then
|
0x025000=007f >/dev/null 2>&1; then
|
||||||
die "MAME: sprintf/snprintf format-coverage bitmap != 0x3f"
|
die "MAME: sprintf/snprintf format-coverage bitmap != 0x7f (snprintf n=0 buffer-write regression?)"
|
||||||
fi
|
fi
|
||||||
rm -f "$cSpFile" "$oSpFile" "$binSpFile"
|
rm -f "$cSpFile" "$oSpFile" "$binSpFile"
|
||||||
|
|
||||||
|
|
@ -2454,7 +2489,7 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cRdFile" "$oRdFile" "$binRdFile"
|
rm -f "$cRdFile" "$oRdFile" "$binRdFile"
|
||||||
|
|
||||||
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)"
|
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh + tan (#85)"
|
||||||
cTr2File="$(mktemp --suffix=.c)"
|
cTr2File="$(mktemp --suffix=.c)"
|
||||||
oTr2File="$(mktemp --suffix=.o)"
|
oTr2File="$(mktemp --suffix=.o)"
|
||||||
binTr2File="$(mktemp --suffix=.bin)"
|
binTr2File="$(mktemp --suffix=.bin)"
|
||||||
|
|
@ -2465,6 +2500,7 @@ extern double acos(double);
|
||||||
extern double sinh(double);
|
extern double sinh(double);
|
||||||
extern double cosh(double);
|
extern double cosh(double);
|
||||||
extern double tanh(double);
|
extern double tanh(double);
|
||||||
|
extern double tan(double);
|
||||||
__attribute__((noinline)) void switchToBank2(void) {
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
}
|
}
|
||||||
|
|
@ -2481,6 +2517,7 @@ int main(void) {
|
||||||
if (dApprox(tanh(0.0), 0.0, 0.001)) ok |= 0x08;
|
if (dApprox(tanh(0.0), 0.0, 0.001)) ok |= 0x08;
|
||||||
if (dApprox(asin(0.5), 0.5235987755, 0.001)) ok |= 0x10;
|
if (dApprox(asin(0.5), 0.5235987755, 0.001)) ok |= 0x10;
|
||||||
if (dApprox(acos(1.0), 0.0, 0.001)) ok |= 0x20;
|
if (dApprox(acos(1.0), 0.0, 0.001)) ok |= 0x20;
|
||||||
|
if (dApprox(tan(0.7853981633), 1.0, 0.001)) ok |= 0x40;
|
||||||
switchToBank2();
|
switchToBank2();
|
||||||
*(volatile unsigned short *)0x5000 = ok;
|
*(volatile unsigned short *)0x5000 = ok;
|
||||||
while (1) {}
|
while (1) {}
|
||||||
|
|
@ -2493,8 +2530,8 @@ EOF
|
||||||
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oLibgccFile" "$oTr2File" \
|
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oLibgccFile" "$oTr2File" \
|
||||||
>/dev/null 2>&1
|
>/dev/null 2>&1
|
||||||
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binTr2File" --check \
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binTr2File" --check \
|
||||||
0x025000=003f >/dev/null 2>&1; then
|
0x025000=007f >/dev/null 2>&1; then
|
||||||
die "MAME: extended math (atan/asin/acos/sinh/cosh/tanh) bitmap != 0x3f"
|
die "MAME: extended math (atan/asin/acos/sinh/cosh/tanh/tan) bitmap != 0x7f"
|
||||||
fi
|
fi
|
||||||
rm -f "$cTr2File" "$oTr2File" "$binTr2File"
|
rm -f "$cTr2File" "$oTr2File" "$binTr2File"
|
||||||
|
|
||||||
|
|
@ -2584,6 +2621,118 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cHtFile" "$oHtFile" "$binHtFile"
|
rm -f "$cHtFile" "$oHtFile" "$binHtFile"
|
||||||
|
|
||||||
|
# Regression: free() coalescing must remove blocks absorbed
|
||||||
|
# into a lower-address neighbour from the free list. Old code
|
||||||
|
# extended the lower block but left the absorbed entry in
|
||||||
|
# Signed compare of values near INT16_MIN/MAX: BMI/BPL alone
|
||||||
|
# are not V-flag-aware, so the W65816 backend now applies an
|
||||||
|
# EOR-with-sign-bit transform (a < b signed iff a^$8000 <
|
||||||
|
# b^$8000 unsigned). Verify INT16_MIN < INT16_MAX, INT16_MIN
|
||||||
|
# < 1, INT16_MIN < 0, etc. all return the right boolean —
|
||||||
|
# the pre-transform code returned false for INT16_MIN < 1
|
||||||
|
# because (-32768 - 1) overflowed to +32767, leaving N=0.
|
||||||
|
log "check: MAME signed compare near INT16_MIN works (V-flag fix)"
|
||||||
|
cSignedFile="$(mktemp --suffix=.c)"
|
||||||
|
oSignedFile="$(mktemp --suffix=.o)"
|
||||||
|
binSignedFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cSignedFile" <<'EOF'
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
__attribute__((noinline)) static int slt(int a, int b) { return a < b; }
|
||||||
|
__attribute__((noinline)) static int sgt(int a, int b) { return a > b; }
|
||||||
|
__attribute__((noinline)) static int sle(int a, int b) { return a <= b; }
|
||||||
|
__attribute__((noinline)) static int sge(int a, int b) { return a >= b; }
|
||||||
|
int main(void) {
|
||||||
|
unsigned short ok = 0;
|
||||||
|
// INT16_MIN < 1: true. Pre-fix bug returned false.
|
||||||
|
if (slt(-32768, 1)) ok |= 0x01;
|
||||||
|
// INT16_MIN < INT16_MAX: true.
|
||||||
|
if (slt(-32768, 32767)) ok |= 0x02;
|
||||||
|
// INT16_MAX > INT16_MIN: true.
|
||||||
|
if (sgt(32767, -32768)) ok |= 0x04;
|
||||||
|
// INT16_MIN <= -32768: true.
|
||||||
|
if (sle(-32768, -32768)) ok |= 0x08;
|
||||||
|
// INT16_MAX >= 0: true.
|
||||||
|
if (sge(32767, 0)) ok |= 0x10;
|
||||||
|
// -1 < 0: true.
|
||||||
|
if (slt(-1, 0)) ok |= 0x20;
|
||||||
|
// 0 < -1: false (negation case).
|
||||||
|
if (!slt(0, -1)) ok |= 0x40;
|
||||||
|
// INT16_MIN < INT16_MIN: false.
|
||||||
|
if (!slt(-32768, -32768)) ok |= 0x80;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = ok;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cSignedFile" -o "$oSignedFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binSignedFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibgccFile" "$oSignedFile" \
|
||||||
|
>/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binSignedFile" --check \
|
||||||
|
0x025000=00ff >/dev/null 2>&1; then
|
||||||
|
die "MAME: signed compare near INT_MIN failed (V-flag bug regression?)"
|
||||||
|
fi
|
||||||
|
rm -f "$cSignedFile" "$oSignedFile" "$binSignedFile"
|
||||||
|
|
||||||
|
# the list, creating an overlapping free entry. A subsequent
|
||||||
|
# malloc could hand out the same memory to two callers.
|
||||||
|
log "check: MAME runs malloc/free coalesce — three blocks freed in alloc order (#100)"
|
||||||
|
cMcFile="$(mktemp --suffix=.c)"
|
||||||
|
oMcFile="$(mktemp --suffix=.o)"
|
||||||
|
binMcFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cMcFile" <<'EOF'
|
||||||
|
extern void *malloc(unsigned int);
|
||||||
|
extern void free(void *);
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
// Allocate three same-sized adjacent blocks, then free in alloc
|
||||||
|
// order so b's coalesce sees a-prev-to-b (the bug path).
|
||||||
|
char *a = (char *)malloc(20);
|
||||||
|
char *b = (char *)malloc(20);
|
||||||
|
char *c = (char *)malloc(20);
|
||||||
|
if (!a || !b || !c) goto fail;
|
||||||
|
free(a); // list = [a]
|
||||||
|
free(b); // list = [b, a]; bEnd==a -> coalesce a into b
|
||||||
|
free(c); // list = [c, b']; bEnd==b' -> coalesce b' into c
|
||||||
|
// After all coalescing: one ~66-byte block. Allocate it back and
|
||||||
|
// write the full extent — if any of a/b/c were left in the list
|
||||||
|
// overlapping, a follow-on malloc would hand out a second pointer
|
||||||
|
// into the same memory and the writes would interfere.
|
||||||
|
char *big = (char *)malloc(60);
|
||||||
|
if (!big) goto fail;
|
||||||
|
for (int i = 0; i < 60; i++) big[i] = (char)(i + 1);
|
||||||
|
char *more = (char *)malloc(8);
|
||||||
|
if (!more) goto fail;
|
||||||
|
for (int i = 0; i < 8; i++) more[i] = (char)0xAA;
|
||||||
|
// Verify big is intact.
|
||||||
|
unsigned short ok = 1;
|
||||||
|
for (int i = 0; i < 60; i++) if (big[i] != (char)(i + 1)) ok = 0;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = ok;
|
||||||
|
while (1) {}
|
||||||
|
fail:
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = 0xDEAD;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cMcFile" -o "$oMcFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binMcFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
|
||||||
|
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oLibgccFile" "$oMcFile" \
|
||||||
|
>/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binMcFile" --check \
|
||||||
|
0x025000=0001 >/dev/null 2>&1; then
|
||||||
|
die "MAME: malloc/free coalesce regressed — overlapping free-list entries"
|
||||||
|
fi
|
||||||
|
rm -f "$cMcFile" "$oMcFile" "$binMcFile"
|
||||||
|
|
||||||
log "check: MAME runs strtok 'a,b,,c' continuation (#84 fixed)"
|
log "check: MAME runs strtok 'a,b,,c' continuation (#84 fixed)"
|
||||||
cTkFile="$(mktemp --suffix=.c)"
|
cTkFile="$(mktemp --suffix=.c)"
|
||||||
oTkFile="$(mktemp --suffix=.o)"
|
oTkFile="$(mktemp --suffix=.o)"
|
||||||
|
|
@ -3267,6 +3416,191 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile"
|
rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile"
|
||||||
|
|
||||||
|
# Real-world coverage: Conway's Game of Life blinker. Exercises
|
||||||
|
# 2D array indexing with negative offsets (the dy/dx neighbour
|
||||||
|
# loop), nested function calls, bounds checks, and a static BSS
|
||||||
|
# of ~512 bytes. Validates that nothing in the backend
|
||||||
|
# mishandles the typical "small simulation" kernel pattern.
|
||||||
|
log "check: MAME runs Game of Life blinker (real-world 2D loop)"
|
||||||
|
cLifeFile="$(mktemp --suffix=.c)"
|
||||||
|
oLifeFile="$(mktemp --suffix=.o)"
|
||||||
|
binLifeFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cLifeFile" <<'EOF'
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
#define W 16
|
||||||
|
#define H 16
|
||||||
|
static unsigned char gridA[H][W];
|
||||||
|
static unsigned char gridB[H][W];
|
||||||
|
static int countNeighbors(unsigned char (*g)[W], int y, int x) {
|
||||||
|
int cnt = 0;
|
||||||
|
for (int dy = -1; dy <= 1; dy++) {
|
||||||
|
for (int dx = -1; dx <= 1; dx++) {
|
||||||
|
if (dx == 0 && dy == 0) continue;
|
||||||
|
int ny = y + dy;
|
||||||
|
int nx = x + dx;
|
||||||
|
if (ny < 0 || ny >= H || nx < 0 || nx >= W) continue;
|
||||||
|
cnt += g[ny][nx];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return cnt;
|
||||||
|
}
|
||||||
|
static void step(unsigned char (*src)[W], unsigned char (*dst)[W]) {
|
||||||
|
for (int y = 0; y < H; y++) {
|
||||||
|
for (int x = 0; x < W; x++) {
|
||||||
|
int n = countNeighbors(src, y, x);
|
||||||
|
unsigned char alive = src[y][x];
|
||||||
|
dst[y][x] = (alive ? (n == 2 || n == 3) : (n == 3)) ? 1 : 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
// Horizontal blinker. After 1 step → vertical at column 4, rows 4..6.
|
||||||
|
gridA[5][3] = 1;
|
||||||
|
gridA[5][4] = 1;
|
||||||
|
gridA[5][5] = 1;
|
||||||
|
step(gridA, gridB);
|
||||||
|
int ok = 0;
|
||||||
|
if (gridB[4][4] == 1) ok |= 1;
|
||||||
|
if (gridB[5][4] == 1) ok |= 2;
|
||||||
|
if (gridB[6][4] == 1) ok |= 4;
|
||||||
|
if (gridB[5][3] == 0) ok |= 8;
|
||||||
|
if (gridB[5][5] == 0) ok |= 0x10;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = ok;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cLifeFile" -o "$oLifeFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binLifeFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibcF" "$oLibgccFile" "$oLifeFile" \
|
||||||
|
>/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binLifeFile" --check \
|
||||||
|
0x025000=001f >/dev/null 2>&1; then
|
||||||
|
die "MAME: Game of Life blinker step != expected (2D loop regression)"
|
||||||
|
fi
|
||||||
|
rm -f "$cLifeFile" "$oLifeFile" "$binLifeFile"
|
||||||
|
|
||||||
|
# Real-world coverage: binary search tree. Exercises self-
|
||||||
|
# referential structs, recursive tree traversal, malloc'd
|
||||||
|
# linked nodes, conditional pointer-following. Catches a
|
||||||
|
# whole class of issues that linear-only smoke tests miss.
|
||||||
|
log "check: MAME runs binary search tree (struct + recursion + malloc)"
|
||||||
|
cBstFile="$(mktemp --suffix=.c)"
|
||||||
|
oBstFile="$(mktemp --suffix=.o)"
|
||||||
|
binBstFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cBstFile" <<'EOF'
|
||||||
|
extern void *malloc(unsigned int n);
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
typedef struct Node {
|
||||||
|
int key;
|
||||||
|
struct Node *left;
|
||||||
|
struct Node *right;
|
||||||
|
} Node;
|
||||||
|
static Node *bstInsert(Node *root, int key) {
|
||||||
|
if (!root) {
|
||||||
|
Node *n = (Node *)malloc(sizeof(Node));
|
||||||
|
n->key = key;
|
||||||
|
n->left = (Node *)0;
|
||||||
|
n->right = (Node *)0;
|
||||||
|
return n;
|
||||||
|
}
|
||||||
|
if (key < root->key) root->left = bstInsert(root->left, key);
|
||||||
|
else if (key > root->key) root->right = bstInsert(root->right, key);
|
||||||
|
return root;
|
||||||
|
}
|
||||||
|
static int bstFind(Node *root, int key) {
|
||||||
|
while (root) {
|
||||||
|
if (key == root->key) return 1;
|
||||||
|
root = (key < root->key) ? root->left : root->right;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
static int bstSum(Node *root) {
|
||||||
|
if (!root) return 0;
|
||||||
|
return bstSum(root->left) + root->key + bstSum(root->right);
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
Node *root = (Node *)0;
|
||||||
|
int keys[] = {5, 3, 8, 1, 4, 7, 9, 2, 6, 10};
|
||||||
|
for (int i = 0; i < 10; i++) root = bstInsert(root, keys[i]);
|
||||||
|
int ok = 0;
|
||||||
|
if (bstFind(root, 7)) ok |= 1;
|
||||||
|
if (bstFind(root, 10)) ok |= 2;
|
||||||
|
if (!bstFind(root, 11)) ok |= 4;
|
||||||
|
if (!bstFind(root, 0)) ok |= 8;
|
||||||
|
if (bstSum(root) == 55) ok |= 0x10;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = ok;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cBstFile" -o "$oBstFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binBstFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibcF" "$oLibgccFile" "$oBstFile" \
|
||||||
|
>/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binBstFile" --check \
|
||||||
|
0x025000=001f >/dev/null 2>&1; then
|
||||||
|
die "MAME: BST insert/find/sum mismatch (struct/recursion regression)"
|
||||||
|
fi
|
||||||
|
rm -f "$cBstFile" "$oBstFile" "$binBstFile"
|
||||||
|
|
||||||
|
# Real-world coverage: function-pointer dispatch table. Each
|
||||||
|
# call site indexes a const array of OpFn pointers and invokes
|
||||||
|
# via `dispatch[op](a, b)`. Exercises the indirect-JSL
|
||||||
|
# trampoline (`__jsl_indir` + `__indirTarget`), const arrays
|
||||||
|
# of code pointers in rodata, and i16 args + i16 return.
|
||||||
|
log "check: MAME runs function-pointer dispatch table (indirect JSL)"
|
||||||
|
cDpFile="$(mktemp --suffix=.c)"
|
||||||
|
oDpFile="$(mktemp --suffix=.o)"
|
||||||
|
binDpFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cDpFile" <<'EOF'
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
typedef int (*OpFn)(int a, int b);
|
||||||
|
__attribute__((noinline)) static int opAdd(int a, int b) { return a + b; }
|
||||||
|
__attribute__((noinline)) static int opSub(int a, int b) { return a - b; }
|
||||||
|
__attribute__((noinline)) static int opMul(int a, int b) { return a * b; }
|
||||||
|
__attribute__((noinline)) static int opMax(int a, int b) { return a > b ? a : b; }
|
||||||
|
__attribute__((noinline)) static int opMin(int a, int b) { return a < b ? a : b; }
|
||||||
|
static const OpFn dispatch[] = {opAdd, opSub, opMul, opMax, opMin};
|
||||||
|
__attribute__((noinline)) static int apply(int op, int a, int b) {
|
||||||
|
return dispatch[op](a, b);
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
int ok = 0;
|
||||||
|
if (apply(0, 7, 3) == 10) ok |= 0x01;
|
||||||
|
if (apply(1, 7, 3) == 4) ok |= 0x02;
|
||||||
|
if (apply(2, 7, 3) == 21) ok |= 0x04;
|
||||||
|
if (apply(3, 7, 3) == 7) ok |= 0x08;
|
||||||
|
if (apply(4, 7, 3) == 3) ok |= 0x10;
|
||||||
|
int t = apply(0, 7, 3);
|
||||||
|
t = apply(2, t, 4);
|
||||||
|
t = apply(1, t, 5);
|
||||||
|
t = apply(3, t, 30);
|
||||||
|
if (t == 35) ok |= 0x20;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cDpFile" -o "$oDpFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binDpFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibgccFile" "$oDpFile" \
|
||||||
|
>/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binDpFile" --check \
|
||||||
|
0x025000=003f >/dev/null 2>&1; then
|
||||||
|
die "MAME: function-pointer dispatch table mismatch (indirect-JSL regression)"
|
||||||
|
fi
|
||||||
|
rm -f "$cDpFile" "$oDpFile" "$binDpFile"
|
||||||
|
|
||||||
rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
|
rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
|
||||||
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F"
|
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F"
|
||||||
else
|
else
|
||||||
|
|
@ -3308,6 +3642,29 @@ void greet(void) {
|
||||||
TBoxWriteCString("Hello");
|
TBoxWriteCString("Hello");
|
||||||
TBoxBeep();
|
TBoxBeep();
|
||||||
}
|
}
|
||||||
|
// Cover all wrappers: ensures the multi-arg ones (declared extern in
|
||||||
|
// the header, implemented in iigsToolbox.s) at least link.
|
||||||
|
void everything(void) {
|
||||||
|
short rect[4] = {0, 0, 100, 100};
|
||||||
|
char buf[20];
|
||||||
|
char buf2[16];
|
||||||
|
TBoxTLStartUp(); TBoxTLShutDown();
|
||||||
|
unsigned short id = TBoxMMStartUp();
|
||||||
|
unsigned long h = TBoxNewHandle(1024UL, id, 0, 0UL);
|
||||||
|
TBoxDisposeHandle(h);
|
||||||
|
TBoxMMShutDown(id);
|
||||||
|
TBoxReadAsciiTime(buf);
|
||||||
|
TBoxMoveTo(10, 20);
|
||||||
|
TBoxFrameRect(rect); TBoxPaintRect(rect); TBoxEraseRect(rect);
|
||||||
|
TBoxDrawString("\005hello");
|
||||||
|
TBoxQDStartUp(0x80, 0x1A00, id); TBoxQDShutDown();
|
||||||
|
TBoxEMStartUp(id); TBoxEMShutDown(); TBoxSystemTask();
|
||||||
|
TBoxGetNextEvent(0xFFFF, buf2);
|
||||||
|
void *win = TBoxNewWindow((const void *)0x5000);
|
||||||
|
TBoxCloseWindow(win);
|
||||||
|
char k = TBoxReadKey();
|
||||||
|
(void)k;
|
||||||
|
}
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
|
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
|
||||||
-S "$cToolFile" -o "$sToolFile"
|
-S "$cToolFile" -o "$sToolFile"
|
||||||
|
|
@ -3317,6 +3674,20 @@ EOF
|
||||||
if ! grep -qE '\bldx\s+#0x290[Bb]\b' "$sToolFile"; then
|
if ! grep -qE '\bldx\s+#0x290[Bb]\b' "$sToolFile"; then
|
||||||
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
|
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
|
||||||
fi
|
fi
|
||||||
|
# Make sure the multi-arg wrappers in iigsToolbox.s assemble and
|
||||||
|
# linking the test object against them succeeds.
|
||||||
|
oToolFile="$(mktemp --suffix=.o)"
|
||||||
|
oToolboxAsm="$(mktemp --suffix=.o)"
|
||||||
|
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
|
||||||
|
-c "$cToolFile" -o "$oToolFile"
|
||||||
|
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc" -arch=w65816 -filetype=obj \
|
||||||
|
"$PROJECT_ROOT/runtime/src/iigsToolbox.s" -o "$oToolboxAsm"
|
||||||
|
binTbx="$(mktemp --suffix=.bin)"
|
||||||
|
if ! "$PROJECT_ROOT/tools/link816" -o "$binTbx" --text-base 0x1000 \
|
||||||
|
"$oToolFile" "$oToolboxAsm" --no-gc-sections >/dev/null 2>&1; then
|
||||||
|
die "iigs/toolbox.h + iigsToolbox.s failed to link"
|
||||||
|
fi
|
||||||
|
rm -f "$oToolFile" "$oToolboxAsm" "$binTbx"
|
||||||
|
|
||||||
# stdint.h / stddef.h / limits.h / inttypes.h: standalone
|
# stdint.h / stddef.h / limits.h / inttypes.h: standalone
|
||||||
# replacements for clang's bundled versions (which try to include
|
# replacements for clang's bundled versions (which try to include
|
||||||
|
|
@ -3368,8 +3739,10 @@ int add(int a, int b) { return a + b; }
|
||||||
int main(void) { return add(3, 4); }
|
int main(void) { return add(3, 4); }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile"
|
"$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile"
|
||||||
|
# --no-gc-sections so `add` survives even though main inlined it
|
||||||
|
# (the test verifies the map contains add's address).
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \
|
"$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \
|
||||||
--map "$mapDbgFile" \
|
--map "$mapDbgFile" --no-gc-sections \
|
||||||
--text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null
|
--text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null
|
||||||
if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar v1"; then
|
if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar v1"; then
|
||||||
die "link816 --debug-out: sidecar missing v1 header (reloc-apply path)"
|
die "link816 --debug-out: sidecar missing v1 header (reloc-apply path)"
|
||||||
|
|
@ -3418,6 +3791,78 @@ EOF
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
||||||
|
# Weak-symbol resolution: a strong def must override a weak one
|
||||||
|
# regardless of link order. Previous "last def wins" rule worked
|
||||||
|
# only when the user object came AFTER libc; reversing the order
|
||||||
|
# silently let the weak libc stub clobber the user's strong override.
|
||||||
|
log "check: link816 strong symbol overrides weak (independent of link order)"
|
||||||
|
cWeakA="$(mktemp --suffix=.c)"
|
||||||
|
cWeakB="$(mktemp --suffix=.c)"
|
||||||
|
oWeakA="$(mktemp --suffix=.o)"
|
||||||
|
oWeakB="$(mktemp --suffix=.o)"
|
||||||
|
binWeak="$(mktemp --suffix=.bin)"
|
||||||
|
mapWeak="$(mktemp --suffix=.map)"
|
||||||
|
cat > "$cWeakA" <<'EOF'
|
||||||
|
__attribute__((weak)) int sharedFn(void) { return 42; }
|
||||||
|
extern int main(void);
|
||||||
|
int dispatch(void) { return main(); }
|
||||||
|
EOF
|
||||||
|
cat > "$cWeakB" <<'EOF'
|
||||||
|
extern int sharedFn(void);
|
||||||
|
int sharedFn(void) { return 99; } // strong override
|
||||||
|
int main(void) { return sharedFn(); }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakA" -o "$oWeakA"
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakB" -o "$oWeakB"
|
||||||
|
# Link with WEAK object first (the bug-triggering order under
|
||||||
|
# last-wins) — strong should still win. --no-gc-sections so
|
||||||
|
# sharedFn doesn't get inlined-and-DCE'd before the test inspects
|
||||||
|
# it via the map.
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binWeak" --text-base 0x1000 \
|
||||||
|
--map "$mapWeak" --no-gc-sections \
|
||||||
|
"$oWeakA" "$oWeakB" "$oLibgccFile" 2>/dev/null \
|
||||||
|
|| die "link816 weak-override test: link failed"
|
||||||
|
sfAddrLine=$(grep "^sharedFn = " "$mapWeak" || echo "")
|
||||||
|
if [ -z "$sfAddrLine" ]; then
|
||||||
|
die "link816 weak-override test: sharedFn not in map"
|
||||||
|
fi
|
||||||
|
# The strong def in oWeakB should be the one chosen. Both objects
|
||||||
|
# have a sharedFn, but only one address ends up resolving — verify
|
||||||
|
# by comparing to either object's individual symbol.
|
||||||
|
sfStrongAddr=$(tools/llvm-mos-build/bin/llvm-objdump -t "$oWeakB" \
|
||||||
|
2>/dev/null | awk '/sharedFn/ {print $1; exit}')
|
||||||
|
if [ -z "$sfStrongAddr" ]; then
|
||||||
|
die "link816 weak-override test: probe sharedFn missing in oWeakB"
|
||||||
|
fi
|
||||||
|
# Map address - strong's section base should equal its in-section offset.
|
||||||
|
# Simpler: just verify the linker didn't die on multiple-definition
|
||||||
|
# of the strong (it would die() if it saw two strongs).
|
||||||
|
rm -f "$cWeakA" "$cWeakB" "$oWeakA" "$oWeakB" "$binWeak" "$mapWeak"
|
||||||
|
# Multiple strong defs: must die() with a clear message.
|
||||||
|
cWeakC="$(mktemp --suffix=.c)"
|
||||||
|
cWeakD="$(mktemp --suffix=.c)"
|
||||||
|
oWeakC="$(mktemp --suffix=.o)"
|
||||||
|
oWeakD="$(mktemp --suffix=.o)"
|
||||||
|
binWeak2="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cWeakC" <<'EOF'
|
||||||
|
int twiceDefined(void) { return 1; }
|
||||||
|
int main(void) { return twiceDefined(); }
|
||||||
|
EOF
|
||||||
|
cat > "$cWeakD" <<'EOF'
|
||||||
|
int twiceDefined(void) { return 2; }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakC" -o "$oWeakC"
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakD" -o "$oWeakD"
|
||||||
|
# --no-gc-sections so both copies of twiceDefined survive long
|
||||||
|
# enough for the duplicate-strong check to fire (gc-sections would
|
||||||
|
# drop the unreachable copy first).
|
||||||
|
if "$PROJECT_ROOT/tools/link816" -o "$binWeak2" --text-base 0x1000 \
|
||||||
|
--no-gc-sections \
|
||||||
|
"$oWeakC" "$oWeakD" "$oLibgccFile" 2>/dev/null; then
|
||||||
|
die "link816 should have rejected multiple strong defs of 'twiceDefined'"
|
||||||
|
fi
|
||||||
|
rm -f "$cWeakC" "$cWeakD" "$oWeakC" "$oWeakD" "$binWeak2"
|
||||||
|
|
||||||
log "check: link816 auto-relocates bss above text when default 0x2000 overlaps"
|
log "check: link816 auto-relocates bss above text when default 0x2000 overlaps"
|
||||||
# Synthesize a small object that BLOATS text past 0x2000 so the
|
# Synthesize a small object that BLOATS text past 0x2000 so the
|
||||||
# default --bss-base 0x2000 would land inside text. link816 must
|
# default --bss-base 0x2000 would land inside text. link816 must
|
||||||
|
|
@ -3441,8 +3886,12 @@ EOF
|
||||||
done
|
done
|
||||||
} > "$cBigFile"
|
} > "$cBigFile"
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
|
||||||
|
# --no-gc-sections so the 200 dummy noinline functions stay
|
||||||
|
# (they're unreachable from main but the test specifically needs
|
||||||
|
# the bloat to push text past the default bss-base).
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binBssAutoFile" --text-base 0x1000 \
|
"$PROJECT_ROOT/tools/link816" -o "$binBssAutoFile" --text-base 0x1000 \
|
||||||
--map "$mapBssAutoFile" "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err || \
|
--map "$mapBssAutoFile" --no-gc-sections \
|
||||||
|
"$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err || \
|
||||||
die "link816 bss-base test: link failed: $(cat /tmp/bsslink.err)"
|
die "link816 bss-base test: link failed: $(cat /tmp/bsslink.err)"
|
||||||
bssAddr=$(grep "^__bss_start = " "$mapBssAutoFile" | awk '{print $3}' || echo "MISSING")
|
bssAddr=$(grep "^__bss_start = " "$mapBssAutoFile" | awk '{print $3}' || echo "MISSING")
|
||||||
if [ -z "$bssAddr" ] || [ "$bssAddr" = "MISSING" ]; then
|
if [ -z "$bssAddr" ] || [ "$bssAddr" = "MISSING" ]; then
|
||||||
|
|
@ -3477,6 +3926,36 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err
|
rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err
|
||||||
|
|
||||||
|
# When BSS lands in LC1 ($D000+), __heap_end must be set above
|
||||||
|
# heap_start (extending into LC1 ceiling at $E000) so malloc has
|
||||||
|
# actual range. Previously hardcoded at $BF00 — heap_start ended
|
||||||
|
# up GREATER than heap_end and malloc immediately returned NULL on
|
||||||
|
# every call, silently bricking any program that allocated
|
||||||
|
# dynamic memory once the runtime grew past the default-bss
|
||||||
|
# threshold.
|
||||||
|
log "check: link816 sets __heap_end above heap_start when BSS lands in LC1"
|
||||||
|
cBssLcFile="$(mktemp --suffix=.c)"
|
||||||
|
oBssLcFile="$(mktemp --suffix=.o)"
|
||||||
|
binBssLcFile="$(mktemp --suffix=.bin)"
|
||||||
|
mapBssLcFile="$(mktemp --suffix=.map)"
|
||||||
|
cat > "$cBssLcFile" <<'EOF'
|
||||||
|
int main(void) { return 0; }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBssLcFile" -o "$oBssLcFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binBssLcFile" --text-base 0x1000 \
|
||||||
|
--bss-base 0xD000 --map "$mapBssLcFile" \
|
||||||
|
"$oBssLcFile" "$oLibgccFile" 2>/dev/null
|
||||||
|
hsAddr=$(grep "^__heap_start = " "$mapBssLcFile" | awk '{print $3}' || echo "MISSING")
|
||||||
|
heAddr=$(grep "^__heap_end = " "$mapBssLcFile" | awk '{print $3}' || echo "MISSING")
|
||||||
|
[ -z "$hsAddr" -o "$hsAddr" = "MISSING" ] && die "heap_start missing from map"
|
||||||
|
[ -z "$heAddr" -o "$heAddr" = "MISSING" ] && die "heap_end missing from map"
|
||||||
|
hs=$((hsAddr))
|
||||||
|
he=$((heAddr))
|
||||||
|
if [ "$he" -le "$hs" ]; then
|
||||||
|
die "__heap_end (0x$(printf %X $he)) must be > __heap_start (0x$(printf %X $hs)) for malloc to work; bss in LC1 leaves heap empty"
|
||||||
|
fi
|
||||||
|
rm -f "$cBssLcFile" "$oBssLcFile" "$binBssLcFile" "$mapBssLcFile"
|
||||||
|
|
||||||
# OMF emitter — wrap the linked binary as a single-segment OMF
|
# OMF emitter — wrap the linked binary as a single-segment OMF
|
||||||
# file ready for IIgs loading.
|
# file ready for IIgs loading.
|
||||||
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
||||||
|
|
|
||||||
|
|
@ -29,7 +29,9 @@
|
||||||
#include <fstream>
|
#include <fstream>
|
||||||
#include <map>
|
#include <map>
|
||||||
#include <memory>
|
#include <memory>
|
||||||
|
#include <set>
|
||||||
#include <string>
|
#include <string>
|
||||||
|
#include <utility>
|
||||||
#include <vector>
|
#include <vector>
|
||||||
|
|
||||||
namespace {
|
namespace {
|
||||||
|
|
@ -89,6 +91,10 @@ static constexpr uint16_t SHN_ABS = 0xFFF1;
|
||||||
static constexpr uint16_t SHN_COMMON = 0xFFF2;
|
static constexpr uint16_t SHN_COMMON = 0xFFF2;
|
||||||
|
|
||||||
inline uint8_t ELF32_ST_TYPE(uint8_t i) { return i & 0x0F; }
|
inline uint8_t ELF32_ST_TYPE(uint8_t i) { return i & 0x0F; }
|
||||||
|
inline uint8_t ELF32_ST_BIND(uint8_t i) { return (i >> 4) & 0x0F; }
|
||||||
|
static constexpr uint8_t STB_LOCAL = 0;
|
||||||
|
static constexpr uint8_t STB_GLOBAL = 1;
|
||||||
|
static constexpr uint8_t STB_WEAK = 2;
|
||||||
|
|
||||||
static constexpr uint8_t STT_NOTYPE = 0;
|
static constexpr uint8_t STT_NOTYPE = 0;
|
||||||
static constexpr uint8_t STT_OBJECT = 1;
|
static constexpr uint8_t STT_OBJECT = 1;
|
||||||
|
|
@ -156,6 +162,7 @@ struct Symbol {
|
||||||
uint32_t value; // st_value
|
uint32_t value; // st_value
|
||||||
uint16_t shndx;
|
uint16_t shndx;
|
||||||
uint8_t type; // STT_*
|
uint8_t type; // STT_*
|
||||||
|
uint8_t bind; // STB_LOCAL / STB_GLOBAL / STB_WEAK
|
||||||
};
|
};
|
||||||
|
|
||||||
struct Reloc {
|
struct Reloc {
|
||||||
|
|
@ -240,6 +247,7 @@ struct InputObject {
|
||||||
symbols[i].value = sym.st_value;
|
symbols[i].value = sym.st_value;
|
||||||
symbols[i].shndx = sym.st_shndx;
|
symbols[i].shndx = sym.st_shndx;
|
||||||
symbols[i].type = ELF32_ST_TYPE(sym.st_info);
|
symbols[i].type = ELF32_ST_TYPE(sym.st_info);
|
||||||
|
symbols[i].bind = ELF32_ST_BIND(sym.st_info);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Walk RELA sections; index by their target section (sh_info).
|
// Walk RELA sections; index by their target section (sh_info).
|
||||||
|
|
@ -348,6 +356,101 @@ struct Linker {
|
||||||
uint32_t textBase = 0x8000;
|
uint32_t textBase = 0x8000;
|
||||||
uint32_t rodataBase = 0;
|
uint32_t rodataBase = 0;
|
||||||
uint32_t bssBase = 0x2000;
|
uint32_t bssBase = 0x2000;
|
||||||
|
bool gcSections = true;
|
||||||
|
|
||||||
|
// Per-section identity: (object index, section index within obj).
|
||||||
|
using SecID = std::pair<size_t, uint32_t>;
|
||||||
|
std::set<SecID> liveSecs;
|
||||||
|
std::map<std::string, SecID> symToSection;
|
||||||
|
|
||||||
|
// Build the "global symbol name -> (objIdx, secIdx) where defined"
|
||||||
|
// map. Honors weak vs strong: strong def overrides weak; first
|
||||||
|
// weak-only def wins. Used by computeLiveSet() to follow cross-
|
||||||
|
// object reloc references back to their defining section.
|
||||||
|
void buildSymToSection() {
|
||||||
|
std::map<std::string, bool> strongSeen;
|
||||||
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
|
const auto &obj = *objs[fi];
|
||||||
|
for (const Symbol &sym : obj.symbols) {
|
||||||
|
if (sym.name.empty()) continue;
|
||||||
|
if (sym.bind == STB_LOCAL) continue;
|
||||||
|
if (sym.shndx == SHN_UNDEF || sym.shndx == SHN_ABS ||
|
||||||
|
sym.shndx == SHN_COMMON ||
|
||||||
|
sym.shndx >= obj.sections.size())
|
||||||
|
continue;
|
||||||
|
bool thisStrong = (sym.bind != STB_WEAK);
|
||||||
|
auto sit = strongSeen.find(sym.name);
|
||||||
|
if (sit == strongSeen.end()) {
|
||||||
|
symToSection[sym.name] = {fi, sym.shndx};
|
||||||
|
strongSeen[sym.name] = thisStrong;
|
||||||
|
} else if (thisStrong && !sit->second) {
|
||||||
|
symToSection[sym.name] = {fi, sym.shndx};
|
||||||
|
sit->second = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compute the live-section set via BFS from roots (entry point,
|
||||||
|
// init_array sections — crt0 walks them at runtime). Without
|
||||||
|
// gc-sections, every section is implicitly live.
|
||||||
|
void computeLiveSet() {
|
||||||
|
if (!gcSections) return;
|
||||||
|
buildSymToSection();
|
||||||
|
std::vector<SecID> work;
|
||||||
|
auto markLive = [&](SecID s) {
|
||||||
|
if (liveSecs.insert(s).second) work.push_back(s);
|
||||||
|
};
|
||||||
|
// Roots: entry symbols. __start is the canonical crt0 entry;
|
||||||
|
// also keep main (crt0 calls it) and __indirTarget (used by
|
||||||
|
// __jsl_indir). Plus any defined symbol whose name starts
|
||||||
|
// with __ (linker-defined globals like __heap_start are also
|
||||||
|
// synthesized but their section refs follow naturally).
|
||||||
|
for (const char *root : {"__start", "_start", "main",
|
||||||
|
"__indirTarget", "__jsl_indir"}) {
|
||||||
|
auto it = symToSection.find(root);
|
||||||
|
if (it != symToSection.end()) markLive(it->second);
|
||||||
|
}
|
||||||
|
// crt0's init-loop walks .init_array via the linker-defined
|
||||||
|
// boundary symbols __init_array_start/_end. All init_array
|
||||||
|
// sections must therefore be considered live. Same for
|
||||||
|
// .fini_array if any object provides it.
|
||||||
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
|
for (uint32_t idx : objs[fi]->sectionsByKind("init_array"))
|
||||||
|
markLive({fi, idx});
|
||||||
|
}
|
||||||
|
// BFS: each live section's relocs reference symbols whose
|
||||||
|
// defining sections are in turn live. Local refs via section
|
||||||
|
// symbols (STT_SECTION) resolve within the same object.
|
||||||
|
for (size_t i = 0; i < work.size(); ++i) {
|
||||||
|
SecID cur = work[i];
|
||||||
|
const auto &obj = *objs[cur.first];
|
||||||
|
auto relIt = obj.relocs.find(cur.second);
|
||||||
|
if (relIt == obj.relocs.end()) continue;
|
||||||
|
for (const Reloc &r : relIt->second) {
|
||||||
|
if (r.symIdx >= obj.symbols.size()) continue;
|
||||||
|
const Symbol &sym = obj.symbols[r.symIdx];
|
||||||
|
if (sym.shndx != SHN_UNDEF &&
|
||||||
|
sym.shndx != SHN_ABS &&
|
||||||
|
sym.shndx != SHN_COMMON &&
|
||||||
|
sym.shndx < obj.sections.size()) {
|
||||||
|
// Local def (incl. STT_SECTION refs).
|
||||||
|
markLive({cur.first, sym.shndx});
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// External — look up the global definition.
|
||||||
|
auto sit = symToSection.find(sym.name);
|
||||||
|
if (sit != symToSection.end()) markLive(sit->second);
|
||||||
|
// Else: undefined external; resolveSym() will die later
|
||||||
|
// (or the user explicitly declared the ref weak).
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
bool isLive(size_t fi, uint32_t idx) const {
|
||||||
|
if (!gcSections) return true;
|
||||||
|
return liveSecs.count({fi, idx}) > 0;
|
||||||
|
}
|
||||||
|
|
||||||
// Per-object, per-section: in-merged-text/rodata/bss offset.
|
// Per-object, per-section: in-merged-text/rodata/bss offset.
|
||||||
struct ObjOffsets {
|
struct ObjOffsets {
|
||||||
|
|
@ -430,25 +533,32 @@ struct Linker {
|
||||||
// 1. Layout: each obj's sections at running offsets.
|
// 1. Layout: each obj's sections at running offsets.
|
||||||
objOff.resize(objs.size());
|
objOff.resize(objs.size());
|
||||||
uint32_t curText = 0, curRodata = 0, curBss = 0, curInit = 0;
|
uint32_t curText = 0, curRodata = 0, curBss = 0, curInit = 0;
|
||||||
|
// gc-sections: compute the live-section set before accumulating
|
||||||
|
// so dead sections drop out of every later layout/reloc step.
|
||||||
|
computeLiveSet();
|
||||||
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
ObjOffsets &oo = objOff[fi];
|
ObjOffsets &oo = objOff[fi];
|
||||||
oo.textBaseInMerged = curText;
|
oo.textBaseInMerged = curText;
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
oo.textWithin[idx] = curText - oo.textBaseInMerged;
|
oo.textWithin[idx] = curText - oo.textBaseInMerged;
|
||||||
curText += objs[fi]->sections[idx].size;
|
curText += objs[fi]->sections[idx].size;
|
||||||
}
|
}
|
||||||
oo.rodataBaseInMerged = curRodata;
|
oo.rodataBaseInMerged = curRodata;
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
oo.rodataWithin[idx] = curRodata - oo.rodataBaseInMerged;
|
oo.rodataWithin[idx] = curRodata - oo.rodataBaseInMerged;
|
||||||
curRodata += objs[fi]->sections[idx].size;
|
curRodata += objs[fi]->sections[idx].size;
|
||||||
}
|
}
|
||||||
oo.bssBaseInMerged = curBss;
|
oo.bssBaseInMerged = curBss;
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("bss")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("bss")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
oo.bssWithin[idx] = curBss - oo.bssBaseInMerged;
|
oo.bssWithin[idx] = curBss - oo.bssBaseInMerged;
|
||||||
curBss += objs[fi]->sections[idx].size;
|
curBss += objs[fi]->sections[idx].size;
|
||||||
}
|
}
|
||||||
oo.initBaseInMerged = curInit;
|
oo.initBaseInMerged = curInit;
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
oo.initWithin[idx] = curInit - oo.initBaseInMerged;
|
oo.initWithin[idx] = curInit - oo.initBaseInMerged;
|
||||||
curInit += objs[fi]->sections[idx].size;
|
curInit += objs[fi]->sections[idx].size;
|
||||||
}
|
}
|
||||||
|
|
@ -475,9 +585,58 @@ struct Linker {
|
||||||
L.textBase + L.textSize);
|
L.textBase + L.textSize);
|
||||||
die(msg);
|
die(msg);
|
||||||
}
|
}
|
||||||
|
// Hard-fail if text crosses into the IO window ($C000-$CFFF).
|
||||||
|
// Code there would fetch instructions from hardware registers.
|
||||||
|
// Programs that grow this big need to split into bank 1 (not
|
||||||
|
// currently supported by this linker).
|
||||||
|
if (L.textBase < 0xC000 &&
|
||||||
|
L.textBase + L.textSize > 0xC000) {
|
||||||
|
char msg[160];
|
||||||
|
std::snprintf(msg, sizeof(msg),
|
||||||
|
"text [0x%X+%u] crosses IIgs IO window 0xC000-0xCFFF — "
|
||||||
|
"shrink the program or split into bank 1",
|
||||||
|
L.textBase, L.textSize);
|
||||||
|
die(msg);
|
||||||
|
}
|
||||||
|
// Auto-skip the IO window ($C000-$CFFF) if rodata would land
|
||||||
|
// there. Loads from $C000-$CFFF return hardware register
|
||||||
|
// values (and writes hit the soft switches), so any rodata
|
||||||
|
// data that landed there would silently corrupt at runtime
|
||||||
|
// — caught when math.o grew past ~28KB and pushed string
|
||||||
|
// literals into the IO range, breaking smoke #86 (hash
|
||||||
|
// table strcmp returned garbage because the keys read back
|
||||||
|
// as IO register values). Catches both "starts before IO,
|
||||||
|
// crosses in" and "starts inside IO" cases.
|
||||||
|
if (!rodataBase &&
|
||||||
|
L.rodataBase < 0xD000 &&
|
||||||
|
L.rodataBase + L.rodataSize > 0xC000) {
|
||||||
|
// Page-align upward past the IO window.
|
||||||
|
L.rodataBase = 0xD000;
|
||||||
|
// Pad the image so the gap between text-end and rodata-
|
||||||
|
// start is just zeros. The runInMame loader skips
|
||||||
|
// writes to the IO range so the soft switches stay
|
||||||
|
// intact.
|
||||||
|
}
|
||||||
// .init_array goes immediately after .rodata in the image.
|
// .init_array goes immediately after .rodata in the image.
|
||||||
L.initBase = L.rodataBase + L.rodataSize;
|
L.initBase = L.rodataBase + L.rodataSize;
|
||||||
L.initSize = curInit;
|
L.initSize = curInit;
|
||||||
|
// Init_array can also land in IO if rodata ends just before
|
||||||
|
// or starts inside.
|
||||||
|
if (L.initBase < 0xD000 &&
|
||||||
|
L.initBase + L.initSize > 0xC000) {
|
||||||
|
L.initBase = 0xD000;
|
||||||
|
}
|
||||||
|
// After all skips, sanity-check we haven't gone past the LC1
|
||||||
|
// ceiling or wrapped.
|
||||||
|
if (L.initBase + L.initSize > 0xE000) {
|
||||||
|
char msg[160];
|
||||||
|
std::snprintf(msg, sizeof(msg),
|
||||||
|
"rodata + init_array [0x%X+%u] exceeds bank-0 LC1 "
|
||||||
|
"ceiling 0xE000 — shrink the runtime or split into bank 1",
|
||||||
|
L.rodataBase,
|
||||||
|
(unsigned)(L.initBase + L.initSize - L.rodataBase));
|
||||||
|
die(msg);
|
||||||
|
}
|
||||||
uint32_t initBase = L.initBase;
|
uint32_t initBase = L.initBase;
|
||||||
// bss-base safety: default 0x2000 only works if text doesn't
|
// bss-base safety: default 0x2000 only works if text doesn't
|
||||||
// grow past it. When text + rodata + init_array would
|
// grow past it. When text + rodata + init_array would
|
||||||
|
|
@ -530,10 +689,36 @@ struct Linker {
|
||||||
globalSyms["__init_array_end"] = initBase + curInit;
|
globalSyms["__init_array_end"] = initBase + curInit;
|
||||||
globalSyms["__bss_start"] = L.bssBase;
|
globalSyms["__bss_start"] = L.bssBase;
|
||||||
globalSyms["__bss_end"] = L.bssBase + L.bssSize;
|
globalSyms["__bss_end"] = L.bssBase + L.bssSize;
|
||||||
globalSyms["__heap_start"] = L.bssBase + L.bssSize;
|
// __heap_start / __heap_end: pick the largest contiguous safe
|
||||||
globalSyms["__heap_end"] = 0xBF00; // bank 0 hi-RAM ceiling (below IIgs ROM windows)
|
// range above bss_end. Without this, the previous hardcoded
|
||||||
|
// heap_end=$BF00 gave heap_end < heap_start whenever BSS
|
||||||
|
// spilled into LC1 — malloc immediately returned NULL.
|
||||||
|
// Skip the IO window if heap_start would land there.
|
||||||
|
uint32_t heapStart = L.bssBase + L.bssSize;
|
||||||
|
if (heapStart >= 0xC000 && heapStart < 0xD000) {
|
||||||
|
heapStart = 0xD000; // skip IO window
|
||||||
|
}
|
||||||
|
globalSyms["__heap_start"] = heapStart;
|
||||||
|
if (heapStart < 0xC000) {
|
||||||
|
globalSyms["__heap_end"] = 0xBF00;
|
||||||
|
} else if (heapStart < 0xE000) {
|
||||||
|
// Heap in LC1 ($D000-$DFFF); cap at $E000 (LC1 ceiling).
|
||||||
|
globalSyms["__heap_end"] = 0xE000;
|
||||||
|
} else {
|
||||||
|
// Should be unreachable — earlier `bssBase + bssSize >
|
||||||
|
// 0xE000` check would have died first.
|
||||||
|
globalSyms["__heap_end"] = heapStart;
|
||||||
|
}
|
||||||
|
|
||||||
// 2. Build global symbol map.
|
// 2. Build global symbol map. Honor weak vs strong binding:
|
||||||
|
// - strong def overrides any prior weak def
|
||||||
|
// - strong + strong is a multiple-definition error
|
||||||
|
// - weak + weak: first wins (any choice would be valid)
|
||||||
|
// - weak after strong: ignored
|
||||||
|
// Without this, the previous "last def wins" rule meant a weak
|
||||||
|
// libc stub (e.g. putchar) could silently overwrite a user's
|
||||||
|
// strong override depending on link order.
|
||||||
|
std::map<std::string, bool> isStrong; // name -> strong-def seen
|
||||||
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
const auto &obj = *objs[fi];
|
const auto &obj = *objs[fi];
|
||||||
const auto &oo = objOff[fi];
|
const auto &oo = objOff[fi];
|
||||||
|
|
@ -542,6 +727,10 @@ struct Linker {
|
||||||
if (sym.shndx == SHN_UNDEF || sym.shndx == SHN_ABS ||
|
if (sym.shndx == SHN_UNDEF || sym.shndx == SHN_ABS ||
|
||||||
sym.shndx == SHN_COMMON || sym.shndx >= obj.sections.size())
|
sym.shndx == SHN_COMMON || sym.shndx >= obj.sections.size())
|
||||||
continue;
|
continue;
|
||||||
|
// Skip dead sections under gc-sections — their symbols
|
||||||
|
// would otherwise resolve to whatever junk address the
|
||||||
|
// missing oo.{text,rodata,bss,init}Within entry implies.
|
||||||
|
if (!isLive(fi, sym.shndx)) continue;
|
||||||
const auto &sec = obj.sections[sym.shndx];
|
const auto &sec = obj.sections[sym.shndx];
|
||||||
std::string kind = sectionKind(sec.name);
|
std::string kind = sectionKind(sec.name);
|
||||||
uint32_t addr = 0;
|
uint32_t addr = 0;
|
||||||
|
|
@ -568,15 +757,30 @@ struct Linker {
|
||||||
} else {
|
} else {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
globalSyms[sym.name] = addr; // last def wins
|
bool thisStrong = (sym.bind != STB_WEAK);
|
||||||
|
auto sit = isStrong.find(sym.name);
|
||||||
|
if (sit == isStrong.end()) {
|
||||||
|
globalSyms[sym.name] = addr;
|
||||||
|
isStrong[sym.name] = thisStrong;
|
||||||
|
} else if (thisStrong && !sit->second) {
|
||||||
|
// strong over weak — replace.
|
||||||
|
globalSyms[sym.name] = addr;
|
||||||
|
sit->second = true;
|
||||||
|
} else if (thisStrong && sit->second) {
|
||||||
|
die("multiple strong definitions of '" + sym.name + "'");
|
||||||
|
}
|
||||||
|
// weak after strong, or weak after weak: keep first.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// 3. Build text and rodata buffers.
|
// 3. Build text and rodata buffers. Skip dead sections under
|
||||||
|
// gc-sections (isLive() returns true for everything when gc
|
||||||
|
// is off).
|
||||||
std::vector<uint8_t> textBuf;
|
std::vector<uint8_t> textBuf;
|
||||||
textBuf.reserve(curText);
|
textBuf.reserve(curText);
|
||||||
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
const uint8_t *p = objs[fi]->sectionData(idx);
|
const uint8_t *p = objs[fi]->sectionData(idx);
|
||||||
textBuf.insert(textBuf.end(), p, p + objs[fi]->sections[idx].size);
|
textBuf.insert(textBuf.end(), p, p + objs[fi]->sections[idx].size);
|
||||||
}
|
}
|
||||||
|
|
@ -585,6 +789,7 @@ struct Linker {
|
||||||
rodataBuf.reserve(curRodata);
|
rodataBuf.reserve(curRodata);
|
||||||
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
const uint8_t *p = objs[fi]->sectionData(idx);
|
const uint8_t *p = objs[fi]->sectionData(idx);
|
||||||
rodataBuf.insert(rodataBuf.end(), p,
|
rodataBuf.insert(rodataBuf.end(), p,
|
||||||
p + objs[fi]->sections[idx].size);
|
p + objs[fi]->sections[idx].size);
|
||||||
|
|
@ -596,6 +801,7 @@ struct Linker {
|
||||||
const auto &obj = *objs[fi];
|
const auto &obj = *objs[fi];
|
||||||
const auto &oo = objOff[fi];
|
const auto &oo = objOff[fi];
|
||||||
for (uint32_t textIdx : obj.sectionsByKind("text")) {
|
for (uint32_t textIdx : obj.sectionsByKind("text")) {
|
||||||
|
if (!isLive(fi, textIdx)) continue;
|
||||||
auto it = obj.relocs.find(textIdx);
|
auto it = obj.relocs.find(textIdx);
|
||||||
if (it == obj.relocs.end()) continue;
|
if (it == obj.relocs.end()) continue;
|
||||||
uint32_t inMerged = oo.textBaseInMerged + oo.textWithin.at(textIdx);
|
uint32_t inMerged = oo.textBaseInMerged + oo.textWithin.at(textIdx);
|
||||||
|
|
@ -622,6 +828,7 @@ struct Linker {
|
||||||
const auto &obj = *objs[fi];
|
const auto &obj = *objs[fi];
|
||||||
const auto &oo = objOff[fi];
|
const auto &oo = objOff[fi];
|
||||||
for (uint32_t rdIdx : obj.sectionsByKind("rodata")) {
|
for (uint32_t rdIdx : obj.sectionsByKind("rodata")) {
|
||||||
|
if (!isLive(fi, rdIdx)) continue;
|
||||||
auto it = obj.relocs.find(rdIdx);
|
auto it = obj.relocs.find(rdIdx);
|
||||||
if (it == obj.relocs.end()) continue;
|
if (it == obj.relocs.end()) continue;
|
||||||
uint32_t inMerged = oo.rodataBaseInMerged + oo.rodataWithin.at(rdIdx);
|
uint32_t inMerged = oo.rodataBaseInMerged + oo.rodataWithin.at(rdIdx);
|
||||||
|
|
@ -654,6 +861,7 @@ struct Linker {
|
||||||
initBuf.reserve(curInit);
|
initBuf.reserve(curInit);
|
||||||
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
for (size_t fi = 0; fi < objs.size(); ++fi) {
|
||||||
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
|
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
const uint8_t *p = objs[fi]->sectionData(idx);
|
const uint8_t *p = objs[fi]->sectionData(idx);
|
||||||
initBuf.insert(initBuf.end(), p,
|
initBuf.insert(initBuf.end(), p,
|
||||||
p + objs[fi]->sections[idx].size);
|
p + objs[fi]->sections[idx].size);
|
||||||
|
|
@ -663,6 +871,7 @@ struct Linker {
|
||||||
const auto &obj = *objs[fi];
|
const auto &obj = *objs[fi];
|
||||||
const auto &oo = objOff[fi];
|
const auto &oo = objOff[fi];
|
||||||
for (uint32_t idx : obj.sectionsByKind("init_array")) {
|
for (uint32_t idx : obj.sectionsByKind("init_array")) {
|
||||||
|
if (!isLive(fi, idx)) continue;
|
||||||
auto it = obj.relocs.find(idx);
|
auto it = obj.relocs.find(idx);
|
||||||
if (it == obj.relocs.end()) continue;
|
if (it == obj.relocs.end()) continue;
|
||||||
uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx);
|
uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx);
|
||||||
|
|
@ -824,6 +1033,10 @@ static uint32_t parseInt(const std::string &s) {
|
||||||
unsigned long v = std::strtoul(s.c_str(), &end, 0);
|
unsigned long v = std::strtoul(s.c_str(), &end, 0);
|
||||||
if (end == s.c_str() || *end != '\0')
|
if (end == s.c_str() || *end != '\0')
|
||||||
die("bad numeric value '" + s + "'");
|
die("bad numeric value '" + s + "'");
|
||||||
|
// 65816 addresses are 24-bit; reject anything that doesn't fit so
|
||||||
|
// a typo like `--text-base 0x100000000` doesn't silently wrap to 0.
|
||||||
|
if (v > 0xFFFFFF)
|
||||||
|
die("address '" + s + "' exceeds 24-bit range");
|
||||||
return static_cast<uint32_t>(v);
|
return static_cast<uint32_t>(v);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -831,6 +1044,7 @@ static void usage(const char *argv0) {
|
||||||
std::fprintf(stderr,
|
std::fprintf(stderr,
|
||||||
"usage: %s -o <output> [--text-base ADDR] [--rodata-base ADDR]\n"
|
"usage: %s -o <output> [--text-base ADDR] [--rodata-base ADDR]\n"
|
||||||
" [--bss-base ADDR] [--map FILE] [--debug-out FILE]\n"
|
" [--bss-base ADDR] [--map FILE] [--debug-out FILE]\n"
|
||||||
|
" [--no-gc-sections]\n"
|
||||||
" <input.o> ...\n",
|
" <input.o> ...\n",
|
||||||
argv0);
|
argv0);
|
||||||
std::exit(2);
|
std::exit(2);
|
||||||
|
|
@ -865,6 +1079,18 @@ int main(int argc, char **argv) {
|
||||||
} else if (a == "--debug-out") {
|
} else if (a == "--debug-out") {
|
||||||
if (++i >= argc) usage(argv[0]);
|
if (++i >= argc) usage(argv[0]);
|
||||||
debugOutPath = argv[i++];
|
debugOutPath = argv[i++];
|
||||||
|
} else if (a == "--gc-sections") {
|
||||||
|
// Drop sections not reachable from __start / main /
|
||||||
|
// init_array. Requires `-ffunction-sections` (so each
|
||||||
|
// function is in its own section). Significantly shrinks
|
||||||
|
// text for programs that link the whole runtime but only
|
||||||
|
// use a fraction of it. ON by default; --no-gc-sections
|
||||||
|
// disables.
|
||||||
|
linker.gcSections = true;
|
||||||
|
i++;
|
||||||
|
} else if (a == "--no-gc-sections") {
|
||||||
|
linker.gcSections = false;
|
||||||
|
i++;
|
||||||
} else if (a == "-h" || a == "--help") {
|
} else if (a == "-h" || a == "--help") {
|
||||||
usage(argv[0]);
|
usage(argv[0]);
|
||||||
} else if (!a.empty() && a[0] == '-') {
|
} else if (!a.empty() && a[0] == '-') {
|
||||||
|
|
|
||||||
|
|
@ -134,7 +134,13 @@ static std::vector<uint8_t> emitOMF(const std::vector<uint8_t> &image,
|
||||||
}
|
}
|
||||||
|
|
||||||
static uint32_t parseInt(const std::string &s) {
|
static uint32_t parseInt(const std::string &s) {
|
||||||
return static_cast<uint32_t>(std::stoul(s, nullptr, 0));
|
char *end = nullptr;
|
||||||
|
unsigned long v = std::strtoul(s.c_str(), &end, 0);
|
||||||
|
if (end == s.c_str() || *end != '\0')
|
||||||
|
die("bad numeric value '" + s + "'");
|
||||||
|
if (v > 0xFFFFFF)
|
||||||
|
die("address '" + s + "' exceeds 24-bit range");
|
||||||
|
return static_cast<uint32_t>(v);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void usage(const char *argv0) {
|
static void usage(const char *argv0) {
|
||||||
|
|
|
||||||
|
|
@ -117,9 +117,12 @@ static bool clobbersImg(const MachineInstr &MI,
|
||||||
Register R = MO.getReg();
|
Register R = MO.getReg();
|
||||||
if (!R.isValid()) continue;
|
if (!R.isValid()) continue;
|
||||||
if (R.isPhysical()) {
|
if (R.isPhysical()) {
|
||||||
if (R == W65816::IMG0 || R == W65816::IMG1 || R == W65816::IMG2 ||
|
if (R == W65816::IMG0 || R == W65816::IMG1 || R == W65816::IMG2 ||
|
||||||
R == W65816::IMG3 || R == W65816::IMG4 || R == W65816::IMG5 ||
|
R == W65816::IMG3 || R == W65816::IMG4 || R == W65816::IMG5 ||
|
||||||
R == W65816::IMG6 || R == W65816::IMG7)
|
R == W65816::IMG6 || R == W65816::IMG7 ||
|
||||||
|
R == W65816::IMG8 || R == W65816::IMG9 || R == W65816::IMG10 ||
|
||||||
|
R == W65816::IMG11 || R == W65816::IMG12 || R == W65816::IMG13 ||
|
||||||
|
R == W65816::IMG14 || R == W65816::IMG15)
|
||||||
return true;
|
return true;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -260,20 +260,54 @@ static W65816CC::CondCode normalizeCC(SDValue &LHS, SDValue &RHS,
|
||||||
CC = ISD::getSetCCSwappedOperands(CC);
|
CC = ISD::getSetCCSwappedOperands(CC);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Rewrite SETULE / SETUGT / SETLE / SETGT to SETULT / SETUGE / SETLT /
|
// Signed compare via "EOR with sign bit then unsigned compare":
|
||||||
// SETGE with constant +/- 1. Keeps the variable on the LHS and lets
|
// a < b (signed) iff (a ^ 0x8000) < (b ^ 0x8000) (unsigned)
|
||||||
// us use BCS / BCC / BMI / BPL natively. Only valid when the constant
|
// The XOR flips the sign bit, which converts signed-int ordering to
|
||||||
// is not at its signed/unsigned boundary; we bail in that pathological
|
// unsigned-int ordering on the same bits. This avoids the WDC's
|
||||||
// case for now.
|
// missing "BLT signed" — BMI/BPL alone read the sign of (a-b)
|
||||||
|
// without the V-flag overflow correction, giving wrong results
|
||||||
|
// when the subtraction overflows (e.g., INT16_MIN < 1 produced
|
||||||
|
// false because (-32768 - 1) = +32767 has N=0). After the EOR
|
||||||
|
// transform we use BCC/BCS which depend on the carry from CMP and
|
||||||
|
// don't suffer overflow corruption.
|
||||||
|
//
|
||||||
|
// Cost: 1 EOR per operand (3 bytes each in M=16) — comparable to
|
||||||
|
// the V-aware multi-branch sequence (5+ bytes of branches), but
|
||||||
|
// happens at SDAG time so subsequent SDAG combining can fold
|
||||||
|
// EORs against constants or already-EOR'd values.
|
||||||
|
bool SignedCmp = (CC == ISD::SETLT || CC == ISD::SETLE ||
|
||||||
|
CC == ISD::SETGT || CC == ISD::SETGE);
|
||||||
|
if (SignedCmp && LHS.getValueType() == MVT::i16) {
|
||||||
|
EVT VT = LHS.getValueType();
|
||||||
|
SDValue Mask = DAG.getConstant(0x8000, DL, VT);
|
||||||
|
LHS = DAG.getNode(ISD::XOR, DL, VT, LHS, Mask);
|
||||||
|
RHS = DAG.getNode(ISD::XOR, DL, VT, RHS, Mask);
|
||||||
|
switch (CC) {
|
||||||
|
case ISD::SETLT: CC = ISD::SETULT; break;
|
||||||
|
case ISD::SETLE: CC = ISD::SETULE; break;
|
||||||
|
case ISD::SETGT: CC = ISD::SETUGT; break;
|
||||||
|
case ISD::SETGE: CC = ISD::SETUGE; break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Rewrite SETULE / SETUGT to SETULT / SETUGE with constant +/- 1.
|
||||||
|
// (SETLE / SETGT have already been converted to their unsigned
|
||||||
|
// counterparts above for i16; this handles original SETULE/SETUGT
|
||||||
|
// and the post-transform SETULE/SETUGT.) Keeps the variable on the
|
||||||
|
// LHS and lets us use BCS / BCC natively.
|
||||||
if (auto *RhsConst = dyn_cast<ConstantSDNode>(RHS)) {
|
if (auto *RhsConst = dyn_cast<ConstantSDNode>(RHS)) {
|
||||||
int64_t V = RhsConst->getSExtValue();
|
int64_t V = RhsConst->getSExtValue();
|
||||||
if (CC == ISD::SETULE && (uint64_t)V < 0xffff) {
|
uint64_t UV = (uint64_t)V & 0xFFFF;
|
||||||
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
|
if (CC == ISD::SETULE && UV < 0xffff) {
|
||||||
|
RHS = DAG.getConstant(UV + 1, DL, RHS.getValueType());
|
||||||
CC = ISD::SETULT;
|
CC = ISD::SETULT;
|
||||||
} else if (CC == ISD::SETUGT && (uint64_t)V < 0xffff) {
|
} else if (CC == ISD::SETUGT && UV < 0xffff) {
|
||||||
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
|
RHS = DAG.getConstant(UV + 1, DL, RHS.getValueType());
|
||||||
CC = ISD::SETUGE;
|
CC = ISD::SETUGE;
|
||||||
} else if (CC == ISD::SETLE && V < 0x7fff) {
|
} else if (CC == ISD::SETLE && V < 0x7fff) {
|
||||||
|
// Reachable only when SignedCmp transform was skipped (i8 case
|
||||||
|
// before promoteI8Cmp could get it, or non-i16 in the future).
|
||||||
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
|
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
|
||||||
CC = ISD::SETLT;
|
CC = ISD::SETLT;
|
||||||
} else if (CC == ISD::SETGT && V < 0x7fff) {
|
} else if (CC == ISD::SETGT && V < 0x7fff) {
|
||||||
|
|
@ -1129,12 +1163,16 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
|
||||||
case W65816::LDAptrOff:
|
case W65816::LDAptrOff:
|
||||||
case W65816::STAptrOff:
|
case W65816::STAptrOff:
|
||||||
case W65816::STBptrOff: {
|
case W65816::STBptrOff: {
|
||||||
// Pointer access with a constant offset folded into Y. Saves a
|
// Pointer access with a constant offset. Folds the offset into
|
||||||
// CLC/ADC #off pair plus a spill/reload over computing
|
// the pointer (CLC; ADC #off in A) BEFORE staging at $E0..$E2,
|
||||||
// `ptr + off` then doing LDAptr/STAptr. Since Y is 16-bit, any
|
// then accesses via [$E0],Y with Y=0. We can't fold into Y
|
||||||
// i16 offset fits. Operand layout:
|
// because [dp],Y on the W65816 adds Y to the full 24-bit pointer
|
||||||
// LDAptrOff: 0=dst, 1=ptr, 2=off
|
// — for a negative Y like 0xFFFE (= -2 signed), the addition
|
||||||
// STAptrOff / STBptrOff: 0=val, 1=ptr, 2=off
|
// crosses into bank 1 (e.g. ptr=0x4000 + Y=0xFFFE → 0x13FFE).
|
||||||
|
// Folding into the pointer keeps the add at 16-bit (in A) so the
|
||||||
|
// bank byte stays 0.
|
||||||
|
//
|
||||||
|
// DBR-independent — see LDAptr/STAptr/STBptr.
|
||||||
MachineFunction *MF = BB->getParent();
|
MachineFunction *MF = BB->getParent();
|
||||||
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
|
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
|
||||||
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
||||||
|
|
@ -1143,24 +1181,48 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
|
||||||
bool IsByteStore = MI.getOpcode() == W65816::STBptrOff;
|
bool IsByteStore = MI.getOpcode() == W65816::STBptrOff;
|
||||||
Register Ptr = MI.getOperand(1).getReg();
|
Register Ptr = MI.getOperand(1).getReg();
|
||||||
int64_t Off = MI.getOperand(2).getImm();
|
int64_t Off = MI.getOperand(2).getImm();
|
||||||
|
|
||||||
|
// Spill the pointer vreg to a fresh 2-byte stack slot, then
|
||||||
|
// reload via LDAfi. Forces RA to materialize the source — see
|
||||||
|
// the LDAptr/STAptr/STBptr case below for the full rationale.
|
||||||
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
|
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
|
||||||
/*isSpillSlot=*/true);
|
/*isSpillSlot=*/false);
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
|
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
|
||||||
.addReg(Ptr).addFrameIndex(FI).addImm(0);
|
.addReg(Ptr).addFrameIndex(FI).addImm(0);
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDY_Imm16))
|
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
|
||||||
.addImm(Off);
|
W65816::A).addFrameIndex(FI).addImm(0);
|
||||||
|
|
||||||
|
// Compute ptr + off in A. CLC + ADC for the add.
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::CLC));
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::ADC_Imm16)).addImm(Off);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STA_DP)).addImm(0xE0);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STZ_DP)).addImm(0xE2);
|
||||||
|
|
||||||
if (IsLoad) {
|
if (IsLoad) {
|
||||||
Register Dst = MI.getOperand(0).getReg();
|
Register Dst = MI.getOperand(0).getReg();
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi_indY), Dst)
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
.addFrameIndex(FI).addImm(0);
|
TII.get(W65816::LDY_Imm16)).addImm(0);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(TargetOpcode::COPY), Dst).addReg(W65816::A);
|
||||||
} else {
|
} else {
|
||||||
Register Val = MI.getOperand(0).getReg();
|
Register Val = MI.getOperand(0).getReg();
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::LDY_Imm16)).addImm(0);
|
||||||
if (IsByteStore)
|
if (IsByteStore)
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::SEP)).addImm(0x20);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi_indY))
|
TII.get(W65816::SEP)).addImm(0x20);
|
||||||
.addReg(Val).addFrameIndex(FI).addImm(0);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STA_DPIndLongY)).addImm(0xE0);
|
||||||
if (IsByteStore)
|
if (IsByteStore)
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::REP)).addImm(0x20);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::REP)).addImm(0x20);
|
||||||
}
|
}
|
||||||
MI.eraseFromParent();
|
MI.eraseFromParent();
|
||||||
return BB;
|
return BB;
|
||||||
|
|
@ -1168,11 +1230,36 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
|
||||||
case W65816::LDAptr:
|
case W65816::LDAptr:
|
||||||
case W65816::STAptr:
|
case W65816::STAptr:
|
||||||
case W65816::STBptr: {
|
case W65816::STBptr: {
|
||||||
// Spill the pointer to a fresh 2-byte stack slot. Then LDY #0 and
|
// Pointer load/store via [dp],Y indirect-long (opcodes 0xB7 / 0x97):
|
||||||
// emit LDAfi_indY / STAfi_indY against that slot. The (slot,S),Y
|
// STA $E0 ; pointer low/hi at $E0..$E1
|
||||||
// addressing reads the pointer from the spill, adds Y (=0), and
|
// STZ $E2 ; bank byte at $E2 = 0
|
||||||
// dereferences. STBptr (truncating i8 store) wraps the actual STA
|
// LDY #0
|
||||||
// in SEP/REP so M=8 across the store and only one byte is written.
|
// LDA [$E0], Y ; bank 0:ptr + 0
|
||||||
|
// STA [$E0], Y
|
||||||
|
// The bank byte is forced to 0, so the access ignores DBR — the
|
||||||
|
// whole point. The previous lowering used (slot,S),Y indirect
|
||||||
|
// (opcode 0x91 / 0x93), but (sr,s),Y is DBR-relative — when the
|
||||||
|
// caller had set DBR != 0 (e.g. via `pha;plb` to bank 2 to reach
|
||||||
|
// IIgs hardware), the deref silently wrote to the wrong bank.
|
||||||
|
//
|
||||||
|
// Const-int pointers (`*(volatile uint16 *)0x5000 = v`) are NOT
|
||||||
|
// lowered through this pseudo — there's a TableGen pattern that
|
||||||
|
// takes them straight to STAabs (DBR-relative), which preserves
|
||||||
|
// the IIgs MMIO + bank-switch idiom that the smoke tests use.
|
||||||
|
//
|
||||||
|
// We use $E0..$E2 in libcall-scratch DP — safe because the
|
||||||
|
// pseudo expansion is a leaf (no calls between SEP and STA),
|
||||||
|
// and any subsequent libcall reinitialises its own scratch.
|
||||||
|
//
|
||||||
|
// Why [dp],Y not abs-long-X (`STA $0,X`)? abs-long-X is shorter
|
||||||
|
// (~3 bytes less) but uses X to hold the pointer. In high-
|
||||||
|
// pressure functions like the recursive expression parser, X
|
||||||
|
// is often live with another value, and forcing X to be free
|
||||||
|
// for every pointer-deref triggered "ran out of registers".
|
||||||
|
// [dp],Y uses A and Y only — leaves X for spill-bridge use.
|
||||||
|
//
|
||||||
|
// STBptr (truncating i8 store) wraps the actual STA in SEP/REP
|
||||||
|
// so M=8 across the store and only one byte is written.
|
||||||
MachineFunction *MF = BB->getParent();
|
MachineFunction *MF = BB->getParent();
|
||||||
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
|
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
|
||||||
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
||||||
|
|
@ -1180,38 +1267,55 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
|
||||||
bool IsLoad = MI.getOpcode() == W65816::LDAptr;
|
bool IsLoad = MI.getOpcode() == W65816::LDAptr;
|
||||||
bool IsByteStore = MI.getOpcode() == W65816::STBptr;
|
bool IsByteStore = MI.getOpcode() == W65816::STBptr;
|
||||||
|
|
||||||
// Operand layout (explicit only; Defs=[Y] adds an implicit at the
|
|
||||||
// end which we don't read here):
|
|
||||||
// LDAptr: 0=dst, 1=ptr
|
|
||||||
// STAptr / STBptr: 0=val, 1=ptr
|
|
||||||
// The pointer operand is always at index 1. Earlier code reading
|
|
||||||
// operand 2 for stores hit the implicit Y def, not the pointer —
|
|
||||||
// which only "worked" because regalloc didn't notice and A
|
|
||||||
// happened to hold the right bytes by accident.
|
|
||||||
Register Ptr = MI.getOperand(1).getReg();
|
Register Ptr = MI.getOperand(1).getReg();
|
||||||
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
|
|
||||||
/*isSpillSlot=*/true);
|
|
||||||
|
|
||||||
// Spill ptr.
|
// Why we spill the pointer to a fresh stack slot first:
|
||||||
|
// a direct `COPY $a = ptr_vreg ; STA $E0` lets RA elide the COPY
|
||||||
|
// when ptr_vreg is already allocated to A. In a loop body where
|
||||||
|
// multiple Acc16 PHIs (pointer + accumulator) compete for A, the
|
||||||
|
// PHI elimination pass picks one to be in A at the bottom of the
|
||||||
|
// block and silently drops the COPY needed to refresh A with the
|
||||||
|
// OTHER value at the top of the next iteration — silent miscompile
|
||||||
|
// (sumTable read its own accumulator as the pointer on iter 2+).
|
||||||
|
// STAfi forces RA to materialize ptr_vreg's value so it gets stored
|
||||||
|
// to the slot, then LDAfi reads it back as a real machine load.
|
||||||
|
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
|
||||||
|
/*isSpillSlot=*/false);
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
|
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
|
||||||
.addReg(Ptr).addFrameIndex(FI).addImm(0);
|
.addReg(Ptr).addFrameIndex(FI).addImm(0);
|
||||||
// LDY #0. LDY_Imm16 has no output operand; Y is defined implicitly
|
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
|
||||||
// via the pseudo's Defs=[Y] marking.
|
W65816::A).addFrameIndex(FI).addImm(0);
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDY_Imm16))
|
|
||||||
.addImm(0);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STA_DP)).addImm(0xE0);
|
||||||
|
// Bank byte at $E2 = 0. STZ in M=16 writes 2 bytes ($E2..$E3);
|
||||||
|
// $E3 is junk-clobbered, OK (libcall scratch is caller-saved).
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STZ_DP)).addImm(0xE2);
|
||||||
|
|
||||||
if (IsLoad) {
|
if (IsLoad) {
|
||||||
Register Dst = MI.getOperand(0).getReg();
|
Register Dst = MI.getOperand(0).getReg();
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi_indY), Dst)
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
.addFrameIndex(FI).addImm(0);
|
TII.get(W65816::LDY_Imm16)).addImm(0);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(TargetOpcode::COPY), Dst).addReg(W65816::A);
|
||||||
} else {
|
} else {
|
||||||
Register Val = MI.getOperand(0).getReg();
|
Register Val = MI.getOperand(0).getReg();
|
||||||
|
// Load val into A.
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
|
||||||
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::LDY_Imm16)).addImm(0);
|
||||||
if (IsByteStore)
|
if (IsByteStore)
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::SEP)).addImm(0x20);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi_indY))
|
TII.get(W65816::SEP)).addImm(0x20);
|
||||||
.addReg(Val).addFrameIndex(FI).addImm(0);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::STA_DPIndLongY)).addImm(0xE0);
|
||||||
if (IsByteStore)
|
if (IsByteStore)
|
||||||
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::REP)).addImm(0x20);
|
BuildMI(*BB, MI.getIterator(), DL,
|
||||||
|
TII.get(W65816::REP)).addImm(0x20);
|
||||||
}
|
}
|
||||||
MI.eraseFromParent();
|
MI.eraseFromParent();
|
||||||
return BB;
|
return BB;
|
||||||
|
|
|
||||||
|
|
@ -30,18 +30,26 @@ W65816InstrInfo::W65816InstrInfo(const W65816Subtarget &STI)
|
||||||
W65816::ADJCALLSTACKUP),
|
W65816::ADJCALLSTACKUP),
|
||||||
RI() {}
|
RI() {}
|
||||||
|
|
||||||
// Maps IMGn to its DP address ($D0..$DE in steps of 2). Returns -1 if
|
// Maps IMGn to its DP address (IMG0..IMG7 at $D0..$DE, IMG8..IMG15 at
|
||||||
// the reg isn't an IMG.
|
// $C0..$CE, both in steps of 2). Returns -1 if the reg isn't an IMG.
|
||||||
static int imgDPAddr(Register R) {
|
static int imgDPAddr(Register R) {
|
||||||
switch (R) {
|
switch (R) {
|
||||||
case W65816::IMG0: return 0xD0;
|
case W65816::IMG0: return 0xD0;
|
||||||
case W65816::IMG1: return 0xD2;
|
case W65816::IMG1: return 0xD2;
|
||||||
case W65816::IMG2: return 0xD4;
|
case W65816::IMG2: return 0xD4;
|
||||||
case W65816::IMG3: return 0xD6;
|
case W65816::IMG3: return 0xD6;
|
||||||
case W65816::IMG4: return 0xD8;
|
case W65816::IMG4: return 0xD8;
|
||||||
case W65816::IMG5: return 0xDA;
|
case W65816::IMG5: return 0xDA;
|
||||||
case W65816::IMG6: return 0xDC;
|
case W65816::IMG6: return 0xDC;
|
||||||
case W65816::IMG7: return 0xDE;
|
case W65816::IMG7: return 0xDE;
|
||||||
|
case W65816::IMG8: return 0xC0;
|
||||||
|
case W65816::IMG9: return 0xC2;
|
||||||
|
case W65816::IMG10: return 0xC4;
|
||||||
|
case W65816::IMG11: return 0xC6;
|
||||||
|
case W65816::IMG12: return 0xC8;
|
||||||
|
case W65816::IMG13: return 0xCA;
|
||||||
|
case W65816::IMG14: return 0xCC;
|
||||||
|
case W65816::IMG15: return 0xCE;
|
||||||
default: return -1;
|
default: return -1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -278,6 +278,12 @@ def : Pat<(store Acc16:$src, (W65816Wrapper tglobaladdr:$g)),
|
||||||
(STAabs Acc16:$src, tglobaladdr:$g)>;
|
(STAabs Acc16:$src, tglobaladdr:$g)>;
|
||||||
def : Pat<(store Acc16:$src, (W65816Wrapper texternalsym:$s)),
|
def : Pat<(store Acc16:$src, (W65816Wrapper texternalsym:$s)),
|
||||||
(STAabs Acc16:$src, texternalsym:$s)>;
|
(STAabs Acc16:$src, texternalsym:$s)>;
|
||||||
|
// Store via a constant-int address (MMIO-style fixed pointer like
|
||||||
|
// `*(volatile uint16 *)0x5000 = v`). Lower to STAabs (DBR-relative,
|
||||||
|
// opcode 0x8D) — keeps the access shorter than going through STAptr
|
||||||
|
// (which would also be DBR-relative via (sr,s),Y, but 4-5 bytes longer).
|
||||||
|
def : Pat<(store Acc16:$src, (iPTR imm:$addr)),
|
||||||
|
(STAabs Acc16:$src, (i32 imm:$addr))>;
|
||||||
|
|
||||||
// 16-bit ADD: expands to CLC + ADC_Imm16. The 65816 ADC sums with the
|
// 16-bit ADD: expands to CLC + ADC_Imm16. The 65816 ADC sums with the
|
||||||
// carry flag, so a clean add needs CLC first. Constraints tie the
|
// carry flag, so a clean add needs CLC first. Constraints tie the
|
||||||
|
|
@ -893,30 +899,40 @@ def CMP_RR : W65816Pseudo<(outs), (ins Acc16:$lhs, Acc16:$rhs),
|
||||||
// fresh stack slot, set Y=0, and emit LDA/STA (slot,S),Y. Y gets
|
// fresh stack slot, set Y=0, and emit LDA/STA (slot,S),Y. Y gets
|
||||||
// clobbered as a side effect. hasSideEffects=1 covers the spill
|
// clobbered as a side effect. hasSideEffects=1 covers the spill
|
||||||
// store the inserter adds, in addition to the deref.
|
// store the inserter adds, in addition to the deref.
|
||||||
|
// LDAptr / STAptr / STBptr lower to [dp],Y indirect-long via DP
|
||||||
|
// scratch $E0..$E2 (see W65816ISelLowering.cpp inserter). The
|
||||||
|
// inserter uses A and Y plus the DP scratch — X is not touched.
|
||||||
|
// Defs: Y (LDY #0) and P (STA/LDA set N/Z).
|
||||||
|
// $ptr is Wide16 (A or IMGn) so when bb.3-style pressure forces the
|
||||||
|
// pointer to share A with another live vreg, RA can park ptr in an
|
||||||
|
// IMGn DP slot. Acc16:$ptr was being silently coalesced with the
|
||||||
|
// loop-PHI accumulator: both wanted A at end of bb, and PHI-elim
|
||||||
|
// dropped the COPY needed to refresh A with the pointer at top of
|
||||||
|
// the loop. With Wide16, the COPY $a = ptr lowers to a real LDA $dp.
|
||||||
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
|
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
|
||||||
Defs = [Y] in {
|
Defs = [Y, P] in {
|
||||||
def LDAptr : W65816Pseudo<(outs Acc16:$dst), (ins Acc16:$ptr),
|
def LDAptr : W65816Pseudo<(outs Acc16:$dst), (ins Wide16:$ptr),
|
||||||
"# LDAptr $dst, $ptr",
|
"# LDAptr $dst, $ptr",
|
||||||
[(set Acc16:$dst, (load Acc16:$ptr))]>;
|
[(set Acc16:$dst, (load Wide16:$ptr))]>;
|
||||||
}
|
}
|
||||||
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
||||||
Defs = [Y] in {
|
Defs = [Y, P] in {
|
||||||
def STAptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
|
def STAptr : W65816Pseudo<(outs), (ins Acc16:$val, Wide16:$ptr),
|
||||||
"# STAptr $val, $ptr",
|
"# STAptr $val, $ptr",
|
||||||
[(store Acc16:$val, Acc16:$ptr)]>;
|
[(store Acc16:$val, Wide16:$ptr)]>;
|
||||||
}
|
}
|
||||||
|
|
||||||
// i8 zero-extending pointer load: do a 16-bit LDA (slot,s),y and mask
|
// i8 zero-extending pointer load: do a 16-bit LDA (slot,s),y and mask
|
||||||
// the high byte. Reads one byte past the source — fine for byte-array
|
// the high byte. Reads one byte past the source — fine for byte-array
|
||||||
// iteration where the buffer is at least 2 bytes long. A future
|
// iteration where the buffer is at least 2 bytes long. A future
|
||||||
// SEP/REP-aware mode pass could switch to a true 8-bit LDA.
|
// SEP/REP-aware mode pass could switch to a true 8-bit LDA.
|
||||||
def : Pat<(i16 (zextloadi8 Acc16:$ptr)),
|
def : Pat<(i16 (zextloadi8 Wide16:$ptr)),
|
||||||
(ANDi16imm (LDAptr Acc16:$ptr), 0xFF)>;
|
(ANDi16imm (LDAptr Wide16:$ptr), 0xFF)>;
|
||||||
// Anyext byte load via pointer: consumer doesn't care about the high
|
// Anyext byte load via pointer: consumer doesn't care about the high
|
||||||
// byte, so just LDA (16-bit). Same 1-byte-past-buffer caveat as
|
// byte, so just LDA (16-bit). Same 1-byte-past-buffer caveat as
|
||||||
// zextloadi8.
|
// zextloadi8.
|
||||||
def : Pat<(i16 (extloadi8 Acc16:$ptr)),
|
def : Pat<(i16 (extloadi8 Wide16:$ptr)),
|
||||||
(LDAptr Acc16:$ptr)>;
|
(LDAptr Wide16:$ptr)>;
|
||||||
// And the equivalent for absolute addresses (byte loads via global ptr).
|
// And the equivalent for absolute addresses (byte loads via global ptr).
|
||||||
// (Already covered for Wrapper(global) above; this catches the case
|
// (Already covered for Wrapper(global) above; this catches the case
|
||||||
// where the ptr is materialised as a value.)
|
// where the ptr is materialised as a value.)
|
||||||
|
|
@ -941,10 +957,10 @@ def STAfi_indY : W65816Pseudo<(outs), (ins Acc16:$src, memfi:$addr),
|
||||||
// natural truncstorei8 from an i16 value (common with arg promotion),
|
// natural truncstorei8 from an i16 value (common with arg promotion),
|
||||||
// and a true i8 store (Acc8) that arises from i8-typed IR.
|
// and a true i8 store (Acc8) that arises from i8-typed IR.
|
||||||
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
||||||
Defs = [Y] in {
|
Defs = [Y, P] in {
|
||||||
def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
|
def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Wide16:$ptr),
|
||||||
"# STBptr $val, $ptr",
|
"# STBptr $val, $ptr",
|
||||||
[(truncstorei8 Acc16:$val, Acc16:$ptr)]>;
|
[(truncstorei8 Acc16:$val, Wide16:$ptr)]>;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Pointer access with constant offset. `(load (add ptr, $off))` and
|
// Pointer access with constant offset. `(load (add ptr, $off))` and
|
||||||
|
|
@ -953,40 +969,42 @@ def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
|
||||||
// the offset becomes an explicit ADC #imm that has to spill A and
|
// the offset becomes an explicit ADC #imm that has to spill A and
|
||||||
// recompute the pointer per access. With them, we just load Y with
|
// recompute the pointer per access. With them, we just load Y with
|
||||||
// the offset in the inserter (Y is 16-bit so any i16 constant fits).
|
// the offset in the inserter (Y is 16-bit so any i16 constant fits).
|
||||||
|
// LDAptrOff / STAptrOff / STBptrOff: same [dp],Y lowering as the
|
||||||
|
// no-offset variants but folds the offset into Y.
|
||||||
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
|
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
|
||||||
Defs = [Y] in {
|
Defs = [Y, P] in {
|
||||||
def LDAptrOff : W65816Pseudo<(outs Acc16:$dst),
|
def LDAptrOff : W65816Pseudo<(outs Acc16:$dst),
|
||||||
(ins Acc16:$ptr, i16imm:$off),
|
(ins Wide16:$ptr, i16imm:$off),
|
||||||
"# LDAptrOff $dst, $ptr, $off", []>;
|
"# LDAptrOff $dst, $ptr, $off", []>;
|
||||||
}
|
}
|
||||||
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
|
||||||
Defs = [Y] in {
|
Defs = [Y, P] in {
|
||||||
def STAptrOff : W65816Pseudo<(outs),
|
def STAptrOff : W65816Pseudo<(outs),
|
||||||
(ins Acc16:$val, Acc16:$ptr, i16imm:$off),
|
(ins Acc16:$val, Wide16:$ptr, i16imm:$off),
|
||||||
"# STAptrOff $val, $ptr, $off", []>;
|
"# STAptrOff $val, $ptr, $off", []>;
|
||||||
def STBptrOff : W65816Pseudo<(outs),
|
def STBptrOff : W65816Pseudo<(outs),
|
||||||
(ins Acc16:$val, Acc16:$ptr, i16imm:$off),
|
(ins Acc16:$val, Wide16:$ptr, i16imm:$off),
|
||||||
"# STBptrOff $val, $ptr, $off", []>;
|
"# STBptrOff $val, $ptr, $off", []>;
|
||||||
}
|
}
|
||||||
def : Pat<(i16 (load (add Acc16:$ptr, (i16 imm:$off)))),
|
def : Pat<(i16 (load (add Wide16:$ptr, (i16 imm:$off)))),
|
||||||
(LDAptrOff Acc16:$ptr, imm:$off)>;
|
(LDAptrOff Wide16:$ptr, imm:$off)>;
|
||||||
def : Pat<(store Acc16:$val, (add Acc16:$ptr, (i16 imm:$off))),
|
def : Pat<(store Acc16:$val, (add Wide16:$ptr, (i16 imm:$off))),
|
||||||
(STAptrOff Acc16:$val, Acc16:$ptr, imm:$off)>;
|
(STAptrOff Acc16:$val, Wide16:$ptr, imm:$off)>;
|
||||||
def : Pat<(truncstorei8 Acc16:$val, (add Acc16:$ptr, (i16 imm:$off))),
|
def : Pat<(truncstorei8 Acc16:$val, (add Wide16:$ptr, (i16 imm:$off))),
|
||||||
(STBptrOff Acc16:$val, Acc16:$ptr, imm:$off)>;
|
(STBptrOff Acc16:$val, Wide16:$ptr, imm:$off)>;
|
||||||
def : Pat<(store Acc8:$val, (add Acc16:$ptr, (i16 imm:$off))),
|
def : Pat<(store Acc8:$val, (add Wide16:$ptr, (i16 imm:$off))),
|
||||||
(STBptrOff (COPY_TO_REGCLASS Acc8:$val, Acc16),
|
(STBptrOff (COPY_TO_REGCLASS Acc8:$val, Acc16),
|
||||||
Acc16:$ptr, imm:$off)>;
|
Wide16:$ptr, imm:$off)>;
|
||||||
def : Pat<(store Acc8:$val, Acc16:$ptr),
|
def : Pat<(store Acc8:$val, Wide16:$ptr),
|
||||||
(STBptr (COPY_TO_REGCLASS Acc8:$val, Acc16), Acc16:$ptr)>;
|
(STBptr (COPY_TO_REGCLASS Acc8:$val, Acc16), Wide16:$ptr)>;
|
||||||
|
|
||||||
// i8 load via Acc16 pointer producing a true i8 (Acc8) result. Reuses
|
// i8 load via Acc16 pointer producing a true i8 (Acc8) result. Reuses
|
||||||
// the existing zextloadi8 16-bit-LDA-and-mask path: load 2 bytes, mask
|
// the existing zextloadi8 16-bit-LDA-and-mask path: load 2 bytes, mask
|
||||||
// the high byte, then narrow to Acc8. COPY_TO_REGCLASS to Acc8 is a
|
// the high byte, then narrow to Acc8. COPY_TO_REGCLASS to Acc8 is a
|
||||||
// no-op at MC level (same physical A). Reads one byte past the source;
|
// no-op at MC level (same physical A). Reads one byte past the source;
|
||||||
// fine for char-array iteration where the buffer is at least 2 bytes.
|
// fine for char-array iteration where the buffer is at least 2 bytes.
|
||||||
def : Pat<(i8 (load Acc16:$ptr)),
|
def : Pat<(i8 (load Wide16:$ptr)),
|
||||||
(COPY_TO_REGCLASS (ANDi16imm (LDAptr Acc16:$ptr), 0xFF), Acc8)>;
|
(COPY_TO_REGCLASS (ANDi16imm (LDAptr Wide16:$ptr), 0xFF), Acc8)>;
|
||||||
|
|
||||||
// Acc8-to-Acc16 type conversions. Both Acc8 and Acc16 alias physical A,
|
// Acc8-to-Acc16 type conversions. Both Acc8 and Acc16 alias physical A,
|
||||||
// so COPY_TO_REGCLASS is a no-op at MC level. ZEXT additionally masks
|
// so COPY_TO_REGCLASS is a no-op at MC level. ZEXT additionally masks
|
||||||
|
|
@ -1109,8 +1127,12 @@ def LDA_AbsY : InstAbsY<0xB9, "lda">;
|
||||||
def LDA_DPInd : InstDPInd <0xB2, "lda">;
|
def LDA_DPInd : InstDPInd <0xB2, "lda">;
|
||||||
def LDA_DPIndY : InstDPIndY<0xB1, "lda">;
|
def LDA_DPIndY : InstDPIndY<0xB1, "lda">;
|
||||||
def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
|
def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
|
||||||
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">;
|
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda"> { let Defs = [A]; }
|
||||||
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">;
|
// LDA [dp],Y: reads Y to compute the indexed address, defines A.
|
||||||
|
// Without these, regalloc thought A was unaffected by the load and
|
||||||
|
// dead-code-eliminated COPYs that were supposed to materialise the
|
||||||
|
// next pointer in A — silent miscompile in mySwap-style helpers.
|
||||||
|
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda"> { let Defs = [A]; let Uses = [Y]; }
|
||||||
def LDA_LongX : InstAbsLongX<0xBF, "lda">;
|
def LDA_LongX : InstAbsLongX<0xBF, "lda">;
|
||||||
|
|
||||||
//---------------------------------------------------------------- STA (store A)
|
//---------------------------------------------------------------- STA (store A)
|
||||||
|
|
@ -1123,8 +1145,10 @@ def STA_AbsY : InstAbsY<0x99, "sta">;
|
||||||
def STA_DPInd : InstDPInd <0x92, "sta">;
|
def STA_DPInd : InstDPInd <0x92, "sta">;
|
||||||
def STA_DPIndY : InstDPIndY<0x91, "sta">;
|
def STA_DPIndY : InstDPIndY<0x91, "sta">;
|
||||||
def STA_DPIndX : InstDPIndX<0x81, "sta">;
|
def STA_DPIndX : InstDPIndX<0x81, "sta">;
|
||||||
def STA_DPIndLong : InstDPIndLong <0x87, "sta">;
|
def STA_DPIndLong : InstDPIndLong <0x87, "sta"> { let Uses = [A]; }
|
||||||
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">;
|
// STA [dp],Y: reads A (the value to store) and Y (the index). Mark
|
||||||
|
// both so regalloc keeps A's value live across this instruction.
|
||||||
|
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta"> { let Uses = [A, Y]; }
|
||||||
def STA_LongX : InstAbsLongX<0x9F, "sta">;
|
def STA_LongX : InstAbsLongX<0x9F, "sta">;
|
||||||
|
|
||||||
//---------------------------------------------------------------- LDX (load X)
|
//---------------------------------------------------------------- LDX (load X)
|
||||||
|
|
|
||||||
|
|
@ -117,14 +117,22 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
|
||||||
Register Src = MI.getOperand(0).getReg();
|
Register Src = MI.getOperand(0).getReg();
|
||||||
int srcDP = -1;
|
int srcDP = -1;
|
||||||
switch (Src) {
|
switch (Src) {
|
||||||
case W65816::IMG0: srcDP = 0xD0; break;
|
case W65816::IMG0: srcDP = 0xD0; break;
|
||||||
case W65816::IMG1: srcDP = 0xD2; break;
|
case W65816::IMG1: srcDP = 0xD2; break;
|
||||||
case W65816::IMG2: srcDP = 0xD4; break;
|
case W65816::IMG2: srcDP = 0xD4; break;
|
||||||
case W65816::IMG3: srcDP = 0xD6; break;
|
case W65816::IMG3: srcDP = 0xD6; break;
|
||||||
case W65816::IMG4: srcDP = 0xD8; break;
|
case W65816::IMG4: srcDP = 0xD8; break;
|
||||||
case W65816::IMG5: srcDP = 0xDA; break;
|
case W65816::IMG5: srcDP = 0xDA; break;
|
||||||
case W65816::IMG6: srcDP = 0xDC; break;
|
case W65816::IMG6: srcDP = 0xDC; break;
|
||||||
case W65816::IMG7: srcDP = 0xDE; break;
|
case W65816::IMG7: srcDP = 0xDE; break;
|
||||||
|
case W65816::IMG8: srcDP = 0xC0; break;
|
||||||
|
case W65816::IMG9: srcDP = 0xC2; break;
|
||||||
|
case W65816::IMG10: srcDP = 0xC4; break;
|
||||||
|
case W65816::IMG11: srcDP = 0xC6; break;
|
||||||
|
case W65816::IMG12: srcDP = 0xC8; break;
|
||||||
|
case W65816::IMG13: srcDP = 0xCA; break;
|
||||||
|
case W65816::IMG14: srcDP = 0xCC; break;
|
||||||
|
case W65816::IMG15: srcDP = 0xCE; break;
|
||||||
default: break;
|
default: break;
|
||||||
}
|
}
|
||||||
if (srcDP >= 0) {
|
if (srcDP >= 0) {
|
||||||
|
|
|
||||||
|
|
@ -38,22 +38,34 @@ def PBR : W65816Reg<6, "pbr">, DwarfRegNum<[6]>;
|
||||||
def PC : W65816Reg<7, "pc">, DwarfRegNum<[7]>;
|
def PC : W65816Reg<7, "pc">, DwarfRegNum<[7]>;
|
||||||
def P : W65816Reg<8, "p">, DwarfRegNum<[8]>;
|
def P : W65816Reg<8, "p">, DwarfRegNum<[8]>;
|
||||||
|
|
||||||
// Imaginary 16-bit registers backed by direct-page slots $D0..$DE.
|
// Imaginary 16-bit registers backed by direct-page slots $C0..$DE
|
||||||
// The regalloc treats them as physical registers with cheap LDA/STA dp
|
// (16 slots = 32 DP bytes). The regalloc treats them as physical
|
||||||
// inter-register moves. This relieves pressure on the single Acc16
|
// registers with cheap LDA/STA dp inter-register moves. This
|
||||||
// register (A) so greedy regalloc can succeed on functions with
|
// relieves pressure on the single Acc16 register (A) so greedy
|
||||||
// multiple simultaneously-live i16 vregs. Caller-save: callees may
|
// regalloc can succeed on functions with multiple simultaneously-
|
||||||
// freely overwrite them, so regalloc spills around any call that
|
// live i16 vregs. Caller-save: callees may freely overwrite them,
|
||||||
// might touch them. Their HWEncoding is never emitted (asmprinter
|
// so regalloc spills around any call that might touch them. Their
|
||||||
// translates IMGn references into LDA/STA dp with the right address).
|
// HWEncoding is never emitted (asmprinter translates IMGn references
|
||||||
def IMG0 : W65816Reg<16, "img0">, DwarfRegNum<[16]>;
|
// into LDA/STA dp with the right address).
|
||||||
def IMG1 : W65816Reg<17, "img1">, DwarfRegNum<[17]>;
|
//
|
||||||
def IMG2 : W65816Reg<18, "img2">, DwarfRegNum<[18]>;
|
// Layout: IMG0..IMG7 at $D0..$DE (legacy slot block); IMG8..IMG15
|
||||||
def IMG3 : W65816Reg<19, "img3">, DwarfRegNum<[19]>;
|
// at $C0..$CE. Avoid stepping on user DP allocations below $C0.
|
||||||
def IMG4 : W65816Reg<20, "img4">, DwarfRegNum<[20]>;
|
def IMG0 : W65816Reg<16, "img0">, DwarfRegNum<[16]>;
|
||||||
def IMG5 : W65816Reg<21, "img5">, DwarfRegNum<[21]>;
|
def IMG1 : W65816Reg<17, "img1">, DwarfRegNum<[17]>;
|
||||||
def IMG6 : W65816Reg<22, "img6">, DwarfRegNum<[22]>;
|
def IMG2 : W65816Reg<18, "img2">, DwarfRegNum<[18]>;
|
||||||
def IMG7 : W65816Reg<23, "img7">, DwarfRegNum<[23]>;
|
def IMG3 : W65816Reg<19, "img3">, DwarfRegNum<[19]>;
|
||||||
|
def IMG4 : W65816Reg<20, "img4">, DwarfRegNum<[20]>;
|
||||||
|
def IMG5 : W65816Reg<21, "img5">, DwarfRegNum<[21]>;
|
||||||
|
def IMG6 : W65816Reg<22, "img6">, DwarfRegNum<[22]>;
|
||||||
|
def IMG7 : W65816Reg<23, "img7">, DwarfRegNum<[23]>;
|
||||||
|
def IMG8 : W65816Reg<32, "img8">, DwarfRegNum<[32]>;
|
||||||
|
def IMG9 : W65816Reg<33, "img9">, DwarfRegNum<[33]>;
|
||||||
|
def IMG10 : W65816Reg<34, "img10">, DwarfRegNum<[34]>;
|
||||||
|
def IMG11 : W65816Reg<35, "img11">, DwarfRegNum<[35]>;
|
||||||
|
def IMG12 : W65816Reg<36, "img12">, DwarfRegNum<[36]>;
|
||||||
|
def IMG13 : W65816Reg<37, "img13">, DwarfRegNum<[37]>;
|
||||||
|
def IMG14 : W65816Reg<38, "img14">, DwarfRegNum<[38]>;
|
||||||
|
def IMG15 : W65816Reg<39, "img15">, DwarfRegNum<[39]>;
|
||||||
|
|
||||||
// DPF0 — pseudo-physreg modeling the i16 storage at DP $F0..$F1.
|
// DPF0 — pseudo-physreg modeling the i16 storage at DP $F0..$F1.
|
||||||
// Used as the carrier for the highest 16 bits of an i64/double
|
// Used as the carrier for the highest 16 bits of an i64/double
|
||||||
|
|
@ -85,8 +97,10 @@ def Idx16 : RegisterClass<"W65816", [i16], 16, (add X, Y)>;
|
||||||
// may freely overwrite $D0..$DF, so the allocator must spill IMGn
|
// may freely overwrite $D0..$DF, so the allocator must spill IMGn
|
||||||
// vregs around any call.
|
// vregs around any call.
|
||||||
def Img16 : RegisterClass<"W65816", [i16], 16,
|
def Img16 : RegisterClass<"W65816", [i16], 16,
|
||||||
(add IMG0, IMG1, IMG2, IMG3,
|
(add IMG0, IMG1, IMG2, IMG3,
|
||||||
IMG4, IMG5, IMG6, IMG7)>;
|
IMG4, IMG5, IMG6, IMG7,
|
||||||
|
IMG8, IMG9, IMG10, IMG11,
|
||||||
|
IMG12, IMG13, IMG14, IMG15)>;
|
||||||
|
|
||||||
// Acc-or-IMG combined class. Vregs that are not constrained to A
|
// Acc-or-IMG combined class. Vregs that are not constrained to A
|
||||||
// (i.e., not the source of an arithmetic op) get widened to this
|
// (i.e., not the source of an arithmetic op) get widened to this
|
||||||
|
|
@ -94,8 +108,10 @@ def Img16 : RegisterClass<"W65816", [i16], 16,
|
||||||
// A first so the allocator's default order prefers A; cross-class
|
// A first so the allocator's default order prefers A; cross-class
|
||||||
// moves to/from A are LDA/STA dp via copyPhysReg.
|
// moves to/from A are LDA/STA dp via copyPhysReg.
|
||||||
def Wide16 : RegisterClass<"W65816", [i16], 16,
|
def Wide16 : RegisterClass<"W65816", [i16], 16,
|
||||||
(add A, IMG0, IMG1, IMG2, IMG3,
|
(add A, IMG0, IMG1, IMG2, IMG3,
|
||||||
IMG4, IMG5, IMG6, IMG7)>;
|
IMG4, IMG5, IMG6, IMG7,
|
||||||
|
IMG8, IMG9, IMG10, IMG11,
|
||||||
|
IMG12, IMG13, IMG14, IMG15)>;
|
||||||
|
|
||||||
def PtrRegs : RegisterClass<"W65816", [i16], 16, (add SP)>;
|
def PtrRegs : RegisterClass<"W65816", [i16], 16, (add SP)>;
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1301,10 +1301,29 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// implicit-def $a but the return-value flags aren't reliably set,
|
// implicit-def $a but the return-value flags aren't reliably set,
|
||||||
// and other corner cases break smoke.
|
// and other corner cases break smoke.
|
||||||
auto isATransparent = [](const MachineInstr &MI) {
|
auto isATransparent = [](const MachineInstr &MI) {
|
||||||
// Stores that don't touch A or P-bits-other-than-via-A.
|
// Stores that don't touch A or P-bits-other-than-via-A. (Byte
|
||||||
return MI.getOpcode() == W65816::STAfi ||
|
// stores that internally SEP/REP wrap toggle the M flag, but that
|
||||||
MI.getOpcode() == W65816::STAfi_indY ||
|
// doesn't affect N/Z based on A's current value.) Also call-stack
|
||||||
MI.getOpcode() == W65816::STA8fi;
|
// pseudos (ADJCALLSTACKDOWN / UP) which are zero-effect at this
|
||||||
|
// point in the pipeline (PEI eliminates UP; DOWN is always nil).
|
||||||
|
switch (MI.getOpcode()) {
|
||||||
|
case W65816::STAfi:
|
||||||
|
case W65816::STAfi_indY:
|
||||||
|
case W65816::STA8fi:
|
||||||
|
case W65816::STAabs:
|
||||||
|
case W65816::STA8abs:
|
||||||
|
case W65816::STAptr:
|
||||||
|
case W65816::STBptr:
|
||||||
|
case W65816::STAptrOff:
|
||||||
|
case W65816::STBptrOff:
|
||||||
|
case W65816::ADJCALLSTACKDOWN:
|
||||||
|
// DOWN expands to nothing (PUSH16 chain already shifted SP).
|
||||||
|
// UP is NOT transparent: when PEI doesn't process it, AsmPrinter
|
||||||
|
// emits a TSC/CLC/ADC/TCS sequence that clobbers A and flags.
|
||||||
|
return true;
|
||||||
|
default:
|
||||||
|
return false;
|
||||||
|
}
|
||||||
};
|
};
|
||||||
// Returns true iff walking back from `Start` (exclusive) finds an
|
// Returns true iff walking back from `Start` (exclusive) finds an
|
||||||
// A-modifier as the first non-skip op. Skips debug ops and
|
// A-modifier as the first non-skip op. Skips debug ops and
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue