Checkpoint

This commit is contained in:
Scott Duensing 2026-05-02 16:48:56 -05:00
parent d6a34075a5
commit 07544f49f2
27 changed files with 2013 additions and 440 deletions

253
STATUS.md
View file

@ -72,11 +72,13 @@ which runs correctly under MAME (apple2gs).
native object format) for round-tripping with classic dev tools.
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
libgcc into linkable objects.
- `scripts/smokeTest.sh` runs 99 end-to-end checks (scalar ops,
- `scripts/smokeTest.sh` runs 102 end-to-end checks (scalar ops,
control flow, calling conventions, MAME execution, regressions,
link816 bss-base safety, iigs/toolbox.h compile-check, standalone
runtime headers, AsmPrinter peepholes for STZ / PEA / PEI —
single-STA, shared-LDA-multi-STA, and DPF0-forwarding cases).
link816 bss-base safety + weak-symbol resolution +
heap_end-vs-heap_start sanity, iigs/toolbox.h compile-check,
standalone runtime headers, AsmPrinter peepholes for STZ /
PEA / PEI — single-STA, shared-LDA-multi-STA, and DPF0-
forwarding cases — malloc/free coalesce ordering).
Currently 100% pass at -O2 throughout.
**ABI:**
@ -131,11 +133,10 @@ Two open bugs tracked:
both pass. Workaround comments in build.sh / smokeTest.sh
removed.
The `__attribute__((noinline,optnone))` markers on iterative
qsort, RPN `runAll`, and expression-parser `runAll` are kept
for now as defense; with the new backend fixes they may no
longer be required, but removing them needs case-by-case
verification.
The `__attribute__((noinline,optnone))` defenses on iterative
qsort / RPN `runAll` / expression-parser `runAll` were
subsequently dropped; the smoke now compiles them at plain
`-O2` without escape hatches.
The W65816 backend assembler now supports all common indirect
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
@ -208,18 +209,45 @@ sidecar bytes.
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
correct for negative offsets like `arr[i-1]`.
- **(d,s),y for stack-local pointer dereferences uses DBR**, so
user code that switches DBR (e.g. `pha;plb` to bank 2 to reach
IIgs hardware) must not call into a function that takes the
address of one of its locals — the callee's `*p = v` will write
to the wrong bank. Documented; no compiler-side mitigation
beyond the existing DPF0 fake-physreg routing for the i64-return
high half. Workaround: inline pointer-arg helpers so the writes
stay in the caller's frame using stack-rel direct stores. The
W65816 only has three DBR-independent addressing modes
(abs_long, abs_long,X, [dp],Y) — none cheap to retrofit into
the current pointer-deref lowering (+5 bytes minimum per access).
Real fix needs PHB/PLB at noinline-pointer-callee entry/exit.
- **Pointer-deref bank policy is now split-by-syntax** (FIXED):
`*p` (where `p` is a runtime pointer / local-or-arg vreg) lowers
via `LDAptr / STAptr / STBptr` to `[$E0],Y` indirect-LONG with
the bank byte at `$E2` forced to 0 — DBR-independent. The
`*(volatile uint16 *)0x5000 = v` MMIO idiom (const-int pointer)
is matched by a separate TableGen pattern that lowers straight
to `STAabs` (DBR-relative) so the smoke tests' bank-2 write
path still works. Two tracked issues this resolved:
(a) PHI-elim was eliding the inserter's `COPY $a = ptr_vreg`
when the loop body had multiple Acc16 PHIs competing for A —
the inserter now spills the pointer to a fresh stack slot and
reloads via LDAfi to keep RA honest; sumTable now correct.
(b) pointer staging through `[$E0]` is bank-0 only, so
switchToBank2 + helper-with-local-ptr no longer corrupts data
in the wrong bank. See `feedback_dbr_ptr_deref_spill.md`.
- **Greedy regalloc fails on long-arg call chains** — a function
that strings ~7+ independent `helper(longArg1, longArg2)` calls
overflows greedy at -O1+ ("ran out of registers during register
allocation"). Same root issue as softDouble's old -O2 hold-out.
Threshold raised somewhat by expanding IMG slots from 8 to 16
(now backed by DP $C0..$DE) — most "normal-looking" mixed-arity
workloads now compile, but pathological pressure (many i32+ args
+ bitmask SETCC chain) still fails. Workarounds (in order of
preference): mark the heaviest helper `__attribute__((noinline))`
to reduce caller pressure; `-mllvm -regalloc=fast` for that TU;
or `__attribute__((optnone))` on the affected function. A proper
fix needs either a custom greedy→fast fallback in
`W65816TargetMachine::createTargetRegisterAllocator` or a smarter
spill-placement pre-RA pass.
- **Bank-0 size limit (~48KB)** — the runtime + program must fit in
$1000-$BFFF (text+rodata) plus $D000-$DFFF (LC1 for rodata-spill
and BSS). Past that, link816 hard-fails because text would
cross the IO window. In practice this is rarely hit now that
link816 has `--gc-sections` (default ON, see Recently Fixed)
which drops unreachable functions: a minimal program shrinks
from ~43KB (whole runtime) to ~1.5KB. Programs that genuinely
use most of the runtime can still hit the limit.
## Recently fixed
@ -288,24 +316,173 @@ sidecar bytes.
also removes two PHA/PLA save-restore wraps around the LDA #0
(STZ doesn't touch A, so the wraps are unnecessary).
- **libgcc.s `lda dp; pha` -> `pei dp`** — 2 sites in __divhi3 /
__modhi3 where the loaded A is dead after the push. PEI
doesn't touch A, saves 1 byte each.
- **W65816StackSlotCleanup Pass 1c skip-list extended** — added
STAabs / STA8abs / STAptr / STBptr / STAptrOff / STBptrOff and
ADJCALLSTACKDOWN to the A-transparent list. Lets the redundant-
CMP-after-A-modifier elimination see through more pseudo
stores and the call-stack-down pseudo. Saves 8 bytes in math.o.
(ADJCALLSTACKUP is NOT transparent — when PEI doesn't process
it, AsmPrinter emits a TSC/CLC/ADC/TCS that clobbers A.)
- **crt0.s `lda #0; sta` -> `stz`** — IRQ-disable block and the
BSS-zero loop both used `.byte 0xa9, 0x00 ; sta` raw-byte
workarounds for `lda #0` (the assembler emits a 16-bit immediate
in M=8, mis-encoding it). `stz` works in M=8 (stores 1 byte) and
doesn't touch A — both `.byte` workarounds removed; saves 4 bytes
in crt0.o.
- **Runtime correctness pass — five real bugs fixed:**
- `free()` coalesce: when a freed block was absorbed into a
lower-address neighbour (`bEnd == a` path), the absorbed entry
was left in the free list overlapping the extended one. A
follow-on malloc could hand out the same memory to two
callers. Fix: track outer-loop predecessor and excise the
absorbed entry. Smoke #100 added.
- `sqrt(-0.0)` returned NaN; should return -0.0 per IEEE-754.
The sign-bit check fired before the zero check. Fix: mask
sign bit when testing for zero.
- `log(0)` returned NaN; should return -Infinity (pole error).
Same sign-bit-vs-zero ordering issue; both ±0 now return
`-1.0/0.0`.
- `snprintf(buf, 0, ...)` wrote `'\0'` to `buf[-1]` (one byte
BEFORE the buffer). C99 says n=0 must not touch the buffer.
Fix: set `gEnd = NULL` for n=0 so neither the normal nor the
truncation NUL-write path fires. Smoke #76 extended.
- `malloc(>~32KB)` and `calloc(n, m)` had silent integer overflow
on size_t (16-bit), wrapping to small values and handing out
tiny allocations claiming huge sizes. Bumped malloc to bail
above 0x7FF0 (heap is at most ~32KB anyway) and made calloc
overflow-check before multiplying.
- **Removed** dead `runtime/src/softDouble.s` (a stub from before
`softDouble.c` was implemented; the build script doesn't reference
it but it was confusing to leave around).
- **inttypes.h PRId64 / PRIu64 / PRIx64** documented as
unsupported in the runtime's printf — the macros expand to
`"lld"`/`"llu"`/`"llx"` but the formatter only knows the `l`
length modifier, not `ll`, so the format prints literally and
the va_list misaligns. Use `PRId32` etc. for now.
- **More runtime fixes (round 2):**
- `fputs(s, stream)` was forwarding to `puts(s)`, which appends a
newline. C says fputs MUST NOT add one. Direct char-by-char
write now.
- `exit(code)` never invoked the registered `atexit` handler.
C99 7.20.4.3 requires it. Now runs the single-slot handler
(with re-entry guard) before the BRK.
- `printf("%f", -0.0)` printed `0.000000` instead of `-0.000000`
because `if (v < 0)` (a `__ltdf2` call) returns false for
negative zero. Switched to the IEEE-754 sign-bit test that
snprintf already uses.
- `vfprintf` was missing entirely (declared neither in stdio.h
nor implemented). Added a thin wrapper around vprintf.
- **link816 weak-symbol resolution:** the linker previously used
"last def wins" with no regard for STB_GLOBAL vs STB_WEAK. When
a user provided a strong override of a weak libc stub (e.g.
`putchar`), it worked only by link-order luck — reversing the
order let the weak stub silently overwrite the strong def.
Now properly: strong over weak (any order), strong + strong
errors out, weak + weak picks the first. Smoke #100 added.
- **More runtime fixes (round 3):**
- `writeHex` / `emitHex` had a stack-overflow buffer overrun
(`char buf[5]` but `printf("%08x", ...)` would write 8 bytes).
On 16-bit `unsigned int`, max useful width is 4 — buf shrunk
to 4 and width is now capped.
- `writeDec` / `writeSignedLong` / `emitDec` / `emitSignedLong`
used `-n` on signed input, which overflows for INT_MIN /
LONG_MIN (UB). All four switched to unsigned-negation
(`0u - (unsigned)n`) for correctness and to keep an
optimizer-aware compiler from exploiting the UB.
- `atoi` / `atol` / `strtol` / `strtoul` likewise built the
parsed magnitude in a signed accumulator and negated at the
end — same UB on the boundary value. All switched to
unsigned magnitude + unsigned-negation cast.
- `link816 parseInt` / `omfEmit parseInt` silently truncated
addresses > 24 bits to `uint32_t` low bits — `--text-base
0x100000000` would silently wrap to 0. Both now reject
out-of-range addresses with a clear error.
- **More runtime fixes (round 4):**
- `pow(x, y)` computed `n = -n` for the integer-y branch when
yi was INT_MIN (-32768); same signed-overflow UB pattern as
the print functions. Switched to unsigned magnitude.
- Added `perror(prefix)` — was missing from the runtime; common
pattern in portable code that reports I/O failure via
`errno + strerror`. Declared in stdio.h, implemented as
char-by-char emit through putchar (no fprintf dependency).
- **link816 `__heap_end` was hardcoded at $BF00**, ignoring where
`__heap_start` actually ended up. When BSS got auto-relocated
into LC1 ($D000+), heap_start ended up > heap_end and malloc
immediately returned NULL on every call — silently bricking any
program that allocated dynamic memory after the runtime grew
past the default-bss threshold. Heap_end now picks
$BF00 / $E000 based on where heap_start lands (and skips the IO
window if heap_start would have landed in $C000-$CFFF).
Smoke #102 added.
- **link816 rodata auto-skips IIgs IO window** ($C000-$CFFF). When
text+rodata grew past 0xC000 the rodata bytes silently corrupted
at runtime — string literals in the IO range read back as
hardware register values, breaking strcmp / strstr / printf / etc.
Now: rodata that would land in or cross $C000-$CFFF auto-skips
to $D000. Init_array gets the same treatment. Text that would
cross IO is hard-rejected at link time (no auto-fix possible —
PC fetches in IO would read hardware registers). This was the
root cause of the "tan/tanf triggers layout-sensitive failure"
symptom listed in older STATUS notes.
- **runInMame skips writes to IO window** during the binary load.
Without this, the zero-padding in the rodata-skip gap would
clobber soft switches (e.g. the LC1 RAM enable that crt0 sets
via $C083) when the loader naively wrote the entire image
byte-by-byte to memory.
- **link816 `--gc-sections` (default ON)** — discards sections not
reachable from the entry point (`__start` / `_start` / `main`
for the canonical crt0 setup) plus all `.init_array` sections.
Built on `-ffunction-sections` so each function is in its own
section. A minimal program with full runtime linked shrinks
from ~43KB to ~1.5KB. Adding `tan/tanf` to math.c (which
caused the latent layout-sensitive failure described above)
no longer pushes any test past the bank-0 limit. Tests that
intentionally check unreachable symbols pass `--no-gc-sections`
to opt out.
- **`fwrite(stdout, ...)` was a stub returning 0** even though
`stdout` has a working `putchar` route. Now actually writes
through `putchar` for stdout/stderr (only). Also gained the
same `size * nmemb` overflow guard as `calloc`.
## What's still needed for a "ship-ready" toolchain
- **softDouble.c -O1 hold-out**`__muldf3`'s u64 lifetime pressure
overflows the greedy register allocator at -O2 ("ran out of
registers during register allocation"). Builds correctly at
-O1. Investigated: marking dpack noinline reduces pressure but
isn't enough; making dclass noinline would unblock -O2 (verified)
but the (d,s),y-uses-DBR bug then corrupts dclass's pointer-arg
writes when a caller has switched DBR (caught by smoke's
dmul-after-bank-switch test). Real fix is gated on the broader
DBR-pointer-deref limitation listed above.
- **softDouble.c -O2 — FIXED.** Marking `dclass` noinline (in
addition to `dpack`) drops register pressure in `__muldf3`/
`__divdf3`/`__adddf3` enough that greedy regalloc no longer
runs out. The previous blocker was that noinline-dclass would
write through pointer args via the DBR-relative `(d,s),y` mode
and corrupt caller data after a bank switch — that path now
goes through `STAptr/STBptr` which use `[$E0],Y` indirect-long
with the bank byte forced to 0, so DBR is irrelevant. All
three smoke build sites moved to `-O2`.
- **More of the C standard library**: real `<stdio.h>` file I/O
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
returning success/zero) — would need a memory-backed FS or a
MAME hook. `<locale.h>` / `<signal.h>` are stubbed (compile and
return safe defaults); `<wchar.h>` / `<time.h>` mostly absent.
MAME hook. `<locale.h>` / `<signal.h>` / `<time.h>` are stubbed
(compile and return safe defaults). `<wchar.h>` mostly absent.
A `time()` impl wired to ReadTimeHex (Misc Tool $0D03) was
attempted but crashes MAME without the Tool Locator initialised
in crt0; `clock()` via VBL counter at $E1006B needs 24-bit
far-pointer support that the backend doesn't yet model.
- **C++ runtime support**: vtable layout for multiple inheritance,
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
@ -315,9 +492,15 @@ sidecar bytes.
whether any 8-bit accumulator value is used. A per-region
scheduler would reduce the SEP/REP wrap overhead on i8 stores.
- **Toolbox / IIgs system call bindings**: header files declaring
the Apple IIgs system calls (`SystemTask`, `WaitMouseUp`,
`DrawString`, …) with the right inline-asm dispatch glue.
- **Toolbox / IIgs system call bindings**: `iigs/toolbox.h` covers
the common entry points across Tool Locator, Memory Manager,
Misc Tools, QuickDraw II, Event Manager, Window Manager, plus
GS/OS Quit. Multi-arg wrappers (NewHandle, QDStartUp, MoveTo,
EMStartUp, GetNextEvent, NewWindow, CloseWindow) live in
`runtime/src/iigsToolbox.s` because the backend's inline-asm
constraints can't take memory operands. Single-arg / no-arg
wrappers stay inline. More routines (Menu Manager, Dialog
Manager, Standard File, Sound) still TBD.
- **Real-world program coverage**: the smoke tests are
microbenchmarks. A few known-good Apple IIgs C programs (e.g.

View file

@ -1,25 +1,27 @@
// IIgs toolbox helpers — minimal inline-asm wrappers for the most
// commonly-used Apple IIgs system calls.
// IIgs toolbox helpers — wrappers for commonly-used Apple IIgs system
// calls.
//
// Toolbox dispatch on the IIgs goes through the Tool Locator at
// $E10000. Each routine is identified by a 16-bit "tool number"
// (low byte = tool set, high byte = function within set), loaded
// (high byte = function within set, low byte = tool set), loaded
// into X, and called via JSL $E10000.
//
// Args go on the stack (push order: rightmost first), then the
// caller pushes a result-space slot if the routine returns something
// non-i16-or-pointer, then JSL.
// GS/OS dispatch goes through $E100A8 with X holding the call
// number and a parameter-block pointer pushed on the stack.
//
// This header keeps things simple: each function inlines a tiny
// asm block specific to that call. No #include guards on bigger
// abstractions; users that want full toolbox coverage should write
// their own wrappers using the same pattern.
// Calling convention:
// - Args go on the stack (push order: rightmost first), then the
// caller pushes a result-space slot (16 or 32 bits) BEFORE
// the args if the routine returns something non-void.
// - The result is read off the same stack slot AFTER JSL.
// - Tool number lives in X immediately before JSL.
// - Tools clobber A, X, Y, P; the runtime spills around the call.
//
// LIMITATIONS:
// - Only a handful of routines wrapped. Calypsi has full toolbox.
// - No error-handling — caller checks the return.
// - Single-bank only. Cross-bank toolbox calls need different
// dispatch logic.
// Single-arg / no-arg wrappers are `static inline`. Multi-arg
// wrappers are declared `extern` here and implemented in
// runtime/src/iigsToolbox.s — backend constraints don't allow
// memory-operand inline asm so the multi-arg pushes need real
// .s code.
#ifndef IIGS_TOOLBOX_H
#define IIGS_TOOLBOX_H
@ -28,81 +30,284 @@
extern "C" {
#endif
// Tool number convention: high byte = function, low byte = tool set.
// Common tool sets: 04 = Misc, 0E = QuickDraw II, 18 = Window Mgr.
// ===== Tool numbers (high byte = function, low byte = tool set) =====
// Tool sets:
// 01 = Tool Locator 02 = Memory Manager 03 = Misc Tools
// 04 = QuickDraw II 06 = Event Manager 0E = Window Manager
// 1B = Menu Manager 29 = Standard File
// Misc Tool Set ---------------------------------------------------
// WriteCString (Misc Tool $290B) — write a NUL-terminated string to
// the text screen. Arg: 16-bit pointer pushed before the call.
// Returns nothing.
static inline void TBoxWriteCString(const char *s) {
// =====================================================================
// Tool Locator (Set $01)
// =====================================================================
static inline void TBoxTLStartUp(void) {
__asm__ volatile (
"pha\n" // push C-string pointer
"ldx #0x290B\n" // tool number (function 0x29, set 0x0B)
"jsl 0xe10000\n" // tool dispatcher
"ldx #0x0201\n"
"jsl 0xe10000\n"
:
: "a"(s)
:
: "a", "x", "y", "memory"
);
}
static inline void TBoxTLShutDown(void) {
__asm__ volatile (
"ldx #0x0301\n"
"jsl 0xe10000\n"
:
:
: "a", "x", "y", "memory"
);
}
// =====================================================================
// Memory Manager (Set $02)
// =====================================================================
// MMStartUp — call as the first MM routine. Returns the caller's
// 16-bit userId; save it for later DisposeAll calls.
static inline unsigned short TBoxMMStartUp(void) {
unsigned short id;
__asm__ volatile (
"pha\n" // result space
"ldx #0x0202\n"
"jsl 0xe10000\n"
"pla\n"
: "=a"(id)
:
: "x", "y", "memory"
);
return id;
}
// MMShutDown — releases all MM resources owned by `userId`.
static inline void TBoxMMShutDown(unsigned short userId) {
__asm__ volatile (
"pha\n"
"ldx #0x0302\n"
"jsl 0xe10000\n"
:
: "a"(userId)
: "x", "y", "memory"
);
}
// SysBeep (Misc Tool $0303) — short beep through the speaker.
// NewHandle / DisposeHandle live in iigsToolbox.s — the parameter
// blocks are 4-arg with mixed widths and need explicit asm.
extern unsigned long TBoxNewHandle(unsigned long size,
unsigned short userId,
unsigned short attr,
unsigned long addr);
extern void TBoxDisposeHandle(unsigned long handle);
// =====================================================================
// Misc Tools (Set $03)
// =====================================================================
// SysBeep — short beep through the speaker.
static inline void TBoxBeep(void) {
__asm__ volatile (
"ldx #0x0303\n"
"jsl 0xe10000\n"
:
:
: "x", "y", "memory"
: "a", "x", "y", "memory"
);
}
// ReadKey (Event Mgr; simplified — actually KeyTrans/etc). Returns
// the next pending key in A, or 0 if none. This wraps GetNextEvent
// internally on a real GS; for the simple console harness it polls
// the keyboard buffer.
static inline char TBoxReadKey(void) {
char r;
// WriteCString — Misc Tool $0B; writes a NUL-terminated string to
// the text screen. Note: actual GS uses Text Tools or stdio;
// this is the legacy entry point.
static inline void TBoxWriteCString(const char *s) {
__asm__ volatile (
"ldx #0x250A\n" // GetEvent (placeholder; refine in real port)
"pha\n"
"ldx #0x290B\n"
"jsl 0xe10000\n"
: "=a"(r)
:
: "a"(s)
: "x", "y", "memory"
);
return r;
}
// ConsoleQuit — clean program shutdown via GS/OS Quit. Pushes a
// pConditionTbl pointer (here, 0 for no condition) before JSL.
// ReadAsciiTime — fills a 20-byte buffer with the current time
// formatted as "DDD MMM dd hh:mm:ss yyyy".
static inline void TBoxReadAsciiTime(char *buf20) {
__asm__ volatile (
"pha\n"
"ldx #0x0F03\n"
"jsl 0xe10000\n"
:
: "a"(buf20)
: "x", "y", "memory"
);
}
// =====================================================================
// QuickDraw II (Set $04)
// =====================================================================
// QDStartUp / QDShutDown. Multi-arg startup lives in iigsToolbox.s.
extern void TBoxQDStartUp(unsigned short masterSCB,
unsigned short pageSize,
unsigned short userId);
static inline void TBoxQDShutDown(void) {
__asm__ volatile (
"ldx #0x0304\n"
"jsl 0xe10000\n"
:
:
: "a", "x", "y", "memory"
);
}
// MoveTo — move the pen to absolute (h, v).
extern void TBoxMoveTo(short h, short v);
// DrawString — draw a Pascal-style length-prefixed string at the
// current pen position. First byte of `pstr` must be the length.
static inline void TBoxDrawString(const char *pstr) {
__asm__ volatile (
"pha\n"
"ldx #0x2C04\n"
"jsl 0xe10000\n"
:
: "a"(pstr)
: "x", "y", "memory"
);
}
// PaintRect / FrameRect / EraseRect — rect is a 16-bit pointer to a
// 4-word Rect (top, left, bottom, right).
static inline void TBoxPaintRect(const short *rect) {
__asm__ volatile (
"pha\n"
"ldx #0x5104\n"
"jsl 0xe10000\n"
:
: "a"(rect)
: "x", "y", "memory"
);
}
static inline void TBoxFrameRect(const short *rect) {
__asm__ volatile (
"pha\n"
"ldx #0x4F04\n"
"jsl 0xe10000\n"
:
: "a"(rect)
: "x", "y", "memory"
);
}
static inline void TBoxEraseRect(const short *rect) {
__asm__ volatile (
"pha\n"
"ldx #0x5004\n"
"jsl 0xe10000\n"
:
: "a"(rect)
: "x", "y", "memory"
);
}
// =====================================================================
// Event Manager (Set $06)
// =====================================================================
// EMStartUp — initialises Event Manager with default queue and
// 640x200 mouse clamp. Args other than userId are hardcoded; if
// you need custom clamp, write your own wrapper.
extern void TBoxEMStartUp(unsigned short userId);
static inline void TBoxEMShutDown(void) {
__asm__ volatile (
"ldx #0x0306\n"
"jsl 0xe10000\n"
:
:
: "a", "x", "y", "memory"
);
}
// SystemTask — gives time to background tasks. Call regularly in
// event loops.
static inline void TBoxSystemTask(void) {
__asm__ volatile (
"ldx #0x0306\n"
"jsl 0xe10000\n"
:
:
: "a", "x", "y", "memory"
);
}
// GetNextEvent — fills the EventRecord pointed at by `theEvent`
// with the next event matching `eventMask`. Returns nonzero if an
// event was returned.
//
// EventRecord layout (16 bytes): what(2) message(4) when(4) where(4)
// modifiers(2).
extern unsigned short TBoxGetNextEvent(unsigned short eventMask, void *theEvent);
// =====================================================================
// Window Manager (Set $0E)
// =====================================================================
// NewWindow — allocate and display a new window. paramList points
// to a NewWindow parameter block (in-bank 16-bit pointer). Returns
// a 32-bit window pointer.
extern void *TBoxNewWindow(const void *paramList);
// CloseWindow — tear down a window. Takes a 32-bit window pointer.
extern void TBoxCloseWindow(void *winPtr);
// =====================================================================
// GS/OS (dispatcher at $E100A8)
// =====================================================================
// Quit — clean program shutdown via GS/OS. pConditionTbl = 0
// (no resume condition). Does not return.
static inline void TBoxQuit(void) {
__asm__ volatile (
"pea 0\n" // pConditionTbl = NULL
"pea 0\n" // pParm
"ldx #0x2029\n" // GS/OS Quit
"jsl 0xe100a8\n" // GS/OS dispatcher (different addr)
"pea 0\n" // pConditionTbl
"pea 0\n" // pParm
"ldx #0x2029\n" // GS/OS Quit
"jsl 0xe100a8\n"
:
:
: "x", "y", "memory"
: "a", "x", "y", "memory"
);
while (1) {} // unreachable
while (1) {} // unreachable
}
// QuickDraw II ----------------------------------------------------
// =====================================================================
// Helpers — direct hardware polling (no toolbox)
// =====================================================================
// QDStartUp / QDShutDown (sketches — real ones take more args).
// Real apps typically use QuickDraw II via the "shell" startup
// sequence; this is for educational/sim scenarios.
static inline void TBoxQDStartUp(void) {
// ReadKey — poll the IIgs keyboard latch at $C000 directly.
// Returns the ASCII byte (0 if no key ready). Strobes $C010 to
// clear the latch. Does NOT use Event Manager — for a real GS
// app, use TBoxGetNextEvent and pull from the queue instead.
static inline char TBoxReadKey(void) {
char r = 0;
__asm__ volatile (
"pea 0\n" "pea 0\n" "pea 0\n" // dummy direct-page handle
"ldx #0x0204\n"
"jsl 0xe10000\n"
"sep #0x20\n" // 8-bit A
"lda 0xc000\n"
"bpl 1f\n"
"sta 0xc010\n" // strobe
"and #0x7f\n"
"bra 2f\n"
"1:\n"
"lda #0\n"
"2:\n"
"rep #0x20\n"
"and #0x00ff\n"
: "=a"(r)
:
:
: "x", "y", "memory"
: "memory"
);
return r;
}
#ifdef __cplusplus

View file

@ -10,9 +10,14 @@
// (strtoimax / strtoumax not implemented — runtime has strtol /
// strtoul for the 32-bit forms which cover the common needs.)
// PRIxN format macros. `int` is 16-bit on W65816, `long` is 32,
// `long long` is 64.
//
// **WARNING — limited printf support.** The runtime's printf /
// snprintf understand the `l` length modifier (long, 32-bit) but
// NOT `ll` (long long, 64-bit). Using PRId64 / PRIu64 / PRIx64
// will compile but the runtime treats the format as a literal
// "%lld" rather than reading 8 bytes off the va_list — wrong output
// AND a stack misalignment for any subsequent args. For 32-bit
// values, PRId32 / PRIu32 / PRIx32 work correctly.
#define PRId8 "d"
#define PRIi8 "i"

View file

@ -19,6 +19,8 @@ double sin (double x);
float sinf (float x);
double cos (double x);
float cosf (float x);
double tan (double x);
float tanf (float x);
double exp (double x);
float expf (float x);
double log (double x);

View file

@ -19,6 +19,8 @@ int snprintf(char *buf, size_t n, const char *fmt, ...);
int vsprintf(char *buf, const char *fmt, va_list ap);
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap);
int fprintf(FILE *stream, const char *fmt, ...);
int vfprintf(FILE *stream, const char *fmt, va_list ap);
void perror(const char *prefix);
int fputc(int c, FILE *stream);
int fputs(const char *s, FILE *stream);
int fflush(FILE *stream);

View file

@ -24,12 +24,13 @@ __start:
rep #0x30
; Disable IIgs peripheral interrupt sources at the chip level —
; SEI alone leaves the hardware lines asserted, and the IRQ trap
; in ROM keeps re-firing if the source isn't quiesced.
; in ROM keeps re-firing if the source isn't quiesced. STZ
; stores zero without going through A; in M=8 it stores 1 byte
; (matching the 8-bit registers), so no LDA #0 prelude is needed.
sep #0x20
.byte 0xa9, 0x00 ; lda #$00 (8-bit M)
sta 0xc041 ; INTEN = 0 (clear AN3/mouse/0.25s/VBL/mouse-IRQ enables)
sta 0xc023 ; VGCINT = 0 (clear external/1-sec/scan-line IRQ enables)
sta 0xc032 ; SCANINT clear
stz 0xc041 ; INTEN = 0 (clear AN3/mouse/0.25s/VBL/mouse-IRQ enables)
stz 0xc023 ; VGCINT = 0 (clear external/1-sec/scan-line IRQ enables)
stz 0xc032 ; SCANINT clear
rep #0x20
; Top-of-stack at $0FFF. Native-mode S is 16-bit, so we don't need
@ -58,20 +59,15 @@ __start:
; Zero BSS. X iterates from __bss_start to __bss_end; each
; iteration writes one byte of zero at addr X (via DP=0 +
; offset 0 — which is just X). Wraps in 8-bit M for the
; byte-store.
; offset 0 — which is just X). STZ in M=8 stores 1 byte and
; doesn't touch A, so we don't need the LDA #0 prelude.
rep #0x10 ; ensure X is 16-bit
ldx #__bss_start
.Lbss_loop:
cpx #__bss_end
bcs .Lbss_done ; X >= end -> done
sep #0x20 ; 8-bit M for 1-byte store
; llvm-mc doesn't track SEP/REP — `lda #$0` after SEP gets
; encoded as a 3-byte 16-bit immediate, so the CPU reads
; `a9 00 00` = LDA #$00 then BRK. Force the 1-byte form
; with raw bytes.
.byte 0xa9, 0x00 ; lda #$00 (8-bit M imm)
sta 0x0, x ; *(uint8_t *)X = 0 (DP=0)
stz 0x0, x ; *(uint8_t *)X = 0 (DP=0)
rep #0x20
inx
bra .Lbss_loop

View file

@ -53,12 +53,14 @@ long atol(const char *s) {
} else if (*s == '+') {
s++;
}
long n = 0;
// Parse magnitude as unsigned to avoid signed-overflow UB (e.g.
// "-2147483648" — the magnitude 2147483648 doesn't fit in long).
unsigned long u = 0;
while (*s >= '0' && *s <= '9') {
n = n * 10 + (*s - '0');
u = u * 10 + (unsigned long)(*s - '0');
s++;
}
return sign < 0 ? -n : n;
return sign < 0 ? (long)(0ul - u) : (long)u;
}

223
runtime/src/iigsToolbox.s Normal file
View file

@ -0,0 +1,223 @@
; iigsToolbox.s — multi-arg toolbox wrappers that can't be done as
; inline asm because the W65816 backend's inline-asm constraints
; can't take memory operands.
;
; C ABI on this target:
; - Arg 0 (i16): in A
; - Arg 0 (i32): low half in A, high half in X
; - Arg N>0 (i16):in stack at (4 + 2*(N-1)), S — args pushed
; rightmost-first, JSL adds 3 bytes of retaddr
; (4,S = arg1 lo)
; - i16 return: A
; - i32 return: A (low) + X (high)
;
; Toolbox calls expect:
; - Args on stack in toolbox order (rightmost pushed first), then
; a result slot of appropriate width pushed BEFORE the args (so
; the result ends up at the highest stack address after pushes).
; - Tool number in X.
; - JSL $E10000.
; - After JSL, pop result then args in reverse.
;
; All wrappers preserve nothing (toolbox clobbers A, X, Y, P).
.text
.globl TBoxNewHandle
.globl TBoxDisposeHandle
.globl TBoxQDStartUp
.globl TBoxMoveTo
.globl TBoxEMStartUp
.globl TBoxGetNextEvent
.globl TBoxNewWindow
.globl TBoxCloseWindow
; =====================================================================
; unsigned long TBoxNewHandle(u32 size, u16 userId, u16 attr, u32 addr)
; Entry: A = size lo, X = size hi
; 4,S = userId, 6,S = attr, 8,S = addr lo, 10,S = addr hi
; Tool layout (push order, leftmost=outermost on stack):
; [result lo][result hi][size lo][size hi][userId][attr][addr lo][addr hi]
; Wait: NewHandle args per Apple GS docs are
; (Long blockSize, Word userId, Word attributes, Long memAttr)
; pushed leftmost-first, so:
; PEA result hi, PEA result lo
; PUSH blockSize hi, PUSH blockSize lo (long, lo first then hi? no — let me check)
;
; Actually GS toolbox push order: each parameter is pushed in
; declaration order, low word first then high word for longs.
; Result space is pushed FIRST (and is read LAST after the pop
; sequence reverses everything). So:
; PEA 0 ; result hi
; PEA 0 ; result lo
; PHA size lo
; PHB? no:
; per https://www.brutaldeluxe.fr/products/crossdevtools/cadius/
; Push order: parameters in order, longs as lo then hi.
; For NewHandle(blockSize=Long, userId=Word, attr=Word, memLoc=Long):
; pea 0 ; result lo
; pea 0 ; result hi
; pha ; blockSize lo
; phx ; blockSize hi (since size hi is in X)
; pha userId
; pha attr
; pha addrLo
; pha addrHi
; ldx #$0902 ; jsl $E10000
; ; result is now on stack: pop hi then lo into A:X return
;
; Note: the IIgs toolbox actually expects result space to be HIGHER
; on stack (pushed first) so that pops in reverse give result last.
; =====================================================================
TBoxNewHandle:
; Stash size lo (in A) and size hi (in X) before we use the
; stack — both must be pushed AFTER the result slot.
sta 0xe0 ; size lo to scratch
stx 0xe2 ; size hi to scratch
; Push 4-byte result space (will be popped at end).
pea 0 ; result lo
pea 0 ; result hi
; Push blockSize: lo first then hi.
lda 0xe0 ; size lo
pha
lda 0xe2 ; size hi
pha
; Push userId (was at 4,S originally; pushes since added: 4 result + 4 size = 8; +4 for JSL retaddr offset baseline)
; Original 4,S; we've pha'd 8 bytes (result+size) on top of retaddr
; So userId is now at 4 + 8 = 12,S.
lda 12, s ; userId
pha
; attr was at 6,S originally; now at 6 + 8 + 2 (one more pha) = 16,S.
lda 16, s ; attr
pha
; addr lo was at 8,S originally; with all our pushes (4 result + 4
; size + 2 user + 2 attr = 12), now at 8 + 12 = 20,S.
lda 20, s ; addr lo
pha
; addr hi was at 10,S originally; +14 = 24,S.
lda 24, s ; addr hi
pha
ldx #0x0902
jsl 0xe10000
; Pop result: hi then lo. Returns u32 in A:X (low in A, hi in X).
pla ; result hi
tax
pla ; result lo → A
rtl
; =====================================================================
; void TBoxDisposeHandle(unsigned long handle)
; Entry: A = handle lo, X = handle hi
; =====================================================================
TBoxDisposeHandle:
pha ; handle lo
phx ; handle hi
ldx #0x1002
jsl 0xe10000
rtl
; =====================================================================
; void TBoxQDStartUp(u16 masterSCB, u16 pageSize, u16 userId)
; Entry: A = masterSCB, 4,S = pageSize, 6,S = userId
; Tool: PEA userId, PEA pageSize, PHA masterSCB, JSL X=$0204
; =====================================================================
TBoxQDStartUp:
sta 0xe0 ; stash masterSCB
lda 6, s ; userId (originally 6,S, no pushes yet)
pha ; userId pushed; subsequent loads need +2
lda 6, s ; pageSize was at 4,S; +2 = 6,S
pha
lda 0xe0 ; masterSCB
pha
ldx #0x0204
jsl 0xe10000
rtl
; =====================================================================
; void TBoxMoveTo(short h, short v)
; Entry: A = h, 4,S = v
; =====================================================================
TBoxMoveTo:
pha ; h
lda 6, s ; v (originally 4,S; +2 after pha)
pha
ldx #0x3A04
jsl 0xe10000
rtl
; =====================================================================
; void TBoxEMStartUp(u16 userId)
; Entry: A = userId
; Default queueSize=0, mouse clamp 0..639 / 0..199
; Tool: PEA queueSize, PEA xMin, PEA xMax, PEA yMin, PEA yMax, PHA userId
; =====================================================================
TBoxEMStartUp:
pea 0 ; queueSize = use default
pea 0 ; xMin
pea 0x27F ; xMax = 639
pea 0 ; yMin
pea 0xC7 ; yMax = 199
pha ; userId (still in A from entry)
ldx #0x0206
jsl 0xe10000
rtl
; =====================================================================
; unsigned short TBoxGetNextEvent(u16 eventMask, void *theEvent)
; Entry: A = eventMask, 4,S = theEvent
; Tool: PHA result(word), PHA eventMask, PHA theEvent, JSL X=$0A06
; =====================================================================
TBoxGetNextEvent:
sta 0xe0 ; stash eventMask
pea 0 ; result space (16-bit)
lda 0xe0 ; eventMask
pha
lda 8, s ; theEvent (originally 4,S; +4 after pea+pha)
pha
ldx #0x0A06
jsl 0xe10000
pla ; result → A
rtl
; =====================================================================
; void *TBoxNewWindow(const void *paramList)
; Entry: A = paramList
; Tool: PEA result hi, PEA result lo, PHA paramList, JSL X=$090E
; Returns 32-bit window ptr in A:X (low in A, hi in X).
; =====================================================================
TBoxNewWindow:
sta 0xe0 ; stash paramList
pea 0 ; result hi
pea 0 ; result lo
lda 0xe0 ; paramList
pha
ldx #0x090E
jsl 0xe10000
pla ; result lo → A
plx ; result hi → X
rtl
; =====================================================================
; void TBoxCloseWindow(void *winPtr)
; Entry: A = winPtr lo, X = winPtr hi
; =====================================================================
TBoxCloseWindow:
pha ; winPtr lo
phx ; winPtr hi
ldx #0x0B0E
jsl 0xe10000
rtl

View file

@ -133,15 +133,17 @@ long labs(long n) { return n < 0 ? -n : n; }
int atoi(const char *s) {
int sign = 1;
int n = 0;
while (isspace(*s)) s++;
if (*s == '-') { sign = -1; s++; }
else if (*s == '+') { s++; }
// Parse magnitude as unsigned to dodge signed-overflow UB on
// values like "32768" (parsing INT_MAX+1 as signed int).
unsigned int u = 0;
while (isdigit(*s)) {
n = n * 10 + (*s - '0');
u = u * 10 + (unsigned int)(*s - '0');
s++;
}
return sign * n;
return sign < 0 ? (int)(0u - u) : (int)u;
}
@ -197,7 +199,10 @@ static void writeUDec(unsigned int n) {
}
static void writeDec(int n) {
if (n < 0) { putchar('-'); writeUDec((unsigned int)(-n)); }
// For INT_MIN, `-n` overflows signed int (UB). Negate as unsigned
// — well-defined (two's-complement wrap), and the magnitude is
// identical for the print path.
if (n < 0) { putchar('-'); writeUDec((unsigned int)(0u - (unsigned int)n)); }
else writeUDec((unsigned int)n);
}
@ -211,10 +216,14 @@ static void writeULong(unsigned long n) {
static void writeHex(unsigned int n, int width) {
static const char digits[] = "0123456789abcdef";
char buf[5];
// unsigned int is 16-bit on this target -> at most 4 hex digits.
// Cap width to that; without it `printf("%08x", ...)` blew past
// the buf[] tail and corrupted the stack.
char buf[4];
if (width > 4) width = 4;
int i = 0;
if (n == 0) { buf[i++] = '0'; }
while (n > 0) { buf[i++] = digits[n & 0xF]; n >>= 4; }
while (n > 0 && i < 4) { buf[i++] = digits[n & 0xF]; n >>= 4; }
while (i < width) buf[i++] = '0';
while (i > 0) putchar(buf[--i]);
}
@ -229,7 +238,8 @@ static void writeStr(const char *s) {
// reliably promotes Bxx to BRL when needed, so the inliner is free to
// merge them when it wants.
static void writeSignedLong(long n) {
if (n < 0) { putchar('-'); writeULong((unsigned long)(-n)); }
// See writeDec: avoid the signed-overflow UB on LONG_MIN.
if (n < 0) { putchar('-'); writeULong(0ul - (unsigned long)n); }
else writeULong((unsigned long)n);
}
@ -242,7 +252,17 @@ static void writeSignedLong(long n) {
static void writeDouble(double v, int prec) {
if (prec < 0) prec = 6;
if (prec > 9) prec = 9;
if (v < 0) { putchar('-'); v = -v; }
// Test the IEEE-754 sign bit (so -0.0 prints with the sign per
// C99) and avoid the soft-float __ltdf2 comparison, which has
// historically miscompiled for negative inputs (see snprintf.c
// banner for the same workaround).
unsigned long long vbits;
__builtin_memcpy(&vbits, &v, 8);
if (vbits & ((unsigned long long)1 << 63)) {
putchar('-');
vbits &= ~((unsigned long long)1 << 63);
__builtin_memcpy(&v, &vbits, 8);
}
long ipart = (long)v;
writeULong((unsigned long)ipart);
if (prec == 0) return;
@ -398,6 +418,12 @@ static void mallocInitOnce(void) {
void *malloc(size_t n) {
mallocInitOnce();
if (n == 0) n = 1;
// Overflow guard: size_t is 16-bit on this target. Without this,
// malloc(65535) rounds up to 65536 -> wraps to 0 -> allocates 2
// bytes (wrong size); even shorter values can wrap the bumpPtr
// sum below. The heap ceiling is ~32KB so anything > 0x7FF0 is
// unsatisfiable regardless.
if (n > (size_t)0x7FF0) return (void *)0;
n = (n + 1) & ~(size_t)1; // round up to 2 bytes
if (n < FREE_NODE_SZ - HDR_SZ)
n = FREE_NODE_SZ - HDR_SZ; // ensure freed block can hold next-ptr
@ -435,38 +461,57 @@ void free(void *p) {
FreeBlk *blk = (FreeBlk *)((char *)p - HDR_SZ);
blk->next = freeList;
freeList = blk;
// Coalesce: walk the free list and merge adjacent blocks. O(n^2)
// in the worst case but n is small in practice.
FreeBlk *a = freeList;
// Coalesce: walk the free list and merge adjacent blocks. Outer
// loop tracks a's predecessor (a_link) so we can excise `a` when
// it gets absorbed into a lower-address neighbour. Without that,
// an `aEnd == b` from b's perspective (i.e. b precedes a in
// memory) would extend b but leave a in the list — a future malloc
// could then hand out a's range as a "free" block while the
// expanded b overlaps it. O(n^2) in the worst case; n is small.
FreeBlk **a_link = &freeList;
FreeBlk *a = freeList;
while (a) {
int a_absorbed = 0;
FreeBlk **link = &a->next;
FreeBlk *b = a->next;
while (b) {
char *aEnd = (char *)a + HDR_SZ + a->size;
char *bEnd = (char *)b + HDR_SZ + b->size;
if (aEnd == (char *)b) {
// a immediately precedes b — extend a, drop b.
a->size += HDR_SZ + b->size;
*link = b->next;
b = *link;
continue;
}
if (bEnd == (char *)a) {
// b immediately precedes a — extend b, drop a from
// the outer list. We can't continue the inner walk
// (a is gone), so break out and let the outer loop
// restart from a's successor.
b->size += HDR_SZ + a->size;
// Remove `a` from the list (a is freeList head if first).
// Simpler: relink b in place of a, but a is at top.
// For correctness, just skip — coalesce on next pass.
link = &b->next;
b = b->next;
continue;
*a_link = a->next;
a_absorbed = 1;
break;
}
link = &b->next;
b = b->next;
}
a = a->next;
if (a_absorbed) {
a = *a_link; // already advanced by the excise
} else {
a_link = &a->next;
a = a->next;
}
}
}
void *calloc(size_t nmemb, size_t size) {
// size_t is 16-bit on this target; nmemb*size can overflow and
// wrap to a small value (e.g. calloc(65536, 1) -> 0 -> 2-byte
// alloc), then the caller writes way past the returned region.
// Bail when the multiplication would overflow.
if (size != 0 && nmemb > (size_t)0xFFFF / size) return (void *)0;
size_t total = nmemb * size;
void *p = malloc(total);
if (p) memset(p, 0, total);
@ -485,14 +530,25 @@ void *realloc(void *ptr, size_t n) {
return q;
}
// ---- exit ----
// ---- atexit / exit ----
//
// Standard exit() halts via BRK. Programs running under the IIgs
// runtime typically would call back into GS/OS Quit; here we just
// wedge the CPU.
// Standard exit() halts via BRK after running any registered atexit
// handler. Programs running under the IIgs runtime typically would
// call back into GS/OS Quit; here we just wedge the CPU. Single-slot
// atexit (the storage and registration function are below).
typedef void (*AtexitFn)(void);
static AtexitFn __atexitFn = (AtexitFn)0;
void exit(int code) {
(void)code;
// C99 7.20.4.3: exit() must invoke registered atexit handlers in
// reverse-registration order before terminating.
if (__atexitFn) {
AtexitFn fn = __atexitFn;
__atexitFn = (AtexitFn)0; // prevent re-entry if fn calls exit
fn();
}
// BRK $00 — halts a 65816 in BRK, MAME's debugger catches.
__asm__ volatile (".byte 0x00, 0x00");
while (1) {} // unreachable
@ -522,14 +578,38 @@ char *strerror(int err) {
}
}
// perror — write `prefix: errno-string\n` to stderr. Common pattern in
// portable programs that report I/O failures.
void perror(const char *prefix) {
if (prefix && *prefix) {
const char *p = prefix;
while (*p) { putchar(*p); p++; }
putchar(':');
putchar(' ');
}
const char *m = strerror(errno);
while (*m) { putchar(*m); m++; }
putchar('\n');
}
// ---- time.h ----
//
// W65816/IIgs has no standard clock from C's perspective. Provide
// stubs that return 0 / -1 so code that calls time() at least links.
// A real implementation would call ReadTimeHex (GS/OS toolbox) or
// poll the IIgs real-time clock.
// time() and clock() are stubs returning 0. A real implementation
// could either:
// - Use ReadTimeHex (Misc Tool $0D03) — but this requires the GS
// Tool Locator to be initialised (TLStartUp from iigs/toolbox.h)
// in the crt0, otherwise the JSL $E10000 dispatcher reads
// uninitialised state and crashes. Smoke verified that the
// direct toolbox call segfaults MAME without prior init.
// - Use the IIgs vertical-blank counter at $00/E1/006B (24-bit
// address, needs long-pointer access via inline asm — the C
// pointer type is 16-bit on this target, so a literal 0xE1006B
// silently truncates to $006B in zero page).
//
// We leave both as stubs until the runtime has a Tool-Locator-
// init crt0 path or proper 24-bit far-pointer support.
typedef long time_t;
typedef long time_t;
typedef unsigned long clock_t;
time_t time(time_t *t) {
@ -559,7 +639,14 @@ FILE *stdout = &__stdout_obj;
FILE *stderr = &__stderr_obj;
int fputc(int c, FILE *stream) { (void)stream; return putchar(c); }
int fputs(const char *s, FILE *stream) { (void)stream; return puts(s); }
// fputs writes the string WITHOUT appending a newline (puts does append).
// Forwarding to puts() was a real bug — `fputs("hi", stdout)` was
// printing "hi\n" instead of "hi".
int fputs(const char *s, FILE *stream) {
(void)stream;
while (*s) { putchar(*s); s++; }
return 0;
}
int fflush(FILE *stream) { (void)stream; return 0; }
int fclose(FILE *stream) { (void)stream; return 0; }
@ -572,6 +659,11 @@ int fprintf(FILE *stream, const char *fmt, ...) {
return r;
}
int vfprintf(FILE *stream, const char *fmt, va_list ap) {
(void)stream;
return vprintf(fmt, ap);
}
// ---- assert ----
//
// __assert_fail is what most assert() macros call. Print a message
@ -589,9 +681,7 @@ void abort(void) {
exit(127);
}
// ---- atexit (stub — single slot) ----
typedef void (*AtexitFn)(void);
static AtexitFn __atexitFn = (AtexitFn)0;
// ---- atexit (single slot; storage + exit() invocation above) ----
int atexit(AtexitFn fn) {
if (__atexitFn) return -1;
__atexitFn = fn;
@ -618,7 +708,20 @@ size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream) {
}
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
(void)ptr; (void)size; (void)nmemb; (void)stream;
// For stdout/stderr, route through putchar so programs that use
// fwrite for binary output ("write %d bytes to stdout") actually
// produce output instead of silently dropping it. For other
// streams (real file handles), still a stub returning 0.
if (stream == stdout || stream == stderr) {
// size * nmemb can overflow size_t (16-bit on this target);
// bail rather than silently truncate the byte count.
if (size != 0 && nmemb > (size_t)0xFFFF / size) return 0;
const u8 *p = (const u8 *)ptr;
size_t total = size * nmemb;
for (size_t i = 0; i < total; i++) putchar(p[i]);
return nmemb;
}
(void)ptr; (void)size; (void)nmemb;
return 0;
}

View file

@ -179,8 +179,7 @@ __divhi3:
jsr __divmod_setup
jsr __udivmod_core
; Quotient is in $ea. Negate if bit 1 of $ee is set.
lda 0xea
pha
pei 0xea
lda 0xee
and #0x2
beq .Ldiv_pos
@ -199,8 +198,7 @@ __modhi3:
jsr __udivmod_core
; Remainder is in $ec. Negate if bit 0 of $ee is set (dividend
; was negative).
lda 0xec
pha
pei 0xec
lda 0xee
and #0x1
beq .Lmod_pos
@ -1131,10 +1129,9 @@ __negdi_b:
; setjmp returned 0 with all-callee-savable regs already preserved by
; setjmp's caller.
; --------------------------------------------------------------------
; NOTE: llvm-mc misencodes `sta (dp), y` and `lda (dp), y` as the
; absolute-,Y opcodes (0x99 / 0xb9) instead of the DP-indirect-Y
; opcodes (0x91 / 0xb1). Use raw `.byte` for those. Y is supplied
; via LDY before each indirect access.
; setjmp / longjmp use the (dp),y indirect mode (opcodes 0x91/0xb1)
; to write through the jmp_buf pointer in $E0. Y is set explicitly
; before each indirect access; M=0 except where noted.
.globl setjmp
setjmp:
sta 0xe0 ; jmp_buf addr -> DP scratch

View file

@ -142,11 +142,13 @@ float fmodf(float x, float y) {
double sqrt(double x) {
uint64_t b;
__builtin_memcpy(&b, &x, sizeof(b));
if (b & ((uint64_t)1 << 63)) {
return 0.0 / 0.0; // NaN for negatives (well, -0.0 returns 0)
// Check zero first (positive or negative) — IEEE-754 says
// sqrt(+0)=+0 and sqrt(-0)=-0; both lower 63 bits are zero.
if ((b & ~((uint64_t)1 << 63)) == 0) {
return x;
}
if (b == 0) {
return 0.0;
if (b & ((uint64_t)1 << 63)) {
return 0.0 / 0.0; // NaN for negatives
}
// Initial guess: halve the exponent. IEEE-754 trick gives a
// surprisingly good starting point — within 2x of the true value.
@ -188,12 +190,16 @@ double pow(double x, double y) {
return 0.0; // non-integer, non-0.5 y not supported yet
}
// y is a whole number; convert via __fixdfsi. Range -32768..32767
// covers any practical exponent.
int n = (int)yi;
// covers any practical exponent. Use unsigned for the magnitude
// to avoid signed-overflow UB on INT_MIN.
int sn = (int)yi;
int neg = 0;
if (n < 0) {
unsigned int n;
if (sn < 0) {
neg = 1;
n = -n;
n = 0u - (unsigned int)sn;
} else {
n = (unsigned int)sn;
}
double r = 1.0;
double base = x;
@ -268,6 +274,15 @@ double cos(double x) {
}
// tan(x) = sin(x) / cos(x). No special handling for poles at pi/2
// + n*pi (where cos(x) == 0): the soft-double divide returns +/-Inf,
// which is the IEEE-754-correct answer. Accuracy follows sin/cos
// (~1e-6) but degrades fast as |x| approaches a pole.
double tan(double x) {
return sin(x) / cos(x);
}
float sinf(float x) {
return (float)sin((double)x);
}
@ -278,6 +293,11 @@ float cosf(float x) {
}
float tanf(float x) {
return (float)tan((double)x);
}
// exp via 2^k * e^r where x = k*ln2 + r, |r| < ln2/2. Then Taylor
// series for e^r converges in ~10 terms. k * 2 multiplication uses
// the IEEE-754 layout (add k to exponent field).
@ -321,8 +341,13 @@ float expf(float x) {
double log(double x) {
uint64_t b;
__builtin_memcpy(&b, &x, sizeof(b));
if (b == 0 || (b & ((uint64_t)1 << 63))) {
return 0.0 / 0.0; // log(0) = -inf, log(neg) = NaN; return NaN
// log(±0) = -Infinity (pole error). Mask off the sign bit when
// testing for zero so -0.0 lands here instead of the negative path.
if ((b & ~((uint64_t)1 << 63)) == 0) {
return -1.0 / 0.0;
}
if (b & ((uint64_t)1 << 63)) {
return 0.0 / 0.0; // log(negative) = NaN (domain error)
}
int e = (int)((b >> 52) & 0x7FF) - 1023;
// Force the exponent field to 1023 so m lands in [1, 2).

View file

@ -2,11 +2,11 @@
// and the byte-swap inner loop don't perturb other libc code.
//
// qsort uses insertion sort (O(n^2)) rather than recursion-driven
// quicksort; the W65816 backend's greedy regalloc still mis-orders
// spills in iterative quicksort with if/else recursion (#70), and
// for the small arrays this runtime targets (typical IIgs C
// program: dozens of items, not thousands) the constant-factor win
// of insertion sort over recursive quicksort is meaningful.
// quicksort. Originally chosen because the W65816 greedy regalloc
// mis-ordered spills in iterative quicksort (#70 — since fixed by a
// W65816StackSlotCleanup safety check), but kept because the typical
// IIgs C program sorts dozens of items, not thousands, and the
// constant-factor win of insertion sort dominates at that scale.
typedef unsigned int size_t;
typedef int (*CmpFnT)(const void *, const void *);

View file

@ -92,9 +92,10 @@ static void emitUDec(unsigned int n) {
__attribute__((noinline))
static void emitDec(int n) {
// -n on INT_MIN is signed-overflow UB; negate as unsigned.
if (n < 0) {
emit('-');
emitUDec((unsigned int)(-n));
emitUDec(0u - (unsigned int)n);
} else {
emitUDec((unsigned int)n);
}
@ -123,9 +124,10 @@ static void emitULong(unsigned long n) {
__attribute__((noinline))
static void emitSignedLong(long n) {
// See emitDec: avoid the signed-overflow UB on LONG_MIN.
if (n < 0) {
emit('-');
emitULong((unsigned long)(-n));
emitULong(0ul - (unsigned long)n);
} else {
emitULong((unsigned long)n);
}
@ -135,12 +137,16 @@ static void emitSignedLong(long n) {
__attribute__((noinline))
static void emitHex(unsigned int n, int width) {
static const char digits[] = "0123456789abcdef";
char buf[5];
// unsigned int is 16-bit on this target -> at most 4 hex digits.
// Cap width to that; without it `snprintf("%08x", ...)` blew past
// the buf[] tail and corrupted the stack.
char buf[4];
if (width > 4) width = 4;
int i = 0;
if (n == 0) {
buf[i++] = '0';
}
while (n > 0) {
while (n > 0 && i < 4) {
buf[i++] = digits[n & 0xF];
n >>= 4;
}
@ -278,6 +284,11 @@ static int format(const char *fmt, va_list ap) {
if (gCur < gEnd) {
*gCur = '\0';
} else if (gEnd > (char *)0) {
// Truncated, but n > 0: overwrite the last byte with NUL so
// the result is a valid C string. snprintf with n=0 sets
// gEnd = NULL up front so this branch correctly skips —
// previously it wrote `gEnd[-1]` to `buf[-1]`, clobbering
// memory before the buffer.
gEnd[-1] = '\0';
}
return (int)gTotal;
@ -286,7 +297,10 @@ static int format(const char *fmt, va_list ap) {
int snprintf(char *buf, size_t n, const char *fmt, ...) {
gCur = buf;
gEnd = buf + (n ? n : 0);
// n == 0 must NOT touch the buffer (C99 7.19.6.5). Setting
// gEnd = NULL here makes both `gCur < gEnd` and `gEnd > 0`
// false, so no NUL terminator gets written.
gEnd = n ? buf + n : (char *)0;
gTotal = 0;
va_list ap;
va_start(ap, fmt);
@ -315,7 +329,7 @@ int sprintf(char *buf, const char *fmt, ...) {
int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap) {
gCur = buf;
gEnd = buf + (n ? n : 0);
gEnd = n ? buf + n : (char *)0;
gTotal = 0;
return format(fmt, ap);
}

View file

@ -43,11 +43,12 @@ __attribute__((noinline)) static u64 dpack(u64 sign, s16 exp, u64 mant) {
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
// Inlinable on purpose — out_sign/out_exp/out_mant point at caller
// stack locals; if dclass were noinline the writes would lower to
// `sta (d,s),y` which uses DBR for the bank, silently corrupting
// data when the caller has switched DBR. Caught by smoke's
// dmul-after-bank-switch test (#dmul-bank-switch).
// noinline reduces register pressure in __muldf3/__divdf3/__adddf3
// — without it, greedy regalloc runs out of registers in __muldf3
// at -O2. Now safe because pointer-arg writes lower to STBptr/STAptr
// which use [$E0],Y indirect-long with the bank byte forced to 0
// (DBR-independent). See `feedback_dbr_ptr_deref_spill.md`.
__attribute__((noinline))
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
*out_sign = x & DSIGN_BIT;
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);

View file

@ -1,91 +0,0 @@
; Stub double-precision soft-float — every routine returns 0.
;
; The C-based softDouble.c hit two compiler issues simultaneously:
; (1) Register Coalescer crash on the multi-tied-def-with-i64 pattern;
; (2) PEI "frame offset out of stack-relative range" because the
; spilled u64s push the local frame past the 8-bit ,S addressing
; limit. Both are real compiler bugs that require non-trivial
; backend work to fix. Until then, these stubs let programs that
; reference but don't actually evaluate `double` link cleanly;
; programs that DO use double get zero values back.
;
; Symbol set matches what clang's i64-routed double libcalls expect.
; ABI: i64 result returned via A:X:Y:DP[$F0] (matches LowerReturn).
.text
; Helper macro idiom: stub returning 64-bit zero.
.macro RET_ZERO64
lda #0
tax
tay
sta 0xf0
rtl
.endm
.globl __adddf3
__adddf3: RET_ZERO64
.globl __subdf3
__subdf3: RET_ZERO64
.globl __muldf3
__muldf3: RET_ZERO64
.globl __divdf3
__divdf3: RET_ZERO64
.globl __negdf2
__negdf2: RET_ZERO64
.globl __cmpdf2
__cmpdf2: lda #0
rtl
.globl __eqdf2
__eqdf2: lda #0
rtl
.globl __nedf2
__nedf2: lda #0
rtl
.globl __ltdf2
__ltdf2: lda #0
rtl
.globl __gtdf2
__gtdf2: lda #0
rtl
.globl __ledf2
__ledf2: lda #0
rtl
.globl __gedf2
__gedf2: lda #0
rtl
.globl __floatsidf
__floatsidf: RET_ZERO64
.globl __floatunsidf
__floatunsidf: RET_ZERO64
.globl __fixdfsi
__fixdfsi: lda #0
tax
rtl
.globl __fixunsdfsi
__fixunsdfsi: lda #0
tax
rtl
.globl __extendsfdf2
__extendsfdf2: RET_ZERO64
.globl __truncdfsf2
__truncdfsf2: lda #0
tax
rtl

View file

@ -40,7 +40,8 @@ unsigned long strtoul(const char *nptr, char **endptr, int base) {
s++;
}
if (endptr) *endptr = (char *)(saw_digit ? s : nptr);
return neg ? (unsigned long)-(long)n : n;
// Negate in unsigned arithmetic to avoid signed-overflow UB.
return neg ? (0ul - n) : n;
}
long strtol(const char *nptr, char **endptr, int base) {
@ -55,5 +56,7 @@ long strtol(const char *nptr, char **endptr, int base) {
return 0;
}
if (endptr) *endptr = ep;
return neg ? -(long)n : (long)n;
// Negate as unsigned to avoid signed-overflow UB on LONG_MIN
// ("-2147483648" — the magnitude doesn't fit in long).
return neg ? (long)(0ul - n) : (long)n;
}

View file

@ -63,7 +63,17 @@ emu.register_frame_done(function()
-- apple2gs CPU model doesn't honor a Lua-side PB!=0 set.
-- The user's code can switch DBR to bank 2+ for safe data
-- writes (bank 2 is clear of IIgs ROM IRQ scribbling).
for i = 1, #data do mem:write_u8(0x001000 + i - 1, data:byte(i)) end
-- Skip writes that would land in the IIgs IO window
-- (\$C000-\$CFFF). link816 may pad this range with zeros
-- when rodata auto-skips it, and writing zeros into soft
-- switches could clobber IO state (e.g., the LC1 RAM enable
-- that crt0 sets up).
for i = 1, #data do
local addr = 0x001000 + i - 1
if not (addr >= 0x00C000 and addr < 0x00D000) then
mem:write_u8(addr, data:byte(i))
end
end
loaded = true
cpu.state["PC"].value = 0x1000
cpu.state["PB"].value = 0x00

View file

@ -294,11 +294,14 @@ EOF
fi
fi
# 11a. SETCC via clang: a > b returns 0/1. Exercises the multi-branch
# CC path (BEQ + BPL diamond, since SETGT can't be a single Bxx).
# 11a. SETCC via clang: a > b returns 0/1. Signed compares now go
# through the EOR-with-sign-bit transform: each operand XORs $8000
# to convert signed-int ordering to unsigned-int ordering, then
# uses BCC/BCS — avoids BMI/BPL's V-flag-overflow bug for values
# near INT16_MIN/MAX.
CLANG="$BUILD_DIR/bin/clang"
if [ -x "$CLANG" ]; then
log "check: clang compiles a > b via multi-branch SETCC"
log "check: clang compiles a > b via EOR-sign-bit + unsigned compare"
cFile="$(mktemp --suffix=.c)"
sCmpFile="$(mktemp --suffix=.s)"
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile"' EXIT
@ -306,18 +309,20 @@ if [ -x "$CLANG" ]; then
int gt(int a, int b) { return a > b; }
EOF
"$CLANG" --target=w65816 -O2 -S "$cFile" -o "$sCmpFile"
# Expect a stack-relative CMP (offset depends on current spill
# behaviour — fast regalloc adds 2 PHA prologue bytes vs greedy
# which had no frame; either is acceptable as long as we cmp
# against b through a stack-relative slot), then BEQ + BPL forming
# the multi-branch diamond.
for expect in "lda #0x1" "beq" "bpl" "lda #0x0"; do
# Expect: EOR #$8000 on each operand, CMP, then BCC/BCS on the
# carry from the unsigned compare. The 0/1 result is materialised
# via lda #0/lda #1 in the diamond.
for expect in "eor #0x8000" "lda #0x1" "lda #0x0"; do
if ! grep -qF "$expect" "$sCmpFile"; then
warn "setcc gt test missing: $expect"
cat "$sCmpFile" >&2
die "setcc gt test failed"
fi
done
if ! grep -qE '^\s*(bcc|bcs)\b' "$sCmpFile"; then
cat "$sCmpFile" >&2
die "setcc gt test missing: bcc/bcs (carry-based unsigned branch)"
fi
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+,\s*s\s*$' "$sCmpFile"; then
cat "$sCmpFile" >&2
die "setcc gt test missing: cmp <off>,s (stack-relative compare to arg b)"
@ -411,24 +416,38 @@ EOF
fi
fi
# 11f. Pointer deref: *p loads via stack-relative-indirect-Y.
# 11f. Pointer deref: *p uses [dp],Y indirect-long (`LDA [$E0],Y`)
# which is DBR-independent. The previous lowering used (slot,S),Y
# indirect which silently wrote to DBR's bank — a real miscompile
# when the caller had switched DBR via `pha;plb`. The new lowering
# stages the pointer in DP scratch $E0..$E2 with the bank byte
# forced to 0, then loads/stores via [dp],Y — always bank 0.
# Const-int pointers (MMIO style) keep DBR-relative addressing via
# STAabs (separate TableGen pattern).
if [ -x "$CLANG" ]; then
log "check: clang compiles *p via LDA (slot,s),y"
log "check: clang compiles *p via [dp],Y indirect-long (DBR-independent)"
cFile6="$(mktemp --suffix=.c)"
sPtrFile="$(mktemp --suffix=.s)"
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile" "$cFile2" "$sSelFile" "$cFile3" "$sChainFile" "$cFile4" "$sMulFile" "$cFile5" "$sShfFile" "$cFile6" "$sPtrFile"' EXIT
oPtrFile="$(mktemp --suffix=.o)"
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$sCmpFile" "$cFile2" "$sSelFile" "$cFile3" "$sChainFile" "$cFile4" "$sMulFile" "$cFile5" "$sShfFile" "$cFile6" "$sPtrFile" "$oPtrFile"' EXIT
cat > "$cFile6" <<'EOF'
int load_ptr(const int *p) { return *p; }
void store_ptr(int *p, int v) { *p = v; }
EOF
"$CLANG" --target=w65816 -O2 -S "$cFile6" -o "$sPtrFile"
for expect in "ldy #0x0" "lda (0x" "sta (0x"; do
if ! grep -qF "$expect" "$sPtrFile"; then
warn "ptr-deref test missing: $expect"
cat "$sPtrFile" >&2
die "ptr-deref test failed"
fi
done
"$CLANG" --target=w65816 -O2 -c "$cFile6" -o "$oPtrFile"
# LDA [dp],Y = 0xB7; STA [dp],Y = 0x97 (followed by the dp byte 0xE0).
if ! "$OBJDUMP" --triple=w65816 -d "$oPtrFile" 2>/dev/null \
| grep -qE '\b97 e0\b'; then
warn "ptr-deref test: STA [dp],Y (0x97 0xE0) missing in store_ptr"
"$OBJDUMP" --triple=w65816 -d "$oPtrFile" >&2
die "ptr-deref test failed (STA [dp],Y expected)"
fi
if ! "$OBJDUMP" --triple=w65816 -d "$oPtrFile" 2>/dev/null \
| grep -qE '\bb7 e0\b'; then
warn "ptr-deref test: LDA [dp],Y (0xB7 0xE0) missing in load_ptr"
"$OBJDUMP" --triple=w65816 -d "$oPtrFile" >&2
die "ptr-deref test failed (LDA [dp],Y expected)"
fi
fi
# 11g. i8 store via pointer: *p = v wraps the STA in SEP/REP so only
@ -444,10 +463,11 @@ void storeb(unsigned char *p, unsigned char v) { *p = v; }
unsigned char incb(unsigned char *p) { return ++*p; }
EOF
"$CLANG" --target=w65816 -O2 -S "$cFile7" -o "$sBptrFile"
# storeb body should contain SEP #$20 ... STA (slot,s),y ... REP #$20.
# storeb body should contain SEP #$20 ... STA [$E0],Y ... REP #$20.
# The STA uses [dp],Y indirect-long addressing (DBR-independent).
if ! grep -qF "sep #0x20" "$sBptrFile" \
|| ! grep -qF "rep #0x20" "$sBptrFile" \
|| ! grep -qE 'sta \(0x[0-9a-f]+, s\), y' "$sBptrFile"; then
|| ! grep -qE 'sta \[0xe0\b' "$sBptrFile"; then
cat "$sBptrFile" >&2
die "i8 ptr-store test missing SEP/STA/REP sequence"
fi
@ -1125,8 +1145,12 @@ EOF
"$CLANG" --target=w65816 -O2 -c "$cLinkFile" -o "$oLinkFile"
"$BUILD_DIR/bin/llvm-mc" -arch=w65816 -filetype=obj \
"$PROJECT_ROOT/runtime/src/libgcc.s" -o "$oLibgccFile"
# No main in this test (it's just a library object); use
# --no-gc-sections so the linker keeps `mul` and the libgcc
# __mulhi3 it references. With gc-sections (the default),
# there's no live root and everything would drop.
"$PROJECT_ROOT/tools/link816" -o "$binLinkFile" \
--text-base 0x8000 --map "$mapLinkFile" \
--text-base 0x8000 --map "$mapLinkFile" --no-gc-sections \
"$oLinkFile" "$oLibgccFile" 2>/dev/null
if [ ! -s "$binLinkFile" ]; then
die "link816 produced empty/missing binary"
@ -1176,8 +1200,10 @@ EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cFltFile" -o "$oFltFile"
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfFile"
# No main here either (test compiles a .o-only "soft-float lib" link).
# --no-gc-sections so all soft-float symbols stay.
"$PROJECT_ROOT/tools/link816" -o "$binFltFile" \
--text-base 0x8000 --map "$mapFltFile" \
--text-base 0x8000 --map "$mapFltFile" --no-gc-sections \
"$oFltFile" "$oSfFile" "$oLibgccFile" 2>/dev/null
if [ ! -s "$binFltFile" ]; then
die "soft-float runtime failed to link"
@ -1214,10 +1240,10 @@ int toInt(double x) { return (int)x; }
double fromInt(int n) { return (double)n; }
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
"$CLANG" --target=w65816 -O1 -ffunction-sections \
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
--text-base 0x8000 --map "$mapDblFile" \
--text-base 0x8000 --map "$mapDblFile" --no-gc-sections \
"$oDblFile" "$oSdFile" "$oLibgccFile" 2>/dev/null
if [ ! -s "$binDblFile" ]; then
die "soft-double runtime failed to link"
@ -1411,7 +1437,7 @@ int main(void) {
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblMame" -o "$oDblMame"
"$CLANG" --target=w65816 -O1 -ffunction-sections \
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdMame"
"$PROJECT_ROOT/tools/link816" -o "$binDblMame" \
--text-base 0x1000 \
@ -1550,7 +1576,7 @@ EOF
-c "$PROJECT_ROOT/runtime/src/math.c" -o "$oMathF"
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF"
"$CLANG" --target=w65816 -O1 -ffunction-sections \
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdF"
oCrt0F="$(mktemp --suffix=.o)"
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc" -arch=w65816 \
@ -2294,6 +2320,15 @@ int main(void) {
if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
r = sprintf(buf, "[%c%c%%]", 'A', 'B');
if (r == 5 && eq(buf, "[AB%]")) ok |= 0x20;
/* C99: snprintf(buf, 0, ...) must NOT touch buf and must return
the would-be-written length. Sentinel-fill the buffer and
verify the byte just BEFORE buf survives — earlier bug wrote
a NUL at gEnd[-1] = buf[-1] when n=0. */
char guard[8];
for (int i = 0; i < 8; i++) guard[i] = (char)0xCC;
r = snprintf(&guard[2], 0, "x");
if (r == 1 && guard[1] == (char)0xCC && guard[2] == (char)0xCC)
ok |= 0x40;
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
@ -2305,8 +2340,8 @@ EOF
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oSfF" "$oSdF" \
"$oLibgccFile" "$oSpFile" >/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binSpFile" --check \
0x025000=003f >/dev/null 2>&1; then
die "MAME: sprintf/snprintf format-coverage bitmap != 0x3f"
0x025000=007f >/dev/null 2>&1; then
die "MAME: sprintf/snprintf format-coverage bitmap != 0x7f (snprintf n=0 buffer-write regression?)"
fi
rm -f "$cSpFile" "$oSpFile" "$binSpFile"
@ -2454,7 +2489,7 @@ EOF
fi
rm -f "$cRdFile" "$oRdFile" "$binRdFile"
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)"
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh + tan (#85)"
cTr2File="$(mktemp --suffix=.c)"
oTr2File="$(mktemp --suffix=.o)"
binTr2File="$(mktemp --suffix=.bin)"
@ -2465,6 +2500,7 @@ extern double acos(double);
extern double sinh(double);
extern double cosh(double);
extern double tanh(double);
extern double tan(double);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
@ -2481,6 +2517,7 @@ int main(void) {
if (dApprox(tanh(0.0), 0.0, 0.001)) ok |= 0x08;
if (dApprox(asin(0.5), 0.5235987755, 0.001)) ok |= 0x10;
if (dApprox(acos(1.0), 0.0, 0.001)) ok |= 0x20;
if (dApprox(tan(0.7853981633), 1.0, 0.001)) ok |= 0x40;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
@ -2493,8 +2530,8 @@ EOF
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oLibgccFile" "$oTr2File" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binTr2File" --check \
0x025000=003f >/dev/null 2>&1; then
die "MAME: extended math (atan/asin/acos/sinh/cosh/tanh) bitmap != 0x3f"
0x025000=007f >/dev/null 2>&1; then
die "MAME: extended math (atan/asin/acos/sinh/cosh/tanh/tan) bitmap != 0x7f"
fi
rm -f "$cTr2File" "$oTr2File" "$binTr2File"
@ -2584,6 +2621,118 @@ EOF
fi
rm -f "$cHtFile" "$oHtFile" "$binHtFile"
# Regression: free() coalescing must remove blocks absorbed
# into a lower-address neighbour from the free list. Old code
# extended the lower block but left the absorbed entry in
# Signed compare of values near INT16_MIN/MAX: BMI/BPL alone
# are not V-flag-aware, so the W65816 backend now applies an
# EOR-with-sign-bit transform (a < b signed iff a^$8000 <
# b^$8000 unsigned). Verify INT16_MIN < INT16_MAX, INT16_MIN
# < 1, INT16_MIN < 0, etc. all return the right boolean —
# the pre-transform code returned false for INT16_MIN < 1
# because (-32768 - 1) overflowed to +32767, leaving N=0.
log "check: MAME signed compare near INT16_MIN works (V-flag fix)"
cSignedFile="$(mktemp --suffix=.c)"
oSignedFile="$(mktemp --suffix=.o)"
binSignedFile="$(mktemp --suffix=.bin)"
cat > "$cSignedFile" <<'EOF'
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
__attribute__((noinline)) static int slt(int a, int b) { return a < b; }
__attribute__((noinline)) static int sgt(int a, int b) { return a > b; }
__attribute__((noinline)) static int sle(int a, int b) { return a <= b; }
__attribute__((noinline)) static int sge(int a, int b) { return a >= b; }
int main(void) {
unsigned short ok = 0;
// INT16_MIN < 1: true. Pre-fix bug returned false.
if (slt(-32768, 1)) ok |= 0x01;
// INT16_MIN < INT16_MAX: true.
if (slt(-32768, 32767)) ok |= 0x02;
// INT16_MAX > INT16_MIN: true.
if (sgt(32767, -32768)) ok |= 0x04;
// INT16_MIN <= -32768: true.
if (sle(-32768, -32768)) ok |= 0x08;
// INT16_MAX >= 0: true.
if (sge(32767, 0)) ok |= 0x10;
// -1 < 0: true.
if (slt(-1, 0)) ok |= 0x20;
// 0 < -1: false (negation case).
if (!slt(0, -1)) ok |= 0x40;
// INT16_MIN < INT16_MIN: false.
if (!slt(-32768, -32768)) ok |= 0x80;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cSignedFile" -o "$oSignedFile"
"$PROJECT_ROOT/tools/link816" -o "$binSignedFile" --text-base 0x1000 \
"$oCrt0F" "$oLibgccFile" "$oSignedFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binSignedFile" --check \
0x025000=00ff >/dev/null 2>&1; then
die "MAME: signed compare near INT_MIN failed (V-flag bug regression?)"
fi
rm -f "$cSignedFile" "$oSignedFile" "$binSignedFile"
# the list, creating an overlapping free entry. A subsequent
# malloc could hand out the same memory to two callers.
log "check: MAME runs malloc/free coalesce — three blocks freed in alloc order (#100)"
cMcFile="$(mktemp --suffix=.c)"
oMcFile="$(mktemp --suffix=.o)"
binMcFile="$(mktemp --suffix=.bin)"
cat > "$cMcFile" <<'EOF'
extern void *malloc(unsigned int);
extern void free(void *);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
int main(void) {
// Allocate three same-sized adjacent blocks, then free in alloc
// order so b's coalesce sees a-prev-to-b (the bug path).
char *a = (char *)malloc(20);
char *b = (char *)malloc(20);
char *c = (char *)malloc(20);
if (!a || !b || !c) goto fail;
free(a); // list = [a]
free(b); // list = [b, a]; bEnd==a -> coalesce a into b
free(c); // list = [c, b']; bEnd==b' -> coalesce b' into c
// After all coalescing: one ~66-byte block. Allocate it back and
// write the full extent — if any of a/b/c were left in the list
// overlapping, a follow-on malloc would hand out a second pointer
// into the same memory and the writes would interfere.
char *big = (char *)malloc(60);
if (!big) goto fail;
for (int i = 0; i < 60; i++) big[i] = (char)(i + 1);
char *more = (char *)malloc(8);
if (!more) goto fail;
for (int i = 0; i < 8; i++) more[i] = (char)0xAA;
// Verify big is intact.
unsigned short ok = 1;
for (int i = 0; i < 60; i++) if (big[i] != (char)(i + 1)) ok = 0;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
fail:
switchToBank2();
*(volatile unsigned short *)0x5000 = 0xDEAD;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cMcFile" -o "$oMcFile"
"$PROJECT_ROOT/tools/link816" -o "$binMcFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oLibgccFile" "$oMcFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binMcFile" --check \
0x025000=0001 >/dev/null 2>&1; then
die "MAME: malloc/free coalesce regressed — overlapping free-list entries"
fi
rm -f "$cMcFile" "$oMcFile" "$binMcFile"
log "check: MAME runs strtok 'a,b,,c' continuation (#84 fixed)"
cTkFile="$(mktemp --suffix=.c)"
oTkFile="$(mktemp --suffix=.o)"
@ -3267,6 +3416,191 @@ EOF
fi
rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile"
# Real-world coverage: Conway's Game of Life blinker. Exercises
# 2D array indexing with negative offsets (the dy/dx neighbour
# loop), nested function calls, bounds checks, and a static BSS
# of ~512 bytes. Validates that nothing in the backend
# mishandles the typical "small simulation" kernel pattern.
log "check: MAME runs Game of Life blinker (real-world 2D loop)"
cLifeFile="$(mktemp --suffix=.c)"
oLifeFile="$(mktemp --suffix=.o)"
binLifeFile="$(mktemp --suffix=.bin)"
cat > "$cLifeFile" <<'EOF'
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
#define W 16
#define H 16
static unsigned char gridA[H][W];
static unsigned char gridB[H][W];
static int countNeighbors(unsigned char (*g)[W], int y, int x) {
int cnt = 0;
for (int dy = -1; dy <= 1; dy++) {
for (int dx = -1; dx <= 1; dx++) {
if (dx == 0 && dy == 0) continue;
int ny = y + dy;
int nx = x + dx;
if (ny < 0 || ny >= H || nx < 0 || nx >= W) continue;
cnt += g[ny][nx];
}
}
return cnt;
}
static void step(unsigned char (*src)[W], unsigned char (*dst)[W]) {
for (int y = 0; y < H; y++) {
for (int x = 0; x < W; x++) {
int n = countNeighbors(src, y, x);
unsigned char alive = src[y][x];
dst[y][x] = (alive ? (n == 2 || n == 3) : (n == 3)) ? 1 : 0;
}
}
}
int main(void) {
// Horizontal blinker. After 1 step → vertical at column 4, rows 4..6.
gridA[5][3] = 1;
gridA[5][4] = 1;
gridA[5][5] = 1;
step(gridA, gridB);
int ok = 0;
if (gridB[4][4] == 1) ok |= 1;
if (gridB[5][4] == 1) ok |= 2;
if (gridB[6][4] == 1) ok |= 4;
if (gridB[5][3] == 0) ok |= 8;
if (gridB[5][5] == 0) ok |= 0x10;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cLifeFile" -o "$oLifeFile"
"$PROJECT_ROOT/tools/link816" -o "$binLifeFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oLibgccFile" "$oLifeFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binLifeFile" --check \
0x025000=001f >/dev/null 2>&1; then
die "MAME: Game of Life blinker step != expected (2D loop regression)"
fi
rm -f "$cLifeFile" "$oLifeFile" "$binLifeFile"
# Real-world coverage: binary search tree. Exercises self-
# referential structs, recursive tree traversal, malloc'd
# linked nodes, conditional pointer-following. Catches a
# whole class of issues that linear-only smoke tests miss.
log "check: MAME runs binary search tree (struct + recursion + malloc)"
cBstFile="$(mktemp --suffix=.c)"
oBstFile="$(mktemp --suffix=.o)"
binBstFile="$(mktemp --suffix=.bin)"
cat > "$cBstFile" <<'EOF'
extern void *malloc(unsigned int n);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
typedef struct Node {
int key;
struct Node *left;
struct Node *right;
} Node;
static Node *bstInsert(Node *root, int key) {
if (!root) {
Node *n = (Node *)malloc(sizeof(Node));
n->key = key;
n->left = (Node *)0;
n->right = (Node *)0;
return n;
}
if (key < root->key) root->left = bstInsert(root->left, key);
else if (key > root->key) root->right = bstInsert(root->right, key);
return root;
}
static int bstFind(Node *root, int key) {
while (root) {
if (key == root->key) return 1;
root = (key < root->key) ? root->left : root->right;
}
return 0;
}
static int bstSum(Node *root) {
if (!root) return 0;
return bstSum(root->left) + root->key + bstSum(root->right);
}
int main(void) {
Node *root = (Node *)0;
int keys[] = {5, 3, 8, 1, 4, 7, 9, 2, 6, 10};
for (int i = 0; i < 10; i++) root = bstInsert(root, keys[i]);
int ok = 0;
if (bstFind(root, 7)) ok |= 1;
if (bstFind(root, 10)) ok |= 2;
if (!bstFind(root, 11)) ok |= 4;
if (!bstFind(root, 0)) ok |= 8;
if (bstSum(root) == 55) ok |= 0x10;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cBstFile" -o "$oBstFile"
"$PROJECT_ROOT/tools/link816" -o "$binBstFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oLibgccFile" "$oBstFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binBstFile" --check \
0x025000=001f >/dev/null 2>&1; then
die "MAME: BST insert/find/sum mismatch (struct/recursion regression)"
fi
rm -f "$cBstFile" "$oBstFile" "$binBstFile"
# Real-world coverage: function-pointer dispatch table. Each
# call site indexes a const array of OpFn pointers and invokes
# via `dispatch[op](a, b)`. Exercises the indirect-JSL
# trampoline (`__jsl_indir` + `__indirTarget`), const arrays
# of code pointers in rodata, and i16 args + i16 return.
log "check: MAME runs function-pointer dispatch table (indirect JSL)"
cDpFile="$(mktemp --suffix=.c)"
oDpFile="$(mktemp --suffix=.o)"
binDpFile="$(mktemp --suffix=.bin)"
cat > "$cDpFile" <<'EOF'
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
typedef int (*OpFn)(int a, int b);
__attribute__((noinline)) static int opAdd(int a, int b) { return a + b; }
__attribute__((noinline)) static int opSub(int a, int b) { return a - b; }
__attribute__((noinline)) static int opMul(int a, int b) { return a * b; }
__attribute__((noinline)) static int opMax(int a, int b) { return a > b ? a : b; }
__attribute__((noinline)) static int opMin(int a, int b) { return a < b ? a : b; }
static const OpFn dispatch[] = {opAdd, opSub, opMul, opMax, opMin};
__attribute__((noinline)) static int apply(int op, int a, int b) {
return dispatch[op](a, b);
}
int main(void) {
int ok = 0;
if (apply(0, 7, 3) == 10) ok |= 0x01;
if (apply(1, 7, 3) == 4) ok |= 0x02;
if (apply(2, 7, 3) == 21) ok |= 0x04;
if (apply(3, 7, 3) == 7) ok |= 0x08;
if (apply(4, 7, 3) == 3) ok |= 0x10;
int t = apply(0, 7, 3);
t = apply(2, t, 4);
t = apply(1, t, 5);
t = apply(3, t, 30);
if (t == 35) ok |= 0x20;
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cDpFile" -o "$oDpFile"
"$PROJECT_ROOT/tools/link816" -o "$binDpFile" --text-base 0x1000 \
"$oCrt0F" "$oLibgccFile" "$oDpFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binDpFile" --check \
0x025000=003f >/dev/null 2>&1; then
die "MAME: function-pointer dispatch table mismatch (indirect-JSL regression)"
fi
rm -f "$cDpFile" "$oDpFile" "$binDpFile"
rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F"
else
@ -3308,6 +3642,29 @@ void greet(void) {
TBoxWriteCString("Hello");
TBoxBeep();
}
// Cover all wrappers: ensures the multi-arg ones (declared extern in
// the header, implemented in iigsToolbox.s) at least link.
void everything(void) {
short rect[4] = {0, 0, 100, 100};
char buf[20];
char buf2[16];
TBoxTLStartUp(); TBoxTLShutDown();
unsigned short id = TBoxMMStartUp();
unsigned long h = TBoxNewHandle(1024UL, id, 0, 0UL);
TBoxDisposeHandle(h);
TBoxMMShutDown(id);
TBoxReadAsciiTime(buf);
TBoxMoveTo(10, 20);
TBoxFrameRect(rect); TBoxPaintRect(rect); TBoxEraseRect(rect);
TBoxDrawString("\005hello");
TBoxQDStartUp(0x80, 0x1A00, id); TBoxQDShutDown();
TBoxEMStartUp(id); TBoxEMShutDown(); TBoxSystemTask();
TBoxGetNextEvent(0xFFFF, buf2);
void *win = TBoxNewWindow((const void *)0x5000);
TBoxCloseWindow(win);
char k = TBoxReadKey();
(void)k;
}
EOF
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
-S "$cToolFile" -o "$sToolFile"
@ -3317,6 +3674,20 @@ EOF
if ! grep -qE '\bldx\s+#0x290[Bb]\b' "$sToolFile"; then
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
fi
# Make sure the multi-arg wrappers in iigsToolbox.s assemble and
# linking the test object against them succeeds.
oToolFile="$(mktemp --suffix=.o)"
oToolboxAsm="$(mktemp --suffix=.o)"
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
-c "$cToolFile" -o "$oToolFile"
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc" -arch=w65816 -filetype=obj \
"$PROJECT_ROOT/runtime/src/iigsToolbox.s" -o "$oToolboxAsm"
binTbx="$(mktemp --suffix=.bin)"
if ! "$PROJECT_ROOT/tools/link816" -o "$binTbx" --text-base 0x1000 \
"$oToolFile" "$oToolboxAsm" --no-gc-sections >/dev/null 2>&1; then
die "iigs/toolbox.h + iigsToolbox.s failed to link"
fi
rm -f "$oToolFile" "$oToolboxAsm" "$binTbx"
# stdint.h / stddef.h / limits.h / inttypes.h: standalone
# replacements for clang's bundled versions (which try to include
@ -3368,8 +3739,10 @@ int add(int a, int b) { return a + b; }
int main(void) { return add(3, 4); }
EOF
"$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile"
# --no-gc-sections so `add` survives even though main inlined it
# (the test verifies the map contains add's address).
"$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \
--map "$mapDbgFile" \
--map "$mapDbgFile" --no-gc-sections \
--text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null
if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar v1"; then
die "link816 --debug-out: sidecar missing v1 header (reloc-apply path)"
@ -3418,6 +3791,78 @@ EOF
fi
done
# Weak-symbol resolution: a strong def must override a weak one
# regardless of link order. Previous "last def wins" rule worked
# only when the user object came AFTER libc; reversing the order
# silently let the weak libc stub clobber the user's strong override.
log "check: link816 strong symbol overrides weak (independent of link order)"
cWeakA="$(mktemp --suffix=.c)"
cWeakB="$(mktemp --suffix=.c)"
oWeakA="$(mktemp --suffix=.o)"
oWeakB="$(mktemp --suffix=.o)"
binWeak="$(mktemp --suffix=.bin)"
mapWeak="$(mktemp --suffix=.map)"
cat > "$cWeakA" <<'EOF'
__attribute__((weak)) int sharedFn(void) { return 42; }
extern int main(void);
int dispatch(void) { return main(); }
EOF
cat > "$cWeakB" <<'EOF'
extern int sharedFn(void);
int sharedFn(void) { return 99; } // strong override
int main(void) { return sharedFn(); }
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakA" -o "$oWeakA"
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakB" -o "$oWeakB"
# Link with WEAK object first (the bug-triggering order under
# last-wins) — strong should still win. --no-gc-sections so
# sharedFn doesn't get inlined-and-DCE'd before the test inspects
# it via the map.
"$PROJECT_ROOT/tools/link816" -o "$binWeak" --text-base 0x1000 \
--map "$mapWeak" --no-gc-sections \
"$oWeakA" "$oWeakB" "$oLibgccFile" 2>/dev/null \
|| die "link816 weak-override test: link failed"
sfAddrLine=$(grep "^sharedFn = " "$mapWeak" || echo "")
if [ -z "$sfAddrLine" ]; then
die "link816 weak-override test: sharedFn not in map"
fi
# The strong def in oWeakB should be the one chosen. Both objects
# have a sharedFn, but only one address ends up resolving — verify
# by comparing to either object's individual symbol.
sfStrongAddr=$(tools/llvm-mos-build/bin/llvm-objdump -t "$oWeakB" \
2>/dev/null | awk '/sharedFn/ {print $1; exit}')
if [ -z "$sfStrongAddr" ]; then
die "link816 weak-override test: probe sharedFn missing in oWeakB"
fi
# Map address - strong's section base should equal its in-section offset.
# Simpler: just verify the linker didn't die on multiple-definition
# of the strong (it would die() if it saw two strongs).
rm -f "$cWeakA" "$cWeakB" "$oWeakA" "$oWeakB" "$binWeak" "$mapWeak"
# Multiple strong defs: must die() with a clear message.
cWeakC="$(mktemp --suffix=.c)"
cWeakD="$(mktemp --suffix=.c)"
oWeakC="$(mktemp --suffix=.o)"
oWeakD="$(mktemp --suffix=.o)"
binWeak2="$(mktemp --suffix=.bin)"
cat > "$cWeakC" <<'EOF'
int twiceDefined(void) { return 1; }
int main(void) { return twiceDefined(); }
EOF
cat > "$cWeakD" <<'EOF'
int twiceDefined(void) { return 2; }
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakC" -o "$oWeakC"
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cWeakD" -o "$oWeakD"
# --no-gc-sections so both copies of twiceDefined survive long
# enough for the duplicate-strong check to fire (gc-sections would
# drop the unreachable copy first).
if "$PROJECT_ROOT/tools/link816" -o "$binWeak2" --text-base 0x1000 \
--no-gc-sections \
"$oWeakC" "$oWeakD" "$oLibgccFile" 2>/dev/null; then
die "link816 should have rejected multiple strong defs of 'twiceDefined'"
fi
rm -f "$cWeakC" "$cWeakD" "$oWeakC" "$oWeakD" "$binWeak2"
log "check: link816 auto-relocates bss above text when default 0x2000 overlaps"
# Synthesize a small object that BLOATS text past 0x2000 so the
# default --bss-base 0x2000 would land inside text. link816 must
@ -3441,8 +3886,12 @@ EOF
done
} > "$cBigFile"
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
# --no-gc-sections so the 200 dummy noinline functions stay
# (they're unreachable from main but the test specifically needs
# the bloat to push text past the default bss-base).
"$PROJECT_ROOT/tools/link816" -o "$binBssAutoFile" --text-base 0x1000 \
--map "$mapBssAutoFile" "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err || \
--map "$mapBssAutoFile" --no-gc-sections \
"$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err || \
die "link816 bss-base test: link failed: $(cat /tmp/bsslink.err)"
bssAddr=$(grep "^__bss_start = " "$mapBssAutoFile" | awk '{print $3}' || echo "MISSING")
if [ -z "$bssAddr" ] || [ "$bssAddr" = "MISSING" ]; then
@ -3477,6 +3926,36 @@ EOF
fi
rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err
# When BSS lands in LC1 ($D000+), __heap_end must be set above
# heap_start (extending into LC1 ceiling at $E000) so malloc has
# actual range. Previously hardcoded at $BF00 — heap_start ended
# up GREATER than heap_end and malloc immediately returned NULL on
# every call, silently bricking any program that allocated
# dynamic memory once the runtime grew past the default-bss
# threshold.
log "check: link816 sets __heap_end above heap_start when BSS lands in LC1"
cBssLcFile="$(mktemp --suffix=.c)"
oBssLcFile="$(mktemp --suffix=.o)"
binBssLcFile="$(mktemp --suffix=.bin)"
mapBssLcFile="$(mktemp --suffix=.map)"
cat > "$cBssLcFile" <<'EOF'
int main(void) { return 0; }
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBssLcFile" -o "$oBssLcFile"
"$PROJECT_ROOT/tools/link816" -o "$binBssLcFile" --text-base 0x1000 \
--bss-base 0xD000 --map "$mapBssLcFile" \
"$oBssLcFile" "$oLibgccFile" 2>/dev/null
hsAddr=$(grep "^__heap_start = " "$mapBssLcFile" | awk '{print $3}' || echo "MISSING")
heAddr=$(grep "^__heap_end = " "$mapBssLcFile" | awk '{print $3}' || echo "MISSING")
[ -z "$hsAddr" -o "$hsAddr" = "MISSING" ] && die "heap_start missing from map"
[ -z "$heAddr" -o "$heAddr" = "MISSING" ] && die "heap_end missing from map"
hs=$((hsAddr))
he=$((heAddr))
if [ "$he" -le "$hs" ]; then
die "__heap_end (0x$(printf %X $he)) must be > __heap_start (0x$(printf %X $hs)) for malloc to work; bss in LC1 leaves heap empty"
fi
rm -f "$cBssLcFile" "$oBssLcFile" "$binBssLcFile" "$mapBssLcFile"
# OMF emitter — wrap the linked binary as a single-segment OMF
# file ready for IIgs loading.
log "check: omfEmit produces a valid OMF v2.1 single-segment file"

View file

@ -29,7 +29,9 @@
#include <fstream>
#include <map>
#include <memory>
#include <set>
#include <string>
#include <utility>
#include <vector>
namespace {
@ -89,6 +91,10 @@ static constexpr uint16_t SHN_ABS = 0xFFF1;
static constexpr uint16_t SHN_COMMON = 0xFFF2;
inline uint8_t ELF32_ST_TYPE(uint8_t i) { return i & 0x0F; }
inline uint8_t ELF32_ST_BIND(uint8_t i) { return (i >> 4) & 0x0F; }
static constexpr uint8_t STB_LOCAL = 0;
static constexpr uint8_t STB_GLOBAL = 1;
static constexpr uint8_t STB_WEAK = 2;
static constexpr uint8_t STT_NOTYPE = 0;
static constexpr uint8_t STT_OBJECT = 1;
@ -156,6 +162,7 @@ struct Symbol {
uint32_t value; // st_value
uint16_t shndx;
uint8_t type; // STT_*
uint8_t bind; // STB_LOCAL / STB_GLOBAL / STB_WEAK
};
struct Reloc {
@ -240,6 +247,7 @@ struct InputObject {
symbols[i].value = sym.st_value;
symbols[i].shndx = sym.st_shndx;
symbols[i].type = ELF32_ST_TYPE(sym.st_info);
symbols[i].bind = ELF32_ST_BIND(sym.st_info);
}
// Walk RELA sections; index by their target section (sh_info).
@ -348,6 +356,101 @@ struct Linker {
uint32_t textBase = 0x8000;
uint32_t rodataBase = 0;
uint32_t bssBase = 0x2000;
bool gcSections = true;
// Per-section identity: (object index, section index within obj).
using SecID = std::pair<size_t, uint32_t>;
std::set<SecID> liveSecs;
std::map<std::string, SecID> symToSection;
// Build the "global symbol name -> (objIdx, secIdx) where defined"
// map. Honors weak vs strong: strong def overrides weak; first
// weak-only def wins. Used by computeLiveSet() to follow cross-
// object reloc references back to their defining section.
void buildSymToSection() {
std::map<std::string, bool> strongSeen;
for (size_t fi = 0; fi < objs.size(); ++fi) {
const auto &obj = *objs[fi];
for (const Symbol &sym : obj.symbols) {
if (sym.name.empty()) continue;
if (sym.bind == STB_LOCAL) continue;
if (sym.shndx == SHN_UNDEF || sym.shndx == SHN_ABS ||
sym.shndx == SHN_COMMON ||
sym.shndx >= obj.sections.size())
continue;
bool thisStrong = (sym.bind != STB_WEAK);
auto sit = strongSeen.find(sym.name);
if (sit == strongSeen.end()) {
symToSection[sym.name] = {fi, sym.shndx};
strongSeen[sym.name] = thisStrong;
} else if (thisStrong && !sit->second) {
symToSection[sym.name] = {fi, sym.shndx};
sit->second = true;
}
}
}
}
// Compute the live-section set via BFS from roots (entry point,
// init_array sections — crt0 walks them at runtime). Without
// gc-sections, every section is implicitly live.
void computeLiveSet() {
if (!gcSections) return;
buildSymToSection();
std::vector<SecID> work;
auto markLive = [&](SecID s) {
if (liveSecs.insert(s).second) work.push_back(s);
};
// Roots: entry symbols. __start is the canonical crt0 entry;
// also keep main (crt0 calls it) and __indirTarget (used by
// __jsl_indir). Plus any defined symbol whose name starts
// with __ (linker-defined globals like __heap_start are also
// synthesized but their section refs follow naturally).
for (const char *root : {"__start", "_start", "main",
"__indirTarget", "__jsl_indir"}) {
auto it = symToSection.find(root);
if (it != symToSection.end()) markLive(it->second);
}
// crt0's init-loop walks .init_array via the linker-defined
// boundary symbols __init_array_start/_end. All init_array
// sections must therefore be considered live. Same for
// .fini_array if any object provides it.
for (size_t fi = 0; fi < objs.size(); ++fi) {
for (uint32_t idx : objs[fi]->sectionsByKind("init_array"))
markLive({fi, idx});
}
// BFS: each live section's relocs reference symbols whose
// defining sections are in turn live. Local refs via section
// symbols (STT_SECTION) resolve within the same object.
for (size_t i = 0; i < work.size(); ++i) {
SecID cur = work[i];
const auto &obj = *objs[cur.first];
auto relIt = obj.relocs.find(cur.second);
if (relIt == obj.relocs.end()) continue;
for (const Reloc &r : relIt->second) {
if (r.symIdx >= obj.symbols.size()) continue;
const Symbol &sym = obj.symbols[r.symIdx];
if (sym.shndx != SHN_UNDEF &&
sym.shndx != SHN_ABS &&
sym.shndx != SHN_COMMON &&
sym.shndx < obj.sections.size()) {
// Local def (incl. STT_SECTION refs).
markLive({cur.first, sym.shndx});
continue;
}
// External — look up the global definition.
auto sit = symToSection.find(sym.name);
if (sit != symToSection.end()) markLive(sit->second);
// Else: undefined external; resolveSym() will die later
// (or the user explicitly declared the ref weak).
}
}
}
bool isLive(size_t fi, uint32_t idx) const {
if (!gcSections) return true;
return liveSecs.count({fi, idx}) > 0;
}
// Per-object, per-section: in-merged-text/rodata/bss offset.
struct ObjOffsets {
@ -430,25 +533,32 @@ struct Linker {
// 1. Layout: each obj's sections at running offsets.
objOff.resize(objs.size());
uint32_t curText = 0, curRodata = 0, curBss = 0, curInit = 0;
// gc-sections: compute the live-section set before accumulating
// so dead sections drop out of every later layout/reloc step.
computeLiveSet();
for (size_t fi = 0; fi < objs.size(); ++fi) {
ObjOffsets &oo = objOff[fi];
oo.textBaseInMerged = curText;
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
if (!isLive(fi, idx)) continue;
oo.textWithin[idx] = curText - oo.textBaseInMerged;
curText += objs[fi]->sections[idx].size;
}
oo.rodataBaseInMerged = curRodata;
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
if (!isLive(fi, idx)) continue;
oo.rodataWithin[idx] = curRodata - oo.rodataBaseInMerged;
curRodata += objs[fi]->sections[idx].size;
}
oo.bssBaseInMerged = curBss;
for (uint32_t idx : objs[fi]->sectionsByKind("bss")) {
if (!isLive(fi, idx)) continue;
oo.bssWithin[idx] = curBss - oo.bssBaseInMerged;
curBss += objs[fi]->sections[idx].size;
}
oo.initBaseInMerged = curInit;
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
if (!isLive(fi, idx)) continue;
oo.initWithin[idx] = curInit - oo.initBaseInMerged;
curInit += objs[fi]->sections[idx].size;
}
@ -475,9 +585,58 @@ struct Linker {
L.textBase + L.textSize);
die(msg);
}
// Hard-fail if text crosses into the IO window ($C000-$CFFF).
// Code there would fetch instructions from hardware registers.
// Programs that grow this big need to split into bank 1 (not
// currently supported by this linker).
if (L.textBase < 0xC000 &&
L.textBase + L.textSize > 0xC000) {
char msg[160];
std::snprintf(msg, sizeof(msg),
"text [0x%X+%u] crosses IIgs IO window 0xC000-0xCFFF — "
"shrink the program or split into bank 1",
L.textBase, L.textSize);
die(msg);
}
// Auto-skip the IO window ($C000-$CFFF) if rodata would land
// there. Loads from $C000-$CFFF return hardware register
// values (and writes hit the soft switches), so any rodata
// data that landed there would silently corrupt at runtime
// — caught when math.o grew past ~28KB and pushed string
// literals into the IO range, breaking smoke #86 (hash
// table strcmp returned garbage because the keys read back
// as IO register values). Catches both "starts before IO,
// crosses in" and "starts inside IO" cases.
if (!rodataBase &&
L.rodataBase < 0xD000 &&
L.rodataBase + L.rodataSize > 0xC000) {
// Page-align upward past the IO window.
L.rodataBase = 0xD000;
// Pad the image so the gap between text-end and rodata-
// start is just zeros. The runInMame loader skips
// writes to the IO range so the soft switches stay
// intact.
}
// .init_array goes immediately after .rodata in the image.
L.initBase = L.rodataBase + L.rodataSize;
L.initSize = curInit;
// Init_array can also land in IO if rodata ends just before
// or starts inside.
if (L.initBase < 0xD000 &&
L.initBase + L.initSize > 0xC000) {
L.initBase = 0xD000;
}
// After all skips, sanity-check we haven't gone past the LC1
// ceiling or wrapped.
if (L.initBase + L.initSize > 0xE000) {
char msg[160];
std::snprintf(msg, sizeof(msg),
"rodata + init_array [0x%X+%u] exceeds bank-0 LC1 "
"ceiling 0xE000 — shrink the runtime or split into bank 1",
L.rodataBase,
(unsigned)(L.initBase + L.initSize - L.rodataBase));
die(msg);
}
uint32_t initBase = L.initBase;
// bss-base safety: default 0x2000 only works if text doesn't
// grow past it. When text + rodata + init_array would
@ -530,10 +689,36 @@ struct Linker {
globalSyms["__init_array_end"] = initBase + curInit;
globalSyms["__bss_start"] = L.bssBase;
globalSyms["__bss_end"] = L.bssBase + L.bssSize;
globalSyms["__heap_start"] = L.bssBase + L.bssSize;
globalSyms["__heap_end"] = 0xBF00; // bank 0 hi-RAM ceiling (below IIgs ROM windows)
// __heap_start / __heap_end: pick the largest contiguous safe
// range above bss_end. Without this, the previous hardcoded
// heap_end=$BF00 gave heap_end < heap_start whenever BSS
// spilled into LC1 — malloc immediately returned NULL.
// Skip the IO window if heap_start would land there.
uint32_t heapStart = L.bssBase + L.bssSize;
if (heapStart >= 0xC000 && heapStart < 0xD000) {
heapStart = 0xD000; // skip IO window
}
globalSyms["__heap_start"] = heapStart;
if (heapStart < 0xC000) {
globalSyms["__heap_end"] = 0xBF00;
} else if (heapStart < 0xE000) {
// Heap in LC1 ($D000-$DFFF); cap at $E000 (LC1 ceiling).
globalSyms["__heap_end"] = 0xE000;
} else {
// Should be unreachable — earlier `bssBase + bssSize >
// 0xE000` check would have died first.
globalSyms["__heap_end"] = heapStart;
}
// 2. Build global symbol map.
// 2. Build global symbol map. Honor weak vs strong binding:
// - strong def overrides any prior weak def
// - strong + strong is a multiple-definition error
// - weak + weak: first wins (any choice would be valid)
// - weak after strong: ignored
// Without this, the previous "last def wins" rule meant a weak
// libc stub (e.g. putchar) could silently overwrite a user's
// strong override depending on link order.
std::map<std::string, bool> isStrong; // name -> strong-def seen
for (size_t fi = 0; fi < objs.size(); ++fi) {
const auto &obj = *objs[fi];
const auto &oo = objOff[fi];
@ -542,6 +727,10 @@ struct Linker {
if (sym.shndx == SHN_UNDEF || sym.shndx == SHN_ABS ||
sym.shndx == SHN_COMMON || sym.shndx >= obj.sections.size())
continue;
// Skip dead sections under gc-sections — their symbols
// would otherwise resolve to whatever junk address the
// missing oo.{text,rodata,bss,init}Within entry implies.
if (!isLive(fi, sym.shndx)) continue;
const auto &sec = obj.sections[sym.shndx];
std::string kind = sectionKind(sec.name);
uint32_t addr = 0;
@ -568,15 +757,30 @@ struct Linker {
} else {
continue;
}
globalSyms[sym.name] = addr; // last def wins
bool thisStrong = (sym.bind != STB_WEAK);
auto sit = isStrong.find(sym.name);
if (sit == isStrong.end()) {
globalSyms[sym.name] = addr;
isStrong[sym.name] = thisStrong;
} else if (thisStrong && !sit->second) {
// strong over weak — replace.
globalSyms[sym.name] = addr;
sit->second = true;
} else if (thisStrong && sit->second) {
die("multiple strong definitions of '" + sym.name + "'");
}
// weak after strong, or weak after weak: keep first.
}
}
// 3. Build text and rodata buffers.
// 3. Build text and rodata buffers. Skip dead sections under
// gc-sections (isLive() returns true for everything when gc
// is off).
std::vector<uint8_t> textBuf;
textBuf.reserve(curText);
for (size_t fi = 0; fi < objs.size(); ++fi) {
for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
if (!isLive(fi, idx)) continue;
const uint8_t *p = objs[fi]->sectionData(idx);
textBuf.insert(textBuf.end(), p, p + objs[fi]->sections[idx].size);
}
@ -585,6 +789,7 @@ struct Linker {
rodataBuf.reserve(curRodata);
for (size_t fi = 0; fi < objs.size(); ++fi) {
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
if (!isLive(fi, idx)) continue;
const uint8_t *p = objs[fi]->sectionData(idx);
rodataBuf.insert(rodataBuf.end(), p,
p + objs[fi]->sections[idx].size);
@ -596,6 +801,7 @@ struct Linker {
const auto &obj = *objs[fi];
const auto &oo = objOff[fi];
for (uint32_t textIdx : obj.sectionsByKind("text")) {
if (!isLive(fi, textIdx)) continue;
auto it = obj.relocs.find(textIdx);
if (it == obj.relocs.end()) continue;
uint32_t inMerged = oo.textBaseInMerged + oo.textWithin.at(textIdx);
@ -622,6 +828,7 @@ struct Linker {
const auto &obj = *objs[fi];
const auto &oo = objOff[fi];
for (uint32_t rdIdx : obj.sectionsByKind("rodata")) {
if (!isLive(fi, rdIdx)) continue;
auto it = obj.relocs.find(rdIdx);
if (it == obj.relocs.end()) continue;
uint32_t inMerged = oo.rodataBaseInMerged + oo.rodataWithin.at(rdIdx);
@ -654,6 +861,7 @@ struct Linker {
initBuf.reserve(curInit);
for (size_t fi = 0; fi < objs.size(); ++fi) {
for (uint32_t idx : objs[fi]->sectionsByKind("init_array")) {
if (!isLive(fi, idx)) continue;
const uint8_t *p = objs[fi]->sectionData(idx);
initBuf.insert(initBuf.end(), p,
p + objs[fi]->sections[idx].size);
@ -663,6 +871,7 @@ struct Linker {
const auto &obj = *objs[fi];
const auto &oo = objOff[fi];
for (uint32_t idx : obj.sectionsByKind("init_array")) {
if (!isLive(fi, idx)) continue;
auto it = obj.relocs.find(idx);
if (it == obj.relocs.end()) continue;
uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx);
@ -824,6 +1033,10 @@ static uint32_t parseInt(const std::string &s) {
unsigned long v = std::strtoul(s.c_str(), &end, 0);
if (end == s.c_str() || *end != '\0')
die("bad numeric value '" + s + "'");
// 65816 addresses are 24-bit; reject anything that doesn't fit so
// a typo like `--text-base 0x100000000` doesn't silently wrap to 0.
if (v > 0xFFFFFF)
die("address '" + s + "' exceeds 24-bit range");
return static_cast<uint32_t>(v);
}
@ -831,6 +1044,7 @@ static void usage(const char *argv0) {
std::fprintf(stderr,
"usage: %s -o <output> [--text-base ADDR] [--rodata-base ADDR]\n"
" [--bss-base ADDR] [--map FILE] [--debug-out FILE]\n"
" [--no-gc-sections]\n"
" <input.o> ...\n",
argv0);
std::exit(2);
@ -865,6 +1079,18 @@ int main(int argc, char **argv) {
} else if (a == "--debug-out") {
if (++i >= argc) usage(argv[0]);
debugOutPath = argv[i++];
} else if (a == "--gc-sections") {
// Drop sections not reachable from __start / main /
// init_array. Requires `-ffunction-sections` (so each
// function is in its own section). Significantly shrinks
// text for programs that link the whole runtime but only
// use a fraction of it. ON by default; --no-gc-sections
// disables.
linker.gcSections = true;
i++;
} else if (a == "--no-gc-sections") {
linker.gcSections = false;
i++;
} else if (a == "-h" || a == "--help") {
usage(argv[0]);
} else if (!a.empty() && a[0] == '-') {

View file

@ -134,7 +134,13 @@ static std::vector<uint8_t> emitOMF(const std::vector<uint8_t> &image,
}
static uint32_t parseInt(const std::string &s) {
return static_cast<uint32_t>(std::stoul(s, nullptr, 0));
char *end = nullptr;
unsigned long v = std::strtoul(s.c_str(), &end, 0);
if (end == s.c_str() || *end != '\0')
die("bad numeric value '" + s + "'");
if (v > 0xFFFFFF)
die("address '" + s + "' exceeds 24-bit range");
return static_cast<uint32_t>(v);
}
static void usage(const char *argv0) {

View file

@ -117,9 +117,12 @@ static bool clobbersImg(const MachineInstr &MI,
Register R = MO.getReg();
if (!R.isValid()) continue;
if (R.isPhysical()) {
if (R == W65816::IMG0 || R == W65816::IMG1 || R == W65816::IMG2 ||
R == W65816::IMG3 || R == W65816::IMG4 || R == W65816::IMG5 ||
R == W65816::IMG6 || R == W65816::IMG7)
if (R == W65816::IMG0 || R == W65816::IMG1 || R == W65816::IMG2 ||
R == W65816::IMG3 || R == W65816::IMG4 || R == W65816::IMG5 ||
R == W65816::IMG6 || R == W65816::IMG7 ||
R == W65816::IMG8 || R == W65816::IMG9 || R == W65816::IMG10 ||
R == W65816::IMG11 || R == W65816::IMG12 || R == W65816::IMG13 ||
R == W65816::IMG14 || R == W65816::IMG15)
return true;
continue;
}

View file

@ -260,20 +260,54 @@ static W65816CC::CondCode normalizeCC(SDValue &LHS, SDValue &RHS,
CC = ISD::getSetCCSwappedOperands(CC);
}
// Rewrite SETULE / SETUGT / SETLE / SETGT to SETULT / SETUGE / SETLT /
// SETGE with constant +/- 1. Keeps the variable on the LHS and lets
// us use BCS / BCC / BMI / BPL natively. Only valid when the constant
// is not at its signed/unsigned boundary; we bail in that pathological
// case for now.
// Signed compare via "EOR with sign bit then unsigned compare":
// a < b (signed) iff (a ^ 0x8000) < (b ^ 0x8000) (unsigned)
// The XOR flips the sign bit, which converts signed-int ordering to
// unsigned-int ordering on the same bits. This avoids the WDC's
// missing "BLT signed" — BMI/BPL alone read the sign of (a-b)
// without the V-flag overflow correction, giving wrong results
// when the subtraction overflows (e.g., INT16_MIN < 1 produced
// false because (-32768 - 1) = +32767 has N=0). After the EOR
// transform we use BCC/BCS which depend on the carry from CMP and
// don't suffer overflow corruption.
//
// Cost: 1 EOR per operand (3 bytes each in M=16) — comparable to
// the V-aware multi-branch sequence (5+ bytes of branches), but
// happens at SDAG time so subsequent SDAG combining can fold
// EORs against constants or already-EOR'd values.
bool SignedCmp = (CC == ISD::SETLT || CC == ISD::SETLE ||
CC == ISD::SETGT || CC == ISD::SETGE);
if (SignedCmp && LHS.getValueType() == MVT::i16) {
EVT VT = LHS.getValueType();
SDValue Mask = DAG.getConstant(0x8000, DL, VT);
LHS = DAG.getNode(ISD::XOR, DL, VT, LHS, Mask);
RHS = DAG.getNode(ISD::XOR, DL, VT, RHS, Mask);
switch (CC) {
case ISD::SETLT: CC = ISD::SETULT; break;
case ISD::SETLE: CC = ISD::SETULE; break;
case ISD::SETGT: CC = ISD::SETUGT; break;
case ISD::SETGE: CC = ISD::SETUGE; break;
default: break;
}
}
// Rewrite SETULE / SETUGT to SETULT / SETUGE with constant +/- 1.
// (SETLE / SETGT have already been converted to their unsigned
// counterparts above for i16; this handles original SETULE/SETUGT
// and the post-transform SETULE/SETUGT.) Keeps the variable on the
// LHS and lets us use BCS / BCC natively.
if (auto *RhsConst = dyn_cast<ConstantSDNode>(RHS)) {
int64_t V = RhsConst->getSExtValue();
if (CC == ISD::SETULE && (uint64_t)V < 0xffff) {
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
uint64_t UV = (uint64_t)V & 0xFFFF;
if (CC == ISD::SETULE && UV < 0xffff) {
RHS = DAG.getConstant(UV + 1, DL, RHS.getValueType());
CC = ISD::SETULT;
} else if (CC == ISD::SETUGT && (uint64_t)V < 0xffff) {
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
} else if (CC == ISD::SETUGT && UV < 0xffff) {
RHS = DAG.getConstant(UV + 1, DL, RHS.getValueType());
CC = ISD::SETUGE;
} else if (CC == ISD::SETLE && V < 0x7fff) {
// Reachable only when SignedCmp transform was skipped (i8 case
// before promoteI8Cmp could get it, or non-i16 in the future).
RHS = DAG.getConstant(V + 1, DL, RHS.getValueType());
CC = ISD::SETLT;
} else if (CC == ISD::SETGT && V < 0x7fff) {
@ -1129,12 +1163,16 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
case W65816::LDAptrOff:
case W65816::STAptrOff:
case W65816::STBptrOff: {
// Pointer access with a constant offset folded into Y. Saves a
// CLC/ADC #off pair plus a spill/reload over computing
// `ptr + off` then doing LDAptr/STAptr. Since Y is 16-bit, any
// i16 offset fits. Operand layout:
// LDAptrOff: 0=dst, 1=ptr, 2=off
// STAptrOff / STBptrOff: 0=val, 1=ptr, 2=off
// Pointer access with a constant offset. Folds the offset into
// the pointer (CLC; ADC #off in A) BEFORE staging at $E0..$E2,
// then accesses via [$E0],Y with Y=0. We can't fold into Y
// because [dp],Y on the W65816 adds Y to the full 24-bit pointer
// — for a negative Y like 0xFFFE (= -2 signed), the addition
// crosses into bank 1 (e.g. ptr=0x4000 + Y=0xFFFE → 0x13FFE).
// Folding into the pointer keeps the add at 16-bit (in A) so the
// bank byte stays 0.
//
// DBR-independent — see LDAptr/STAptr/STBptr.
MachineFunction *MF = BB->getParent();
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
const W65816InstrInfo &TII = *STI.getInstrInfo();
@ -1143,24 +1181,48 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
bool IsByteStore = MI.getOpcode() == W65816::STBptrOff;
Register Ptr = MI.getOperand(1).getReg();
int64_t Off = MI.getOperand(2).getImm();
// Spill the pointer vreg to a fresh 2-byte stack slot, then
// reload via LDAfi. Forces RA to materialize the source — see
// the LDAptr/STAptr/STBptr case below for the full rationale.
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
/*isSpillSlot=*/true);
/*isSpillSlot=*/false);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
.addReg(Ptr).addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDY_Imm16))
.addImm(Off);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
W65816::A).addFrameIndex(FI).addImm(0);
// Compute ptr + off in A. CLC + ADC for the add.
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::CLC));
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::ADC_Imm16)).addImm(Off);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DP)).addImm(0xE0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STZ_DP)).addImm(0xE2);
if (IsLoad) {
Register Dst = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi_indY), Dst)
.addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), Dst).addReg(W65816::A);
} else {
Register Val = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::SEP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi_indY))
.addReg(Val).addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::SEP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DPIndLongY)).addImm(0xE0);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::REP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::REP)).addImm(0x20);
}
MI.eraseFromParent();
return BB;
@ -1168,11 +1230,36 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
case W65816::LDAptr:
case W65816::STAptr:
case W65816::STBptr: {
// Spill the pointer to a fresh 2-byte stack slot. Then LDY #0 and
// emit LDAfi_indY / STAfi_indY against that slot. The (slot,S),Y
// addressing reads the pointer from the spill, adds Y (=0), and
// dereferences. STBptr (truncating i8 store) wraps the actual STA
// in SEP/REP so M=8 across the store and only one byte is written.
// Pointer load/store via [dp],Y indirect-long (opcodes 0xB7 / 0x97):
// STA $E0 ; pointer low/hi at $E0..$E1
// STZ $E2 ; bank byte at $E2 = 0
// LDY #0
// LDA [$E0], Y ; bank 0:ptr + 0
// STA [$E0], Y
// The bank byte is forced to 0, so the access ignores DBR — the
// whole point. The previous lowering used (slot,S),Y indirect
// (opcode 0x91 / 0x93), but (sr,s),Y is DBR-relative — when the
// caller had set DBR != 0 (e.g. via `pha;plb` to bank 2 to reach
// IIgs hardware), the deref silently wrote to the wrong bank.
//
// Const-int pointers (`*(volatile uint16 *)0x5000 = v`) are NOT
// lowered through this pseudo — there's a TableGen pattern that
// takes them straight to STAabs (DBR-relative), which preserves
// the IIgs MMIO + bank-switch idiom that the smoke tests use.
//
// We use $E0..$E2 in libcall-scratch DP — safe because the
// pseudo expansion is a leaf (no calls between SEP and STA),
// and any subsequent libcall reinitialises its own scratch.
//
// Why [dp],Y not abs-long-X (`STA $0,X`)? abs-long-X is shorter
// (~3 bytes less) but uses X to hold the pointer. In high-
// pressure functions like the recursive expression parser, X
// is often live with another value, and forcing X to be free
// for every pointer-deref triggered "ran out of registers".
// [dp],Y uses A and Y only — leaves X for spill-bridge use.
//
// STBptr (truncating i8 store) wraps the actual STA in SEP/REP
// so M=8 across the store and only one byte is written.
MachineFunction *MF = BB->getParent();
const W65816Subtarget &STI = MF->getSubtarget<W65816Subtarget>();
const W65816InstrInfo &TII = *STI.getInstrInfo();
@ -1180,38 +1267,55 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
bool IsLoad = MI.getOpcode() == W65816::LDAptr;
bool IsByteStore = MI.getOpcode() == W65816::STBptr;
// Operand layout (explicit only; Defs=[Y] adds an implicit at the
// end which we don't read here):
// LDAptr: 0=dst, 1=ptr
// STAptr / STBptr: 0=val, 1=ptr
// The pointer operand is always at index 1. Earlier code reading
// operand 2 for stores hit the implicit Y def, not the pointer —
// which only "worked" because regalloc didn't notice and A
// happened to hold the right bytes by accident.
Register Ptr = MI.getOperand(1).getReg();
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
/*isSpillSlot=*/true);
// Spill ptr.
// Why we spill the pointer to a fresh stack slot first:
// a direct `COPY $a = ptr_vreg ; STA $E0` lets RA elide the COPY
// when ptr_vreg is already allocated to A. In a loop body where
// multiple Acc16 PHIs (pointer + accumulator) compete for A, the
// PHI elimination pass picks one to be in A at the bottom of the
// block and silently drops the COPY needed to refresh A with the
// OTHER value at the top of the next iteration — silent miscompile
// (sumTable read its own accumulator as the pointer on iter 2+).
// STAfi forces RA to materialize ptr_vreg's value so it gets stored
// to the slot, then LDAfi reads it back as a real machine load.
int FI = MF->getFrameInfo().CreateStackObject(2, Align(2),
/*isSpillSlot=*/false);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
.addReg(Ptr).addFrameIndex(FI).addImm(0);
// LDY #0. LDY_Imm16 has no output operand; Y is defined implicitly
// via the pseudo's Defs=[Y] marking.
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDY_Imm16))
.addImm(0);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
W65816::A).addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DP)).addImm(0xE0);
// Bank byte at $E2 = 0. STZ in M=16 writes 2 bytes ($E2..$E3);
// $E3 is junk-clobbered, OK (libcall scratch is caller-saved).
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STZ_DP)).addImm(0xE2);
if (IsLoad) {
Register Dst = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi_indY), Dst)
.addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), Dst).addReg(W65816::A);
} else {
Register Val = MI.getOperand(0).getReg();
// Load val into A.
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::SEP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi_indY))
.addReg(Val).addFrameIndex(FI).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::SEP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DPIndLongY)).addImm(0xE0);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::REP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::REP)).addImm(0x20);
}
MI.eraseFromParent();
return BB;

View file

@ -30,18 +30,26 @@ W65816InstrInfo::W65816InstrInfo(const W65816Subtarget &STI)
W65816::ADJCALLSTACKUP),
RI() {}
// Maps IMGn to its DP address ($D0..$DE in steps of 2). Returns -1 if
// the reg isn't an IMG.
// Maps IMGn to its DP address (IMG0..IMG7 at $D0..$DE, IMG8..IMG15 at
// $C0..$CE, both in steps of 2). Returns -1 if the reg isn't an IMG.
static int imgDPAddr(Register R) {
switch (R) {
case W65816::IMG0: return 0xD0;
case W65816::IMG1: return 0xD2;
case W65816::IMG2: return 0xD4;
case W65816::IMG3: return 0xD6;
case W65816::IMG4: return 0xD8;
case W65816::IMG5: return 0xDA;
case W65816::IMG6: return 0xDC;
case W65816::IMG7: return 0xDE;
case W65816::IMG0: return 0xD0;
case W65816::IMG1: return 0xD2;
case W65816::IMG2: return 0xD4;
case W65816::IMG3: return 0xD6;
case W65816::IMG4: return 0xD8;
case W65816::IMG5: return 0xDA;
case W65816::IMG6: return 0xDC;
case W65816::IMG7: return 0xDE;
case W65816::IMG8: return 0xC0;
case W65816::IMG9: return 0xC2;
case W65816::IMG10: return 0xC4;
case W65816::IMG11: return 0xC6;
case W65816::IMG12: return 0xC8;
case W65816::IMG13: return 0xCA;
case W65816::IMG14: return 0xCC;
case W65816::IMG15: return 0xCE;
default: return -1;
}
}

View file

@ -278,6 +278,12 @@ def : Pat<(store Acc16:$src, (W65816Wrapper tglobaladdr:$g)),
(STAabs Acc16:$src, tglobaladdr:$g)>;
def : Pat<(store Acc16:$src, (W65816Wrapper texternalsym:$s)),
(STAabs Acc16:$src, texternalsym:$s)>;
// Store via a constant-int address (MMIO-style fixed pointer like
// `*(volatile uint16 *)0x5000 = v`). Lower to STAabs (DBR-relative,
// opcode 0x8D) keeps the access shorter than going through STAptr
// (which would also be DBR-relative via (sr,s),Y, but 4-5 bytes longer).
def : Pat<(store Acc16:$src, (iPTR imm:$addr)),
(STAabs Acc16:$src, (i32 imm:$addr))>;
// 16-bit ADD: expands to CLC + ADC_Imm16. The 65816 ADC sums with the
// carry flag, so a clean add needs CLC first. Constraints tie the
@ -893,30 +899,40 @@ def CMP_RR : W65816Pseudo<(outs), (ins Acc16:$lhs, Acc16:$rhs),
// fresh stack slot, set Y=0, and emit LDA/STA (slot,S),Y. Y gets
// clobbered as a side effect. hasSideEffects=1 covers the spill
// store the inserter adds, in addition to the deref.
// LDAptr / STAptr / STBptr lower to [dp],Y indirect-long via DP
// scratch $E0..$E2 (see W65816ISelLowering.cpp inserter). The
// inserter uses A and Y plus the DP scratch X is not touched.
// Defs: Y (LDY #0) and P (STA/LDA set N/Z).
// $ptr is Wide16 (A or IMGn) so when bb.3-style pressure forces the
// pointer to share A with another live vreg, RA can park ptr in an
// IMGn DP slot. Acc16:$ptr was being silently coalesced with the
// loop-PHI accumulator: both wanted A at end of bb, and PHI-elim
// dropped the COPY needed to refresh A with the pointer at top of
// the loop. With Wide16, the COPY $a = ptr lowers to a real LDA $dp.
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
Defs = [Y] in {
def LDAptr : W65816Pseudo<(outs Acc16:$dst), (ins Acc16:$ptr),
Defs = [Y, P] in {
def LDAptr : W65816Pseudo<(outs Acc16:$dst), (ins Wide16:$ptr),
"# LDAptr $dst, $ptr",
[(set Acc16:$dst, (load Acc16:$ptr))]>;
[(set Acc16:$dst, (load Wide16:$ptr))]>;
}
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
Defs = [Y] in {
def STAptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
Defs = [Y, P] in {
def STAptr : W65816Pseudo<(outs), (ins Acc16:$val, Wide16:$ptr),
"# STAptr $val, $ptr",
[(store Acc16:$val, Acc16:$ptr)]>;
[(store Acc16:$val, Wide16:$ptr)]>;
}
// i8 zero-extending pointer load: do a 16-bit LDA (slot,s),y and mask
// the high byte. Reads one byte past the source fine for byte-array
// iteration where the buffer is at least 2 bytes long. A future
// SEP/REP-aware mode pass could switch to a true 8-bit LDA.
def : Pat<(i16 (zextloadi8 Acc16:$ptr)),
(ANDi16imm (LDAptr Acc16:$ptr), 0xFF)>;
def : Pat<(i16 (zextloadi8 Wide16:$ptr)),
(ANDi16imm (LDAptr Wide16:$ptr), 0xFF)>;
// Anyext byte load via pointer: consumer doesn't care about the high
// byte, so just LDA (16-bit). Same 1-byte-past-buffer caveat as
// zextloadi8.
def : Pat<(i16 (extloadi8 Acc16:$ptr)),
(LDAptr Acc16:$ptr)>;
def : Pat<(i16 (extloadi8 Wide16:$ptr)),
(LDAptr Wide16:$ptr)>;
// And the equivalent for absolute addresses (byte loads via global ptr).
// (Already covered for Wrapper(global) above; this catches the case
// where the ptr is materialised as a value.)
@ -941,10 +957,10 @@ def STAfi_indY : W65816Pseudo<(outs), (ins Acc16:$src, memfi:$addr),
// natural truncstorei8 from an i16 value (common with arg promotion),
// and a true i8 store (Acc8) that arises from i8-typed IR.
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
Defs = [Y] in {
def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
Defs = [Y, P] in {
def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Wide16:$ptr),
"# STBptr $val, $ptr",
[(truncstorei8 Acc16:$val, Acc16:$ptr)]>;
[(truncstorei8 Acc16:$val, Wide16:$ptr)]>;
}
// Pointer access with constant offset. `(load (add ptr, $off))` and
@ -953,40 +969,42 @@ def STBptr : W65816Pseudo<(outs), (ins Acc16:$val, Acc16:$ptr),
// the offset becomes an explicit ADC #imm that has to spill A and
// recompute the pointer per access. With them, we just load Y with
// the offset in the inserter (Y is 16-bit so any i16 constant fits).
// LDAptrOff / STAptrOff / STBptrOff: same [dp],Y lowering as the
// no-offset variants but folds the offset into Y.
let usesCustomInserter = 1, hasSideEffects = 1, mayLoad = 1,
Defs = [Y] in {
Defs = [Y, P] in {
def LDAptrOff : W65816Pseudo<(outs Acc16:$dst),
(ins Acc16:$ptr, i16imm:$off),
(ins Wide16:$ptr, i16imm:$off),
"# LDAptrOff $dst, $ptr, $off", []>;
}
let usesCustomInserter = 1, hasSideEffects = 1, mayStore = 1,
Defs = [Y] in {
Defs = [Y, P] in {
def STAptrOff : W65816Pseudo<(outs),
(ins Acc16:$val, Acc16:$ptr, i16imm:$off),
(ins Acc16:$val, Wide16:$ptr, i16imm:$off),
"# STAptrOff $val, $ptr, $off", []>;
def STBptrOff : W65816Pseudo<(outs),
(ins Acc16:$val, Acc16:$ptr, i16imm:$off),
(ins Acc16:$val, Wide16:$ptr, i16imm:$off),
"# STBptrOff $val, $ptr, $off", []>;
}
def : Pat<(i16 (load (add Acc16:$ptr, (i16 imm:$off)))),
(LDAptrOff Acc16:$ptr, imm:$off)>;
def : Pat<(store Acc16:$val, (add Acc16:$ptr, (i16 imm:$off))),
(STAptrOff Acc16:$val, Acc16:$ptr, imm:$off)>;
def : Pat<(truncstorei8 Acc16:$val, (add Acc16:$ptr, (i16 imm:$off))),
(STBptrOff Acc16:$val, Acc16:$ptr, imm:$off)>;
def : Pat<(store Acc8:$val, (add Acc16:$ptr, (i16 imm:$off))),
def : Pat<(i16 (load (add Wide16:$ptr, (i16 imm:$off)))),
(LDAptrOff Wide16:$ptr, imm:$off)>;
def : Pat<(store Acc16:$val, (add Wide16:$ptr, (i16 imm:$off))),
(STAptrOff Acc16:$val, Wide16:$ptr, imm:$off)>;
def : Pat<(truncstorei8 Acc16:$val, (add Wide16:$ptr, (i16 imm:$off))),
(STBptrOff Acc16:$val, Wide16:$ptr, imm:$off)>;
def : Pat<(store Acc8:$val, (add Wide16:$ptr, (i16 imm:$off))),
(STBptrOff (COPY_TO_REGCLASS Acc8:$val, Acc16),
Acc16:$ptr, imm:$off)>;
def : Pat<(store Acc8:$val, Acc16:$ptr),
(STBptr (COPY_TO_REGCLASS Acc8:$val, Acc16), Acc16:$ptr)>;
Wide16:$ptr, imm:$off)>;
def : Pat<(store Acc8:$val, Wide16:$ptr),
(STBptr (COPY_TO_REGCLASS Acc8:$val, Acc16), Wide16:$ptr)>;
// i8 load via Acc16 pointer producing a true i8 (Acc8) result. Reuses
// the existing zextloadi8 16-bit-LDA-and-mask path: load 2 bytes, mask
// the high byte, then narrow to Acc8. COPY_TO_REGCLASS to Acc8 is a
// no-op at MC level (same physical A). Reads one byte past the source;
// fine for char-array iteration where the buffer is at least 2 bytes.
def : Pat<(i8 (load Acc16:$ptr)),
(COPY_TO_REGCLASS (ANDi16imm (LDAptr Acc16:$ptr), 0xFF), Acc8)>;
def : Pat<(i8 (load Wide16:$ptr)),
(COPY_TO_REGCLASS (ANDi16imm (LDAptr Wide16:$ptr), 0xFF), Acc8)>;
// Acc8-to-Acc16 type conversions. Both Acc8 and Acc16 alias physical A,
// so COPY_TO_REGCLASS is a no-op at MC level. ZEXT additionally masks
@ -1109,8 +1127,12 @@ def LDA_AbsY : InstAbsY<0xB9, "lda">;
def LDA_DPInd : InstDPInd <0xB2, "lda">;
def LDA_DPIndY : InstDPIndY<0xB1, "lda">;
def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">;
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">;
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda"> { let Defs = [A]; }
// LDA [dp],Y: reads Y to compute the indexed address, defines A.
// Without these, regalloc thought A was unaffected by the load and
// dead-code-eliminated COPYs that were supposed to materialise the
// next pointer in A silent miscompile in mySwap-style helpers.
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda"> { let Defs = [A]; let Uses = [Y]; }
def LDA_LongX : InstAbsLongX<0xBF, "lda">;
//---------------------------------------------------------------- STA (store A)
@ -1123,8 +1145,10 @@ def STA_AbsY : InstAbsY<0x99, "sta">;
def STA_DPInd : InstDPInd <0x92, "sta">;
def STA_DPIndY : InstDPIndY<0x91, "sta">;
def STA_DPIndX : InstDPIndX<0x81, "sta">;
def STA_DPIndLong : InstDPIndLong <0x87, "sta">;
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">;
def STA_DPIndLong : InstDPIndLong <0x87, "sta"> { let Uses = [A]; }
// STA [dp],Y: reads A (the value to store) and Y (the index). Mark
// both so regalloc keeps A's value live across this instruction.
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta"> { let Uses = [A, Y]; }
def STA_LongX : InstAbsLongX<0x9F, "sta">;
//---------------------------------------------------------------- LDX (load X)

View file

@ -117,14 +117,22 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
Register Src = MI.getOperand(0).getReg();
int srcDP = -1;
switch (Src) {
case W65816::IMG0: srcDP = 0xD0; break;
case W65816::IMG1: srcDP = 0xD2; break;
case W65816::IMG2: srcDP = 0xD4; break;
case W65816::IMG3: srcDP = 0xD6; break;
case W65816::IMG4: srcDP = 0xD8; break;
case W65816::IMG5: srcDP = 0xDA; break;
case W65816::IMG6: srcDP = 0xDC; break;
case W65816::IMG7: srcDP = 0xDE; break;
case W65816::IMG0: srcDP = 0xD0; break;
case W65816::IMG1: srcDP = 0xD2; break;
case W65816::IMG2: srcDP = 0xD4; break;
case W65816::IMG3: srcDP = 0xD6; break;
case W65816::IMG4: srcDP = 0xD8; break;
case W65816::IMG5: srcDP = 0xDA; break;
case W65816::IMG6: srcDP = 0xDC; break;
case W65816::IMG7: srcDP = 0xDE; break;
case W65816::IMG8: srcDP = 0xC0; break;
case W65816::IMG9: srcDP = 0xC2; break;
case W65816::IMG10: srcDP = 0xC4; break;
case W65816::IMG11: srcDP = 0xC6; break;
case W65816::IMG12: srcDP = 0xC8; break;
case W65816::IMG13: srcDP = 0xCA; break;
case W65816::IMG14: srcDP = 0xCC; break;
case W65816::IMG15: srcDP = 0xCE; break;
default: break;
}
if (srcDP >= 0) {

View file

@ -38,22 +38,34 @@ def PBR : W65816Reg<6, "pbr">, DwarfRegNum<[6]>;
def PC : W65816Reg<7, "pc">, DwarfRegNum<[7]>;
def P : W65816Reg<8, "p">, DwarfRegNum<[8]>;
// Imaginary 16-bit registers backed by direct-page slots $D0..$DE.
// The regalloc treats them as physical registers with cheap LDA/STA dp
// inter-register moves. This relieves pressure on the single Acc16
// register (A) so greedy regalloc can succeed on functions with
// multiple simultaneously-live i16 vregs. Caller-save: callees may
// freely overwrite them, so regalloc spills around any call that
// might touch them. Their HWEncoding is never emitted (asmprinter
// translates IMGn references into LDA/STA dp with the right address).
def IMG0 : W65816Reg<16, "img0">, DwarfRegNum<[16]>;
def IMG1 : W65816Reg<17, "img1">, DwarfRegNum<[17]>;
def IMG2 : W65816Reg<18, "img2">, DwarfRegNum<[18]>;
def IMG3 : W65816Reg<19, "img3">, DwarfRegNum<[19]>;
def IMG4 : W65816Reg<20, "img4">, DwarfRegNum<[20]>;
def IMG5 : W65816Reg<21, "img5">, DwarfRegNum<[21]>;
def IMG6 : W65816Reg<22, "img6">, DwarfRegNum<[22]>;
def IMG7 : W65816Reg<23, "img7">, DwarfRegNum<[23]>;
// Imaginary 16-bit registers backed by direct-page slots $C0..$DE
// (16 slots = 32 DP bytes). The regalloc treats them as physical
// registers with cheap LDA/STA dp inter-register moves. This
// relieves pressure on the single Acc16 register (A) so greedy
// regalloc can succeed on functions with multiple simultaneously-
// live i16 vregs. Caller-save: callees may freely overwrite them,
// so regalloc spills around any call that might touch them. Their
// HWEncoding is never emitted (asmprinter translates IMGn references
// into LDA/STA dp with the right address).
//
// Layout: IMG0..IMG7 at $D0..$DE (legacy slot block); IMG8..IMG15
// at $C0..$CE. Avoid stepping on user DP allocations below $C0.
def IMG0 : W65816Reg<16, "img0">, DwarfRegNum<[16]>;
def IMG1 : W65816Reg<17, "img1">, DwarfRegNum<[17]>;
def IMG2 : W65816Reg<18, "img2">, DwarfRegNum<[18]>;
def IMG3 : W65816Reg<19, "img3">, DwarfRegNum<[19]>;
def IMG4 : W65816Reg<20, "img4">, DwarfRegNum<[20]>;
def IMG5 : W65816Reg<21, "img5">, DwarfRegNum<[21]>;
def IMG6 : W65816Reg<22, "img6">, DwarfRegNum<[22]>;
def IMG7 : W65816Reg<23, "img7">, DwarfRegNum<[23]>;
def IMG8 : W65816Reg<32, "img8">, DwarfRegNum<[32]>;
def IMG9 : W65816Reg<33, "img9">, DwarfRegNum<[33]>;
def IMG10 : W65816Reg<34, "img10">, DwarfRegNum<[34]>;
def IMG11 : W65816Reg<35, "img11">, DwarfRegNum<[35]>;
def IMG12 : W65816Reg<36, "img12">, DwarfRegNum<[36]>;
def IMG13 : W65816Reg<37, "img13">, DwarfRegNum<[37]>;
def IMG14 : W65816Reg<38, "img14">, DwarfRegNum<[38]>;
def IMG15 : W65816Reg<39, "img15">, DwarfRegNum<[39]>;
// DPF0 pseudo-physreg modeling the i16 storage at DP $F0..$F1.
// Used as the carrier for the highest 16 bits of an i64/double
@ -85,8 +97,10 @@ def Idx16 : RegisterClass<"W65816", [i16], 16, (add X, Y)>;
// may freely overwrite $D0..$DF, so the allocator must spill IMGn
// vregs around any call.
def Img16 : RegisterClass<"W65816", [i16], 16,
(add IMG0, IMG1, IMG2, IMG3,
IMG4, IMG5, IMG6, IMG7)>;
(add IMG0, IMG1, IMG2, IMG3,
IMG4, IMG5, IMG6, IMG7,
IMG8, IMG9, IMG10, IMG11,
IMG12, IMG13, IMG14, IMG15)>;
// Acc-or-IMG combined class. Vregs that are not constrained to A
// (i.e., not the source of an arithmetic op) get widened to this
@ -94,8 +108,10 @@ def Img16 : RegisterClass<"W65816", [i16], 16,
// A first so the allocator's default order prefers A; cross-class
// moves to/from A are LDA/STA dp via copyPhysReg.
def Wide16 : RegisterClass<"W65816", [i16], 16,
(add A, IMG0, IMG1, IMG2, IMG3,
IMG4, IMG5, IMG6, IMG7)>;
(add A, IMG0, IMG1, IMG2, IMG3,
IMG4, IMG5, IMG6, IMG7,
IMG8, IMG9, IMG10, IMG11,
IMG12, IMG13, IMG14, IMG15)>;
def PtrRegs : RegisterClass<"W65816", [i16], 16, (add SP)>;

View file

@ -1301,10 +1301,29 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
// implicit-def $a but the return-value flags aren't reliably set,
// and other corner cases break smoke.
auto isATransparent = [](const MachineInstr &MI) {
// Stores that don't touch A or P-bits-other-than-via-A.
return MI.getOpcode() == W65816::STAfi ||
MI.getOpcode() == W65816::STAfi_indY ||
MI.getOpcode() == W65816::STA8fi;
// Stores that don't touch A or P-bits-other-than-via-A. (Byte
// stores that internally SEP/REP wrap toggle the M flag, but that
// doesn't affect N/Z based on A's current value.) Also call-stack
// pseudos (ADJCALLSTACKDOWN / UP) which are zero-effect at this
// point in the pipeline (PEI eliminates UP; DOWN is always nil).
switch (MI.getOpcode()) {
case W65816::STAfi:
case W65816::STAfi_indY:
case W65816::STA8fi:
case W65816::STAabs:
case W65816::STA8abs:
case W65816::STAptr:
case W65816::STBptr:
case W65816::STAptrOff:
case W65816::STBptrOff:
case W65816::ADJCALLSTACKDOWN:
// DOWN expands to nothing (PUSH16 chain already shifted SP).
// UP is NOT transparent: when PEI doesn't process it, AsmPrinter
// emits a TSC/CLC/ADC/TCS sequence that clobbers A and flags.
return true;
default:
return false;
}
};
// Returns true iff walking back from `Start` (exclusive) finds an
// A-modifier as the first non-skip op. Skips debug ops and