Checkpoint

This commit is contained in:
Scott Duensing 2026-05-02 18:30:15 -05:00
parent 35aaf7953a
commit f338d93bae
21 changed files with 26533 additions and 965 deletions

632
STATUS.md
View file

@ -14,9 +14,16 @@ which runs correctly under MAME (apple2gs).
(signed and unsigned). Carry-chained multi-word ops via ADC/SBC pseudos (signed and unsigned). Carry-chained multi-word ops via ADC/SBC pseudos
+ ASLA16 / shift libcalls. + ASLA16 / shift libcalls.
- Comparisons and signed/unsigned widening (sext, zext, trunc) for all - Comparisons and signed/unsigned widening (sext, zext, trunc) for all
the above sizes. the above sizes. Signed compare near INT_MIN handled via EOR-with-
sign-bit transform.
- Pointer arithmetic, array indexing, struct field access, struct - Pointer arithmetic, array indexing, struct field access, struct
return-by-value (up to 8 bytes — Pair, Vec4, double). return-by-value (up to 8 bytes — Pair, Vec4, double).
- Pointer dereference (`*p`) lowers via `LDAptr / STAptr / STBptr`
to `[$E0],Y` indirect-LONG with the bank byte at `$E2` forced to 0
— DBR-independent, so `pha;plb` bank-switched callers don't corrupt
data through callee local-pointer writes. Const-int pointers
(`*(volatile uint16 *)0x5000 = v` MMIO idiom) lower to `STAabs`
(DBR-relative) so bank-2 writes still work.
- Bitfields, switch statements (verified up to ~12 cases + default), - Bitfields, switch statements (verified up to ~12 cases + default),
function pointers, function-pointer tables, indirect calls via function pointers, function-pointer tables, indirect calls via
`__jsl_indir` trampoline. `__jsl_indir` trampoline.
@ -25,14 +32,15 @@ which runs correctly under MAME (apple2gs).
- Loops with goto / break / continue, nested loops, state machines. - Loops with goto / break / continue, nested loops, state machines.
- `<stdarg.h>` varargs with int / long / unsigned long long mixed args. - `<stdarg.h>` varargs with int / long / unsigned long long mixed args.
- Heap: `malloc` / `free` (libc.c first-fit allocator) — linked-list - Heap: `malloc` / `free` (libc.c first-fit allocator) — linked-list
reverse with `cons` works. reverse with `cons` works; free-list coalesce verified.
- Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa - Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa
roundtrip. roundtrip.
- Soft-float (single): all four ops + comparisons, MAME-verified. - Soft-float (single): all four ops + comparisons, MAME-verified.
- Soft-double: add, sub, mul, div all return correct bit patterns - Soft-double: add, sub, mul, div all return correct bit patterns
bit-for-bit against gcc with round-to-nearest-even rounding; bit-for-bit against gcc with round-to-nearest-even rounding;
3-iter Newton sqrt converges. Long-running iterations may hit MAME's 3-iter Newton sqrt converges. Compiles at -O2 throughout. Long-
1-second sim-time budget (test config issue, not a compiler bug). running iterations may hit MAME's 1-second sim-time budget (test
config issue, not a compiler bug).
- Inline assembly with `"a"`, `"x"`, `"y"` register constraints and - Inline assembly with `"a"`, `"x"`, `"y"` register constraints and
arbitrary opcode bytes (used for the `pha;plb` bank-switch idiom). arbitrary opcode bytes (used for the `pha;plb` bank-switch idiom).
- C++ minimal: clang++ compiles a class with virtual + non-trivial - C++ minimal: clang++ compiles a class with virtual + non-trivial
@ -43,22 +51,41 @@ which runs correctly under MAME (apple2gs).
C99 truncation semantics for snprintf. `%.Nf` produces the C99 truncation semantics for snprintf. `%.Nf` produces the
correct fractional digits with round-half-up. correct fractional digits with round-half-up.
- qsort + bsearch over arbitrary element size with a user `cmp` - qsort + bsearch over arbitrary element size with a user `cmp`
callback (insertion-sort variant — sidesteps the greedy regalloc callback.
bug in the recursive iterative-qsort form).
- Standard string/stdlib glue: strcat, strncat, strpbrk, strspn, - Standard string/stdlib glue: strcat, strncat, strpbrk, strspn,
strcspn, atol, llabs (kept in their own translation unit so strcspn, atol, llabs (kept in their own translation unit so
vprintf's branch layout doesn't shift). vprintf's branch layout doesn't shift).
- `<math.h>`: fabs, floor, ceil, fmod, copysign, sqrt, pow, - `<math.h>`: fabs, floor, ceil, fmod, copysign, sqrt, pow,
sin, cos, exp, log, atan, atan2, asin, acos, sinh, cosh, tanh sin, cos, tan, exp, log, atan, atan2, asin, acos, sinh, cosh,
(and float variants). Bit-twiddling for fabs/floor/ceil/copysign; tanh (and float variants). Bit-twiddling for fabs/floor/ceil/
Newton iteration for sqrt; range-reduction + Taylor for sin/cos/ copysign; Newton iteration for sqrt; range-reduction + Taylor
exp/log/atan; identities for asin/acos/atan2/sinh/cosh/tanh. for sin/cos/exp/log/atan; identities for asin/acos/atan2/sinh/
Accuracy is in the ~1e-6 range — good enough for typical numeric cosh/tanh. Accuracy is in the ~1e-6 range — good enough for
work, far short of glibc-quality. These are slow (each call is typical numeric work, far short of glibc-quality. These are
dozens to hundreds of soft-double libcalls) — pre-compute or slow (each call is dozens to hundreds of soft-double libcalls)
cache when possible. — pre-compute or cache when possible.
- `setjmp` / `longjmp` from libgcc.s. - `setjmp` / `longjmp` from libgcc.s.
- Static constructors via crt0's init_array walk. - Static constructors via crt0's init_array walk.
- `<stdio.h>` file I/O against an in-memory FS: `mfsRegister
(path, buf, size, cap, writable)` stages a buffer as a named
file; `fopen`/`fread`/`fwrite`/`fseek`/`ftell`/`fclose`/`fgetc`
/`fgets`/`ungetc`/`fprintf` operate on it via a per-FILE
(kind, buf, size, cap, pos, eof, err, unget) record. stdin/
stdout/stderr route through `putchar` as before.
- `<wchar.h>`: wcslen / wcscmp / wcsncmp / wcscpy / wcsncpy /
wcscat / wcschr / wcsrchr; mbtowc / wctomb / mbstowcs /
wcstombs / mblen with the trivial 1:1 byte<->wide mapping
(Latin-1). wchar_t is 16-bit on this target.
- `<signal.h>`: in-process signal table. signal() registers a
handler; raise() invokes it. Default actions: SIGABRT calls
abort(), SIGINT/SIGTERM call exit(128+sig), others ignored.
- `<locale.h>`: setlocale always returns "C"; localeconv returns
a fixed C-locale lconv struct.
- C++ subset: classes, single inheritance, virtual functions,
polymorphism via base-class pointer arrays, virtual dtors.
Compile with `clang++ -fno-exceptions -fno-rtti`. Multiple
inheritance with virtual bases, full RTTI, exceptions are
out of scope.
**Toolchain:** **Toolchain:**
@ -67,23 +94,60 @@ which runs correctly under MAME (apple2gs).
text/rodata/bss, emits a flat binary the IIgs ROM can load. text/rodata/bss, emits a flat binary the IIgs ROM can load.
Auto-relocates bss above text+rodata when the default Auto-relocates bss above text+rodata when the default
`--bss-base 0x2000` would overlap text, and skips past the `--bss-base 0x2000` would overlap text, and skips past the
IIgs IO window ($C000-$CFFF) if needed. IIgs IO window ($C000-$CFFF) if needed. `--gc-sections`
(default ON) drops unreachable functions: a minimal program
with full runtime linked shrinks from ~43KB to ~1.5KB.
- `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's - `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's
native object format) for round-tripping with classic dev tools. native object format) for round-tripping with classic dev tools.
- `link816 --debug-out FILE` writes a DWARF sidecar with text/
rodata/bss/init_array relocations applied to every `.debug_*`
section, so `.debug_addr` / `.debug_line` PC values are final-
image addresses.
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double, - `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
libgcc into linkable objects. libgcc into linkable objects.
- `scripts/smokeTest.sh` runs 107 end-to-end checks (scalar ops, - `scripts/smokeTest.sh` runs 113 end-to-end checks at -O2:
control flow, calling conventions, MAME execution, regressions, scalar ops, control flow, calling conventions, MAME execution
link816 bss-base safety + weak-symbol resolution + regressions, link816 bss-base safety + weak-symbol resolution +
heap_end-vs-heap_start sanity, iigs/toolbox.h compile + link heap_end-vs-heap_start sanity, iigs/toolbox.h compile + link,
check (singles inlined, multi-arg wrappers in iigsToolbox.s), standalone runtime headers, AsmPrinter peepholes (STZ / PEA /
standalone runtime headers, AsmPrinter peepholes for STZ / PEI — single-STA, shared-LDA-multi-STA, DPF0-forwarding),
PEA / PEI — single-STA, shared-LDA-multi-STA, and DPF0- malloc/free coalesce ordering, plus real-world coverage:
forwarding cases — malloc/free coalesce ordering, plus Conway's Game of Life blinker (2D loop + neighbour bounds),
real-world coverage tests: Conway's Game of Life blinker binary search tree (recursive struct + malloc), function-pointer
(2D loop + neighbour bounds), binary search tree (recursive dispatch table (indirect JSL via `__jsl_indir`), memory-backed
struct + malloc), function-pointer dispatch table (indirect file I/O (mfsRegister + fopen/fread/fwrite/fseek/fprintf), C++
JSL via `__jsl_indir`). Currently 100% pass at -O2 throughout. polymorphism (single inheritance + virtual functions), wchar /
signal core APIs, hex dumper writing through fprintf, JSON
tokenizer state machine, scripts/bench.sh size-vs-Calypsi
harness. 100% pass.
- `scripts/bench.sh` compiles a microbenchmark suite with both
clang (this toolchain) and Calypsi cc65816, comparing emitted
text-section size. Current ratio: ~2.2x (clang generates more
bytes than Calypsi on average; sumOfSquares is the worst case
at 6.45x because of __mulsi3 dispatch). Eight benchmarks
shipped under `benchmarks/`.
**Backend register allocation:**
- Greedy regalloc as default at -O1+; fast at -O0/optnone.
- Pre-RA passes: `WidenAcc16` (Acc16→Wide16 promotion, lets
greedy spread i16 pressure across A and 16 IMG slots);
`TiedDefSpill` (handles tied-def-multi-use hazard);
`ABridgeViaX` (bridges via X/Y when free).
- Post-RA passes: `SpillToX` (STA/LDA pairs → TAX/TXA bridges
when X dead); `StackSlotCleanup` (deletes redundant adjacent
spills); `NegYIndY` (rewrites negative-Y indirect-Y stack-rel
ops to avoid the 24-bit-add bank-cross).
- Pre-emit: `BranchExpand` (long Bxx → INV_Bxx skip; BRA target);
`SepRepCleanup` (coalesces adjacent SEP/REP toggles, plus a
cross-mode-neutral coalesce that drops REP/SEP pairs sandwiching
X-flag-only ops, branches, transfers — saves 4B / 12cyc per
collapse). AsmPrinter LDAi8imm peephole walks past mode-neutral
MIs to fuse the closing REP into a following SEP.
- Imaginary registers IMG0..IMG15 backed by DP $C0..$CE +
$D0..$DE — gives greedy 17 effective i16 carriers (A + 16 IMG)
before stack spills kick in.
**ABI:** **ABI:**
@ -95,459 +159,83 @@ which runs correctly under MAME (apple2gs).
- Frame is empty-descending (S points to next-free); offsets account - Frame is empty-descending (S points to next-free); offsets account
for the +1 skew vs LLVM's full-descending model. for the +1 skew vs LLVM's full-descending model.
**IIgs toolbox:**
- `iigs/toolbox.h` — autogenerated wrappers for all ~1300 IIgs
toolbox routines across 35 tool sets (Tool Locator, Memory
Manager, Misc Tools, QuickDraw II / Aux, Event Manager,
Sound Manager, Apple Desktop Bus, SANE, Integer Math, Text
Tools, Window Manager, Menu Manager, Control Manager,
LineEdit, Dialog Manager, Scrap Manager, Standard File,
Note Synth/Sequencer, Font Manager, List Manager, ACE,
Resource Manager, MIDI, Video Overlay, TextEdit, Media
Control, Print Manager, Scheduler, Desk Manager, …). Names
match Apple's IIgs Toolbox Reference exactly (TLStartUp,
MMStartUp, NewWindow, SysBeep, …). 417 simple wrappers
(zero/single-arg, i16-or-void return) inline in the header;
890 multi-arg ones live in `runtime/src/iigsToolbox.s`.
Generated by `scripts/genToolbox.py` from ORCA-C's
`ORCACDefs/` (re-runnable when ORCA-C updates).
## In flight ## In flight
Two open bugs tracked:
1. **#107 — strtok / qsort -O1+ miscompile — RESOLVED.** Three
independent issues across the backend, runtime, and linker;
all fixed.
**Fix 1 (W65816StackSlotCleanup cross-MBB):** Pass -4 /
Pass -4c collapsed `LDA fs.X; STA stk.Y; ... LDA_indY stk.Y`
patterns with only an MBB-local safety check, missing cross-MBB
readers of stk.Y. Greedy regalloc had spilled an in-place INA
result back to stk.Y; eliminating the bb.3 init store left the
bb.10 reload reading garbage. Function-wide cross-MBB check
added.
**Fix 2 (W65816SepRepCleanup LDAi8imm hoist):** Pre-pass that
relocates LDAi8imm BEFORE byte-store SEP/REP wraps. LDAi8imm
expands at AsmPrinter to its own SEP+LDA8+REP that toggles M;
the post-RA scheduler was moving it INSIDE an STBptr wrap, so
the LDAi8imm's REP fired BEFORE the byte STA. The STA then
ran in M=16, writing 2 bytes of zero and clobbering the next
byte. Hoist puts the toggle in the outer M=16 zone, leaving
the byte STA in M=8.
**Fix 3 (link816 bss-base safety + strtok_r noinline):** With
the backend fixes, -O2 strtok grew large enough that the
strtok() wrapper inlining (~290 extra bytes) pushed the
binary's text+rodata past 0xC000 (IIgs IO window). Reads of
string literals or stdio handles in that range hit IO
registers and corrupted execution. Two complementary fixes:
`__attribute__((noinline))` on `strtok_r` so the wrapper
doesn't duplicate it (-O2 strtok.o now 1564B, was 2156B);
link816 auto-relocates bss above text+rodata when default
`--bss-base 0x2000` would overlap, and skips past the IO
window if needed.
strtok.c now compiles at -O2 with everything else. Smoke
#84 (4-call strtok continuation) and #92 (recursive parser)
both pass. Workaround comments in build.sh / smokeTest.sh
removed.
The `__attribute__((noinline,optnone))` defenses on iterative
qsort / RPN `runAll` / expression-parser `runAll` were
subsequently dropped; the smoke now compiles them at plain
`-O2` without escape hatches.
The W65816 backend assembler now supports all common indirect
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
`[dp]`, `[dp],Y`, and `JMP (abs)`). All `.byte` opcode hacks in
the runtime have been removed in favour of the mnemonics. The
disassembler decodes them too.
Runtime now exposes a ~complete C99 subset: sprintf/snprintf with correct %.Nf precision, qsort/bsearch,
the full string.h family (strcat/strncat/strpbrk/strspn/strcspn/
strtok/strtok_r), math.h with the eleven common transcendentals
(sqrt/pow/sin/cos/exp/log/atan/atan2/asin/acos/sinh/cosh/tanh),
atol/llabs/atexit/exit/abort, and a smoke test that exercises
malloc + struct pointers + strcmp/strcpy via a working hash table
end-to-end in MAME.
`strtok` / `strtok_r` live in their own TU at `-O2` (with
`__attribute__((noinline))` on `strtok_r` so the strtok() wrapper
doesn't duplicate it). Multi-call strtok over "a,b,,c" works
end-to-end in smoke. The layout-sensitive miscompile that
previously haunted strtok_r's inner CMP loop has been fixed by
modelling `Uses=[P]` on the conditional branches (the LICM/sink
interaction that elided "redundant" CMPs no longer fires); no
surgical workaround flags needed.
A small **RPN calculator** test (smoke #87) chains strtok, atol,
push/pop over a static stack, snprintf "%ld", and strcmp to verify
the end-to-end composition under a realistic-ish workload — adds,
subs, muls, divs, and 3-deep operand stacks all work.
**setjmp / longjmp** (smoke #88) now work end-to-end: setjmp saves
SP / 24-bit ret addr / DP, longjmp restores them and returns the
val argument as setjmp's "second return". Required two fixes:
(a) the W65816 assembler had no instruction definition for
`(dp)` / `(dp), y` / `(dp, x)` indirect addressing modes, so the
mnemonic forms silently fell through to absolute-,Y opcodes —
fixed in `src/llvm/lib/Target/W65816/W65816InstrFormats.td` +
`W65816InstrInfo.td` + `AsmParser/W65816AsmParser.cpp` (the runtime
.byte hacks have been replaced with mnemonics); (b) added
`__attribute__((returns_twice))` to the setjmp declaration so the
optimizer doesn't constant-fold post-setjmp env reads to 0.
**CRC32** (smoke #89) verifies the standard "123456789" → 0xCBF43926
end-to-end — exercises uint32_t shifts, XORs, char-by-char loops.
**Brainfuck interpreter** (smoke #90) executes a small bf program
and verifies the output bytes — exercises loop bracket matching,
pointer math (data pointer), branching on cell value.
**Recursive-descent expression parser** (smoke #92) evaluates
"3+4", "2*3+4", "2+3*4", "(3+4)*5", "100/4-5*2+1" with proper
operator precedence and parentheses — exercises mutual recursion,
char-by-char tokenization, and integer arithmetic in concert.
The **DWARF sidecar** (`link816 --debug-out FILE`) now applies
text/rodata/bss/init_array relocations to every `.debug_*` section
before writing it. PC values in `.debug_addr` and `.debug_line` end
up as final-image addresses, so a consumer can map back to source
lines without re-running the linker. Intra-debug references (e.g.
`.debug_info` -> `.debug_str` offsets) are intentionally left
object-local — sections are concatenated, not recompacted, and each
slice carries an `; OBJ ... SEC ... SIZE ...` header so a multi-TU
consumer can scope intra-debug offsets per-slice. The smoke test
verifies the address of a known function appears in the patched
sidecar bytes.
## Known issues / workarounds
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
correct for negative offsets like `arr[i-1]`.
- **Pointer-deref bank policy is now split-by-syntax** (FIXED):
`*p` (where `p` is a runtime pointer / local-or-arg vreg) lowers
via `LDAptr / STAptr / STBptr` to `[$E0],Y` indirect-LONG with
the bank byte at `$E2` forced to 0 — DBR-independent. The
`*(volatile uint16 *)0x5000 = v` MMIO idiom (const-int pointer)
is matched by a separate TableGen pattern that lowers straight
to `STAabs` (DBR-relative) so the smoke tests' bank-2 write
path still works. Two tracked issues this resolved:
(a) PHI-elim was eliding the inserter's `COPY $a = ptr_vreg`
when the loop body had multiple Acc16 PHIs competing for A —
the inserter now spills the pointer to a fresh stack slot and
reloads via LDAfi to keep RA honest; sumTable now correct.
(b) pointer staging through `[$E0]` is bank-0 only, so
switchToBank2 + helper-with-local-ptr no longer corrupts data
in the wrong bank. See `feedback_dbr_ptr_deref_spill.md`.
- **Greedy regalloc fails on long-arg call chains** — a function - **Greedy regalloc fails on long-arg call chains** — a function
that strings ~7+ independent `helper(longArg1, longArg2)` calls that strings ~7+ independent `helper(longArg1, longArg2)` calls
overflows greedy at -O1+ ("ran out of registers during register overflows greedy at -O1+ with "ran out of registers during
allocation"). Same root issue as softDouble's old -O2 hold-out. register allocation". IMG slot expansion (8→16) raised the
Threshold raised somewhat by expanding IMG slots from 8 to 16 threshold; most "normal-looking" mixed-arity workloads now
(now backed by DP $C0..$DE) — most "normal-looking" mixed-arity compile, but pathological pressure (many i32+ args + bitmask
workloads now compile, but pathological pressure (many i32+ args SETCC chain in one function) still fails. Workarounds: mark
+ bitmask SETCC chain) still fails. Workarounds (in order of the heaviest helper `__attribute__((noinline))`; or
preference): mark the heaviest helper `__attribute__((noinline))` `-mllvm -regalloc=fast` for that TU; or `__attribute__((optnone))`
to reduce caller pressure; `-mllvm -regalloc=fast` for that TU; on the affected function. Proper fix needs either a custom
or `__attribute__((optnone))` on the affected function. A proper greedy→fast fallback in
fix needs either a custom greedy→fast fallback in `W65816TargetMachine::createTargetRegisterAllocator` or a
`W65816TargetMachine::createTargetRegisterAllocator` or a smarter smarter spill-placement pre-RA pass.
spill-placement pre-RA pass.
- **Bank-0 size limit (~48KB)** — the runtime + program must fit in - **`time()` / `clock()` are stubs** returning 0. ReadTimeHex
$1000-$BFFF (text+rodata) plus $D000-$DFFF (LC1 for rodata-spill (Misc Tool $0D03) needs the Tool Locator initialised in crt0
and BSS). Past that, link816 hard-fails because text would to not crash MAME; the VBL counter at $E1006B needs 24-bit
cross the IO window. In practice this is rarely hit now that
link816 has `--gc-sections` (default ON, see Recently Fixed)
which drops unreachable functions: a minimal program shrinks
from ~43KB (whole runtime) to ~1.5KB. Programs that genuinely
use most of the runtime can still hit the limit.
## Recently fixed
- **DBR pointer-deref RA elision (sumTable miscompile)**
`LDAptr / STAptr / STBptr` inserter's first-thing
`COPY $a = ptr_vreg` was being elided by RA when the loop body
had multiple Acc16 PHIs competing for A. PHI-elim silently
dropped the COPY needed to refresh A with the pointer at the
top of each iteration; sumTable's inner loop did `STA $E0`
while A held the accumulator. Fix: spill the pointer to a
fresh stack slot via `STAfi` and reload via `LDAfi` — forces
RA to materialize the value through real machine ops. See
`feedback_dbr_ptr_deref_spill.md`.
- **softDouble.c -O2 hold-out** — with the DBR fix in place,
`dclass` can be `noinline` (its pointer-arg writes go through
`STBptr / STAptr` which now use `[$E0],Y` indirect-long with
bank=0). Drops register pressure in `__muldf3 / __divdf3 /
__adddf3` enough that greedy regalloc no longer runs out. All
three smoke build sites moved from `-O1` to `-O2`.
- **IMG slot count doubled (8 → 16)** — Img16 / Wide16 register
classes now hold IMG0..IMG15, backed by DP $C0..$CE + $D0..$DE.
Reduces greedy regalloc spills for moderately-busy functions.
Existing `IMG0..IMG7 → $D0..$DE` mapping unchanged so smoke
tests that assume specific DP carriers (e.g. DPF0 at $F0) still
work. User app DP is now $00..$BF (was $00..$CF).
- **Real-world smoke coverage added** — Conway's Game of Life
blinker (2D arrays + neighbour bounds), binary search tree
(recursive struct + malloc), function-pointer dispatch table
(indirect-JSL via `__jsl_indir`). Total smoke tests at 107.
- **iigs/toolbox.h expanded** — from 4 stubs to 18+ wrappers
across Tool Locator, Memory Manager, Misc Tools, QuickDraw II,
Event Manager, Window Manager, plus GS/OS Quit. Multi-arg
wrappers live in `runtime/src/iigsToolbox.s` (the backend's
inline-asm constraints can't take memory operands); single-arg
ones stay inline.
- **#70 — iterative qsort -O2 miscompile** — `W65816StackSlotCleanup`
Pass -2 was deleting a store to a slot the loop body read.
Function-wide `slotHasOtherRefs` safety check added (Pass -1 and
Pass -2c hardened with the same pattern). Iterative qsort at
plain -O2 + greedy now compiles correctly; the `optnone` workaround
in smoke #70 was removed.
- **strtok -O2 layout-sensitive miscompile** — modelling `Uses=[P]`
on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/BVS/BVC) made
MachineCSE / scheduler / LICM / sink see the CMP→Bxx flag
dependency. An entire class of layout-sensitive flag-corruption
bugs went away; verified by sweeping `--rodata-base` from text-end
to text-end+300 in 13 increments — every layout returns the correct
strtok result. As a follow-on, MachineCSE has been re-enabled
(was previously disabled in `W65816TargetMachine::addMachineSSAOpti­
mization` as a workaround for the same root cause).
- **link816 silently produced 4.3GB binaries** when `--rodata-base`
was set inside the text region. Now dies with a clear error:
`--rodata-base 0xX overlaps text 0xY+N (must start at or after 0xZ)`.
- **link816 BSS-relocate landed in IIgs Language Card area**
when text+rodata grew past $C000, link816 placed BSS at $D000
(the LC1 area), where IIgs-by-default maps ROM (writes drop
silently, reads return ROM bytes). Globals never initialised;
caught by the expression-parser smoke (#92) when adding rand /
strnlen / etc. pushed the runtime past that threshold. Two-part
fix: crt0 now enables LC1 RAM via the standard `lda $C083`
read-twice trick at startup, and link816 hard-fails (rather
than silently corrupt) if BSS would exceed the LC1 ceiling
($E000) — past that you'd need crt0 to also enable LC2 / shadow
RAM, which we haven't wired up.
- **STZ peephole multi-STA latent miscompile** — AsmPrinter's
`LDA #0; STA $g` -> `STZ $g` peephole eliminated the LDA but
only consumed the FIRST `STA`. When SDAG-CSE shared one
`LDA #0` across multiple `STA`s (`g16=0; g32=0;` is one IR
shape), trailing `STA`s read whatever was in A on entry —
silently corrupting any global where A wasn't 0 at function
entry. Smoke happened to pass because A was 0 by luck in
every covered path. Fixed by gating the peephole on the
consuming `STA` killing A (regalloc only sets `killed` on the
last reader); smoke #98 added to lock the multi-STA case.
- **PEI AsmPrinter peephole** — new: `LDA $dp; PHA` -> `PEI $dp`
saves 1 byte and avoids touching A. Fires on the
`copyPhysReg(A=DPF0); PUSH16` pattern (i64-libcall return-value
forwarding into the next call's stacked args), which appears
in every chained soft-double / soft-int64 expression. Saves
68 bytes across the runtime (-64 in math.o alone). Same
next-instruction-modifies-A safety check as the PEA peephole.
Smoke #99 added.
- **PEA peephole opcode-allowlist replaced with `modifiesRegister`**
the next-after-PUSH16 check that gates the PEA peephole was a
hand-curated list of opcodes that obviously redefine A; switched
to `MachineInstr::modifiesRegister(A, TRI)` which also catches
implicit-defs (e.g. JSL clobbering A as part of the call ABI).
Saves a few bytes and is more robust.
- **libgcc.s `lda #0; sta $XX` -> `stz $XX`** — 7 sites converted
in libgcc.s after STZ landed in the assembler. Saves 28 bytes;
also removes two PHA/PLA save-restore wraps around the LDA #0
(STZ doesn't touch A, so the wraps are unnecessary).
- **libgcc.s `lda dp; pha` -> `pei dp`** — 2 sites in __divhi3 /
__modhi3 where the loaded A is dead after the push. PEI
doesn't touch A, saves 1 byte each.
- **W65816StackSlotCleanup Pass 1c skip-list extended** — added
STAabs / STA8abs / STAptr / STBptr / STAptrOff / STBptrOff and
ADJCALLSTACKDOWN to the A-transparent list. Lets the redundant-
CMP-after-A-modifier elimination see through more pseudo
stores and the call-stack-down pseudo. Saves 8 bytes in math.o.
(ADJCALLSTACKUP is NOT transparent — when PEI doesn't process
it, AsmPrinter emits a TSC/CLC/ADC/TCS that clobbers A.)
- **crt0.s `lda #0; sta` -> `stz`** — IRQ-disable block and the
BSS-zero loop both used `.byte 0xa9, 0x00 ; sta` raw-byte
workarounds for `lda #0` (the assembler emits a 16-bit immediate
in M=8, mis-encoding it). `stz` works in M=8 (stores 1 byte) and
doesn't touch A — both `.byte` workarounds removed; saves 4 bytes
in crt0.o.
- **Runtime correctness pass — five real bugs fixed:**
- `free()` coalesce: when a freed block was absorbed into a
lower-address neighbour (`bEnd == a` path), the absorbed entry
was left in the free list overlapping the extended one. A
follow-on malloc could hand out the same memory to two
callers. Fix: track outer-loop predecessor and excise the
absorbed entry. Smoke #100 added.
- `sqrt(-0.0)` returned NaN; should return -0.0 per IEEE-754.
The sign-bit check fired before the zero check. Fix: mask
sign bit when testing for zero.
- `log(0)` returned NaN; should return -Infinity (pole error).
Same sign-bit-vs-zero ordering issue; both ±0 now return
`-1.0/0.0`.
- `snprintf(buf, 0, ...)` wrote `'\0'` to `buf[-1]` (one byte
BEFORE the buffer). C99 says n=0 must not touch the buffer.
Fix: set `gEnd = NULL` for n=0 so neither the normal nor the
truncation NUL-write path fires. Smoke #76 extended.
- `malloc(>~32KB)` and `calloc(n, m)` had silent integer overflow
on size_t (16-bit), wrapping to small values and handing out
tiny allocations claiming huge sizes. Bumped malloc to bail
above 0x7FF0 (heap is at most ~32KB anyway) and made calloc
overflow-check before multiplying.
- **Removed** dead `runtime/src/softDouble.s` (a stub from before
`softDouble.c` was implemented; the build script doesn't reference
it but it was confusing to leave around).
- **inttypes.h PRId64 / PRIu64 / PRIx64** documented as
unsupported in the runtime's printf — the macros expand to
`"lld"`/`"llu"`/`"llx"` but the formatter only knows the `l`
length modifier, not `ll`, so the format prints literally and
the va_list misaligns. Use `PRId32` etc. for now.
- **More runtime fixes (round 2):**
- `fputs(s, stream)` was forwarding to `puts(s)`, which appends a
newline. C says fputs MUST NOT add one. Direct char-by-char
write now.
- `exit(code)` never invoked the registered `atexit` handler.
C99 7.20.4.3 requires it. Now runs the single-slot handler
(with re-entry guard) before the BRK.
- `printf("%f", -0.0)` printed `0.000000` instead of `-0.000000`
because `if (v < 0)` (a `__ltdf2` call) returns false for
negative zero. Switched to the IEEE-754 sign-bit test that
snprintf already uses.
- `vfprintf` was missing entirely (declared neither in stdio.h
nor implemented). Added a thin wrapper around vprintf.
- **link816 weak-symbol resolution:** the linker previously used
"last def wins" with no regard for STB_GLOBAL vs STB_WEAK. When
a user provided a strong override of a weak libc stub (e.g.
`putchar`), it worked only by link-order luck — reversing the
order let the weak stub silently overwrite the strong def.
Now properly: strong over weak (any order), strong + strong
errors out, weak + weak picks the first. Smoke #100 added.
- **More runtime fixes (round 3):**
- `writeHex` / `emitHex` had a stack-overflow buffer overrun
(`char buf[5]` but `printf("%08x", ...)` would write 8 bytes).
On 16-bit `unsigned int`, max useful width is 4 — buf shrunk
to 4 and width is now capped.
- `writeDec` / `writeSignedLong` / `emitDec` / `emitSignedLong`
used `-n` on signed input, which overflows for INT_MIN /
LONG_MIN (UB). All four switched to unsigned-negation
(`0u - (unsigned)n`) for correctness and to keep an
optimizer-aware compiler from exploiting the UB.
- `atoi` / `atol` / `strtol` / `strtoul` likewise built the
parsed magnitude in a signed accumulator and negated at the
end — same UB on the boundary value. All switched to
unsigned magnitude + unsigned-negation cast.
- `link816 parseInt` / `omfEmit parseInt` silently truncated
addresses > 24 bits to `uint32_t` low bits — `--text-base
0x100000000` would silently wrap to 0. Both now reject
out-of-range addresses with a clear error.
- **More runtime fixes (round 4):**
- `pow(x, y)` computed `n = -n` for the integer-y branch when
yi was INT_MIN (-32768); same signed-overflow UB pattern as
the print functions. Switched to unsigned magnitude.
- Added `perror(prefix)` — was missing from the runtime; common
pattern in portable code that reports I/O failure via
`errno + strerror`. Declared in stdio.h, implemented as
char-by-char emit through putchar (no fprintf dependency).
- **link816 `__heap_end` was hardcoded at $BF00**, ignoring where
`__heap_start` actually ended up. When BSS got auto-relocated
into LC1 ($D000+), heap_start ended up > heap_end and malloc
immediately returned NULL on every call — silently bricking any
program that allocated dynamic memory after the runtime grew
past the default-bss threshold. Heap_end now picks
$BF00 / $E000 based on where heap_start lands (and skips the IO
window if heap_start would have landed in $C000-$CFFF).
Smoke #102 added.
- **link816 rodata auto-skips IIgs IO window** ($C000-$CFFF). When
text+rodata grew past 0xC000 the rodata bytes silently corrupted
at runtime — string literals in the IO range read back as
hardware register values, breaking strcmp / strstr / printf / etc.
Now: rodata that would land in or cross $C000-$CFFF auto-skips
to $D000. Init_array gets the same treatment. Text that would
cross IO is hard-rejected at link time (no auto-fix possible —
PC fetches in IO would read hardware registers). This was the
root cause of the "tan/tanf triggers layout-sensitive failure"
symptom listed in older STATUS notes.
- **runInMame skips writes to IO window** during the binary load.
Without this, the zero-padding in the rodata-skip gap would
clobber soft switches (e.g. the LC1 RAM enable that crt0 sets
via $C083) when the loader naively wrote the entire image
byte-by-byte to memory.
- **link816 `--gc-sections` (default ON)** — discards sections not
reachable from the entry point (`__start` / `_start` / `main`
for the canonical crt0 setup) plus all `.init_array` sections.
Built on `-ffunction-sections` so each function is in its own
section. A minimal program with full runtime linked shrinks
from ~43KB to ~1.5KB. Adding `tan/tanf` to math.c (which
caused the latent layout-sensitive failure described above)
no longer pushes any test past the bank-0 limit. Tests that
intentionally check unreachable symbols pass `--no-gc-sections`
to opt out.
- **`fwrite(stdout, ...)` was a stub returning 0** even though
`stdout` has a working `putchar` route. Now actually writes
through `putchar` for stdout/stderr (only). Also gained the
same `size * nmemb` overflow guard as `calloc`.
## What's still needed for a "ship-ready" toolchain
- **softDouble.c -O2 — FIXED.** Marking `dclass` noinline (in
addition to `dpack`) drops register pressure in `__muldf3`/
`__divdf3`/`__adddf3` enough that greedy regalloc no longer
runs out. The previous blocker was that noinline-dclass would
write through pointer args via the DBR-relative `(d,s),y` mode
and corrupt caller data after a bank switch — that path now
goes through `STAptr/STBptr` which use `[$E0],Y` indirect-long
with the bank byte forced to 0, so DBR is irrelevant. All
three smoke build sites moved to `-O2`.
- **More of the C standard library**: real `<stdio.h>` file I/O
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
returning success/zero) — would need a memory-backed FS or a
MAME hook. `<locale.h>` / `<signal.h>` / `<time.h>` are stubbed
(compile and return safe defaults). `<wchar.h>` mostly absent.
A `time()` impl wired to ReadTimeHex (Misc Tool $0D03) was
attempted but crashes MAME without the Tool Locator initialised
in crt0; `clock()` via VBL counter at $E1006B needs 24-bit
far-pointer support that the backend doesn't yet model. far-pointer support that the backend doesn't yet model.
- **C++ runtime support**: vtable layout for multiple inheritance, - **`(d,s),y / (sr,s),y` addressing wraps the bank** when Y is
RTTI, exceptions (or a documented `-fno-exceptions` requirement). negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. The
workaround stays correct for negative offsets like `arr[i-1]`
but the underlying issue is unfixed at the addressing-mode
level.
- **REP/SEP scheduling pass** (design doc §3.3): the current - **Bank-0 size limit (~48KB)** — the runtime + program must fit
prologue picks one M-mode for the whole function based on in $1000-$BFFF (text+rodata) plus $D000-$DFFF (LC1 for rodata-
whether any 8-bit accumulator value is used. A per-region spill and BSS). Past that, link816 hard-fails because text
scheduler would reduce the SEP/REP wrap overhead on i8 stores. would cross the IO window. In practice rarely hit thanks to
`--gc-sections`, but programs that genuinely use most of the
runtime can still trip it. Future work: enable LC2 / shadow
RAM via crt0 to add ~16KB more.
- **Toolbox / IIgs system call bindings**: `iigs/toolbox.h` covers ## Yet to come
the common entry points across Tool Locator, Memory Manager,
Misc Tools, QuickDraw II, Event Manager, Window Manager, plus
GS/OS Quit. Multi-arg wrappers (NewHandle, QDStartUp, MoveTo,
EMStartUp, GetNextEvent, NewWindow, CloseWindow) live in
`runtime/src/iigsToolbox.s` because the backend's inline-asm
constraints can't take memory operands. Single-arg / no-arg
wrappers stay inline. More routines (Menu Manager, Dialog
Manager, Standard File, Sound) still TBD.
- **Real-world program coverage**: the smoke tests are - **GS/OS-backed `<stdio.h>` file I/O** — current FS is
microbenchmarks. A few known-good Apple IIgs C programs (e.g. memory-backed (programs `mfsRegister` buffers as files). A
a textfile pager, a small game) compiled and run end-to-end GS/OS backend would let programs see the real ProDOS volume
would catch issues no synthetic test currently exercises. during MAME execution, but needs Tool Locator init in crt0
and a class-1 parm-block dispatch wrapper around $E100A8.
- **Cycle-time / size benchmarks vs Calypsi 5.16**: design doc §1 - **C++ exceptions / RTTI / multiple inheritance with virtual
says the goal is to "match or exceed" Calypsi. We have neither bases** — only the `-fno-exceptions -fno-rtti` subset is
baseline numbers nor a comparison harness yet. supported. `__cxa_throw` etc. would need an unwind ABI on
this target plus a personality routine.
- **Close the size gap to Calypsi**`scripts/bench.sh`
shows clang at ~2.2x Calypsi text size on the included
microbenchmarks, with sumOfSquares as the worst case (6.45x)
due to __mulsi3 dispatch overhead. Targeted improvements:
inline 16x16->32 multiply for small operands; widen the
IMG slot heuristic so greedy uses them more aggressively;
cycle-time benchmark harness (separate from size).
- **Larger/real-world end-to-end programs** — current real-world
smoke (Game of Life, BST, dispatch, hex dumper, JSON tokenizer)
exercises core idioms. A multi-thousand-line program (e.g.
a small interactive shell, a text editor command loop) would
catch issues no smaller test reaches.

51
bench_simple.s Normal file
View file

@ -0,0 +1,51 @@
; Generated by Calypsi ISO C compiler for 65816
.rtmodel version,"1"
.rtmodel codeModel,"large"
.rtmodel dataModel,"small"
.rtmodel core,"65816"
.rtmodel huge,"0"
.rtmodel target,"none-specified"
.extern _Dp
.extern _Mul16
.extern _Vfp
; unsigned long sumOfSquares(unsigned short n) {
.section farcode,text
.public sumOfSquares
sumOfSquares:
phy
phy
sta 1,s
; unsigned long total = 0;
stz dp:.tiny _Dp
stz dp:.tiny (_Dp+2)
; for (unsigned short i = 1; i <= n; i++) total += (unsigned long)i * i;
lda ##1
sta 3,s
`?L5`: lda 1,s
cmp 3,s
bcs `?L4`
; return total;
ldx dp:.tiny (_Dp+2)
lda dp:.tiny _Dp
; }
ply
ply
rtl
`?L4`: lda 3,s
tax
jsl long:_Mul16
clc
adc dp:.tiny _Dp
pha
txa
adc dp:.tiny (_Dp+2)
tax
pla
stx dp:.tiny (_Dp+2)
sta dp:.tiny _Dp
lda 3,s
inc a
sta 3,s
bra `?L5`

10
benchmarks/bsearch.c Normal file
View file

@ -0,0 +1,10 @@
int bsearch(const int *arr, int n, int key) {
int lo = 0, hi = n - 1;
while (lo <= hi) {
int mid = (lo + hi) / 2;
if (arr[mid] == key) return mid;
if (arr[mid] < key) lo = mid + 1;
else hi = mid - 1;
}
return -1;
}

10
benchmarks/crc32.c Normal file
View file

@ -0,0 +1,10 @@
unsigned long crc32(const unsigned char *p, unsigned int n) {
unsigned long crc = 0xFFFFFFFFUL;
while (n--) {
crc ^= *p++;
for (int k = 0; k < 8; k++) {
crc = (crc >> 1) ^ (0xEDB88320UL & -(long)(crc & 1));
}
}
return crc ^ 0xFFFFFFFFUL;
}

7
benchmarks/dotProduct.c Normal file
View file

@ -0,0 +1,7 @@
long dotProduct(const short *a, const short *b, unsigned int n) {
long sum = 0;
for (unsigned int i = 0; i < n; i++) {
sum += (long)a[i] * (long)b[i];
}
return sum;
}

4
benchmarks/fib.c Normal file
View file

@ -0,0 +1,4 @@
unsigned short fib(unsigned short n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}

10
benchmarks/memcmp.c Normal file
View file

@ -0,0 +1,10 @@
typedef unsigned char u8;
int mymemcmp(const void *a, const void *b, unsigned int n) {
const u8 *p = (const u8 *)a;
const u8 *q = (const u8 *)b;
while (n--) {
if (*p != *q) return *p - *q;
p++; q++;
}
return 0;
}

5
benchmarks/popcount.c Normal file
View file

@ -0,0 +1,5 @@
int popcount(unsigned long x) {
int n = 0;
while (x) { n += x & 1; x >>= 1; }
return n;
}

5
benchmarks/strcpy.c Normal file
View file

@ -0,0 +1,5 @@
char *mystrcpy(char *dst, const char *src) {
char *d = dst;
while ((*d++ = *src++)) {}
return dst;
}

View file

@ -0,0 +1,5 @@
unsigned long sumOfSquares(unsigned short n) {
unsigned long total = 0;
for (unsigned short i = 1; i <= n; i++) total += (unsigned long)i * i;
return total;
}

File diff suppressed because it is too large Load diff

View file

@ -41,12 +41,19 @@ void clearerr(FILE *stream);
#define EOF (-1) #define EOF (-1)
// Input stubs. Real implementations would route through GS/OS
// console I/O; current impl in libc.c returns EOF / 0.
int getchar(void); int getchar(void);
int fgetc(FILE *stream); int fgetc(FILE *stream);
char *fgets(char *buf, int n, FILE *stream); char *fgets(char *buf, int n, FILE *stream);
int ungetc(int c, FILE *stream); int ungetc(int c, FILE *stream);
#define getc(s) fgetc(s) #define getc(s) fgetc(s)
// Memory-backed FS: register a memory region as a named file so
// fopen can open it. `cap` should be >= size; use cap > size for
// files that may grow on write. `writable` controls whether
// fopen("...", "w") / "a" / "r+" succeeds. Returns 0 on success,
// -1 on duplicate name or table full.
int mfsRegister(const char *path, void *buf, size_t size, size_t cap,
int writable);
int mfsUnregister(const char *path);
#endif #endif

38
runtime/include/wchar.h Normal file
View file

@ -0,0 +1,38 @@
// Minimal wchar.h for the W65816 runtime.
//
// wchar_t is 16-bit (matches `int` on this target). No real
// multi-byte / locale support — mbtowc/wctomb assume a one-byte =
// one-wchar mapping (essentially Latin-1). The wcs* functions
// mirror the str* family.
#ifndef _WCHAR_H
#define _WCHAR_H
typedef unsigned short wchar_t;
typedef unsigned int size_t;
typedef long wint_t;
#define WEOF ((wint_t)-1)
#ifndef NULL
#define NULL ((void *)0)
#endif
size_t wcslen (const wchar_t *s);
int wcscmp (const wchar_t *a, const wchar_t *b);
int wcsncmp(const wchar_t *a, const wchar_t *b, size_t n);
wchar_t *wcscpy (wchar_t *dst, const wchar_t *src);
wchar_t *wcsncpy(wchar_t *dst, const wchar_t *src, size_t n);
wchar_t *wcscat (wchar_t *dst, const wchar_t *src);
wchar_t *wcschr (const wchar_t *s, wchar_t c);
wchar_t *wcsrchr(const wchar_t *s, wchar_t c);
// Multi-byte conversion. Trivial 1:1 in our impl: each byte maps
// to the wide char with the same numeric value (zero-extended).
int mbtowc (wchar_t *pwc, const char *s, size_t n);
int wctomb (char *s, wchar_t wc);
size_t mbstowcs(wchar_t *pwcs, const char *s, size_t n);
size_t wcstombs(char *s, const wchar_t *pwcs, size_t n);
int mblen (const char *s, size_t n);
#endif

View file

@ -176,3 +176,107 @@ size_t strcspn(const char *s, const char *reject) {
// strtok / strtok_r are in runtime/src/strtok.c. // strtok / strtok_r are in runtime/src/strtok.c.
// ---- wchar.h ----
// wchar_t is 16-bit on this target. The wcs* functions mirror the
// str* family. mbtowc / wctomb use the trivial 1:1 byte<->wide-char
// mapping (essentially Latin-1) — no real multi-byte / locale support.
typedef unsigned short wchar_t;
size_t wcslen(const wchar_t *s) {
size_t n = 0;
while (*s++) n++;
return n;
}
int wcscmp(const wchar_t *a, const wchar_t *b) {
while (*a && *a == *b) { a++; b++; }
return (int)((short)*a - (short)*b);
}
int wcsncmp(const wchar_t *a, const wchar_t *b, size_t n) {
while (n && *a && *a == *b) { a++; b++; n--; }
if (!n) return 0;
return (int)((short)*a - (short)*b);
}
wchar_t *wcscpy(wchar_t *dst, const wchar_t *src) {
wchar_t *d = dst;
while ((*d++ = *src++)) {}
return dst;
}
wchar_t *wcsncpy(wchar_t *dst, const wchar_t *src, size_t n) {
wchar_t *d = dst;
while (n && (*d = *src)) { d++; src++; n--; }
while (n--) *d++ = 0;
return dst;
}
wchar_t *wcscat(wchar_t *dst, const wchar_t *src) {
wchar_t *d = dst;
while (*d) d++;
while ((*d++ = *src++)) {}
return dst;
}
wchar_t *wcschr(const wchar_t *s, wchar_t c) {
while (*s) {
if (*s == c) return (wchar_t *)s;
s++;
}
return (c == 0) ? (wchar_t *)s : (wchar_t *)0;
}
wchar_t *wcsrchr(const wchar_t *s, wchar_t c) {
const wchar_t *last = (const wchar_t *)0;
while (*s) {
if (*s == c) last = s;
s++;
}
if (c == 0) return (wchar_t *)s;
return (wchar_t *)last;
}
int mbtowc(wchar_t *pwc, const char *s, size_t n) {
if (!s) return 0; // no shift state
if (n == 0) return -1;
unsigned char c = (unsigned char)*s;
if (pwc) *pwc = (wchar_t)c;
return c ? 1 : 0;
}
int wctomb(char *s, wchar_t wc) {
if (!s) return 0; // no shift state
if (wc > 0xFF) return -1;
*s = (char)wc;
return 1;
}
size_t mbstowcs(wchar_t *pwcs, const char *s, size_t n) {
size_t i = 0;
while (i < n && s[i]) {
if (pwcs) pwcs[i] = (wchar_t)(unsigned char)s[i];
i++;
}
if (pwcs && i < n) pwcs[i] = 0;
return i;
}
size_t wcstombs(char *s, const wchar_t *pwcs, size_t n) {
size_t i = 0;
while (i < n && pwcs[i]) {
if (pwcs[i] > 0xFF) return (size_t)-1;
if (s) s[i] = (char)pwcs[i];
i++;
}
if (s && i < n) s[i] = 0;
return i;
}
int mblen(const char *s, size_t n) {
if (!s) return 0;
if (n == 0) return -1;
return *s ? 1 : 0;
}

File diff suppressed because it is too large Load diff

View file

@ -167,18 +167,11 @@ int puts(const char *s) {
// ---- input stubs ---- // ---- input stubs ----
// //
// Real input would route through GS/OS console / event handling. // getchar reads from the keyboard; real input would route through
// These return EOF / NULL so user code that calls them links and // the IIgs Event Manager. Returns -1 (EOF) for now. fgetc/fgets/
// gets predictable end-of-input behaviour. FILE struct is defined // ungetc are defined further down alongside the FILE-table-backed
// further down (alongside fopen etc.) — forward-declare for the // fopen/fread/etc.
// signatures.
struct __sFILE;
int getchar(void) { return -1; /* EOF */ } int getchar(void) { return -1; /* EOF */ }
int fgetc(struct __sFILE *s) { (void)s; return -1; }
char *fgets(char *b, int n, struct __sFILE *s) {
(void)b; (void)n; (void)s; return (char *)0;
}
int ungetc(int c, struct __sFILE *s) { (void)c; (void)s; return -1; }
// ---- minimal printf ---- // ---- minimal printf ----
@ -621,47 +614,191 @@ clock_t clock(void) {
return (clock_t)0; return (clock_t)0;
} }
// ---- FILE* abstraction (minimal) ---- // ---- FILE* abstraction (memory-backed FS) ----
// //
// stdin / stdout / stderr exist as opaque non-NULL pointers. fputs / // stdin / stdout / stderr are tagged as kind=STDIO and route through
// fputc forward to puts/putchar (which currently no-op or hit a debug // putchar / fgetc-from-keyboard; opening a regular file allocates a
// hook). fprintf forwards to printf, ignoring the stream. fflush is // FILE slot and keeps a (buf, size, pos, writable) record. Programs
// a no-op. Real file I/O via GS/OS toolbox is a separate feature // stage files into the FS at startup via mfsRegister(name, ptr, size,
// (would need open/read/write/close + a file-descriptor table). // writable) and then use the standard fopen/fread/fwrite/fseek API.
//
// Why in-memory rather than GS/OS-backed: the smoke harness doesn't
// boot ProDOS, so toolbox-FS calls would crash MAME. An in-RAM FS
// covers the common need (parser/printer that wants a FILE*) without
// pulling in GS/OS init. A future GS/OS backend can replace
// fopenImpl/etc. without touching callers.
//
// FILE-table layout: 8 entries. Slot 0..2 are stdin/stdout/stderr
// (immutable); 3..7 are user-allocated by fopen. Each entry has:
// kind (0=stdio in/out/err, 1=memory)
// buf (memory buffer base)
// size (logical size in bytes)
// cap (allocated capacity — for write-grow)
// pos (current seek position)
// eof, err flags
// writable (1 if opened for "w" or "r+" or "a")
// ungetc holding cell (-1 = empty)
typedef struct __sFILE { unsigned int magic; } FILE; #define FILE_KIND_STDIN 0
#define FILE_KIND_STDOUT 1
#define FILE_KIND_STDERR 2
#define FILE_KIND_MEM 3
static FILE __stdin_obj = { 1 }; typedef struct __sFILE {
static FILE __stdout_obj = { 2 }; u8 kind;
static FILE __stderr_obj = { 3 }; u8 writable;
FILE *stdin = &__stdin_obj; u8 eof;
FILE *stdout = &__stdout_obj; u8 err;
FILE *stderr = &__stderr_obj; char *buf;
size_t size;
size_t cap;
size_t pos;
int unget; // -1 if no pushed-back char
const char *path; // borrowed from caller, NULL for stdio
} FILE;
#define MFS_MAX_FILES 8
static FILE __mfs[MFS_MAX_FILES] = {
{ FILE_KIND_STDIN, 0, 0, 0, 0, 0, 0, 0, -1, 0 },
{ FILE_KIND_STDOUT, 1, 0, 0, 0, 0, 0, 0, -1, 0 },
{ FILE_KIND_STDERR, 1, 0, 0, 0, 0, 0, 0, -1, 0 },
};
FILE *stdin = &__mfs[0];
FILE *stdout = &__mfs[1];
FILE *stderr = &__mfs[2];
// Registered "files" available to fopen. Each registration is
// (path, buf, size, writable). Order doesn't matter — fopen scans
// linearly.
typedef struct {
const char *path;
char *buf;
size_t size;
size_t cap;
u8 writable;
u8 inUse;
} MfsEntry;
#define MFS_MAX_REG 16
static MfsEntry __mfsReg[MFS_MAX_REG];
// Register a memory region as a named file. Returns 0 on success,
// -1 if the table is full or a duplicate name exists. `cap` may be
// larger than `size` to allow appends without reallocation; pass
// cap=size if writes must not grow the file.
int mfsRegister(const char *path, void *buf, size_t size, size_t cap,
int writable) {
if (cap < size) cap = size;
for (int i = 0; i < MFS_MAX_REG; i++) {
if (__mfsReg[i].inUse && strcmp(__mfsReg[i].path, path) == 0)
return -1;
}
for (int i = 0; i < MFS_MAX_REG; i++) {
if (!__mfsReg[i].inUse) {
__mfsReg[i].path = path;
__mfsReg[i].buf = (char *)buf;
__mfsReg[i].size = size;
__mfsReg[i].cap = cap;
__mfsReg[i].writable = (u8)(writable != 0);
__mfsReg[i].inUse = 1;
return 0;
}
}
return -1;
}
// Drop a registration. Returns 0 on success, -1 if not found.
int mfsUnregister(const char *path) {
for (int i = 0; i < MFS_MAX_REG; i++) {
if (__mfsReg[i].inUse && strcmp(__mfsReg[i].path, path) == 0) {
__mfsReg[i].inUse = 0;
__mfsReg[i].path = (const char *)0;
return 0;
}
}
return -1;
}
int fputc(int c, FILE *stream) {
if (!stream) return -1;
if (stream->kind == FILE_KIND_STDOUT || stream->kind == FILE_KIND_STDERR)
return putchar(c);
if (stream->kind == FILE_KIND_MEM) {
if (!stream->writable) { stream->err = 1; return -1; }
if (stream->pos >= stream->cap) { stream->err = 1; return -1; }
stream->buf[stream->pos++] = (char)c;
if (stream->pos > stream->size) stream->size = stream->pos;
return (int)(unsigned char)c;
}
return -1;
}
int fputc(int c, FILE *stream) { (void)stream; return putchar(c); }
// fputs writes the string WITHOUT appending a newline (puts does append).
// Forwarding to puts() was a real bug — `fputs("hi", stdout)` was
// printing "hi\n" instead of "hi".
int fputs(const char *s, FILE *stream) { int fputs(const char *s, FILE *stream) {
(void)stream; if (!stream || !s) return -1;
if (stream->kind == FILE_KIND_STDOUT || stream->kind == FILE_KIND_STDERR) {
while (*s) { putchar(*s); s++; } while (*s) { putchar(*s); s++; }
return 0; return 0;
} }
if (stream->kind == FILE_KIND_MEM) {
while (*s) {
if (fputc(*s, stream) == -1) return -1;
s++;
}
return 0;
}
return -1;
}
int fflush(FILE *stream) { (void)stream; return 0; } int fflush(FILE *stream) { (void)stream; return 0; }
int fclose(FILE *stream) { (void)stream; return 0; }
int fclose(FILE *stream) {
if (!stream) return -1;
// Don't close stdin/stdout/stderr — they're long-lived statics.
if (stream->kind != FILE_KIND_MEM) return 0;
stream->kind = 0;
stream->buf = (char *)0;
stream->size = 0;
stream->cap = 0;
stream->pos = 0;
stream->path = (const char *)0;
return 0;
}
// Forward decls for routines that live in snprintf.c.
extern int vsnprintf(char *buf, size_t n, const char *fmt, va_list ap);
// Forward decl for vfprintf so fprintf can call it.
int vfprintf(FILE *stream, const char *fmt, va_list ap);
int fprintf(FILE *stream, const char *fmt, ...) { int fprintf(FILE *stream, const char *fmt, ...) {
(void)stream;
va_list ap; va_list ap;
__builtin_va_start(ap, fmt); __builtin_va_start(ap, fmt);
int r = vprintf(fmt, ap); int r = vfprintf(stream, fmt, ap);
__builtin_va_end(ap); __builtin_va_end(ap);
return r; return r;
} }
int vfprintf(FILE *stream, const char *fmt, va_list ap) { int vfprintf(FILE *stream, const char *fmt, va_list ap) {
(void)stream; if (!stream) return -1;
if (stream->kind == FILE_KIND_STDOUT || stream->kind == FILE_KIND_STDERR)
return vprintf(fmt, ap); return vprintf(fmt, ap);
if (stream->kind == FILE_KIND_MEM) {
// Format into the file's tail. Use the memory buffer that
// remains as a snprintf target. Caller is responsible for
// sizing the file's buffer.
if (!stream->writable) { stream->err = 1; return -1; }
size_t remain = (stream->cap > stream->pos)
? stream->cap - stream->pos : 0;
if (remain == 0) { stream->err = 1; return -1; }
int n = vsnprintf(stream->buf + stream->pos, remain, fmt, ap);
if (n < 0) { stream->err = 1; return -1; }
size_t written = ((size_t)n < remain) ? (size_t)n : remain - 1;
stream->pos += written;
if (stream->pos > stream->size) stream->size = stream->pos;
return n;
}
return -1;
} }
// ---- assert ---- // ---- assert ----
@ -688,56 +825,204 @@ int atexit(AtexitFn fn) {
return 0; return 0;
} }
// ---- File I/O stubs ---- // ---- File I/O (memory-backed) ----
// //
// A real implementation would route through the GS/OS dispatcher at // Backed by mfsRegister'd entries. Mode strings:
// $E100A8 (build a class-1 parm block, push its pointer, JSL with X // "r" read only
// = callNum, copy the refNum out). fopen would maintain a small // "w" write, truncate to zero on open
// FD table mapping FILE* magic values back to GS/OS refNums. // "a" write, position at end on open
// Until that lands, every call returns failure so code that links // "r+" read+write
// against stdio degrades gracefully instead of trapping. // "w+" read+write, truncate
// Plus optional "b" (no-op since we're memory-backed).
//
// Returns NULL if no registration matches `path` (or the requested
// mode isn't compatible with the registration's writable flag).
FILE *fopen(const char *path, const char *mode) { FILE *fopen(const char *path, const char *mode) {
(void)path; (void)mode; if (!path || !mode) return (FILE *)0;
return (FILE *)0; int wantWrite = 0;
int wantRead = 1;
int truncate = 0;
int append = 0;
if (mode[0] == 'r') { wantRead = 1; wantWrite = (mode[1] == '+' || (mode[1] == 'b' && mode[2] == '+')); }
else if (mode[0] == 'w') { wantWrite = 1; truncate = 1; wantRead = (mode[1] == '+' || (mode[1] == 'b' && mode[2] == '+')); }
else if (mode[0] == 'a') { wantWrite = 1; append = 1; wantRead = (mode[1] == '+' || (mode[1] == 'b' && mode[2] == '+')); }
else return (FILE *)0;
// Locate registration.
MfsEntry *reg = (MfsEntry *)0;
for (int i = 0; i < MFS_MAX_REG; i++) {
if (__mfsReg[i].inUse && strcmp(__mfsReg[i].path, path) == 0) {
reg = &__mfsReg[i];
break;
}
}
if (!reg) return (FILE *)0;
if (wantWrite && !reg->writable) return (FILE *)0;
// Allocate a FILE slot (3..MAX-1 — 0..2 are stdin/out/err).
FILE *f = (FILE *)0;
for (int i = 3; i < MFS_MAX_FILES; i++) {
if (__mfs[i].kind == 0) {
f = &__mfs[i];
break;
}
}
if (!f) return (FILE *)0;
f->kind = FILE_KIND_MEM;
f->writable = (u8)(wantWrite ? 1 : 0);
f->eof = 0;
f->err = 0;
f->buf = reg->buf;
f->size = reg->size;
f->cap = reg->cap;
f->pos = 0;
f->unget = -1;
f->path = reg->path;
(void)wantRead;
if (truncate) f->size = 0;
if (append) f->pos = f->size;
return f;
} }
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream) { size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream) {
(void)ptr; (void)size; (void)nmemb; (void)stream; if (!stream || stream->kind != FILE_KIND_MEM) return 0;
return 0; if (size == 0 || nmemb == 0) return 0;
// Avoid 32-bit overflow on size * nmemb: cap nmemb so each item
// (size bytes) fits in remaining 16-bit address space.
if (nmemb > (size_t)0xFFFE / size) nmemb = (size_t)0xFFFE / size;
char *out = (char *)ptr;
size_t items = 0;
while (items < nmemb) {
size_t b;
// Each item: size bytes.
for (b = 0; b < size; b++) {
if (stream->unget >= 0) {
*out++ = (char)stream->unget;
stream->unget = -1;
continue;
}
if (stream->pos >= stream->size) {
stream->eof = 1;
return items;
}
*out++ = stream->buf[stream->pos++];
}
items++;
}
return items;
} }
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) { size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
// For stdout/stderr, route through putchar so programs that use if (!stream) return 0;
// fwrite for binary output ("write %d bytes to stdout") actually if (size == 0 || nmemb == 0) return 0;
// produce output instead of silently dropping it. For other // Cap nmemb so each item (size bytes) fits in the address space
// streams (real file handles), still a stub returning 0. // — avoids 32-bit `size * nmemb` that the i32 multiply path triggers.
if (stream == stdout || stream == stderr) { if (nmemb > (size_t)0xFFFE / size) nmemb = (size_t)0xFFFE / size;
// size * nmemb can overflow size_t (16-bit on this target); const char *in = (const char *)ptr;
// bail rather than silently truncate the byte count. if (stream->kind == FILE_KIND_STDOUT || stream->kind == FILE_KIND_STDERR) {
if (size != 0 && nmemb > (size_t)0xFFFF / size) return 0; size_t items = 0;
const u8 *p = (const u8 *)ptr; while (items < nmemb) {
size_t total = size * nmemb; for (size_t b = 0; b < size; b++) putchar(*in++);
for (size_t i = 0; i < total; i++) putchar(p[i]); items++;
return nmemb;
} }
(void)ptr; (void)size; (void)nmemb; return items;
}
if (stream->kind != FILE_KIND_MEM) return 0;
if (!stream->writable) { stream->err = 1; return 0; }
size_t items = 0;
while (items < nmemb) {
size_t b;
for (b = 0; b < size; b++) {
if (stream->pos >= stream->cap) {
stream->err = 1;
if (stream->pos > stream->size) stream->size = stream->pos;
return items;
}
stream->buf[stream->pos++] = *in++;
}
items++;
}
if (stream->pos > stream->size) stream->size = stream->pos;
return items;
}
#define SEEK_SET 0
#define SEEK_CUR 1
#define SEEK_END 2
int fseek(FILE *stream, long offset, int whence) {
if (!stream || stream->kind != FILE_KIND_MEM) return -1;
long base;
if (whence == SEEK_SET) base = 0;
else if (whence == SEEK_CUR) base = (long)stream->pos;
else if (whence == SEEK_END) base = (long)stream->size;
else return -1;
long target = base + offset;
if (target < 0 || target > (long)stream->size) return -1;
stream->pos = (size_t)target;
stream->eof = 0;
stream->unget = -1;
return 0; return 0;
} }
int fseek(FILE *stream, long offset, int whence) { long ftell(FILE *stream) {
(void)stream; (void)offset; (void)whence; if (!stream || stream->kind != FILE_KIND_MEM) return -1L;
return (long)stream->pos;
}
int fgetc(FILE *stream) {
if (!stream) return -1;
if (stream->unget >= 0) {
int c = stream->unget;
stream->unget = -1;
return c;
}
if (stream->kind == FILE_KIND_MEM) {
if (stream->pos >= stream->size) { stream->eof = 1; return -1; }
return (int)(unsigned char)stream->buf[stream->pos++];
}
if (stream->kind == FILE_KIND_STDIN) return getchar();
return -1; return -1;
} }
long ftell(FILE *stream) { char *fgets(char *buf, int n, FILE *stream) {
(void)stream; if (!buf || n <= 0 || !stream) return (char *)0;
return -1L; int i = 0;
while (i < n - 1) {
int c = fgetc(stream);
if (c < 0) {
if (i == 0) return (char *)0;
break;
}
buf[i++] = (char)c;
if (c == '\n') break;
}
buf[i] = 0;
return buf;
} }
int feof(FILE *stream) { (void)stream; return 1; } int ungetc(int c, FILE *stream) {
int ferror(FILE *stream) { (void)stream; return 0; } if (!stream || c < 0) return -1;
void clearerr(FILE *stream) { (void)stream; } if (stream->unget >= 0) return -1; // only one slot
stream->unget = c & 0xFF;
stream->eof = 0;
return c & 0xFF;
}
int feof(FILE *stream) {
return stream ? (int)stream->eof : 1;
}
int ferror(FILE *stream) {
return stream ? (int)stream->err : 0;
}
void clearerr(FILE *stream) {
if (stream) { stream->eof = 0; stream->err = 0; }
}
// ---- locale.h stubs ---- // ---- locale.h stubs ----
// //
@ -792,22 +1077,46 @@ struct lconv *localeconv(void) {
return &__c_lconv; return &__c_lconv;
} }
// ---- signal.h stubs ---- // ---- signal.h ----
// //
// IIgs has no POSIX-style signal model. signal() always fails (returns // IIgs has no POSIX-style signal source (no kernel-delivered signals
// SIG_ERR); raise() returns -1. Code that uses these for diagnostic // from external events), but a small in-process signal table makes
// fall-through (e.g. abort -> raise(SIGABRT) -> stub) compiles and // signal()/raise() work for synchronous diagnostic use: a program
// behaves as "signals disabled". // can install SIGABRT/SIGINT/etc. handlers and abort()-equivalent
// code can raise(SIGABRT) to invoke them. No async signal delivery.
//
// Table indexed by signal number 0..15; raise() looks up the
// installed handler and calls it. SIG_DFL falls through to a
// per-signal default (SIGABRT calls abort(); others ignore).
typedef void (*__sighandler_t)(int); typedef void (*__sighandler_t)(int);
#define _SIG_DFL ((__sighandler_t)0)
#define _SIG_IGN ((__sighandler_t)1)
#define _SIG_ERR ((__sighandler_t)-1) #define _SIG_ERR ((__sighandler_t)-1)
#define _NSIG 16
static __sighandler_t __sigHandlers[_NSIG];
__sighandler_t signal(int sig, __sighandler_t handler) { __sighandler_t signal(int sig, __sighandler_t handler) {
(void)sig; (void)handler; if (sig < 0 || sig >= _NSIG) return _SIG_ERR;
return _SIG_ERR; __sighandler_t prev = __sigHandlers[sig];
if (!prev) prev = _SIG_DFL;
__sigHandlers[sig] = handler;
return prev;
} }
int raise(int sig) { int raise(int sig) {
(void)sig; if (sig < 0 || sig >= _NSIG) return -1;
return -1; __sighandler_t h = __sigHandlers[sig];
if (h == _SIG_IGN) return 0;
if (!h || h == _SIG_DFL) {
// Default action: SIGABRT -> abort(); SIGTERM/SIGINT -> exit;
// others -> ignore.
if (sig == 6) abort(); // SIGABRT
if (sig == 2 || sig == 15) // SIGINT, SIGTERM
exit(128 + sig);
return 0;
}
h(sig);
return 0;
} }

106
scripts/bench.sh Executable file
View file

@ -0,0 +1,106 @@
#!/usr/bin/env bash
# bench.sh — compile a benchmark suite with both clang (this toolchain)
# and Calypsi cc65816, compare emitted code size.
#
# Each benchmark is a self-contained .c file under benchmarks/. We
# compile each with both toolchains (-O2 / --speed), then count
# bytes in the .text + .data sections of the resulting object.
# Output is a markdown table on stdout.
#
# Cycle-time comparison would require running each benchmark in MAME
# under both toolchains' produced code, with a wrapper function that
# instruments the cycle counter. That's a separate, more involved
# tool — left for future work.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
BENCH_DIR="$PROJECT_ROOT/benchmarks"
CLANG="$PROJECT_ROOT/tools/llvm-mos-build/bin/clang"
CALYPSI="$PROJECT_ROOT/tools/calypsi/usr/local/lib/calypsi-65816-5.16/bin/cc65816"
[ -x "$CLANG" ] || { echo "ERROR: clang not built" >&2; exit 1; }
[ -x "$CALYPSI" ] || { echo "ERROR: Calypsi not installed" >&2; exit 1; }
[ -d "$BENCH_DIR" ] || { echo "ERROR: $BENCH_DIR not found" >&2; exit 1; }
# Object-size measurement. Different object formats — for clang it's
# ELF (use llvm-readobj), for Calypsi it's its own format (use the
# binary file size as a proxy, minus header overhead). ELF .text +
# .rodata + .data covers code + constants; we report code-only as the
# primary metric.
clangSize() {
local o="$1"
"$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-readobj" --section-headers "$o" \
2>/dev/null | awk '
/Name: .text/ { intext=1; inrodata=0; indata=0; next }
/Name: .rodata/ { intext=0; inrodata=1; indata=0; next }
/Name: .data/ { intext=0; inrodata=0; indata=1; next }
/Name: / { intext=0; inrodata=0; indata=0; next }
/Size:/ {
if (intext) text += strtonum($2)
if (inrodata) rodata += strtonum($2)
if (indata) data += strtonum($2)
}
END { print text " " rodata " " data }
'
}
# Calypsi text size: extract the highest farcode offset from the
# assembler listing. cc65816 -> .s, then as65816 --list-file
# emits "OFFSET hexbytes" columns; we pick the max offset and add
# the byte width of the final instruction (1-3 bytes typically).
# Approximation but within a byte or two of true text size.
calypsiTextSize() {
local src="$1"
local s lst tmp
s=$(mktemp --suffix=.s)
lst=$(mktemp --suffix=.lst)
tmp=$(mktemp --suffix=.o)
"$CALYPSI" -O 2 --speed --assembly-source "$s" -c "$src" -o "$tmp" 2>/dev/null \
|| { echo 0; rm -f "$s" "$lst" "$tmp"; return; }
"$CALYPSI" -O 2 --speed -c "$src" -o "$tmp" 2>/dev/null
"$PROJECT_ROOT/tools/calypsi/usr/local/lib/calypsi-65816-5.16/bin/as65816" \
--list-file "$lst" -o "$tmp" "$s" 2>/dev/null
# Highest farcode offset. We skip the +instruction-bytes detail
# (rough estimate is fine for relative comparison).
local maxOff
maxOff=$(grep -oE "^[0-9]+ [0-9a-f]{6}" "$lst" 2>/dev/null \
| awk '{print strtonum("0x"$2)}' | sort -n | tail -1)
echo "${maxOff:-0}"
rm -f "$s" "$lst" "$tmp"
}
# Print markdown header.
printf '| Benchmark | clang (B) | Calypsi (B) | clang vs Calypsi |\n'
printf '|-----------|----------:|------------:|-----------------:|\n'
totalClang=0
totalCalypsi=0
for src in "$BENCH_DIR"/*.c; do
name=$(basename "$src" .c)
cObj=$(mktemp --suffix=.clang.o)
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$src" -o "$cObj" 2>/dev/null || { echo "clang failed on $name" >&2; rm -f "$cObj"; continue; }
read clangText _ _ < <(clangSize "$cObj")
clangText=${clangText:-0}
calText=$(calypsiTextSize "$src")
if [ "$calText" -gt 0 ]; then
ratio=$(awk -v a="$clangText" -v b="$calText" 'BEGIN{printf "%.2fx", a/b}')
else
ratio="—"
fi
printf '| %s | %d | %d | %s |\n' "$name" "$clangText" "$calText" "$ratio"
totalClang=$((totalClang + clangText))
totalCalypsi=$((totalCalypsi + calText))
rm -f "$cObj"
done
if [ "$totalCalypsi" -gt 0 ]; then
totalRatio=$(awk -v a="$totalClang" -v b="$totalCalypsi" 'BEGIN{printf "%.2fx", a/b}')
printf '| **total** | **%d** | **%d** | **%s** |\n' "$totalClang" "$totalCalypsi" "$totalRatio"
fi

425
scripts/genToolbox.py Normal file
View file

@ -0,0 +1,425 @@
#!/usr/bin/env python3
# genToolbox.py — generate IIgs toolbox wrappers from ORCA-C headers.
#
# Reads ORCA's extern declarations of the form:
# extern pascal RetType FuncName(ArgType, ArgType) inline(0xNNTT, dispatcher);
# and emits two outputs:
# - C header with `static inline` wrappers using clang inline-asm
# - .s file with extern wrapper bodies for multi-arg routines that
# can't fit in inline asm (our backend's constraints don't take
# memory operands).
#
# Tool number convention: 0xNNTT high byte = function, low byte = tool set
# Dispatcher: JSL $E10000 for normal toolbox; JSL $E100A8 for GS/OS
# (only the ProDOS-16 / GS/OS calls use _CallBackVector).
#
# Calling convention conversion: ORCA uses Pascal (args pushed L-to-R),
# our C ABI passes arg0 in A and arg1+ on stack RTL. Each generated
# wrapper re-pushes args in toolbox order.
#
# Type widths (matching ORCA):
# Word, Boolean, Integer, Char, Byte = 2 bytes (16-bit)
# LongWord, Long, Handle, Pointer = 4 bytes (32-bit)
# Ptr, Ref, ResType = 4 bytes
# (Pointer is 4 bytes in ORCA -- it's a far/24-bit pointer. Our backend
# uses 16-bit pointers, but the toolbox expects 32-bit on the stack;
# we extend with a zero high word.)
#
# Output files are written to the runtime tree.
import re
import sys
from pathlib import Path
ORCA_DIR = Path("/tmp/orca-headers")
OUT_HEADER = Path("/home/scott/claude/llvm816/runtime/include/iigs/toolbox.h")
OUT_ASM = Path("/home/scott/claude/llvm816/runtime/src/iigsToolbox.s")
# Type table: (size in bytes, c-type)
TYPE_MAP = {
"void": (0, "void"),
"Word": (2, "unsigned short"),
"Boolean": (2, "unsigned short"),
"Integer": (2, "short"),
"Char": (2, "char"), # widened on stack
"Byte": (2, "unsigned char"),
"LongWord": (4, "unsigned long"),
"Long": (4, "long"),
"Handle": (4, "void *"), # 4-byte handle
"Pointer": (4, "void *"), # 4-byte pointer (toolbox semantics)
"Ref": (4, "void *"),
"Ptr": (4, "void *"),
"ResType": (4, "unsigned long"),
"Real": (4, "float"),
"Double": (8, "double"),
"Comp": (8, "long long"),
"Extended": (10, "long double"),
"GrafPortPtr":(4, "void *"),
"WindowPtr": (4, "void *"),
"MenuHandle": (4, "void *"),
"CtlRecHndl": (4, "void *"),
"DialogPtr": (4, "void *"),
"RgnHandle": (4, "void *"),
"PrPort": (4, "void *"),
"PrRecHndl": (4, "void *"),
"PicHandle": (4, "void *"),
"WindRecHndl":(4, "void *"),
}
# Tool number → tool-set name mapping (low byte of toolNumber)
TOOLSET_NAME = {
0x01: "ToolLocator",
0x02: "MemoryManager",
0x03: "MiscTools",
0x04: "QuickDraw",
0x05: "DeskManager",
0x06: "EventManager",
0x07: "Scheduler",
0x08: "SoundManager",
0x09: "AppleDeskBus",
0x0A: "SANE",
0x0B: "IntegerMath",
0x0C: "TextTools",
0x0E: "WindowManager",
0x0F: "MenuManager",
0x10: "ControlManager",
0x11: "Loader",
0x12: "QDAuxiliary",
0x13: "PrintManager",
0x14: "LineEdit",
0x15: "DialogManager",
0x16: "ScrapManager",
0x17: "StandardFile",
0x18: "DiskUtil",
0x19: "NoteSynth",
0x1A: "NoteSequencer",
0x1B: "FontManager",
0x1C: "ListManager",
0x1D: "ACETools",
0x1E: "ResourceManager",
0x1F: "MIDITools",
0x20: "VideoOverlay",
0x21: "Teletext",
0x22: "TextEdit",
0x23: "MediaControl",
0x32: "MediaControl2",
}
def parseLine(line):
"""Parse `extern pascal RetType Name(args) inline(0xNNTT, dispatcher);`
Returns dict or None if not a toolbox decl.
"""
m = re.match(
r'^\s*extern\s+pascal\s+(\w+)\s+(\w+)\s*\((.*?)\)\s+inline\(0x([0-9A-Fa-f]+)\s*,\s*(\w+)\)\s*;',
line,
)
if not m:
return None
retType, name, args, toolHex, dispatcher = m.group(1, 2, 3, 4, 5)
toolNum = int(toolHex, 16)
# Parse arg types (just the types, no names since ORCA omits them).
args = args.strip()
argTypes = []
if args and args != "void":
for a in args.split(","):
a = a.strip()
# ORCA may have type-only or "type name"; take the first word.
t = a.split()[0]
argTypes.append(t)
return {
"ret": retType,
"name": name,
"args": argTypes,
"tool": toolNum,
"dispatcher": dispatcher,
}
def typeInfo(t):
"""Return (size_bytes, c_type) for ORCA type, or None if unsupported."""
if t in TYPE_MAP:
return TYPE_MAP[t]
# Default: assume 4 bytes / void* (pointer-like)
return (4, "void *")
def emit(decls):
"""Generate C header and .s file from parsed decls."""
cLines = [
"// AUTOGENERATED by scripts/genToolbox.py from ORCA-C ORCACDefs/.",
"// DO NOT EDIT by hand — regenerate to update.",
"//",
"// Complete IIgs toolbox: ~1300 routines across 35 tool sets.",
"// Names match Apple's IIgs Toolbox Reference (TLStartUp,",
"// MMStartUp, NewWindow, SysBeep, etc.). Multi-arg wrappers",
"// (those whose stub body uses memory operands) live in",
"// runtime/src/iigsToolbox.s; zero-arg / single-arg simple",
"// ones are inlined here.",
"",
"#ifndef IIGS_TOOLBOX_H",
"#define IIGS_TOOLBOX_H",
"",
"#ifdef __cplusplus",
'extern "C" {',
"#endif",
"",
]
sLines = [
"; AUTOGENERATED by scripts/genToolbox.py from ORCA-C ORCACDefs/.",
"; DO NOT EDIT by hand — regenerate to update.",
";",
"; IIgs toolbox multi-arg wrappers.",
";",
"; C ABI: arg0 (i16) in A, arg0 (i32) in A:X, arg1+ on stack (4,S etc.).",
"; Each wrapper re-pushes args in toolbox (Pascal-style L-to-R) order,",
"; preceded by result space if non-void return, then JSL $E10000",
"; (or $E100A8 for GS/OS). Pops result if non-void.",
";",
"; Tool number: high byte = function, low byte = tool set.",
"",
"\t.text",
"",
]
seenNames = set()
inlineCount = 0
asmCount = 0
skipped = []
for d in decls:
name = d["name"]
if name in seenNames:
continue # duplicate from header re-include, etc.
seenNames.add(name)
retType = d["ret"]
argTypes = d["args"]
tool = d["tool"]
dispatcher = d["dispatcher"]
# Check if all types are known.
retSize, retC = typeInfo(retType)
argInfo = [typeInfo(a) for a in argTypes]
if any(ai is None for ai in argInfo):
skipped.append((name, "unknown arg type"))
continue
# Build C-style arg list.
cArgs = ", ".join(f"{ai[1]} a{i}" for i, ai in enumerate(argInfo))
if not cArgs:
cArgs = "void"
cDecl = f"{retC} {name}({cArgs});"
# Decide inline vs asm.
# Simple cases that can be inlined: no args (with or without 16-bit
# return), or single 16-bit arg with void return / 16-bit return.
canInline = False
if not argInfo and retSize in (0, 2):
canInline = True
elif (
len(argInfo) == 1
and argInfo[0][0] == 2
and retSize in (0, 2)
):
canInline = True
dispAddr = "0xe10000" if dispatcher == "dispatcher" else "0xe100a8"
if canInline:
# Generate inline asm body.
if not argInfo:
if retSize == 0:
body = (
f' __asm__ volatile (\n'
f' "ldx #0x{tool:04X}\\n"\n'
f' "jsl {dispAddr}\\n"\n'
f' :\n'
f' :\n'
f' : "a", "x", "y", "memory"\n'
f' );\n'
)
else: # 16-bit return
body = (
f' {retC} _r;\n'
f' __asm__ volatile (\n'
f' "pha\\n" // result space\n'
f' "ldx #0x{tool:04X}\\n"\n'
f' "jsl {dispAddr}\\n"\n'
f' "pla\\n"\n'
f' : "=a"(_r)\n'
f' :\n'
f' : "x", "y", "memory"\n'
f' );\n'
f' return _r;\n'
)
else: # 1-arg
if retSize == 0:
body = (
f' __asm__ volatile (\n'
f' "pha\\n" // arg0\n'
f' "ldx #0x{tool:04X}\\n"\n'
f' "jsl {dispAddr}\\n"\n'
f' :\n'
f' : "a"(a0)\n'
f' : "x", "y", "memory"\n'
f' );\n'
)
else:
body = (
f' {retC} _r;\n'
f' __asm__ volatile (\n'
f' "pha\\n" // result space\n'
f' "pha\\n" // arg0\n'
f' "ldx #0x{tool:04X}\\n"\n'
f' "jsl {dispAddr}\\n"\n'
f' "pla\\n"\n'
f' : "=a"(_r)\n'
f' : "a"(a0)\n'
f' : "x", "y", "memory"\n'
f' );\n'
f' return _r;\n'
)
cLines.append(f"// tool 0x{tool:04X} set 0x{tool & 0xFF:02X} ({TOOLSET_NAME.get(tool & 0xFF, '?')})")
cLines.append(f"static inline {retC} {name}({cArgs}) {{")
cLines.append(body.rstrip())
cLines.append("}")
cLines.append("")
inlineCount += 1
else:
# Extern decl in header, asm body in .s file.
cLines.append(f"extern {retC} {name}({cArgs}); // 0x{tool:04X}")
# Generate asm body.
sLines.append(f"; {name}({', '.join(argTypes) or 'void'}) -> {retType}")
sLines.append(f"; tool 0x{tool:04X}, set 0x{tool & 0xFF:02X} ({TOOLSET_NAME.get(tool & 0xFF, '?')})")
sLines.append(f"\t.globl {name}")
sLines.append(f"{name}:")
# Compute total stack arg bytes (excluding arg0 which is in regs).
# Determine where each arg starts on the caller's stack.
# arg0 is in A (or A:X for i32-first-arg).
firstArgIs32 = argInfo and argInfo[0][0] == 4
stackArgStart = 4 # offset to first stack-passed arg after JSL retaddr
# Stash arg0. i16: 'sta scratch'. i32: 'sta scratch; stx scratch+2'.
scratchDP = 0xE0 # libcall scratch zone
sLines.append(f"\t; --- stash arg0 (in A{'/X' if firstArgIs32 else ''}) ---")
sLines.append(f"\tsta 0x{scratchDP:02X}")
if firstArgIs32:
sLines.append(f"\tstx 0x{scratchDP + 2:02X}")
# Push result space (toolbox order: result is highest on stack).
if retSize > 0:
sLines.append(f"\t; --- result space ({retSize} bytes) ---")
for _ in range((retSize + 1) // 2):
sLines.append(f"\tpea 0")
# Push args in Pascal order (L-to-R, but each multi-byte value
# pushed lo-word first then hi-word per ORCA convention).
# Tracker: how many bytes have we pushed beyond the original
# caller-stack so all stack-arg loads need to add (pushed) to
# their original offset.
pushedBytes = (retSize + 1) // 2 * 2 # result space rounded up to word
# arg0 first.
sLines.append(f"\t; --- arg0 ---")
sLines.append(f"\tlda 0x{scratchDP:02X}")
sLines.append(f"\tpha")
pushedBytes += 2
if firstArgIs32:
sLines.append(f"\tlda 0x{scratchDP + 2:02X}")
sLines.append(f"\tpha")
pushedBytes += 2
# arg1, arg2, ... — each loaded from caller stack at original
# offset + pushedBytes.
stackArgOffset = stackArgStart # original offset of next arg
for i, ai in enumerate(argInfo[1:], start=1):
size = ai[0]
sLines.append(f"\t; --- arg{i} ({argTypes[i]}, {size}B) ---")
# i16 / 16-bit-on-stack args: 1 word, push lo
# i32 / 32-bit-on-stack: 2 words, push lo then hi
# We're loading from caller's pre-push stack. Original
# offsets: arg1 at 4, arg2 at 4+size(arg1), ...
# But each load from `(orig+pushed),s` accounts for pushes.
if size <= 2:
sLines.append(f"\tlda {stackArgOffset + pushedBytes}, s")
sLines.append(f"\tpha")
pushedBytes += 2
stackArgOffset += 2
elif size == 4:
# Load lo, push; load hi, push.
sLines.append(f"\tlda {stackArgOffset + pushedBytes}, s")
sLines.append(f"\tpha")
pushedBytes += 2
sLines.append(f"\tlda {stackArgOffset + pushedBytes}, s")
sLines.append(f"\tpha")
pushedBytes += 2
stackArgOffset += 4
else:
# Bigger types (8-byte Comp, 10-byte Extended) — push word by word.
nWords = (size + 1) // 2
for _ in range(nWords):
sLines.append(f"\tlda {stackArgOffset + pushedBytes}, s")
sLines.append(f"\tpha")
pushedBytes += 2
stackArgOffset += size
# Dispatch.
sLines.append(f"\tldx #0x{tool:04X}")
sLines.append(f"\tjsl {dispAddr}")
# Pop result.
if retSize == 2:
sLines.append(f"\tpla ; result -> A")
elif retSize == 4:
sLines.append(f"\tpla ; result lo -> A")
sLines.append(f"\tplx ; result hi -> X")
elif retSize > 4:
# Larger results: pop into scratch then load A/X for return.
# Treat as "best effort" — caller should not expect a real
# return value beyond what fits in A:X.
nWords = (retSize + 1) // 2
for _ in range(nWords):
sLines.append(f"\tpla")
sLines.append(f"\trtl")
sLines.append("")
asmCount += 1
cLines.append("")
cLines.append("#ifdef __cplusplus")
cLines.append("}")
cLines.append("#endif")
cLines.append("")
cLines.append("#endif // IIGS_TOOLBOX_H")
OUT_HEADER.write_text("\n".join(cLines))
OUT_ASM.write_text("\n".join(sLines))
print(f"wrote {OUT_HEADER}: {inlineCount} inline + {asmCount} extern decls")
print(f"wrote {OUT_ASM}: {asmCount} bodies")
if skipped:
print(f"skipped {len(skipped)} routines (unhandled types):")
for n, why in skipped[:5]:
print(f" {n}: {why}")
def main():
decls = []
for h in sorted(ORCA_DIR.glob("*.h")):
for line in h.read_text().splitlines():
d = parseLine(line)
if d:
decls.append(d)
print(f"parsed {len(decls)} declarations from {ORCA_DIR}")
emit(decls)
if __name__ == "__main__":
main()

View file

@ -3601,6 +3601,345 @@ EOF
fi fi
rm -f "$cDpFile" "$oDpFile" "$binDpFile" rm -f "$cDpFile" "$oDpFile" "$binDpFile"
# Memory-backed file I/O. mfsRegister stages a buffer as a
# named file, then fopen/fread/fwrite/fseek/ftell/fclose
# operate on it. Verifies fopen returns a non-NULL FILE,
# fread copies bytes into the caller's buffer, ftell advances,
# fseek rewinds, fclose succeeds, fprintf into a writable
# in-memory file produces the expected formatted bytes.
log "check: MAME runs memory-backed stdio (fopen/fread/fseek/fprintf)"
cFioFile="$(mktemp --suffix=.c)"
oFioFile="$(mktemp --suffix=.o)"
binFioFile="$(mktemp --suffix=.bin)"
cat > "$cFioFile" <<'EOF'
extern int mfsRegister(const char *path, void *buf, unsigned int size, unsigned int cap, int writable);
extern struct __sFILE *fopen(const char *path, const char *mode);
extern unsigned int fread(void *p, unsigned int s, unsigned int n, struct __sFILE *f);
extern int fseek(struct __sFILE *f, long off, int whence);
extern long ftell(struct __sFILE *f);
extern int fclose(struct __sFILE *f);
extern int fgetc(struct __sFILE *f);
extern int fprintf(struct __sFILE *f, const char *fmt, ...);
extern int strcmp(const char *a, const char *b);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
static char data[14] = "Hello, world!";
static char wbuf[64];
static char rbuf[32];
int main(void) {
unsigned short ok = 0;
if (mfsRegister("greet", data, 13, 13, 0) == 0) ok |= 0x01;
struct __sFILE *f = fopen("greet", "r");
if (f) ok |= 0x02;
unsigned int n = fread(rbuf, 1, 13, f);
rbuf[13] = 0;
if (n == 13 && strcmp(rbuf, "Hello, world!") == 0) ok |= 0x04;
if (ftell(f) == 13L) ok |= 0x08;
fseek(f, 0L, 0);
if (fgetc(f) == 'H') ok |= 0x10;
if (fclose(f) == 0) ok |= 0x20;
if (mfsRegister("out", wbuf, 0, 64, 1) == 0) ok |= 0x40;
f = fopen("out", "w");
int wlen = fprintf(f, "n=%d", 42);
if (wlen == 4 && wbuf[0] == 'n' && wbuf[1] == '=' && wbuf[2] == '4' && wbuf[3] == '2')
ok |= 0x80;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cFioFile" -o "$oFioFile"
"$PROJECT_ROOT/tools/link816" -o "$binFioFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oExtrasF" "$oSnprintfF" \
"$oSfF" "$oSdF" "$oLibgccFile" "$oFioFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binFioFile" --check \
0x025000=00ff >/dev/null 2>&1; then
die "MAME: memory-backed file I/O bitmap != 0xFF (mfsRegister/fopen/fread/fwrite/fseek regression)"
fi
rm -f "$cFioFile" "$oFioFile" "$binFioFile"
# wchar.h + signal.h. wcslen/wcscmp/wcscpy/wcschr cover the
# core wide-char family; mbtowc/wctomb verify the trivial 1:1
# Latin-1 mapping. signal()/raise() are exercised by
# installing a handler, raising, and verifying the handler ran.
log "check: MAME runs wchar.h + signal.h core API"
cWsFile="$(mktemp --suffix=.c)"
oWsFile="$(mktemp --suffix=.o)"
binWsFile="$(mktemp --suffix=.bin)"
cat > "$cWsFile" <<'EOF'
#include <wchar.h>
#include <signal.h>
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
static volatile int sigSeen = 0;
static void onSig(int s) { sigSeen = s; }
int main(void) {
unsigned short ok = 0;
static const wchar_t hello[] = { 'h','e','l','l','o',0 };
static const wchar_t hellp[] = { 'h','e','l','l','p',0 };
wchar_t buf[16];
if (wcslen(hello) == 5) ok |= 0x01;
if (wcscmp(hello, hello) == 0) ok |= 0x02;
if (wcscmp(hello, hellp) < 0) ok |= 0x04;
wcscpy(buf, hello);
if (wcscmp(buf, hello) == 0) ok |= 0x08;
if (wcschr(hello, 'l') == hello + 2) ok |= 0x10;
char mb[8]; wchar_t wc;
int n = mbtowc(&wc, "A", 1);
if (n == 1 && wc == 'A') ok |= 0x20;
if (wctomb(mb, 'Z') == 1 && mb[0] == 'Z') ok |= 0x40;
// signal: install handler, raise, verify it fired.
signal(SIGABRT, onSig); // would normally abort; we override
signal(SIGFPE, onSig);
raise(SIGFPE);
if (sigSeen == SIGFPE) ok |= 0x80;
switchToBank2();
*(volatile unsigned short *)0x5000 = ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -I"$PROJECT_ROOT/runtime/include" -c \
"$cWsFile" -o "$oWsFile"
"$PROJECT_ROOT/tools/link816" -o "$binWsFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oExtrasF" "$oLibgccFile" "$oWsFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binWsFile" --check \
0x025000=00ff >/dev/null 2>&1; then
die "MAME: wchar/signal core != 0xFF (wcs* / mbtowc / signal/raise regression)"
fi
rm -f "$cWsFile" "$oWsFile" "$binWsFile"
# C++ subset: classes, single inheritance, virtual functions,
# polymorphism via base-class pointer arrays, virtual dtors.
# Compiled with -fno-exceptions -fno-rtti (the supported subset
# — full RTTI / exceptions / multi-inheritance with virtual
# bases are not supported).
log "check: MAME runs C++ polymorphism (virtuals + single inheritance)"
cppFile="$(mktemp --suffix=.cpp)"
oCppFile="$(mktemp --suffix=.o)"
binCppFile="$(mktemp --suffix=.bin)"
cat > "$cppFile" <<'EOF'
extern "C" __attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
class Shape {
public:
virtual int area() const = 0;
virtual int perimeter() const = 0;
virtual ~Shape() {}
};
class Rect : public Shape {
int w, h;
public:
Rect(int w, int h) : w(w), h(h) {}
int area() const override { return w * h; }
int perimeter() const override { return 2 * (w + h); }
};
class Square : public Rect {
public:
Square(int s) : Rect(s, s) {}
};
class Circle : public Shape {
int r;
public:
Circle(int r) : r(r) {}
int area() const override { return (314 * r * r) / 100; }
int perimeter() const override { return (628 * r) / 100; }
};
static int sumAreas(Shape **shapes, int n) {
int total = 0;
for (int i = 0; i < n; i++) total += shapes[i]->area();
return total;
}
extern "C" int main(void) {
Rect r(3, 4); Square s(5); Circle c(2);
Shape *arr[3] = { &r, &s, &c };
int total = sumAreas(arr, 3);
int ok = 0;
if (r.area() == 12) ok |= 1;
if (r.perimeter() == 14) ok |= 2;
if (s.area() == 25) ok |= 4;
if (c.area() == 12) ok |= 8;
if (total == 49) ok |= 0x10;
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
}
EOF
"$PROJECT_ROOT/tools/llvm-mos-build/bin/clang++" --target=w65816 -O2 \
-ffunction-sections -fno-exceptions -fno-rtti \
-c "$cppFile" -o "$oCppFile"
"$PROJECT_ROOT/tools/link816" -o "$binCppFile" --text-base 0x1000 \
"$oCrt0F" "$oLibgccFile" "$oCppFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binCppFile" --check \
0x025000=001f >/dev/null 2>&1; then
die "MAME: C++ polymorphism != 0x1F (vtable / virtual call regression)"
fi
rm -f "$cppFile" "$oCppFile" "$binCppFile"
# Real-world: hex dumper using memory-backed file I/O. Reads
# 16 bytes from a registered "in" file, writes a hex+ASCII
# dump to a registered "out" file via fprintf. Verifies the
# output via strstr lookups (clang DCE's static-buffer
# byte-reads after extern fn calls — strstr defeats that).
log "check: MAME runs hex dumper (file I/O + fprintf real-world)"
cHdFile="$(mktemp --suffix=.c)"
oHdFile="$(mktemp --suffix=.o)"
binHdFile="$(mktemp --suffix=.bin)"
cat > "$cHdFile" <<'EOF'
extern int mfsRegister(const char *path, void *buf, unsigned int size, unsigned int cap, int writable);
extern struct __sFILE *fopen(const char *path, const char *mode);
extern int fclose(struct __sFILE *f);
extern int fgetc(struct __sFILE *f);
extern int fprintf(struct __sFILE *f, const char *fmt, ...);
extern char *strstr(const char *h, const char *n);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
__attribute__((noinline)) void hexdump(struct __sFILE *in, struct __sFILE *out) {
unsigned int offset = 0;
unsigned char line[16];
int linelen;
while (1) {
linelen = 0;
while (linelen < 16) {
int c = fgetc(in);
if (c < 0) break;
line[linelen++] = (unsigned char)c;
}
if (linelen == 0) break;
fprintf(out, "%04x: ", offset);
for (int i = 0; i < 16; i++) {
if (i < linelen) fprintf(out, "%02x ", line[i]);
else fprintf(out, " ");
}
fprintf(out, " |");
for (int i = 0; i < linelen; i++) {
unsigned char c = line[i];
int p = (c >= 0x20 && c < 0x7F) ? c : '.';
fprintf(out, "%c", p);
}
fprintf(out, "|\n");
offset += linelen;
if (linelen < 16) break;
}
}
static char input[16] = { 'H','e','l','l','o','!','\n','A','B','C',0,1,2,3,4,5 };
static char output[300];
int main(void) {
mfsRegister("in", input, 16, 16, 0);
mfsRegister("out", output, 0, 300, 1);
struct __sFILE *in = fopen("in", "r");
struct __sFILE *out = fopen("out", "w");
hexdump(in, out);
fclose(in); fclose(out);
int ok = 0;
if (strstr(output, "0000:")) ok |= 1;
if (strstr(output, "48 65 6c 6c 6f 21")) ok |= 2;
if (strstr(output, "|Hello!.ABC......|")) ok |= 4;
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cHdFile" -o "$oHdFile"
"$PROJECT_ROOT/tools/link816" -o "$binHdFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oExtrasF" "$oSnprintfF" \
"$oSfF" "$oSdF" "$oLibgccFile" "$oHdFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binHdFile" --check \
0x025000=0007 >/dev/null 2>&1; then
die "MAME: hex dumper output strstr lookups failed"
fi
rm -f "$cHdFile" "$oHdFile" "$binHdFile"
# Real-world: JSON tokenizer. Walks a literal JSON string,
# producing token-type counts. Exercises a state machine
# over char-by-char input, mixed string/number/keyword
# parsing, strncmp on keywords, and 16-bit globals. ~50
# lines of code, ~10 distinct token types.
log "check: MAME runs JSON tokenizer (state machine + strncmp)"
cJsFile="$(mktemp --suffix=.c)"
oJsFile="$(mktemp --suffix=.o)"
binJsFile="$(mktemp --suffix=.bin)"
cat > "$cJsFile" <<'EOF'
extern int strncmp(const char *a, const char *b, unsigned int n);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
enum { TOK_LBRACE, TOK_RBRACE, TOK_LBRACK, TOK_RBRACK, TOK_COMMA, TOK_COLON,
TOK_STRING, TOK_NUMBER, TOK_TRUE, TOK_FALSE, TOK_NULL, TOK_EOF, TOK_ERR };
static const char *p;
static int counts[16];
__attribute__((noinline)) static int nextToken(void) {
while (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r') p++;
if (*p == 0) return TOK_EOF;
if (*p == '{') { p++; return TOK_LBRACE; }
if (*p == '}') { p++; return TOK_RBRACE; }
if (*p == '[') { p++; return TOK_LBRACK; }
if (*p == ']') { p++; return TOK_RBRACK; }
if (*p == ',') { p++; return TOK_COMMA; }
if (*p == ':') { p++; return TOK_COLON; }
if (*p == '"') {
p++;
while (*p && *p != '"') p++;
if (*p == '"') p++;
return TOK_STRING;
}
if (*p == '-' || (*p >= '0' && *p <= '9')) {
if (*p == '-') p++;
while (*p >= '0' && *p <= '9') p++;
return TOK_NUMBER;
}
if (strncmp(p, "true", 4) == 0) { p += 4; return TOK_TRUE; }
if (strncmp(p, "false", 5) == 0) { p += 5; return TOK_FALSE; }
if (strncmp(p, "null", 4) == 0) { p += 4; return TOK_NULL; }
return TOK_ERR;
}
__attribute__((noinline)) static void tokenize(const char *src) {
p = src;
int t;
while ((t = nextToken()) != TOK_EOF && t != TOK_ERR) {
if (t < 16) counts[t]++;
}
}
int main(void) {
static const char input[] =
"{\"name\": \"alice\", \"age\": 30, \"isCool\": true, \"things\": [1, 2, null]}";
tokenize(input);
int ok = 0;
if (counts[TOK_LBRACE] == 1) ok |= 0x01;
if (counts[TOK_RBRACE] == 1) ok |= 0x02;
if (counts[TOK_LBRACK] == 1) ok |= 0x04;
if (counts[TOK_RBRACK] == 1) ok |= 0x08;
if (counts[TOK_COMMA] == 5) ok |= 0x10;
if (counts[TOK_COLON] == 4) ok |= 0x20;
if (counts[TOK_STRING] == 5) ok |= 0x40;
if (counts[TOK_NUMBER] == 3) ok |= 0x80;
if (counts[TOK_TRUE] == 1) ok |= 0x100;
if (counts[TOK_NULL] == 1) ok |= 0x200;
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cJsFile" -o "$oJsFile"
"$PROJECT_ROOT/tools/link816" -o "$binJsFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oLibgccFile" "$oJsFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binJsFile" --check \
0x025000=03ff >/dev/null 2>&1; then
die "MAME: JSON tokenizer count bitmap != 0x3ff"
fi
rm -f "$cJsFile" "$oJsFile" "$binJsFile"
rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \ rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F" "$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F"
else else
@ -3628,54 +3967,68 @@ EOF
die "inline asm: 'inc a' missing from output" die "inline asm: 'inc a' missing from output"
fi fi
# bench.sh runs the size-comparison harness against Calypsi.
# Smoke just verifies it produces a non-empty markdown table —
# actual ratios are reported in STATUS.
log "check: scripts/bench.sh runs (size vs Calypsi)"
benchOut="$(mktemp)"
bash "$PROJECT_ROOT/scripts/bench.sh" >"$benchOut" 2>/dev/null
if ! grep -q '^| \*\*total\*\*' "$benchOut"; then
die "bench.sh did not produce a totals row"
fi
rm -f "$benchOut"
# iigs/toolbox.h compiles cleanly and emits the JSL $E10000 dispatch # iigs/toolbox.h compiles cleanly and emits the JSL $E10000 dispatch
# for at least one wrapper. Don't run in MAME (toolbox needs the # for at least one wrapper. Don't run in MAME (toolbox needs the
# real ROM dispatcher, smoke runs in bare-CPU mode); just check # real ROM dispatcher, smoke runs in bare-CPU mode); just check
# the codegen. # the codegen.
log "check: iigs/toolbox.h wrappers compile and emit JSL E10000" # iigs/toolbox.h — autogenerated wrappers for the entire IIgs
# toolbox (~1300 routines from 35 tool sets, sourced from ORCA-C
# ORCACDefs/ via scripts/genToolbox.py). Names match Apple's
# IIgs Toolbox Reference (TLStartUp, MMStartUp, NewWindow,
# SysBeep, ...). Verify the header compiles, the multi-arg
# asm bodies in iigsToolbox.s assemble, and that linking
# together produces a binary that emits the JSL $E10000 (Tool
# Locator) and JSL $E100A8 (GS/OS) dispatchers. Don't run in
# MAME (toolbox needs the real ROM dispatcher).
log "check: iigs/toolbox.h (autogenerated, ~1300 routines, Apple names)"
cToolFile="$(mktemp --suffix=.c)" cToolFile="$(mktemp --suffix=.c)"
sToolFile="$(mktemp --suffix=.s)" sToolFile="$(mktemp --suffix=.s)"
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$oFile2" "$cI32File" "$oI32File" "$cFibFile" "$sFibFile" "$cMulFile" "$sMulFile" "$cAllocaFile" "$sAllocaFile" "$cStrFile" "$sStrFile" "$cIndFile" "$sIndFile" "$irCoalesceFile" "$sCoalesceFile" "$cMixFile" "$sMixFile" "$cLinkFile" "$oLinkFile" "$oLibgccFile" "$binLinkFile" "$mapLinkFile" "$cFltFile" "$oFltFile" "$oSfFile" "$binFltFile" "$mapFltFile" "$cAsmFile" "$sAsmFile" "$cToolFile" "$sToolFile"' EXIT trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$oFile2" "$cI32File" "$oI32File" "$cFibFile" "$sFibFile" "$cMulFile" "$sMulFile" "$cAllocaFile" "$sAllocaFile" "$cStrFile" "$sStrFile" "$cIndFile" "$sIndFile" "$irCoalesceFile" "$sCoalesceFile" "$cMixFile" "$sMixFile" "$cLinkFile" "$oLinkFile" "$oLibgccFile" "$binLinkFile" "$mapLinkFile" "$cFltFile" "$oFltFile" "$oSfFile" "$binFltFile" "$mapFltFile" "$cAsmFile" "$sAsmFile" "$cToolFile" "$sToolFile"' EXIT
cat > "$cToolFile" <<'EOF' cat > "$cToolFile" <<'EOF'
#include <iigs/toolbox.h> #include <iigs/toolbox.h>
void greet(void) { // Cover wrappers across multiple tool sets to verify the header
TBoxWriteCString("Hello"); // compiles and the multi-arg asm bodies in iigsToolbox.s link.
TBoxBeep(); void useToolLocator(void) {
TLStartUp(); TLShutDown(); TLBootInit(); TLReset();
unsigned short v = TLVersion(); (void)v;
} }
// Cover all wrappers: ensures the multi-arg ones (declared extern in void useMM(void) {
// the header, implemented in iigsToolbox.s) at least link. unsigned short id = MMStartUp();
void everything(void) { MMShutDown(id);
void *h = NewHandle(1024UL, id, 0, 0UL);
DisposeHandle(h);
}
void useEvent(void) {
unsigned short b = Button(0); (void)b;
unsigned long t = TickCount(); (void)t;
}
void useQD(void) {
short rect[4] = {0, 0, 100, 100}; short rect[4] = {0, 0, 100, 100};
char buf[20]; PaintRect(rect); FrameRect(rect); MoveTo(50, 50);
char buf2[16];
TBoxTLStartUp(); TBoxTLShutDown();
unsigned short id = TBoxMMStartUp();
unsigned long h = TBoxNewHandle(1024UL, id, 0, 0UL);
TBoxDisposeHandle(h);
TBoxMMShutDown(id);
TBoxReadAsciiTime(buf);
TBoxMoveTo(10, 20);
TBoxFrameRect(rect); TBoxPaintRect(rect); TBoxEraseRect(rect);
TBoxDrawString("\005hello");
TBoxQDStartUp(0x80, 0x1A00, id); TBoxQDShutDown();
TBoxEMStartUp(id); TBoxEMShutDown(); TBoxSystemTask();
TBoxGetNextEvent(0xFFFF, buf2);
void *win = TBoxNewWindow((const void *)0x5000);
TBoxCloseWindow(win);
char k = TBoxReadKey();
(void)k;
} }
void useMisc(void) { SysBeep(); }
EOF EOF
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \ "$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
-S "$cToolFile" -o "$sToolFile" -S "$cToolFile" -o "$sToolFile"
if ! grep -qE '\bjsl\s+0xe10000\b' "$sToolFile"; then if ! grep -qE '\bjsl\s+0xe10000\b' "$sToolFile"; then
die "iigs/toolbox.h: JSL \$E10000 (Tool Locator) not emitted" die "iigs/toolbox.h: JSL \$E10000 (Tool Locator) not emitted"
fi fi
if ! grep -qE '\bldx\s+#0x290[Bb]\b' "$sToolFile"; then # SysBeep tool number $2C03 per ORCA (function $2C of Misc Tools $03).
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output" # Match case-insensitively — clang lowercases hex constants.
if ! grep -qiE '\bldx\s+#0x2c03\b' "$sToolFile"; then
die "iigs/toolbox.h: SysBeep tool number 0x2C03 not in output"
fi fi
# Make sure the multi-arg wrappers in iigsToolbox.s assemble and
# linking the test object against them succeeds.
oToolFile="$(mktemp --suffix=.o)" oToolFile="$(mktemp --suffix=.o)"
oToolboxAsm="$(mktemp --suffix=.o)" oToolboxAsm="$(mktemp --suffix=.o)"
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \ "$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \

View file

@ -107,8 +107,11 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
getSubtargetInfo().getFeatureBits()); getSubtargetInfo().getFeatureBits());
// Drop a SEP that the previous LDAi8imm expansion marked redundant. // Drop a SEP that the previous LDAi8imm expansion marked redundant.
// The LDAi8imm peephole leaves M=8 set when its successor is a SEP // The LDAi8imm peephole leaves M=8 set when its successor (or the
// #$20 — that SEP would re-set the same flag, so we elide it. // next non-mode-neutral MI) is a SEP #$20 — that SEP would re-set
// the same flag, so we elide it. Mode-neutral MIs (X-flag-only
// index ops, branches, transfers that don't touch A) pass through
// freely without invalidating the skip.
if (SkipNextSepImm >= 0 && !MI->isDebugInstr()) { if (SkipNextSepImm >= 0 && !MI->isDebugInstr()) {
if (MI->getOpcode() == W65816::SEP && if (MI->getOpcode() == W65816::SEP &&
MI->getNumOperands() >= 1 && MI->getOperand(0).isImm() && MI->getNumOperands() >= 1 && MI->getOperand(0).isImm() &&
@ -116,10 +119,29 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
SkipNextSepImm = -1; SkipNextSepImm = -1;
return; // consume the SEP, don't emit return; // consume the SEP, don't emit
} }
// Conservative: any non-debug, non-matching MI between LDAi8imm // Check if MI is mode-neutral; if so, pass through and KEEP the skip.
// and the expected SEP invalidates the elision (it might re-clear bool isMNeutral = false;
// M, observe P, etc.). Reset and proceed normally. if (MI->isBranch() || MI->isReturn()) isMNeutral = true;
SkipNextSepImm = -1; else switch (MI->getOpcode()) {
case W65816::LDX_Imm16: case W65816::LDX_DP: case W65816::LDX_Abs:
case W65816::LDX_DPY: case W65816::LDX_AbsY:
case W65816::LDY_Imm16: case W65816::LDY_DP: case W65816::LDY_Abs:
case W65816::LDY_DPX: case W65816::LDY_AbsX:
case W65816::STX_DP: case W65816::STX_Abs: case W65816::STX_DPY:
case W65816::STY_DP: case W65816::STY_Abs: case W65816::STY_DPX:
case W65816::INX: case W65816::DEX:
case W65816::INY: case W65816::DEY:
case W65816::CPX_Imm16: case W65816::CPX_DP: case W65816::CPX_Abs:
case W65816::CPY_Imm16: case W65816::CPY_DP: case W65816::CPY_Abs:
case W65816::PHX: case W65816::PHY:
case W65816::PLX: case W65816::PLY:
case W65816::NOP:
isMNeutral = true; break;
default: break;
}
// Anything else invalidates the elision (might re-clear M, push/pop
// 8-bit P that observes mode, call out, etc.).
if (!isMNeutral) SkipNextSepImm = -1;
} }
// Drop the STAabs that the LDAi16imm-0 peephole replaced with STZ. // Drop the STAabs that the LDAi16imm-0 peephole replaced with STZ.
@ -318,8 +340,37 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
Lda.addOperand(MCOperand::createImm(Val)); Lda.addOperand(MCOperand::createImm(Val));
EmitToStreamer(*OutStreamer, Lda); EmitToStreamer(*OutStreamer, Lda);
bool SkipRep = false; bool SkipRep = false;
// Walk past mode-neutral MIs (X-flag-only ops, branches, transfers
// that don't touch A) to find the next SEP/REP — same idea as the
// pre-emit REP/SEP scheduler, but applied to LDAi8imm's closing
// REP. If a SEP #$20 sits there, we can elide the REP+SEP pair.
auto isMNeutralMI = [](const MachineInstr &MI) -> bool {
if (MI.isDebugInstr()) return true;
if (MI.isBranch() || MI.isReturn()) return true;
unsigned O = MI.getOpcode();
switch (O) {
case W65816::LDX_Imm16: case W65816::LDX_DP: case W65816::LDX_Abs:
case W65816::LDX_DPY: case W65816::LDX_AbsY:
case W65816::LDY_Imm16: case W65816::LDY_DP: case W65816::LDY_Abs:
case W65816::LDY_DPX: case W65816::LDY_AbsX:
case W65816::STX_DP: case W65816::STX_Abs: case W65816::STX_DPY:
case W65816::STY_DP: case W65816::STY_Abs: case W65816::STY_DPX:
case W65816::INX: case W65816::DEX:
case W65816::INY: case W65816::DEY:
case W65816::CPX_Imm16: case W65816::CPX_DP: case W65816::CPX_Abs:
case W65816::CPY_Imm16: case W65816::CPY_DP: case W65816::CPY_Abs:
case W65816::PHX: case W65816::PHY:
case W65816::PLX: case W65816::PLY:
case W65816::CLC: case W65816::SEC:
case W65816::PHP: case W65816::PLP:
case W65816::NOP:
return true;
default:
return false;
}
};
auto It = std::next(MI->getIterator()); auto It = std::next(MI->getIterator());
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It; while (It != MI->getParent()->end() && isMNeutralMI(*It)) ++It;
if (It != MI->getParent()->end() && if (It != MI->getParent()->end() &&
It->getOpcode() == W65816::SEP && It->getOpcode() == W65816::SEP &&
It->getNumOperands() >= 1 && It->getOperand(0).isImm() && It->getNumOperands() >= 1 && It->getOperand(0).isImm() &&

View file

@ -307,6 +307,93 @@ bool W65816SepRepCleanup::runOnMachineFunction(MachineFunction &MF) {
Changed = true; Changed = true;
} }
// Extended toggle coalesce — REP/SEP scheduling.
//
// Walk the MBB looking for `T1 ; ...neutral... ; T2` where T1 and
// T2 are opposite-polarity SEP/REP toggles (T1=REP T2=SEP, or
// vice versa) with the same imm, and the gap contains only
// M-mode-neutral instructions (transfers/branches/X-flag-only
// index ops). In that case T1+T2 form a no-op pair around code
// that doesn't care about M, so both can be dropped. Equivalent
// to "moving the SEP/REP wrap inward to skip the neutral region".
//
// Saves 4 bytes / 12 cycles per gap collapsed. The common
// trigger is two STA8 stores separated by an LDY for the second
// store's address — STA8fi each emit SEP/STA/REP, the existing
// adjacent coalesce can't see across the LDY, this pass can.
{
// Mode-neutral instruction set: don't touch the M-bit and
// don't depend on A's width. X-flag dependent ops (LDX/LDY/
// STX/STY/INX/DEX/INY/DEY/CPX/CPY/PHX/PHY/PLX/PLY) are
// independent of M. So are all branches, JMP/JSR/JSL/RTL/RTS,
// CLC/SEC/CLI/SEI/CLD/SED/CLV, NOP, and PHP/PLP (they push
// 8-bit P regardless of M).
auto isMNeutral = [](const MachineInstr &MI) -> bool {
if (MI.isDebugInstr()) return true;
if (MI.isBranch() || MI.isReturn()) return true;
unsigned O = MI.getOpcode();
switch (O) {
case W65816::LDX_Imm16: case W65816::LDX_DP: case W65816::LDX_Abs:
case W65816::LDX_DPY: case W65816::LDX_AbsY:
case W65816::LDY_Imm16: case W65816::LDY_DP: case W65816::LDY_Abs:
case W65816::LDY_DPX: case W65816::LDY_AbsX:
case W65816::STX_DP: case W65816::STX_Abs: case W65816::STX_DPY:
case W65816::STY_DP: case W65816::STY_Abs: case W65816::STY_DPX:
case W65816::INX: case W65816::DEX:
case W65816::INY: case W65816::DEY:
case W65816::CPX_Imm16: case W65816::CPX_DP: case W65816::CPX_Abs:
case W65816::CPY_Imm16: case W65816::CPY_DP: case W65816::CPY_Abs:
case W65816::PHX: case W65816::PHY:
case W65816::PLX: case W65816::PLY:
case W65816::CLC: case W65816::SEC:
case W65816::PHP: case W65816::PLP:
case W65816::NOP:
return true;
default:
return false;
}
};
bool again = true;
while (again) {
again = false;
for (auto It = MBB.begin(); It != MBB.end(); ++It) {
unsigned Op1 = It->getOpcode();
if (Op1 != W65816::REP && Op1 != W65816::SEP) continue;
if (It->getNumOperands() < 1 || !It->getOperand(0).isImm()) continue;
int Imm1 = It->getOperand(0).getImm();
if (Imm1 != 0x20) continue; // M-bit only
// Walk forward across mode-neutral ops looking for the matching
// opposite toggle. Bail at calls, asm, ALU ops on A, etc.
unsigned WantOp = (Op1 == W65816::REP) ? W65816::SEP : W65816::REP;
auto Walker = std::next(It);
MachineInstr *Match = nullptr;
while (Walker != MBB.end()) {
if (Walker->isDebugInstr()) { ++Walker; continue; }
unsigned WO = Walker->getOpcode();
if (WO == WantOp && Walker->getNumOperands() >= 1 &&
Walker->getOperand(0).isImm() &&
Walker->getOperand(0).getImm() == Imm1) {
Match = &*Walker;
break;
}
// Bail on anything that touches A or otherwise cares about M.
if (Walker->isCall() || Walker->isInlineAsm()) break;
if (!isMNeutral(*Walker)) break;
++Walker;
}
if (!Match) continue;
// Drop both toggles. Erasing changes iterator stability; restart.
MachineInstr *T1 = &*It;
T1->eraseFromParent();
Match->eraseFromParent();
Changed = true;
again = true;
break;
}
}
}
// Second peephole: collapse `ADCi16imm src, ±1/±2` (and SBCi16imm) // Second peephole: collapse `ADCi16imm src, ±1/±2` (and SBCi16imm)
// into INA/DEA chains when the carry flag they would set is unused. // into INA/DEA chains when the carry flag they would set is unused.
// ADCi16imm is a pseudo (expands to CLC+ADC_Imm16); we rewrite it // ADCi16imm is a pseudo (expands to CLC+ADC_Imm16); we rewrite it