# llvm816 — Current Status LLVM/Clang backend for the WDC 65816 (Apple IIgs), forked from llvm-mos as a separate `W65816` target. ## What works End-to-end C-to-binary toolchain that produces 65816 machine code which runs correctly under MAME (apple2gs). **Language coverage at -O2 (no extra flags):** - All scalar arithmetic: i8 / i16 / i32 / i64 add, sub, mul, div, mod (signed and unsigned). Carry-chained multi-word ops via ADC/SBC pseudos + ASLA16 / shift libcalls. - Comparisons and signed/unsigned widening (sext, zext, trunc) for all the above sizes. - Pointer arithmetic, array indexing, struct field access, struct return-by-value (up to 8 bytes — Pair, Vec4, double). - Bitfields, switch statements (verified up to ~12 cases + default), function pointers, function-pointer tables, indirect calls via `__jsl_indir` trampoline. - Recursion: factorial, Fibonacci, depth-3 binary-tree insert/sum/min/max, simple recursive quicksort. - Loops with goto / break / continue, nested loops, state machines. - `` varargs with int / long / unsigned long long mixed args. - Heap: `malloc` / `free` (libc.c first-fit allocator) — linked-list reverse with `cons` works. - Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa roundtrip. - Soft-float (single): all four ops + comparisons, MAME-verified. - Soft-double: add, sub, mul, div all return correct bit patterns; 3-iter Newton sqrt converges. Long-running iterations may hit MAME's 1-second sim-time budget (test config issue, not a compiler bug). - Inline assembly with `"a"`, `"x"`, `"y"` register constraints and arbitrary opcode bytes (used for the `pha;plb` bank-switch idiom). - C++ minimal: clang++ compiles a class with virtual + non-trivial ctor (vtable + RTTI omitted; no exceptions). - printf with `%d %x %s %c %p` and width/precision specifiers. - `setjmp` / `longjmp` from libgcc.s. - Static constructors via crt0's init_array walk. **Toolchain:** - `clang` / `llc` produce W65816 assembly + ELF object files. - `tools/link816` resolves cross-translation-unit refs, lays out text/rodata/bss, emits a flat binary the IIgs ROM can load. - `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's native object format) for round-tripping with classic dev tools. - `runtime/build.sh` builds crt0, libc, soft-float, soft-double, libgcc into linkable objects. - `scripts/smokeTest.sh` runs ~80 end-to-end checks (scalar ops, control flow, calling conventions, MAME execution, regressions). Currently 100% pass. **ABI:** - arg0 in A; arg1 in X for i32-first-arg signatures; rest pushed RTL on the system stack with PHA. Caller deallocates via `tsc;clc;adc #N;tcs` or `PLY*N/2`. - Return: i8/i16 in A; i32 in A:X; i64 in A:X:Y plus DP[$F0..$F1] for the highest 16 bits. - Frame is empty-descending (S points to next-free); offsets account for the +1 skew vs LLVM's full-descending model. ## In flight (build-system level) - **DWARF sidecar emission in link816** (#51): The link should produce a separate sidecar file with line-number / variable-location info that an IDE or post-mortem dumper can consume. Skeleton not yet written; deferred until other correctness work is done. ## Known issues / workarounds - **Greedy register allocator mis-orders spills** in two patterns (#69, #70): 1. Functions where both `$a` and `$x` are live-in (i64-first-arg with a stack-output pointer, e.g. `udivmod(i64, i64, ptr)`). The TAX bridging `$x` to A clobbers `$a`'s value before the second STA can save it. 2. Iterative quicksort with `if/else` recursion choice: complex live-ranges across two `swap()` calls produce wrong arg values. Both reproduce only at `-O1`/`-O2` with greedy. Workaround: `-mllvm -regalloc=fast` for the affected translation unit. `softDouble.c` already requires this flag for `__muldf3` (build.sh applies it automatically). Real fix is a pre-RA pass that pre-spills critical pointer arguments to memory, or a targeted fix in greedy's spill-ordering heuristic. Material work; deferred. - **(d,s),y / (sr,s),y addressing wraps the bank** when Y is negative as 16-bit unsigned. Worked around by `W65816NegYIndY` rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays correct for negative offsets like `arr[i-1]`. - **(d,s),y for stack-local pointer dereferences uses DBR**, so user code that switches DBR (e.g. `pha;plb` to bank 2 to reach IIgs hardware) must not call into a function that takes the address of one of its locals — the callee's `*p = v` will write to the wrong bank. Documented; no compiler-side mitigation beyond the existing DPF0 fake-physreg routing for the i64-return high half. ## What's still needed for a "ship-ready" toolchain - **Greedy regalloc spill-ordering fix** — see above. Removes the need for the per-file `-regalloc=fast` workaround on `softDouble.c` and unblocks pattern-rich code that currently must be compiled at `-O0` for correctness. - **Round-to-nearest-even in `__divdf3`** — currently truncate-toward-zero, which differs from gcc by ±1 ULP in several test cases. Acceptable today (Newton iterations still converge); revisit when an exact-match test suite lands. - **DWARF sidecar** (#51) for source-level debugging. - **More of the C standard library**: `` transcendental functions (sin, cos, exp, log, pow), `` beyond what's hand-coded, `` file I/O (`fopen`, `fread`, `fwrite`, `fseek`). - **C++ runtime support**: vtable layout for multiple inheritance, RTTI, exceptions (or a documented `-fno-exceptions` requirement). - **REP/SEP scheduling pass** (design doc §3.3): the current prologue picks one M-mode for the whole function based on whether any 8-bit accumulator value is used. A per-region scheduler would reduce the SEP/REP wrap overhead on i8 stores. - **Toolbox / IIgs system call bindings**: header files declaring the Apple IIgs system calls (`SystemTask`, `WaitMouseUp`, `DrawString`, …) with the right inline-asm dispatch glue. - **Real-world program coverage**: the smoke tests are microbenchmarks. A few known-good Apple IIgs C programs (e.g. a textfile pager, a small game) compiled and run end-to-end would catch issues no synthetic test currently exercises. - **Cycle-time / size benchmarks vs Calypsi 5.16**: design doc §1 says the goal is to "match or exceed" Calypsi. We have neither baseline numbers nor a comparison harness yet.