6.4 KiB
llvm816 — Current Status
LLVM/Clang backend for the WDC 65816 (Apple IIgs), forked from
llvm-mos as a separate W65816 target.
What works
End-to-end C-to-binary toolchain that produces 65816 machine code which runs correctly under MAME (apple2gs).
Language coverage at -O2 (no extra flags):
- All scalar arithmetic: i8 / i16 / i32 / i64 add, sub, mul, div, mod
(signed and unsigned). Carry-chained multi-word ops via ADC/SBC pseudos
- ASLA16 / shift libcalls.
- Comparisons and signed/unsigned widening (sext, zext, trunc) for all the above sizes.
- Pointer arithmetic, array indexing, struct field access, struct return-by-value (up to 8 bytes — Pair, Vec4, double).
- Bitfields, switch statements (verified up to ~12 cases + default),
function pointers, function-pointer tables, indirect calls via
__jsl_indirtrampoline. - Recursion: factorial, Fibonacci, depth-3 binary-tree insert/sum/min/max, simple recursive quicksort.
- Loops with goto / break / continue, nested loops, state machines.
<stdarg.h>varargs with int / long / unsigned long long mixed args.- Heap:
malloc/free(libc.c first-fit allocator) — linked-list reverse withconsworks. - Strings: hand-rolled
strlen,strcmp,strcpy,strchr, atoi/itoa roundtrip. - Soft-float (single): all four ops + comparisons, MAME-verified.
- Soft-double: add, sub, mul, div all return correct bit patterns; 3-iter Newton sqrt converges. Long-running iterations may hit MAME's 1-second sim-time budget (test config issue, not a compiler bug).
- Inline assembly with
"a","x","y"register constraints and arbitrary opcode bytes (used for thepha;plbbank-switch idiom). - C++ minimal: clang++ compiles a class with virtual + non-trivial ctor (vtable + RTTI omitted; no exceptions).
- printf with
%d %x %s %c %pand width/precision specifiers. setjmp/longjmpfrom libgcc.s.- Static constructors via crt0's init_array walk.
Toolchain:
clang/llcproduce W65816 assembly + ELF object files.tools/link816resolves cross-translation-unit refs, lays out text/rodata/bss, emits a flat binary the IIgs ROM can load.tools/omfEmitproduces OMF v2.1 single-segment files (the IIgs's native object format) for round-tripping with classic dev tools.runtime/build.shbuilds crt0, libc, soft-float, soft-double, libgcc into linkable objects.scripts/smokeTest.shruns ~80 end-to-end checks (scalar ops, control flow, calling conventions, MAME execution, regressions). Currently 100% pass.
ABI:
- arg0 in A; arg1 in X for i32-first-arg signatures; rest pushed RTL
on the system stack with PHA. Caller deallocates via
tsc;clc;adc #N;tcsorPLY*N/2. - Return: i8/i16 in A; i32 in A:X; i64 in A:X:Y plus DP[$F0..$F1] for the highest 16 bits.
- Frame is empty-descending (S points to next-free); offsets account for the +1 skew vs LLVM's full-descending model.
In flight (build-system level)
- DWARF sidecar emission in link816 (#51): The link should produce a separate sidecar file with line-number / variable-location info that an IDE or post-mortem dumper can consume. Skeleton not yet written; deferred until other correctness work is done.
Known issues / workarounds
-
Greedy register allocator mis-orders spills in two patterns (#69, #70):
- Functions where both
$aand$xare live-in (i64-first-arg with a stack-output pointer, e.g.udivmod(i64, i64, ptr)). The TAX bridging$xto A clobbers$a's value before the second STA can save it. - Iterative quicksort with
if/elserecursion choice: complex live-ranges across twoswap()calls produce wrong arg values.
Both reproduce only at
-O1/-O2with greedy. Workaround:-mllvm -regalloc=fastfor the affected translation unit.softDouble.calready requires this flag for__muldf3(build.sh applies it automatically).Real fix is a pre-RA pass that pre-spills critical pointer arguments to memory, or a targeted fix in greedy's spill-ordering heuristic. Material work; deferred.
- Functions where both
-
(d,s),y / (sr,s),y addressing wraps the bank when Y is negative as 16-bit unsigned. Worked around by
W65816NegYIndYrewriting the affected ops toTAX ; LDA/STA $0000,X. Stays correct for negative offsets likearr[i-1]. -
(d,s),y for stack-local pointer dereferences uses DBR, so user code that switches DBR (e.g.
pha;plbto bank 2 to reach IIgs hardware) must not call into a function that takes the address of one of its locals — the callee's*p = vwill write to the wrong bank. Documented; no compiler-side mitigation beyond the existing DPF0 fake-physreg routing for the i64-return high half.
What's still needed for a "ship-ready" toolchain
-
Greedy regalloc spill-ordering fix — see above. Removes the need for the per-file
-regalloc=fastworkaround onsoftDouble.cand unblocks pattern-rich code that currently must be compiled at-O0for correctness. -
Round-to-nearest-even in
__divdf3— currently truncate-toward-zero, which differs from gcc by ±1 ULP in several test cases. Acceptable today (Newton iterations still converge); revisit when an exact-match test suite lands. -
DWARF sidecar (#51) for source-level debugging.
-
More of the C standard library:
<math.h>transcendental functions (sin, cos, exp, log, pow),<string.h>beyond what's hand-coded,<stdio.h>file I/O (fopen,fread,fwrite,fseek). -
C++ runtime support: vtable layout for multiple inheritance, RTTI, exceptions (or a documented
-fno-exceptionsrequirement). -
REP/SEP scheduling pass (design doc §3.3): the current prologue picks one M-mode for the whole function based on whether any 8-bit accumulator value is used. A per-region scheduler would reduce the SEP/REP wrap overhead on i8 stores.
-
Toolbox / IIgs system call bindings: header files declaring the Apple IIgs system calls (
SystemTask,WaitMouseUp,DrawString, …) with the right inline-asm dispatch glue. -
Real-world program coverage: the smoke tests are microbenchmarks. A few known-good Apple IIgs C programs (e.g. a textfile pager, a small game) compiled and run end-to-end would catch issues no synthetic test currently exercises.
-
Cycle-time / size benchmarks vs Calypsi 5.16: design doc §1 says the goal is to "match or exceed" Calypsi. We have neither baseline numbers nor a comparison harness yet.