184 lines
8.7 KiB
Markdown
184 lines
8.7 KiB
Markdown
# llvm816 — Current Status
|
|
|
|
LLVM/Clang backend for the WDC 65816 (Apple IIgs), forked from
|
|
llvm-mos as a separate `W65816` target.
|
|
|
|
## What works
|
|
|
|
End-to-end C-to-binary toolchain that produces 65816 machine code
|
|
which runs correctly under MAME (apple2gs).
|
|
|
|
**Language coverage at -O2 (no extra flags):**
|
|
|
|
- All scalar arithmetic: i8 / i16 / i32 / i64 add, sub, mul, div, mod
|
|
(signed and unsigned). Carry-chained multi-word ops via ADC/SBC pseudos
|
|
+ ASLA16 / shift libcalls.
|
|
- Comparisons and signed/unsigned widening (sext, zext, trunc) for all
|
|
the above sizes.
|
|
- Pointer arithmetic, array indexing, struct field access, struct
|
|
return-by-value (up to 8 bytes — Pair, Vec4, double).
|
|
- Bitfields, switch statements (verified up to ~12 cases + default),
|
|
function pointers, function-pointer tables, indirect calls via
|
|
`__jsl_indir` trampoline.
|
|
- Recursion: factorial, Fibonacci, depth-3 binary-tree
|
|
insert/sum/min/max, simple recursive quicksort.
|
|
- Loops with goto / break / continue, nested loops, state machines.
|
|
- `<stdarg.h>` varargs with int / long / unsigned long long mixed args.
|
|
- Heap: `malloc` / `free` (libc.c first-fit allocator) — linked-list
|
|
reverse with `cons` works.
|
|
- Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa
|
|
roundtrip.
|
|
- Soft-float (single): all four ops + comparisons, MAME-verified.
|
|
- Soft-double: add, sub, mul, div all return correct bit patterns
|
|
bit-for-bit against gcc with round-to-nearest-even rounding;
|
|
3-iter Newton sqrt converges. Long-running iterations may hit MAME's
|
|
1-second sim-time budget (test config issue, not a compiler bug).
|
|
- Inline assembly with `"a"`, `"x"`, `"y"` register constraints and
|
|
arbitrary opcode bytes (used for the `pha;plb` bank-switch idiom).
|
|
- C++ minimal: clang++ compiles a class with virtual + non-trivial
|
|
ctor (vtable + RTTI omitted; no exceptions).
|
|
- printf with `%d %x %s %c %p` and width/precision specifiers.
|
|
- sprintf / snprintf / vsprintf / vsnprintf with the same format
|
|
coverage as printf (`%d %u %x %ld %lu %s %c %f %p %%` + width).
|
|
C99 truncation semantics for snprintf. `%.Nf` produces the
|
|
correct fractional digits with round-half-up.
|
|
- qsort + bsearch over arbitrary element size with a user `cmp`
|
|
callback (insertion-sort variant — sidesteps the greedy regalloc
|
|
bug in the recursive iterative-qsort form).
|
|
- Standard string/stdlib glue: strcat, strncat, strpbrk, strspn,
|
|
strcspn, atol, llabs (kept in their own translation unit so
|
|
vprintf's branch layout doesn't shift).
|
|
- `<math.h>`: fabs, floor, ceil, fmod, copysign, sqrt, pow,
|
|
sin, cos, exp, log, atan, atan2, asin, acos, sinh, cosh, tanh
|
|
(and float variants). Bit-twiddling for fabs/floor/ceil/copysign;
|
|
Newton iteration for sqrt; range-reduction + Taylor for sin/cos/
|
|
exp/log/atan; identities for asin/acos/atan2/sinh/cosh/tanh.
|
|
Accuracy is in the ~1e-6 range — good enough for typical numeric
|
|
work, far short of glibc-quality. These are slow (each call is
|
|
dozens to hundreds of soft-double libcalls) — pre-compute or
|
|
cache when possible.
|
|
- `setjmp` / `longjmp` from libgcc.s.
|
|
- Static constructors via crt0's init_array walk.
|
|
|
|
**Toolchain:**
|
|
|
|
- `clang` / `llc` produce W65816 assembly + ELF object files.
|
|
- `tools/link816` resolves cross-translation-unit refs, lays out
|
|
text/rodata/bss, emits a flat binary the IIgs ROM can load.
|
|
- `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's
|
|
native object format) for round-tripping with classic dev tools.
|
|
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
|
libgcc into linkable objects.
|
|
- `scripts/smokeTest.sh` runs ~80 end-to-end checks (scalar ops,
|
|
control flow, calling conventions, MAME execution, regressions).
|
|
Currently 100% pass.
|
|
|
|
**ABI:**
|
|
|
|
- arg0 in A; arg1 in X for i32-first-arg signatures; rest pushed RTL
|
|
on the system stack with PHA. Caller deallocates via `tsc;clc;adc
|
|
#N;tcs` or `PLY*N/2`.
|
|
- Return: i8/i16 in A; i32 in A:X; i64 in A:X:Y plus DP[$F0..$F1] for
|
|
the highest 16 bits.
|
|
- Frame is empty-descending (S points to next-free); offsets account
|
|
for the +1 skew vs LLVM's full-descending model.
|
|
|
|
## In flight
|
|
|
|
Nothing tracked is open. Runtime now exposes a ~complete C99
|
|
subset: sprintf/snprintf with correct %.Nf precision, qsort/bsearch,
|
|
the full string.h family (strcat/strncat/strpbrk/strspn/strcspn/
|
|
strtok/strtok_r), math.h with the eleven common transcendentals
|
|
(sqrt/pow/sin/cos/exp/log/atan/atan2/asin/acos/sinh/cosh/tanh),
|
|
atol/llabs/atexit/exit/abort, and a smoke test that exercises
|
|
malloc + struct pointers + strcmp/strcpy via a working hash table
|
|
end-to-end in MAME.
|
|
|
|
`strtok` / `strtok_r` live in their own TU built at `-O0` — the
|
|
`-O2` codegen for the str==NULL continuation path miscompiles
|
|
(documented as the same backend-fragility class as #70 / qsort,
|
|
both mitigated by reaching for fast regalloc per-TU). Multi-call
|
|
strtok over "a,b,,c" works end-to-end in smoke.
|
|
|
|
A small **RPN calculator** test (smoke #87) chains strtok, atol,
|
|
push/pop over a static stack, snprintf "%ld", and strcmp to verify
|
|
the end-to-end composition under a realistic-ish workload — adds,
|
|
subs, muls, divs, and 3-deep operand stacks all work.
|
|
|
|
The **DWARF sidecar** (`link816 --debug-out FILE`) now applies
|
|
text/rodata/bss/init_array relocations to every `.debug_*` section
|
|
before writing it. PC values in `.debug_addr` and `.debug_line` end
|
|
up as final-image addresses, so a consumer can map back to source
|
|
lines without re-running the linker. Intra-debug references (e.g.
|
|
`.debug_info` -> `.debug_str` offsets) are intentionally left
|
|
object-local — sections are concatenated, not recompacted, and each
|
|
slice carries an `; OBJ ... SEC ... SIZE ...` header so a multi-TU
|
|
consumer can scope intra-debug offsets per-slice. The smoke test
|
|
verifies the address of a known function appears in the patched
|
|
sidecar bytes.
|
|
|
|
## Known issues / workarounds
|
|
|
|
- **Greedy register allocator mis-orders spills** in iterative
|
|
quicksort with `if/else` recursion choice (#70). Live-range
|
|
tracking for `hi` is wrong across the inner loop and post-loop
|
|
swap call, producing miscompiled code. Reproduces only at
|
|
`-O1`/`-O2` with greedy. Workarounds (any one):
|
|
- `__attribute__((noinline,optnone))` on the affected function —
|
|
routes through fast regalloc per-function. Verified in smoke
|
|
test; recommended for new code that hits this.
|
|
- `-mllvm -regalloc=fast` for the whole translation unit.
|
|
`softDouble.c` already uses this for `__muldf3` (build.sh
|
|
applies it automatically).
|
|
- Rewrite the loop with explicit recursion guards instead of
|
|
the iterative tail-elim form.
|
|
|
|
Real fix needs deeper greedy work; deferred behind the per-
|
|
function attribute since it covers the practical cases.
|
|
|
|
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
|
|
negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
|
|
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
|
|
correct for negative offsets like `arr[i-1]`.
|
|
|
|
- **(d,s),y for stack-local pointer dereferences uses DBR**, so
|
|
user code that switches DBR (e.g. `pha;plb` to bank 2 to reach
|
|
IIgs hardware) must not call into a function that takes the
|
|
address of one of its locals — the callee's `*p = v` will write
|
|
to the wrong bank. Documented; no compiler-side mitigation
|
|
beyond the existing DPF0 fake-physreg routing for the i64-return
|
|
high half.
|
|
|
|
## What's still needed for a "ship-ready" toolchain
|
|
|
|
- **Greedy regalloc spill-ordering fix** — see above. Removes the
|
|
need for the per-file `-regalloc=fast` workaround on
|
|
`softDouble.c` and unblocks pattern-rich code that currently
|
|
must be compiled at `-O0` for correctness.
|
|
|
|
- **More of the C standard library**: real `<stdio.h>` file I/O
|
|
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
|
|
returning success/zero) — would need a memory-backed FS or a
|
|
MAME hook; `<locale.h>` / `<wchar.h>` if any real-world code
|
|
needs them.
|
|
|
|
- **C++ runtime support**: vtable layout for multiple inheritance,
|
|
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
|
|
|
|
- **REP/SEP scheduling pass** (design doc §3.3): the current
|
|
prologue picks one M-mode for the whole function based on
|
|
whether any 8-bit accumulator value is used. A per-region
|
|
scheduler would reduce the SEP/REP wrap overhead on i8 stores.
|
|
|
|
- **Toolbox / IIgs system call bindings**: header files declaring
|
|
the Apple IIgs system calls (`SystemTask`, `WaitMouseUp`,
|
|
`DrawString`, …) with the right inline-asm dispatch glue.
|
|
|
|
- **Real-world program coverage**: the smoke tests are
|
|
microbenchmarks. A few known-good Apple IIgs C programs (e.g.
|
|
a textfile pager, a small game) compiled and run end-to-end
|
|
would catch issues no synthetic test currently exercises.
|
|
|
|
- **Cycle-time / size benchmarks vs Calypsi 5.16**: design doc §1
|
|
says the goal is to "match or exceed" Calypsi. We have neither
|
|
baseline numbers nor a comparison harness yet.
|