65816-llvm-mos/docs/USAGE.md
Scott Duensing 6bff7bea3f Docs!
2026-05-14 11:23:00 -05:00

391 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Using llvm816
This document covers compiling a C program, linking it into an
Apple IIgs binary, and running it under MAME. It assumes you've
followed [INSTALL.md](INSTALL.md) and have a working
`tools/llvm-mos-build/bin/clang`.
## Quick reference
```bash
CLANG=tools/llvm-mos-build/bin/clang
LINK=tools/link816
RUNTIME=runtime
# 1. Compile C to object
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
# 2. Link to a raw binary (loadable at $00:1000)
$LINK -o hello.bin --text-base 0x1000 \
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
# 3. Run under MAME
bash scripts/runInMame.sh hello.bin --check 0x025000=????
```
## Compiling C
The compiler is invoked just like a normal clang, with
`--target=w65816`:
```bash
clang --target=w65816 -O2 -c source.c -o source.o
```
**Recommended flags:**
| Flag | Meaning |
|---|---|
| `--target=w65816` | Selects the W65816 backend (required) |
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
| `-I runtime/include` | Find `<stdio.h>` etc. |
| `-c` | Compile only — produce `.o`, don't link |
**What works at `-O2`:**
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
all arithmetic operators
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
- Pointers, arrays, structs, unions, bitfields
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
recursion
- `<stdarg.h>` varargs
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
- C++ subset: classes, single+multiple inheritance, virtual functions,
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
implemented).
See [STATUS.md](../STATUS.md) for the full feature matrix.
## Linking
The linker is `tools/link816`. It produces either a raw binary
suitable for direct execution (loaded into a fixed address) or an
OMF binary suitable for GS/OS Loader.
### Raw binary
```bash
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
```
- `--text-base 0x1000` — physical address where code is loaded.
`0x1000` is the conventional starting address; the first 4KB
of bank 0 ($00:0000 $00:0FFF) is reserved for the stack and
zero-page.
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
Always link first.
- `libc.o``printf`, `malloc`, `strlen`, etc.
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
programs.
### Additional runtime libraries
| Library | What you get |
|---|---|
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
| `runtime/softDouble.o` | IEEE 754 double-precision math |
| `runtime/softFloat.o` | IEEE 754 single-precision math |
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
| `runtime/qsort.o` | `qsort` / `bsearch` |
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
| `runtime/strtok.o` | `strtok` / `strtok_r` |
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
| `runtime/iigsGsos.o` | GS/OS call wrappers |
Link only what you use — the linker drops unreferenced symbols.
Build them all once with:
```bash
bash runtime/build.sh
```
### Multi-segment OMF (for GS/OS Loader)
For programs that need >60 KB of code (the usable bank-0 limit
after subtracting the stack, zero-page, and I/O window), build a
multi-segment OMF that GS/OS Loader can place across banks:
```bash
link816 -o myprog.bin --omf --manifest my.manifest \
--expressload \
crt0Gsos.o ... yourprog.o
```
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
working example.
## Running under MAME
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
launches MAME's `apple2gs` with the right ROM path, loads your
binary at `$00:1000`, runs for a few seconds, and reads back a
memory cell.
```bash
bash scripts/runInMame.sh prog.bin # just run for 5s
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
```
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
the value without checking.
MAME is invoked headless by default (no window) via
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
servers/CI runners.
### The bank-switch idiom
Bank 0 (`$00:0000-$00:FFFF`) has the I/O window at `$C000-$CFFF`
that interferes with normal data access. The convention is to
switch the data bank register (DBR) to bank 2 (`$02:0000`) before
doing any data work:
```c
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n" // 8-bit accumulator
".byte 0xa9,0x02\n" // lda #2 (force as bytes — llvm-mc bug)
"pha\n"
"plb\n" // DBR = 2
"rep #0x20\n" // back to 16-bit
);
}
```
After `switchToBank2()`, your data lives at `$02:0000` upward.
The `runInMame.sh` `--check 0x025000=...` address is `$02:5000`
— accessible via a normal store in bank 2.
## Examples
### Hello, integer
```c
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
int x = 42;
switchToBank2();
*(volatile int *)0x5000 = x;
while (1) {}
}
```
Build & run:
```bash
clang --target=w65816 -O2 -c hello.c -o hello.o
link816 -o hello.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
```
### Recursion + printing
```c
#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned n) {
if (n < 2) return n;
return fib(n-1) + fib(n-2);
}
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
char buf[32];
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
switchToBank2();
// Copy buf to $025000 so we can read it after the run
for (int i = 0; i <= len; i++)
((volatile char *)0x5000)[i] = buf[i];
while (1) {}
}
```
Build (note: need snprintf.o for `snprintf`):
```bash
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
link816 -o fib.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
```
### Apple IIgs Toolbox
```c
#include <iigs/toolbox_full.h>
int main(void) {
DrawString("\pHello, World");
while (1) {}
}
```
Build:
```bash
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
link816 -o hello_gs.bin --text-base 0x1000 \
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
runtime/libgcc.o hello_gs.o
```
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
toolbox — it sets up the IIgs runtime environment.
## Inline assembly
The W65816 backend supports `__asm__` with operand constraints
`"a"`, `"x"`, `"y"`:
```c
unsigned short addOne(unsigned short x) {
unsigned short r;
__asm__("inc a" : "=a"(r) : "a"(x));
return r;
}
```
Multi-instruction asm and raw bytes both work:
```c
__asm__ volatile (
"sep #0x20\n"
".byte 0x68\n" // pla
"rep #0x20\n"
);
```
The `.byte 0xa9, ...` form is sometimes needed to work around
llvm-mc encoding gaps — the assembler doesn't yet support every
65816 addressing mode literally. The pattern works for any
opcode whose mnemonic doesn't yet parse.
## Tools reference
| Tool | Location | Purpose |
|---|---|---|
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll``.s`) |
| `link816` | `tools/link816` | Our relocating linker |
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
## Debugging
### Look at the asm
```bash
clang --target=w65816 -O2 -S -o prog.s prog.c
```
### Look at the MIR after each pass
```bash
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
```
Useful pass names to filter on:
| Pass name | What it does |
|---|---|
| `w65816-isel` | SDAG → MachineInstr selection |
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
### Single-pass filter
```bash
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
```
## Cycle-count benchmarks
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
Each runs N iterations of the bench function and reports a
per-call cycle count via MAME's `emu.time()`:
```bash
bash scripts/benchCyclesPrecise.sh
```
Output:
```
| Benchmark | Per-call cycles (clang) |
|-----------|------------------------:|
| bsearch | 767 cyc/call |
| dotProduct | 2131 cyc/call |
| fib | 12617 cyc/call |
| memcmp | 989 cyc/call |
| popcount | 2864 cyc/call |
| strcpy | 2216 cyc/call |
| sumOfSquares | 16709 cyc/call |
```
The [`compare/`](../compare/) directory has side-by-side `.s`
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
Rerun with:
```bash
bash compare/regen.sh
```
## Known limitations
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
throwing.
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
Use `sscanf` on a buffer instead.
- **File I/O** through `fopen` etc. requires a backing implementation.
The default `mfs` backing (memory-file-system) lets you simulate
files via `mfsRegister()` — useful for tests, not for real disk
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
against the GS/OS runtime.
- **`fork`/`exec`** — not applicable on a 65816, no support.
- **Code generation gotcha:** very large frames (>200 bytes) trigger
FP-relative addressing. Most programs fit under that limit. See
the `frame-rel` discussion in
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
## Where to go next
- **Building real GS/OS apps:** see
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
MAME.
- **Backend internals (you're hacking on the compiler):**
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
Read it for examples of every feature in action.