65816-llvm-mos/docs/USAGE.md
2026-05-18 14:43:35 -05:00

529 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Using llvm816
This document covers compiling a C program, linking it into an
Apple IIgs binary, and running it under MAME. It assumes you've
followed [INSTALL.md](INSTALL.md) and have a working
`tools/llvm-mos-build/bin/clang`.
## Quick reference
```bash
CLANG=tools/llvm-mos-build/bin/clang
LINK=tools/link816
RUNTIME=runtime
# 1. Compile C to object
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
# 2. Link to a raw binary (loadable at $00:1000)
$LINK -o hello.bin --text-base 0x1000 \
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
# 3. Run under MAME
bash scripts/runInMame.sh hello.bin --check 0x025000=????
```
## Compiling C
The compiler is invoked just like a normal clang, with
`--target=w65816`:
```bash
clang --target=w65816 -O2 -c source.c -o source.o
```
**Recommended flags:**
| Flag | Meaning |
|---|---|
| `--target=w65816` | Selects the W65816 backend (required) |
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
| `-I runtime/include` | Find `<stdio.h>` etc. |
| `-c` | Compile only — produce `.o`, don't link |
**What works at `-O2`:**
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
all arithmetic operators
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
- Pointers, arrays, structs, unions, bitfields
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
recursion
- `<stdarg.h>` varargs
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
- C++ subset: classes, single+multiple inheritance, virtual functions,
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
implemented).
See [STATUS.md](../STATUS.md) for the full feature matrix.
## Linking
The linker is `tools/link816`. It produces either a raw binary
suitable for direct execution (loaded into a fixed address) or an
OMF binary suitable for GS/OS Loader.
### Raw binary
```bash
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
```
- `--text-base 0x1000` — physical address where code is loaded.
`0x1000` is the conventional starting address; the first 4KB
of bank 0 ($00:0000 $00:0FFF) is reserved for the stack and
zero-page.
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
Always link first.
- `libc.o``printf`, `malloc`, `strlen`, etc.
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
programs.
### Additional runtime libraries
| Library | What you get |
|---|---|
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
| `runtime/softDouble.o` | IEEE 754 double-precision math |
| `runtime/softFloat.o` | IEEE 754 single-precision math |
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
| `runtime/qsort.o` | `qsort` / `bsearch` |
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
| `runtime/strtok.o` | `strtok` / `strtok_r` |
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
| `runtime/iigsGsos.o` | GS/OS call wrappers |
Link only what you use — the linker drops unreferenced symbols.
Build them all once with:
```bash
bash runtime/build.sh
```
### Multi-segment OMF (for GS/OS Loader)
For programs that need >60 KB of code (the usable bank-0 limit
after subtracting the stack, zero-page, and I/O window), build a
multi-segment OMF that GS/OS Loader can place across banks:
```bash
link816 -o myprog.bin --omf --manifest my.manifest \
--expressload \
crt0Gsos.o ... yourprog.o
```
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
working example.
## Running under MAME
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
launches MAME's `apple2gs` with the right ROM path, loads your
binary at `$00:1000`, runs for a few seconds, and reads back a
memory cell.
```bash
bash scripts/runInMame.sh prog.bin # just run for 5s
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
```
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
the value without checking.
MAME is invoked headless by default (no window) via
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
servers/CI runners.
### The bank-switch idiom
#### Background — why this is necessary
The 65816 has two registers that select which bank a memory access
goes to:
- **PBR** (Program Bank Register) — selects the bank for instruction
fetches. Set by `jsl long_addr` and `rtl`.
- **DBR** (Data Bank Register) — selects the bank for data accesses
like `lda $5000`, `sta $5000`, etc.
When the IIgs boots, DBR defaults to `$00`. Bank `$00` (the same
bank as the language card / IIe-compatibility area) contains the
**I/O window at `$C000-$CFFF`**. Any data access to addresses in
that range goes to the soft-switches and slot ROMs, NOT to RAM.
This is the same I/O hole the Apple IIe has, inherited by the IIgs
for backward compatibility.
Concretely: if your DBR is `$00` and you write to address `$C100`,
you're poking the slot-1 ROM enable register — definitely not what
you want. Similarly, `$5000` in bank 0 is the language card area
and may or may not be RAM depending on soft-switch state.
Banks `$01`-`$DF` are full 64K RAM banks (`$E0`/`$E1` are aux/main
shadow, `$E0`-`$FF` reserved). To do reliable data work, switch
the DBR to any of these "normal" banks. **`$02`** is conventional
in this codebase because:
1. `$01:0000-$01:FFFF` overlaps the stack page (`$0100-$01FF` in
any bank ends up in the same physical RAM as bank `$00`'s
stack page — confusing).
2. `$02:0000-$02:FFFF` is the first "clean" bank above the
special-purpose banks.
3. The smoke-test convention is to write a result word to
`$02:5000` so `runInMame.sh` can read it back.
If your program needs more than 64 KB of data, switch DBR to
different banks as needed.
#### What the assembly does, line by line
```c
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n" // (1) Switch A to 8-bit
".byte 0xa9,0x02\n" // (2) lda #2 (8-bit immediate)
"pha\n" // (3) Push A onto stack (1 byte)
"plb\n" // (4) Pop into DBR (1 byte from stack)
"rep #0x20\n" // (5) Restore A to 16-bit
);
}
```
1. **`sep #0x20`** — sets the `M` bit in the status register `P`.
`M=1` makes A behave as 8-bit (and immediate operands become
1 byte). We need this so the next `lda #2` pushes 1 byte
(matching what `plb` expects to pop). Calling-convention
prologues always run in M=0 (16-bit), so this `sep` is
required.
2. **`.byte 0xa9,0x02`** — raw bytes for `lda #$02`. We hand-encode
because llvm-mc can't yet emit an 8-bit immediate `lda #$02`
that knows it's 1 byte; the assembler keeps treating it as
16-bit. `0xa9` is the LDA-immediate opcode; `0x02` is the
1-byte operand. Result: A = `$02` (8-bit).
3. **`pha`** — pushes A. In M=1 mode, PHA pushes exactly 1 byte
(the low half of A). Stack now has `$02` on top.
4. **`plb`** — pops 1 byte from the stack and stores it in DBR.
DBR is now `$02`. All subsequent data accesses go to bank 2.
5. **`rep #0x20`** — clears the `M` bit. A returns to 16-bit mode,
matching the calling-convention contract for the rest of the
function.
The DBR change persists across function returns. Once
`switchToBank2()` returns, all data reads/writes in your program
target bank 2 — until you switch DBR again.
#### When you need it
You need to switch DBR whenever you want to access data at an
absolute address `$XXXX` and need it to land in a specific bank.
Common cases:
- **MMIO from the test harness** — `*(volatile uint16 *)0x5000 = x;`
Without DBR=2, this would go to bank 0's `$5000` (which is in
the language card area). With DBR=2, it goes to `$02:5000`
where `runInMame.sh --check 0x025000=...` reads from.
- **Anything in `$C000-$CFFF`** — bank 0 has soft-switches here.
Bank 2 has plain RAM.
- **Global arrays declared at link-time at fixed addresses** —
the linker may place them in bank 2 BSS (`--bss-base 0x020000`).
Your DBR must match.
You DON'T need DBR=2 for:
- **Local variables on the stack** — the stack is always
bank-relative-to-DBR-ignored; `lda $4,s` reads from the stack
page regardless of DBR.
- **Direct-page accesses** — `lda $D0` reads from `$00:00D0`
(always bank 0). DP is anchored to bank 0.
- **Indirect-long pointers via `[dp],y`** — these include their
own bank byte and ignore DBR.
- **Function calls** — `jsl` uses PBR + a long destination
address. PBR is updated automatically.
#### Other ways to access non-bank-0 data
If you only need to write to a single non-bank-0 address, you can
emit the store as `STA_Long` (24-bit absolute) which encodes the
bank inline:
```c
*(volatile unsigned short *)0x025000 = 42; // becomes sta $025000
```
The W65816 backend recognizes `const-int pointer + integer offset`
and lowers to `sta long` if the address has a bank byte. No
`switchToBank2()` needed.
For frequent data work in a bank, switching DBR once and using
plain `sta $5000` (2 bytes) is smaller and faster than `sta $025000`
(4 bytes) per access.
#### Caveats
- **Save/restore is your problem.** `switchToBank2()` never
restores DBR. If your caller expected DBR=0, you've broken its
expectation. For long-running programs, that's usually fine
(you just set DBR=2 once and stay there). For toolbox calls, GS/OS
might assume DBR=0 — check the call's documentation.
- **The stack is in bank 0 regardless.** Don't try to put the
stack elsewhere; the 65816's stack-relative addressing modes
ignore DBR.
- **In M=1 mode, INTERRUPTS may behave differently.** The `sep`
affects A's width but not the bank-switching machinery itself.
Keep the sep/rep window short.
- **PBR vs DBR** are independent. Code execution stays where it
was; only data accesses change.
#### How `runInMame.sh --check 0x025000=...` works
The check address `0x025000` is a 24-bit address: bank `$02`,
offset `$5000`. The MAME Lua runner reads this byte (and the next
byte if you specify a 2-byte value) directly from physical RAM,
bypassing DBR entirely. So the convention is:
1. Your program switches DBR to bank 2.
2. Your program writes its result to `*(volatile X *)0x5000`,
which becomes `sta $5000` — landing in bank 2 because of DBR.
3. MAME reads bank 2's `$5000` via the absolute 24-bit address.
4. The runner compares to your expected value.
If you forget `switchToBank2()`, your store goes to the language
card area (bank 0's `$5000`), MAME's check reads bank 2's
unchanged `$5000` (likely `$00` or whatever was there), and the
test fails.
## Examples
### Hello, integer
```c
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
int x = 42;
switchToBank2();
*(volatile int *)0x5000 = x;
while (1) {}
}
```
Build & run:
```bash
clang --target=w65816 -O2 -c hello.c -o hello.o
link816 -o hello.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
```
### Recursion + printing
```c
#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned n) {
if (n < 2) return n;
return fib(n-1) + fib(n-2);
}
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
char buf[32];
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
switchToBank2();
// Copy buf to $025000 so we can read it after the run
for (int i = 0; i <= len; i++)
((volatile char *)0x5000)[i] = buf[i];
while (1) {}
}
```
Build (note: need snprintf.o for `snprintf`):
```bash
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
link816 -o fib.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
```
### Apple IIgs Toolbox
```c
#include <iigs/toolbox_full.h>
int main(void) {
DrawString("\pHello, World");
while (1) {}
}
```
Build:
```bash
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
link816 -o hello_gs.bin --text-base 0x1000 \
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
runtime/libgcc.o hello_gs.o
```
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
toolbox — it sets up the IIgs runtime environment.
## Inline assembly
The W65816 backend supports `__asm__` with operand constraints
`"a"`, `"x"`, `"y"`:
```c
unsigned short addOne(unsigned short x) {
unsigned short r;
__asm__("inc a" : "=a"(r) : "a"(x));
return r;
}
```
Multi-instruction asm and raw bytes both work:
```c
__asm__ volatile (
"sep #0x20\n"
".byte 0x68\n" // pla
"rep #0x20\n"
);
```
The `.byte 0xa9, ...` form is sometimes needed to work around
llvm-mc encoding gaps — the assembler doesn't yet support every
65816 addressing mode literally. The pattern works for any
opcode whose mnemonic doesn't yet parse.
## Tools reference
| Tool | Location | Purpose |
|---|---|---|
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll``.s`) |
| `link816` | `tools/link816` | Our relocating linker |
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
## Debugging
### Look at the asm
```bash
clang --target=w65816 -O2 -S -o prog.s prog.c
```
### Look at the MIR after each pass
```bash
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
```
Useful pass names to filter on:
| Pass name | What it does |
|---|---|
| `w65816-isel` | SDAG → MachineInstr selection |
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
### Single-pass filter
```bash
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
```
## Cycle-count benchmarks
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
Each runs N iterations of the bench function and reports a
per-call cycle count via MAME's `emu.time()`:
```bash
bash scripts/benchCyclesPrecise.sh
```
Output:
```
| Benchmark | Per-call cycles (clang) |
|-----------|------------------------:|
| bsearch | 767 cyc/call |
| dotProduct | 2131 cyc/call |
| fib | 12617 cyc/call |
| memcmp | 989 cyc/call |
| popcount | 2864 cyc/call |
| strcpy | 2216 cyc/call |
| sumOfSquares | 16709 cyc/call |
```
The [`compare/`](../compare/) directory has side-by-side `.s`
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
Rerun with:
```bash
bash compare/regen.sh
```
## Known limitations
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
throwing.
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
Use `sscanf` on a buffer instead.
- **File I/O** through `fopen` etc. requires a backing implementation.
The default `mfs` backing (memory-file-system) lets you simulate
files via `mfsRegister()` — useful for tests, not for real disk
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
against the GS/OS runtime.
- **`fork`/`exec`** — not applicable on a 65816, no support.
- **Code generation gotcha:** very large frames (>200 bytes) trigger
FP-relative addressing. Most programs fit under that limit. See
the `frame-rel` discussion in
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
## Where to go next
- **Building real GS/OS apps:** see
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
MAME.
- **Backend internals (you're hacking on the compiler):**
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
Read it for examples of every feature in action.