529 lines
17 KiB
Markdown
529 lines
17 KiB
Markdown
# Using llvm816
|
||
|
||
This document covers compiling a C program, linking it into an
|
||
Apple IIgs binary, and running it under MAME. It assumes you've
|
||
followed [INSTALL.md](INSTALL.md) and have a working
|
||
`tools/llvm-mos-build/bin/clang`.
|
||
|
||
## Quick reference
|
||
|
||
```bash
|
||
CLANG=tools/llvm-mos-build/bin/clang
|
||
LINK=tools/link816
|
||
RUNTIME=runtime
|
||
|
||
# 1. Compile C to object
|
||
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
|
||
|
||
# 2. Link to a raw binary (loadable at $00:1000)
|
||
$LINK -o hello.bin --text-base 0x1000 \
|
||
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
|
||
|
||
# 3. Run under MAME
|
||
bash scripts/runInMame.sh hello.bin --check 0x025000=????
|
||
```
|
||
|
||
## Compiling C
|
||
|
||
The compiler is invoked just like a normal clang, with
|
||
`--target=w65816`:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -c source.c -o source.o
|
||
```
|
||
|
||
**Recommended flags:**
|
||
|
||
| Flag | Meaning |
|
||
|---|---|
|
||
| `--target=w65816` | Selects the W65816 backend (required) |
|
||
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
|
||
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
|
||
| `-I runtime/include` | Find `<stdio.h>` etc. |
|
||
| `-c` | Compile only — produce `.o`, don't link |
|
||
|
||
**What works at `-O2`:**
|
||
|
||
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
|
||
all arithmetic operators
|
||
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
|
||
- Pointers, arrays, structs, unions, bitfields
|
||
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
|
||
recursion
|
||
- `<stdarg.h>` varargs
|
||
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
|
||
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
|
||
- C++ subset: classes, single+multiple inheritance, virtual functions,
|
||
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
|
||
implemented).
|
||
|
||
See [STATUS.md](../STATUS.md) for the full feature matrix.
|
||
|
||
## Linking
|
||
|
||
The linker is `tools/link816`. It produces either a raw binary
|
||
suitable for direct execution (loaded into a fixed address) or an
|
||
OMF binary suitable for GS/OS Loader.
|
||
|
||
### Raw binary
|
||
|
||
```bash
|
||
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
|
||
```
|
||
|
||
- `--text-base 0x1000` — physical address where code is loaded.
|
||
`0x1000` is the conventional starting address; the first 4KB
|
||
of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and
|
||
zero-page.
|
||
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
|
||
Always link first.
|
||
- `libc.o` — `printf`, `malloc`, `strlen`, etc.
|
||
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
|
||
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
|
||
programs.
|
||
|
||
### Additional runtime libraries
|
||
|
||
| Library | What you get |
|
||
|---|---|
|
||
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
|
||
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
|
||
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
|
||
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
|
||
| `runtime/softDouble.o` | IEEE 754 double-precision math |
|
||
| `runtime/softFloat.o` | IEEE 754 single-precision math |
|
||
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
|
||
| `runtime/qsort.o` | `qsort` / `bsearch` |
|
||
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
|
||
| `runtime/strtok.o` | `strtok` / `strtok_r` |
|
||
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
|
||
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
|
||
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
|
||
| `runtime/iigsGsos.o` | GS/OS call wrappers |
|
||
|
||
Link only what you use — the linker drops unreferenced symbols.
|
||
|
||
Build them all once with:
|
||
|
||
```bash
|
||
bash runtime/build.sh
|
||
```
|
||
|
||
### Multi-segment OMF (for GS/OS Loader)
|
||
|
||
For programs that need >60 KB of code (the usable bank-0 limit
|
||
after subtracting the stack, zero-page, and I/O window), build a
|
||
multi-segment OMF that GS/OS Loader can place across banks:
|
||
|
||
```bash
|
||
link816 -o myprog.bin --omf --manifest my.manifest \
|
||
--expressload \
|
||
crt0Gsos.o ... yourprog.o
|
||
```
|
||
|
||
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
|
||
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
|
||
working example.
|
||
|
||
## Running under MAME
|
||
|
||
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
|
||
launches MAME's `apple2gs` with the right ROM path, loads your
|
||
binary at `$00:1000`, runs for a few seconds, and reads back a
|
||
memory cell.
|
||
|
||
```bash
|
||
bash scripts/runInMame.sh prog.bin # just run for 5s
|
||
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
|
||
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
|
||
```
|
||
|
||
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
|
||
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
|
||
the value without checking.
|
||
|
||
MAME is invoked headless by default (no window) via
|
||
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
|
||
servers/CI runners.
|
||
|
||
### The bank-switch idiom
|
||
|
||
#### Background — why this is necessary
|
||
|
||
The 65816 has two registers that select which bank a memory access
|
||
goes to:
|
||
|
||
- **PBR** (Program Bank Register) — selects the bank for instruction
|
||
fetches. Set by `jsl long_addr` and `rtl`.
|
||
- **DBR** (Data Bank Register) — selects the bank for data accesses
|
||
like `lda $5000`, `sta $5000`, etc.
|
||
|
||
When the IIgs boots, DBR defaults to `$00`. Bank `$00` (the same
|
||
bank as the language card / IIe-compatibility area) contains the
|
||
**I/O window at `$C000-$CFFF`**. Any data access to addresses in
|
||
that range goes to the soft-switches and slot ROMs, NOT to RAM.
|
||
This is the same I/O hole the Apple IIe has, inherited by the IIgs
|
||
for backward compatibility.
|
||
|
||
Concretely: if your DBR is `$00` and you write to address `$C100`,
|
||
you're poking the slot-1 ROM enable register — definitely not what
|
||
you want. Similarly, `$5000` in bank 0 is the language card area
|
||
and may or may not be RAM depending on soft-switch state.
|
||
|
||
Banks `$01`-`$DF` are full 64K RAM banks (`$E0`/`$E1` are aux/main
|
||
shadow, `$E0`-`$FF` reserved). To do reliable data work, switch
|
||
the DBR to any of these "normal" banks. **`$02`** is conventional
|
||
in this codebase because:
|
||
|
||
1. `$01:0000-$01:FFFF` overlaps the stack page (`$0100-$01FF` in
|
||
any bank ends up in the same physical RAM as bank `$00`'s
|
||
stack page — confusing).
|
||
2. `$02:0000-$02:FFFF` is the first "clean" bank above the
|
||
special-purpose banks.
|
||
3. The smoke-test convention is to write a result word to
|
||
`$02:5000` so `runInMame.sh` can read it back.
|
||
|
||
If your program needs more than 64 KB of data, switch DBR to
|
||
different banks as needed.
|
||
|
||
#### What the assembly does, line by line
|
||
|
||
```c
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n" // (1) Switch A to 8-bit
|
||
".byte 0xa9,0x02\n" // (2) lda #2 (8-bit immediate)
|
||
"pha\n" // (3) Push A onto stack (1 byte)
|
||
"plb\n" // (4) Pop into DBR (1 byte from stack)
|
||
"rep #0x20\n" // (5) Restore A to 16-bit
|
||
);
|
||
}
|
||
```
|
||
|
||
1. **`sep #0x20`** — sets the `M` bit in the status register `P`.
|
||
`M=1` makes A behave as 8-bit (and immediate operands become
|
||
1 byte). We need this so the next `lda #2` pushes 1 byte
|
||
(matching what `plb` expects to pop). Calling-convention
|
||
prologues always run in M=0 (16-bit), so this `sep` is
|
||
required.
|
||
|
||
2. **`.byte 0xa9,0x02`** — raw bytes for `lda #$02`. We hand-encode
|
||
because llvm-mc can't yet emit an 8-bit immediate `lda #$02`
|
||
that knows it's 1 byte; the assembler keeps treating it as
|
||
16-bit. `0xa9` is the LDA-immediate opcode; `0x02` is the
|
||
1-byte operand. Result: A = `$02` (8-bit).
|
||
|
||
3. **`pha`** — pushes A. In M=1 mode, PHA pushes exactly 1 byte
|
||
(the low half of A). Stack now has `$02` on top.
|
||
|
||
4. **`plb`** — pops 1 byte from the stack and stores it in DBR.
|
||
DBR is now `$02`. All subsequent data accesses go to bank 2.
|
||
|
||
5. **`rep #0x20`** — clears the `M` bit. A returns to 16-bit mode,
|
||
matching the calling-convention contract for the rest of the
|
||
function.
|
||
|
||
The DBR change persists across function returns. Once
|
||
`switchToBank2()` returns, all data reads/writes in your program
|
||
target bank 2 — until you switch DBR again.
|
||
|
||
#### When you need it
|
||
|
||
You need to switch DBR whenever you want to access data at an
|
||
absolute address `$XXXX` and need it to land in a specific bank.
|
||
Common cases:
|
||
|
||
- **MMIO from the test harness** — `*(volatile uint16 *)0x5000 = x;`
|
||
Without DBR=2, this would go to bank 0's `$5000` (which is in
|
||
the language card area). With DBR=2, it goes to `$02:5000`
|
||
where `runInMame.sh --check 0x025000=...` reads from.
|
||
- **Anything in `$C000-$CFFF`** — bank 0 has soft-switches here.
|
||
Bank 2 has plain RAM.
|
||
- **Global arrays declared at link-time at fixed addresses** —
|
||
the linker may place them in bank 2 BSS (`--bss-base 0x020000`).
|
||
Your DBR must match.
|
||
|
||
You DON'T need DBR=2 for:
|
||
|
||
- **Local variables on the stack** — the stack is always
|
||
bank-relative-to-DBR-ignored; `lda $4,s` reads from the stack
|
||
page regardless of DBR.
|
||
- **Direct-page accesses** — `lda $D0` reads from `$00:00D0`
|
||
(always bank 0). DP is anchored to bank 0.
|
||
- **Indirect-long pointers via `[dp],y`** — these include their
|
||
own bank byte and ignore DBR.
|
||
- **Function calls** — `jsl` uses PBR + a long destination
|
||
address. PBR is updated automatically.
|
||
|
||
#### Other ways to access non-bank-0 data
|
||
|
||
If you only need to write to a single non-bank-0 address, you can
|
||
emit the store as `STA_Long` (24-bit absolute) which encodes the
|
||
bank inline:
|
||
|
||
```c
|
||
*(volatile unsigned short *)0x025000 = 42; // becomes sta $025000
|
||
```
|
||
|
||
The W65816 backend recognizes `const-int pointer + integer offset`
|
||
and lowers to `sta long` if the address has a bank byte. No
|
||
`switchToBank2()` needed.
|
||
|
||
For frequent data work in a bank, switching DBR once and using
|
||
plain `sta $5000` (2 bytes) is smaller and faster than `sta $025000`
|
||
(4 bytes) per access.
|
||
|
||
#### Caveats
|
||
|
||
- **Save/restore is your problem.** `switchToBank2()` never
|
||
restores DBR. If your caller expected DBR=0, you've broken its
|
||
expectation. For long-running programs, that's usually fine
|
||
(you just set DBR=2 once and stay there). For toolbox calls, GS/OS
|
||
might assume DBR=0 — check the call's documentation.
|
||
- **The stack is in bank 0 regardless.** Don't try to put the
|
||
stack elsewhere; the 65816's stack-relative addressing modes
|
||
ignore DBR.
|
||
- **In M=1 mode, INTERRUPTS may behave differently.** The `sep`
|
||
affects A's width but not the bank-switching machinery itself.
|
||
Keep the sep/rep window short.
|
||
- **PBR vs DBR** are independent. Code execution stays where it
|
||
was; only data accesses change.
|
||
|
||
#### How `runInMame.sh --check 0x025000=...` works
|
||
|
||
The check address `0x025000` is a 24-bit address: bank `$02`,
|
||
offset `$5000`. The MAME Lua runner reads this byte (and the next
|
||
byte if you specify a 2-byte value) directly from physical RAM,
|
||
bypassing DBR entirely. So the convention is:
|
||
|
||
1. Your program switches DBR to bank 2.
|
||
2. Your program writes its result to `*(volatile X *)0x5000`,
|
||
which becomes `sta $5000` — landing in bank 2 because of DBR.
|
||
3. MAME reads bank 2's `$5000` via the absolute 24-bit address.
|
||
4. The runner compares to your expected value.
|
||
|
||
If you forget `switchToBank2()`, your store goes to the language
|
||
card area (bank 0's `$5000`), MAME's check reads bank 2's
|
||
unchanged `$5000` (likely `$00` or whatever was there), and the
|
||
test fails.
|
||
|
||
## Examples
|
||
|
||
### Hello, integer
|
||
|
||
```c
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||
);
|
||
}
|
||
|
||
int main(void) {
|
||
int x = 42;
|
||
switchToBank2();
|
||
*(volatile int *)0x5000 = x;
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build & run:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -c hello.c -o hello.o
|
||
link816 -o hello.bin --text-base 0x1000 \
|
||
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
|
||
```
|
||
|
||
### Recursion + printing
|
||
|
||
```c
|
||
#include <stdio.h>
|
||
#include <stdlib.h>
|
||
|
||
unsigned long fib(unsigned n) {
|
||
if (n < 2) return n;
|
||
return fib(n-1) + fib(n-2);
|
||
}
|
||
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||
);
|
||
}
|
||
|
||
int main(void) {
|
||
char buf[32];
|
||
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
|
||
switchToBank2();
|
||
// Copy buf to $025000 so we can read it after the run
|
||
for (int i = 0; i <= len; i++)
|
||
((volatile char *)0x5000)[i] = buf[i];
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build (note: need snprintf.o for `snprintf`):
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
|
||
link816 -o fib.bin --text-base 0x1000 \
|
||
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
|
||
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
|
||
```
|
||
|
||
### Apple IIgs Toolbox
|
||
|
||
```c
|
||
#include <iigs/toolbox_full.h>
|
||
|
||
int main(void) {
|
||
DrawString("\pHello, World");
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
|
||
link816 -o hello_gs.bin --text-base 0x1000 \
|
||
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
|
||
runtime/libgcc.o hello_gs.o
|
||
```
|
||
|
||
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
|
||
toolbox — it sets up the IIgs runtime environment.
|
||
|
||
## Inline assembly
|
||
|
||
The W65816 backend supports `__asm__` with operand constraints
|
||
`"a"`, `"x"`, `"y"`:
|
||
|
||
```c
|
||
unsigned short addOne(unsigned short x) {
|
||
unsigned short r;
|
||
__asm__("inc a" : "=a"(r) : "a"(x));
|
||
return r;
|
||
}
|
||
```
|
||
|
||
Multi-instruction asm and raw bytes both work:
|
||
|
||
```c
|
||
__asm__ volatile (
|
||
"sep #0x20\n"
|
||
".byte 0x68\n" // pla
|
||
"rep #0x20\n"
|
||
);
|
||
```
|
||
|
||
The `.byte 0xa9, ...` form is sometimes needed to work around
|
||
llvm-mc encoding gaps — the assembler doesn't yet support every
|
||
65816 addressing mode literally. The pattern works for any
|
||
opcode whose mnemonic doesn't yet parse.
|
||
|
||
## Tools reference
|
||
|
||
| Tool | Location | Purpose |
|
||
|---|---|---|
|
||
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
|
||
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
|
||
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
|
||
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll` → `.s`) |
|
||
| `link816` | `tools/link816` | Our relocating linker |
|
||
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
|
||
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
|
||
|
||
## Debugging
|
||
|
||
### Look at the asm
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -S -o prog.s prog.c
|
||
```
|
||
|
||
### Look at the MIR after each pass
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
|
||
```
|
||
|
||
Useful pass names to filter on:
|
||
|
||
| Pass name | What it does |
|
||
|---|---|
|
||
| `w65816-isel` | SDAG → MachineInstr selection |
|
||
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
|
||
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
|
||
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
|
||
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
|
||
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
|
||
|
||
### Single-pass filter
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
|
||
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
|
||
```
|
||
|
||
## Cycle-count benchmarks
|
||
|
||
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
|
||
Each runs N iterations of the bench function and reports a
|
||
per-call cycle count via MAME's `emu.time()`:
|
||
|
||
```bash
|
||
bash scripts/benchCyclesPrecise.sh
|
||
```
|
||
|
||
Output:
|
||
|
||
```
|
||
| Benchmark | Per-call cycles (clang) |
|
||
|-----------|------------------------:|
|
||
| bsearch | 767 cyc/call |
|
||
| dotProduct | 2131 cyc/call |
|
||
| fib | 12617 cyc/call |
|
||
| memcmp | 989 cyc/call |
|
||
| popcount | 2864 cyc/call |
|
||
| strcpy | 2216 cyc/call |
|
||
| sumOfSquares | 16709 cyc/call |
|
||
```
|
||
|
||
The [`compare/`](../compare/) directory has side-by-side `.s`
|
||
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
|
||
Rerun with:
|
||
|
||
```bash
|
||
bash compare/regen.sh
|
||
```
|
||
|
||
## Known limitations
|
||
|
||
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
|
||
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
|
||
throwing.
|
||
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
|
||
Use `sscanf` on a buffer instead.
|
||
- **File I/O** through `fopen` etc. requires a backing implementation.
|
||
The default `mfs` backing (memory-file-system) lets you simulate
|
||
files via `mfsRegister()` — useful for tests, not for real disk
|
||
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
|
||
against the GS/OS runtime.
|
||
- **`fork`/`exec`** — not applicable on a 65816, no support.
|
||
- **Code generation gotcha:** very large frames (>200 bytes) trigger
|
||
FP-relative addressing. Most programs fit under that limit. See
|
||
the `frame-rel` discussion in
|
||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||
|
||
## Where to go next
|
||
|
||
- **Building real GS/OS apps:** see
|
||
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
|
||
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
|
||
MAME.
|
||
- **Backend internals (you're hacking on the compiler):**
|
||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
|
||
Read it for examples of every feature in action.
|