391 lines
11 KiB
Markdown
391 lines
11 KiB
Markdown
# Using llvm816
|
||
|
||
This document covers compiling a C program, linking it into an
|
||
Apple IIgs binary, and running it under MAME. It assumes you've
|
||
followed [INSTALL.md](INSTALL.md) and have a working
|
||
`tools/llvm-mos-build/bin/clang`.
|
||
|
||
## Quick reference
|
||
|
||
```bash
|
||
CLANG=tools/llvm-mos-build/bin/clang
|
||
LINK=tools/link816
|
||
RUNTIME=runtime
|
||
|
||
# 1. Compile C to object
|
||
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
|
||
|
||
# 2. Link to a raw binary (loadable at $00:1000)
|
||
$LINK -o hello.bin --text-base 0x1000 \
|
||
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
|
||
|
||
# 3. Run under MAME
|
||
bash scripts/runInMame.sh hello.bin --check 0x025000=????
|
||
```
|
||
|
||
## Compiling C
|
||
|
||
The compiler is invoked just like a normal clang, with
|
||
`--target=w65816`:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -c source.c -o source.o
|
||
```
|
||
|
||
**Recommended flags:**
|
||
|
||
| Flag | Meaning |
|
||
|---|---|
|
||
| `--target=w65816` | Selects the W65816 backend (required) |
|
||
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
|
||
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
|
||
| `-I runtime/include` | Find `<stdio.h>` etc. |
|
||
| `-c` | Compile only — produce `.o`, don't link |
|
||
|
||
**What works at `-O2`:**
|
||
|
||
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
|
||
all arithmetic operators
|
||
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
|
||
- Pointers, arrays, structs, unions, bitfields
|
||
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
|
||
recursion
|
||
- `<stdarg.h>` varargs
|
||
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
|
||
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
|
||
- C++ subset: classes, single+multiple inheritance, virtual functions,
|
||
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
|
||
implemented).
|
||
|
||
See [STATUS.md](../STATUS.md) for the full feature matrix.
|
||
|
||
## Linking
|
||
|
||
The linker is `tools/link816`. It produces either a raw binary
|
||
suitable for direct execution (loaded into a fixed address) or an
|
||
OMF binary suitable for GS/OS Loader.
|
||
|
||
### Raw binary
|
||
|
||
```bash
|
||
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
|
||
```
|
||
|
||
- `--text-base 0x1000` — physical address where code is loaded.
|
||
`0x1000` is the conventional starting address; the first 4KB
|
||
of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and
|
||
zero-page.
|
||
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
|
||
Always link first.
|
||
- `libc.o` — `printf`, `malloc`, `strlen`, etc.
|
||
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
|
||
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
|
||
programs.
|
||
|
||
### Additional runtime libraries
|
||
|
||
| Library | What you get |
|
||
|---|---|
|
||
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
|
||
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
|
||
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
|
||
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
|
||
| `runtime/softDouble.o` | IEEE 754 double-precision math |
|
||
| `runtime/softFloat.o` | IEEE 754 single-precision math |
|
||
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
|
||
| `runtime/qsort.o` | `qsort` / `bsearch` |
|
||
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
|
||
| `runtime/strtok.o` | `strtok` / `strtok_r` |
|
||
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
|
||
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
|
||
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
|
||
| `runtime/iigsGsos.o` | GS/OS call wrappers |
|
||
|
||
Link only what you use — the linker drops unreferenced symbols.
|
||
|
||
Build them all once with:
|
||
|
||
```bash
|
||
bash runtime/build.sh
|
||
```
|
||
|
||
### Multi-segment OMF (for GS/OS Loader)
|
||
|
||
For programs that need >60 KB of code (the usable bank-0 limit
|
||
after subtracting the stack, zero-page, and I/O window), build a
|
||
multi-segment OMF that GS/OS Loader can place across banks:
|
||
|
||
```bash
|
||
link816 -o myprog.bin --omf --manifest my.manifest \
|
||
--expressload \
|
||
crt0Gsos.o ... yourprog.o
|
||
```
|
||
|
||
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
|
||
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
|
||
working example.
|
||
|
||
## Running under MAME
|
||
|
||
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
|
||
launches MAME's `apple2gs` with the right ROM path, loads your
|
||
binary at `$00:1000`, runs for a few seconds, and reads back a
|
||
memory cell.
|
||
|
||
```bash
|
||
bash scripts/runInMame.sh prog.bin # just run for 5s
|
||
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
|
||
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
|
||
```
|
||
|
||
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
|
||
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
|
||
the value without checking.
|
||
|
||
MAME is invoked headless by default (no window) via
|
||
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
|
||
servers/CI runners.
|
||
|
||
### The bank-switch idiom
|
||
|
||
Bank 0 (`$00:0000-$00:FFFF`) has the I/O window at `$C000-$CFFF`
|
||
that interferes with normal data access. The convention is to
|
||
switch the data bank register (DBR) to bank 2 (`$02:0000`) before
|
||
doing any data work:
|
||
|
||
```c
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n" // 8-bit accumulator
|
||
".byte 0xa9,0x02\n" // lda #2 (force as bytes — llvm-mc bug)
|
||
"pha\n"
|
||
"plb\n" // DBR = 2
|
||
"rep #0x20\n" // back to 16-bit
|
||
);
|
||
}
|
||
```
|
||
|
||
After `switchToBank2()`, your data lives at `$02:0000` upward.
|
||
The `runInMame.sh` `--check 0x025000=...` address is `$02:5000`
|
||
— accessible via a normal store in bank 2.
|
||
|
||
## Examples
|
||
|
||
### Hello, integer
|
||
|
||
```c
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||
);
|
||
}
|
||
|
||
int main(void) {
|
||
int x = 42;
|
||
switchToBank2();
|
||
*(volatile int *)0x5000 = x;
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build & run:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -c hello.c -o hello.o
|
||
link816 -o hello.bin --text-base 0x1000 \
|
||
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
|
||
```
|
||
|
||
### Recursion + printing
|
||
|
||
```c
|
||
#include <stdio.h>
|
||
#include <stdlib.h>
|
||
|
||
unsigned long fib(unsigned n) {
|
||
if (n < 2) return n;
|
||
return fib(n-1) + fib(n-2);
|
||
}
|
||
|
||
__attribute__((noinline)) void switchToBank2(void) {
|
||
__asm__ volatile (
|
||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||
);
|
||
}
|
||
|
||
int main(void) {
|
||
char buf[32];
|
||
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
|
||
switchToBank2();
|
||
// Copy buf to $025000 so we can read it after the run
|
||
for (int i = 0; i <= len; i++)
|
||
((volatile char *)0x5000)[i] = buf[i];
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build (note: need snprintf.o for `snprintf`):
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
|
||
link816 -o fib.bin --text-base 0x1000 \
|
||
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
|
||
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
|
||
```
|
||
|
||
### Apple IIgs Toolbox
|
||
|
||
```c
|
||
#include <iigs/toolbox_full.h>
|
||
|
||
int main(void) {
|
||
DrawString("\pHello, World");
|
||
while (1) {}
|
||
}
|
||
```
|
||
|
||
Build:
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
|
||
link816 -o hello_gs.bin --text-base 0x1000 \
|
||
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
|
||
runtime/libgcc.o hello_gs.o
|
||
```
|
||
|
||
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
|
||
toolbox — it sets up the IIgs runtime environment.
|
||
|
||
## Inline assembly
|
||
|
||
The W65816 backend supports `__asm__` with operand constraints
|
||
`"a"`, `"x"`, `"y"`:
|
||
|
||
```c
|
||
unsigned short addOne(unsigned short x) {
|
||
unsigned short r;
|
||
__asm__("inc a" : "=a"(r) : "a"(x));
|
||
return r;
|
||
}
|
||
```
|
||
|
||
Multi-instruction asm and raw bytes both work:
|
||
|
||
```c
|
||
__asm__ volatile (
|
||
"sep #0x20\n"
|
||
".byte 0x68\n" // pla
|
||
"rep #0x20\n"
|
||
);
|
||
```
|
||
|
||
The `.byte 0xa9, ...` form is sometimes needed to work around
|
||
llvm-mc encoding gaps — the assembler doesn't yet support every
|
||
65816 addressing mode literally. The pattern works for any
|
||
opcode whose mnemonic doesn't yet parse.
|
||
|
||
## Tools reference
|
||
|
||
| Tool | Location | Purpose |
|
||
|---|---|---|
|
||
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
|
||
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
|
||
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
|
||
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll` → `.s`) |
|
||
| `link816` | `tools/link816` | Our relocating linker |
|
||
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
|
||
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
|
||
|
||
## Debugging
|
||
|
||
### Look at the asm
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -S -o prog.s prog.c
|
||
```
|
||
|
||
### Look at the MIR after each pass
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
|
||
```
|
||
|
||
Useful pass names to filter on:
|
||
|
||
| Pass name | What it does |
|
||
|---|---|
|
||
| `w65816-isel` | SDAG → MachineInstr selection |
|
||
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
|
||
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
|
||
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
|
||
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
|
||
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
|
||
|
||
### Single-pass filter
|
||
|
||
```bash
|
||
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
|
||
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
|
||
```
|
||
|
||
## Cycle-count benchmarks
|
||
|
||
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
|
||
Each runs N iterations of the bench function and reports a
|
||
per-call cycle count via MAME's `emu.time()`:
|
||
|
||
```bash
|
||
bash scripts/benchCyclesPrecise.sh
|
||
```
|
||
|
||
Output:
|
||
|
||
```
|
||
| Benchmark | Per-call cycles (clang) |
|
||
|-----------|------------------------:|
|
||
| bsearch | 767 cyc/call |
|
||
| dotProduct | 2131 cyc/call |
|
||
| fib | 12617 cyc/call |
|
||
| memcmp | 989 cyc/call |
|
||
| popcount | 2864 cyc/call |
|
||
| strcpy | 2216 cyc/call |
|
||
| sumOfSquares | 16709 cyc/call |
|
||
```
|
||
|
||
The [`compare/`](../compare/) directory has side-by-side `.s`
|
||
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
|
||
Rerun with:
|
||
|
||
```bash
|
||
bash compare/regen.sh
|
||
```
|
||
|
||
## Known limitations
|
||
|
||
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
|
||
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
|
||
throwing.
|
||
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
|
||
Use `sscanf` on a buffer instead.
|
||
- **File I/O** through `fopen` etc. requires a backing implementation.
|
||
The default `mfs` backing (memory-file-system) lets you simulate
|
||
files via `mfsRegister()` — useful for tests, not for real disk
|
||
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
|
||
against the GS/OS runtime.
|
||
- **`fork`/`exec`** — not applicable on a 65816, no support.
|
||
- **Code generation gotcha:** very large frames (>200 bytes) trigger
|
||
FP-relative addressing. Most programs fit under that limit. See
|
||
the `frame-rel` discussion in
|
||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||
|
||
## Where to go next
|
||
|
||
- **Building real GS/OS apps:** see
|
||
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
|
||
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
|
||
MAME.
|
||
- **Backend internals (you're hacking on the compiler):**
|
||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
|
||
Read it for examples of every feature in action.
|