# Using llvm816 This document covers compiling a C program, linking it into an Apple IIgs binary, and running it under MAME. It assumes you've followed [INSTALL.md](INSTALL.md) and have a working `tools/llvm-mos-build/bin/clang`. ## Quick reference ```bash CLANG=tools/llvm-mos-build/bin/clang LINK=tools/link816 RUNTIME=runtime # 1. Compile C to object $CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o # 2. Link to a raw binary (loadable at $00:1000) $LINK -o hello.bin --text-base 0x1000 \ $RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o # 3. Run under MAME bash scripts/runInMame.sh hello.bin --check 0x025000=???? ``` ## Compiling C The compiler is invoked just like a normal clang, with `--target=w65816`: ```bash clang --target=w65816 -O2 -c source.c -o source.o ``` **Recommended flags:** | Flag | Meaning | |---|---| | `--target=w65816` | Selects the W65816 backend (required) | | `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code | | `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions | | `-I runtime/include` | Find `` etc. | | `-c` | Compile only — produce `.o`, don't link | **What works at `-O2`:** - All C99 scalars: `int8_t` through `int64_t`, signed and unsigned, all arithmetic operators - Soft `float` and `double` (full IEEE-754 with round-to-nearest-even) - Pointers, arrays, structs, unions, bitfields - All control flow: `if`, `for`, `while`, `goto`, `switch`, recursion - `` varargs - `` setjmp/longjmp (SJLJ, no DWARF unwinder) - Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints - C++ subset: classes, single+multiple inheritance, virtual functions, RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not implemented). See [STATUS.md](../STATUS.md) for the full feature matrix. ## Linking The linker is `tools/link816`. It produces either a raw binary suitable for direct execution (loaded into a fixed address) or an OMF binary suitable for GS/OS Loader. ### Raw binary ```bash link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o ``` - `--text-base 0x1000` — physical address where code is loaded. `0x1000` is the conventional starting address; the first 4KB of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and zero-page. - `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts. Always link first. - `libc.o` — `printf`, `malloc`, `strlen`, etc. - `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`, `__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial programs. ### Additional runtime libraries | Library | What you get | |---|---| | `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. | | `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift | | `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` | | `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` | | `runtime/softDouble.o` | IEEE 754 double-precision math | | `runtime/softFloat.o` | IEEE 754 single-precision math | | `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. | | `runtime/qsort.o` | `qsort` / `bsearch` | | `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` | | `runtime/strtok.o` | `strtok` / `strtok_r` | | `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` | | `runtime/timeExt.o` | `time` / `gmtime` / `mktime` | | `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers | | `runtime/iigsGsos.o` | GS/OS call wrappers | Link only what you use — the linker drops unreferenced symbols. Build them all once with: ```bash bash runtime/build.sh ``` ### Multi-segment OMF (for GS/OS Loader) For programs that need >60 KB of code (the usable bank-0 limit after subtracting the stack, zero-page, and I/O window), build a multi-segment OMF that GS/OS Loader can place across banks: ```bash link816 -o myprog.bin --omf --manifest my.manifest \ --expressload \ crt0Gsos.o ... yourprog.o ``` See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a working example. ## Running under MAME The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh) launches MAME's `apple2gs` with the right ROM path, loads your binary at `$00:1000`, runs for a few seconds, and reads back a memory cell. ```bash bash scripts/runInMame.sh prog.bin # just run for 5s bash scripts/runInMame.sh prog.bin --check 0x025000=00ff bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs ``` The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains `VALUE` after the run, exit 1 otherwise. Use `0x????` to dump the value without checking. MAME is invoked headless by default (no window) via `-video none` + `SDL_VIDEODRIVER=dummy`. This works on servers/CI runners. ### The bank-switch idiom #### Background — why this is necessary The 65816 has two registers that select which bank a memory access goes to: - **PBR** (Program Bank Register) — selects the bank for instruction fetches. Set by `jsl long_addr` and `rtl`. - **DBR** (Data Bank Register) — selects the bank for data accesses like `lda $5000`, `sta $5000`, etc. When the IIgs boots, DBR defaults to `$00`. Bank `$00` (the same bank as the language card / IIe-compatibility area) contains the **I/O window at `$C000-$CFFF`**. Any data access to addresses in that range goes to the soft-switches and slot ROMs, NOT to RAM. This is the same I/O hole the Apple IIe has, inherited by the IIgs for backward compatibility. Concretely: if your DBR is `$00` and you write to address `$C100`, you're poking the slot-1 ROM enable register — definitely not what you want. Similarly, `$5000` in bank 0 is the language card area and may or may not be RAM depending on soft-switch state. Banks `$01`-`$DF` are full 64K RAM banks (`$E0`/`$E1` are aux/main shadow, `$E0`-`$FF` reserved). To do reliable data work, switch the DBR to any of these "normal" banks. **`$02`** is conventional in this codebase because: 1. `$01:0000-$01:FFFF` overlaps the stack page (`$0100-$01FF` in any bank ends up in the same physical RAM as bank `$00`'s stack page — confusing). 2. `$02:0000-$02:FFFF` is the first "clean" bank above the special-purpose banks. 3. The smoke-test convention is to write a result word to `$02:5000` so `runInMame.sh` can read it back. If your program needs more than 64 KB of data, switch DBR to different banks as needed. #### What the assembly does, line by line ```c __attribute__((noinline)) void switchToBank2(void) { __asm__ volatile ( "sep #0x20\n" // (1) Switch A to 8-bit ".byte 0xa9,0x02\n" // (2) lda #2 (8-bit immediate) "pha\n" // (3) Push A onto stack (1 byte) "plb\n" // (4) Pop into DBR (1 byte from stack) "rep #0x20\n" // (5) Restore A to 16-bit ); } ``` 1. **`sep #0x20`** — sets the `M` bit in the status register `P`. `M=1` makes A behave as 8-bit (and immediate operands become 1 byte). We need this so the next `lda #2` pushes 1 byte (matching what `plb` expects to pop). Calling-convention prologues always run in M=0 (16-bit), so this `sep` is required. 2. **`.byte 0xa9,0x02`** — raw bytes for `lda #$02`. We hand-encode because llvm-mc can't yet emit an 8-bit immediate `lda #$02` that knows it's 1 byte; the assembler keeps treating it as 16-bit. `0xa9` is the LDA-immediate opcode; `0x02` is the 1-byte operand. Result: A = `$02` (8-bit). 3. **`pha`** — pushes A. In M=1 mode, PHA pushes exactly 1 byte (the low half of A). Stack now has `$02` on top. 4. **`plb`** — pops 1 byte from the stack and stores it in DBR. DBR is now `$02`. All subsequent data accesses go to bank 2. 5. **`rep #0x20`** — clears the `M` bit. A returns to 16-bit mode, matching the calling-convention contract for the rest of the function. The DBR change persists across function returns. Once `switchToBank2()` returns, all data reads/writes in your program target bank 2 — until you switch DBR again. #### When you need it You need to switch DBR whenever you want to access data at an absolute address `$XXXX` and need it to land in a specific bank. Common cases: - **MMIO from the test harness** — `*(volatile uint16 *)0x5000 = x;` Without DBR=2, this would go to bank 0's `$5000` (which is in the language card area). With DBR=2, it goes to `$02:5000` where `runInMame.sh --check 0x025000=...` reads from. - **Anything in `$C000-$CFFF`** — bank 0 has soft-switches here. Bank 2 has plain RAM. - **Global arrays declared at link-time at fixed addresses** — the linker may place them in bank 2 BSS (`--bss-base 0x020000`). Your DBR must match. You DON'T need DBR=2 for: - **Local variables on the stack** — the stack is always bank-relative-to-DBR-ignored; `lda $4,s` reads from the stack page regardless of DBR. - **Direct-page accesses** — `lda $D0` reads from `$00:00D0` (always bank 0). DP is anchored to bank 0. - **Indirect-long pointers via `[dp],y`** — these include their own bank byte and ignore DBR. - **Function calls** — `jsl` uses PBR + a long destination address. PBR is updated automatically. #### Other ways to access non-bank-0 data If you only need to write to a single non-bank-0 address, you can emit the store as `STA_Long` (24-bit absolute) which encodes the bank inline: ```c *(volatile unsigned short *)0x025000 = 42; // becomes sta $025000 ``` The W65816 backend recognizes `const-int pointer + integer offset` and lowers to `sta long` if the address has a bank byte. No `switchToBank2()` needed. For frequent data work in a bank, switching DBR once and using plain `sta $5000` (2 bytes) is smaller and faster than `sta $025000` (4 bytes) per access. #### Caveats - **Save/restore is your problem.** `switchToBank2()` never restores DBR. If your caller expected DBR=0, you've broken its expectation. For long-running programs, that's usually fine (you just set DBR=2 once and stay there). For toolbox calls, GS/OS might assume DBR=0 — check the call's documentation. - **The stack is in bank 0 regardless.** Don't try to put the stack elsewhere; the 65816's stack-relative addressing modes ignore DBR. - **In M=1 mode, INTERRUPTS may behave differently.** The `sep` affects A's width but not the bank-switching machinery itself. Keep the sep/rep window short. - **PBR vs DBR** are independent. Code execution stays where it was; only data accesses change. #### How `runInMame.sh --check 0x025000=...` works The check address `0x025000` is a 24-bit address: bank `$02`, offset `$5000`. The MAME Lua runner reads this byte (and the next byte if you specify a 2-byte value) directly from physical RAM, bypassing DBR entirely. So the convention is: 1. Your program switches DBR to bank 2. 2. Your program writes its result to `*(volatile X *)0x5000`, which becomes `sta $5000` — landing in bank 2 because of DBR. 3. MAME reads bank 2's `$5000` via the absolute 24-bit address. 4. The runner compares to your expected value. If you forget `switchToBank2()`, your store goes to the language card area (bank 0's `$5000`), MAME's check reads bank 2's unchanged `$5000` (likely `$00` or whatever was there), and the test fails. ## Examples ### Hello, integer ```c __attribute__((noinline)) void switchToBank2(void) { __asm__ volatile ( "sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n" ); } int main(void) { int x = 42; switchToBank2(); *(volatile int *)0x5000 = x; while (1) {} } ``` Build & run: ```bash clang --target=w65816 -O2 -c hello.c -o hello.o link816 -o hello.bin --text-base 0x1000 \ runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42 ``` ### Recursion + printing ```c #include #include unsigned long fib(unsigned n) { if (n < 2) return n; return fib(n-1) + fib(n-2); } __attribute__((noinline)) void switchToBank2(void) { __asm__ volatile ( "sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n" ); } int main(void) { char buf[32]; int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10)); switchToBank2(); // Copy buf to $025000 so we can read it after the run for (int i = 0; i <= len; i++) ((volatile char *)0x5000)[i] = buf[i]; while (1) {} } ``` Build (note: need snprintf.o for `snprintf`): ```bash clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o link816 -o fib.bin --text-base 0x1000 \ runtime/crt0.o runtime/libc.o runtime/libgcc.o \ runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o ``` ### Apple IIgs Toolbox ```c #include int main(void) { DrawString("\pHello, World"); while (1) {} } ``` Build: ```bash clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o link816 -o hello_gs.bin --text-base 0x1000 \ runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \ runtime/libgcc.o hello_gs.o ``` Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the toolbox — it sets up the IIgs runtime environment. ## Inline assembly The W65816 backend supports `__asm__` with operand constraints `"a"`, `"x"`, `"y"`: ```c unsigned short addOne(unsigned short x) { unsigned short r; __asm__("inc a" : "=a"(r) : "a"(x)); return r; } ``` Multi-instruction asm and raw bytes both work: ```c __asm__ volatile ( "sep #0x20\n" ".byte 0x68\n" // pla "rep #0x20\n" ); ``` The `.byte 0xa9, ...` form is sometimes needed to work around llvm-mc encoding gaps — the assembler doesn't yet support every 65816 addressing mode literally. The pattern works for any opcode whose mnemonic doesn't yet parse. ## Tools reference | Tool | Location | Purpose | |---|---|---| | `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler | | `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler | | `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler | | `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll` → `.s`) | | `link816` | `tools/link816` | Our relocating linker | | `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output | | `mame` | `apt` (system-wide) | Apple IIgs emulator | ## Debugging ### Look at the asm ```bash clang --target=w65816 -O2 -S -o prog.s prog.c ``` ### Look at the MIR after each pass ```bash clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less ``` Useful pass names to filter on: | Pass name | What it does | |---|---| | `w65816-isel` | SDAG → MachineInstr selection | | `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) | | `w65816-stack-slot-cleanup` | Remove redundant spill/reload | | `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots | | `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs | | `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA | ### Single-pass filter ```bash clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \ -mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less ``` ## Cycle-count benchmarks Eight microbenchmarks live under [`benchmarks/`](../benchmarks/). Each runs N iterations of the bench function and reports a per-call cycle count via MAME's `emu.time()`: ```bash bash scripts/benchCyclesPrecise.sh ``` Output: ``` | Benchmark | Per-call cycles (clang) | |-----------|------------------------:| | bsearch | 767 cyc/call | | dotProduct | 2131 cyc/call | | fib | 12617 cyc/call | | memcmp | 989 cyc/call | | popcount | 2864 cyc/call | | strcpy | 2216 cyc/call | | sumOfSquares | 16709 cyc/call | ``` The [`compare/`](../compare/) directory has side-by-side `.s` files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32. Rerun with: ```bash bash compare/regen.sh ``` ## Known limitations - **C++ exceptions** are not implemented. `try`/`catch` compiles but doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style throwing. - **`stdin`** always returns EOF. `scanf` compiles but isn't useful. Use `sscanf` on a buffer instead. - **File I/O** through `fopen` etc. requires a backing implementation. The default `mfs` backing (memory-file-system) lets you simulate files via `mfsRegister()` — useful for tests, not for real disk I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link against the GS/OS runtime. - **`fork`/`exec`** — not applicable on a 65816, no support. - **Code generation gotcha:** very large frames (>200 bytes) trigger FP-relative addressing. Most programs fit under that limit. See the `frame-rel` discussion in [LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md). ## Where to go next - **Building real GS/OS apps:** see [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the `runViaFinder.sh` script for booting through real GS/OS 6.0.2 in MAME. - **Backend internals (you're hacking on the compiler):** [LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md). - **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks. Read it for examples of every feature in action.