17 KiB
Using llvm816
This document covers compiling a C program, linking it into an
Apple IIgs binary, and running it under MAME. It assumes you've
followed INSTALL.md and have a working
tools/llvm-mos-build/bin/clang.
Quick reference
CLANG=tools/llvm-mos-build/bin/clang
LINK=tools/link816
RUNTIME=runtime
# 1. Compile C to object
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
# 2. Link to a raw binary (loadable at $00:1000)
$LINK -o hello.bin --text-base 0x1000 \
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
# 3. Run under MAME
bash scripts/runInMame.sh hello.bin --check 0x025000=????
Compiling C
The compiler is invoked just like a normal clang, with
--target=w65816:
clang --target=w65816 -O2 -c source.c -o source.o
Recommended flags:
| Flag | Meaning |
|---|---|
--target=w65816 |
Selects the W65816 backend (required) |
-O2 |
Default optimization level. -O0 and -O1 work but produce ~3-5× larger code |
-ffunction-sections |
Put each function in its own section. Lets the linker drop unreferenced functions |
-I runtime/include |
Find <stdio.h> etc. |
-c |
Compile only — produce .o, don't link |
What works at -O2:
- All C99 scalars:
int8_tthroughint64_t, signed and unsigned, all arithmetic operators - Soft
floatanddouble(full IEEE-754 with round-to-nearest-even) - Pointers, arrays, structs, unions, bitfields
- All control flow:
if,for,while,goto,switch, recursion <stdarg.h>varargs<setjmp.h>setjmp/longjmp (SJLJ, no DWARF unwinder)- Inline
__asm__with"a","x","y"register constraints - C++ subset: classes, single+multiple inheritance, virtual functions,
RTTI,
dynamic_cast. No exceptions (DWARF unwinder not implemented).
See STATUS.md for the full feature matrix.
Linking
The linker is tools/link816. It produces either a raw binary
suitable for direct execution (loaded into a fixed address) or an
OMF binary suitable for GS/OS Loader.
Raw binary
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
--text-base 0x1000— physical address where code is loaded.0x1000is the conventional starting address; the first 4KB of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and zero-page.crt0.o— the C runtime startup. Sets DBR, callsmain, halts. Always link first.libc.o—printf,malloc,strlen, etc.libgcc.o— compiler-helper routines (__mulhi3,__umulhisi3,__divhi3,__ashlhi3, etc.). Required by most non-trivial programs.
Additional runtime libraries
| Library | What you get |
|---|---|
runtime/libc.o |
Core C library — printf, malloc, strlen, etc. |
runtime/libgcc.o |
Compiler helpers — multiply, divide, shift |
runtime/snprintf.o |
sprintf / snprintf / vsnprintf |
runtime/sscanf.o |
sscanf / vsscanf / fscanf |
runtime/softDouble.o |
IEEE 754 double-precision math |
runtime/softFloat.o |
IEEE 754 single-precision math |
runtime/math.o |
fabs, floor, sqrt, sin, cos, etc. |
runtime/qsort.o |
qsort / bsearch |
runtime/strtol.o |
strtol / strtoul / atoi / atol |
runtime/strtok.o |
strtok / strtok_r |
runtime/extras.o |
strcat, strncat, llabs, rand/srand |
runtime/timeExt.o |
time / gmtime / mktime |
runtime/iigsToolbox.o |
Apple IIgs Toolbox call wrappers |
runtime/iigsGsos.o |
GS/OS call wrappers |
Link only what you use — the linker drops unreferenced symbols.
Build them all once with:
bash runtime/build.sh
Multi-segment OMF (for GS/OS Loader)
For programs that need >60 KB of code (the usable bank-0 limit after subtracting the stack, zero-page, and I/O window), build a multi-segment OMF that GS/OS Loader can place across banks:
link816 -o myprog.bin --omf --manifest my.manifest \
--expressload \
crt0Gsos.o ... yourprog.o
See docs/multiSegmentPlan.md for details
and scripts/runMultiSeg.sh for a
working example.
Running under MAME
The supplied scripts/runInMame.sh
launches MAME's apple2gs with the right ROM path, loads your
binary at $00:1000, runs for a few seconds, and reads back a
memory cell.
bash scripts/runInMame.sh prog.bin # just run for 5s
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
The --check ADDR=VALUE form returns exit 0 if ADDR contains
VALUE after the run, exit 1 otherwise. Use 0x???? to dump
the value without checking.
MAME is invoked headless by default (no window) via
-video none + SDL_VIDEODRIVER=dummy. This works on
servers/CI runners.
The bank-switch idiom
Background — why this is necessary
The 65816 has two registers that select which bank a memory access goes to:
- PBR (Program Bank Register) — selects the bank for instruction
fetches. Set by
jsl long_addrandrtl. - DBR (Data Bank Register) — selects the bank for data accesses
like
lda $5000,sta $5000, etc.
When the IIgs boots, DBR defaults to $00. Bank $00 (the same
bank as the language card / IIe-compatibility area) contains the
I/O window at $C000-$CFFF. Any data access to addresses in
that range goes to the soft-switches and slot ROMs, NOT to RAM.
This is the same I/O hole the Apple IIe has, inherited by the IIgs
for backward compatibility.
Concretely: if your DBR is $00 and you write to address $C100,
you're poking the slot-1 ROM enable register — definitely not what
you want. Similarly, $5000 in bank 0 is the language card area
and may or may not be RAM depending on soft-switch state.
Banks $01-$DF are full 64K RAM banks ($E0/$E1 are aux/main
shadow, $E0-$FF reserved). To do reliable data work, switch
the DBR to any of these "normal" banks. $02 is conventional
in this codebase because:
$01:0000-$01:FFFFoverlaps the stack page ($0100-$01FFin any bank ends up in the same physical RAM as bank$00's stack page — confusing).$02:0000-$02:FFFFis the first "clean" bank above the special-purpose banks.- The smoke-test convention is to write a result word to
$02:5000sorunInMame.shcan read it back.
If your program needs more than 64 KB of data, switch DBR to different banks as needed.
What the assembly does, line by line
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n" // (1) Switch A to 8-bit
".byte 0xa9,0x02\n" // (2) lda #2 (8-bit immediate)
"pha\n" // (3) Push A onto stack (1 byte)
"plb\n" // (4) Pop into DBR (1 byte from stack)
"rep #0x20\n" // (5) Restore A to 16-bit
);
}
-
sep #0x20— sets theMbit in the status registerP.M=1makes A behave as 8-bit (and immediate operands become 1 byte). We need this so the nextlda #2pushes 1 byte (matching whatplbexpects to pop). Calling-convention prologues always run in M=0 (16-bit), so thissepis required. -
.byte 0xa9,0x02— raw bytes forlda #$02. We hand-encode because llvm-mc can't yet emit an 8-bit immediatelda #$02that knows it's 1 byte; the assembler keeps treating it as 16-bit.0xa9is the LDA-immediate opcode;0x02is the 1-byte operand. Result: A =$02(8-bit). -
pha— pushes A. In M=1 mode, PHA pushes exactly 1 byte (the low half of A). Stack now has$02on top. -
plb— pops 1 byte from the stack and stores it in DBR. DBR is now$02. All subsequent data accesses go to bank 2. -
rep #0x20— clears theMbit. A returns to 16-bit mode, matching the calling-convention contract for the rest of the function.
The DBR change persists across function returns. Once
switchToBank2() returns, all data reads/writes in your program
target bank 2 — until you switch DBR again.
When you need it
You need to switch DBR whenever you want to access data at an
absolute address $XXXX and need it to land in a specific bank.
Common cases:
- MMIO from the test harness —
*(volatile uint16 *)0x5000 = x;Without DBR=2, this would go to bank 0's$5000(which is in the language card area). With DBR=2, it goes to$02:5000whererunInMame.sh --check 0x025000=...reads from. - Anything in
$C000-$CFFF— bank 0 has soft-switches here. Bank 2 has plain RAM. - Global arrays declared at link-time at fixed addresses —
the linker may place them in bank 2 BSS (
--bss-base 0x020000). Your DBR must match.
You DON'T need DBR=2 for:
- Local variables on the stack — the stack is always
bank-relative-to-DBR-ignored;
lda $4,sreads from the stack page regardless of DBR. - Direct-page accesses —
lda $D0reads from$00:00D0(always bank 0). DP is anchored to bank 0. - Indirect-long pointers via
[dp],y— these include their own bank byte and ignore DBR. - Function calls —
jsluses PBR + a long destination address. PBR is updated automatically.
Other ways to access non-bank-0 data
If you only need to write to a single non-bank-0 address, you can
emit the store as STA_Long (24-bit absolute) which encodes the
bank inline:
*(volatile unsigned short *)0x025000 = 42; // becomes sta $025000
The W65816 backend recognizes const-int pointer + integer offset
and lowers to sta long if the address has a bank byte. No
switchToBank2() needed.
For frequent data work in a bank, switching DBR once and using
plain sta $5000 (2 bytes) is smaller and faster than sta $025000
(4 bytes) per access.
Caveats
- Save/restore is your problem.
switchToBank2()never restores DBR. If your caller expected DBR=0, you've broken its expectation. For long-running programs, that's usually fine (you just set DBR=2 once and stay there). For toolbox calls, GS/OS might assume DBR=0 — check the call's documentation. - The stack is in bank 0 regardless. Don't try to put the stack elsewhere; the 65816's stack-relative addressing modes ignore DBR.
- In M=1 mode, INTERRUPTS may behave differently. The
sepaffects A's width but not the bank-switching machinery itself. Keep the sep/rep window short. - PBR vs DBR are independent. Code execution stays where it was; only data accesses change.
How runInMame.sh --check 0x025000=... works
The check address 0x025000 is a 24-bit address: bank $02,
offset $5000. The MAME Lua runner reads this byte (and the next
byte if you specify a 2-byte value) directly from physical RAM,
bypassing DBR entirely. So the convention is:
- Your program switches DBR to bank 2.
- Your program writes its result to
*(volatile X *)0x5000, which becomessta $5000— landing in bank 2 because of DBR. - MAME reads bank 2's
$5000via the absolute 24-bit address. - The runner compares to your expected value.
If you forget switchToBank2(), your store goes to the language
card area (bank 0's $5000), MAME's check reads bank 2's
unchanged $5000 (likely $00 or whatever was there), and the
test fails.
Examples
Hello, integer
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
int x = 42;
switchToBank2();
*(volatile int *)0x5000 = x;
while (1) {}
}
Build & run:
clang --target=w65816 -O2 -c hello.c -o hello.o
link816 -o hello.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
Recursion + printing
#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned n) {
if (n < 2) return n;
return fib(n-1) + fib(n-2);
}
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
char buf[32];
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
switchToBank2();
// Copy buf to $025000 so we can read it after the run
for (int i = 0; i <= len; i++)
((volatile char *)0x5000)[i] = buf[i];
while (1) {}
}
Build (note: need snprintf.o for snprintf):
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
link816 -o fib.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
Apple IIgs Toolbox
#include <iigs/toolbox_full.h>
int main(void) {
DrawString("\pHello, World");
while (1) {}
}
Build:
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
link816 -o hello_gs.bin --text-base 0x1000 \
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
runtime/libgcc.o hello_gs.o
Use crt0Gsos.o (not crt0.o) for programs that call into the
toolbox — it sets up the IIgs runtime environment.
Inline assembly
The W65816 backend supports __asm__ with operand constraints
"a", "x", "y":
unsigned short addOne(unsigned short x) {
unsigned short r;
__asm__("inc a" : "=a"(r) : "a"(x));
return r;
}
Multi-instruction asm and raw bytes both work:
__asm__ volatile (
"sep #0x20\n"
".byte 0x68\n" // pla
"rep #0x20\n"
);
The .byte 0xa9, ... form is sometimes needed to work around
llvm-mc encoding gaps — the assembler doesn't yet support every
65816 addressing mode literally. The pattern works for any
opcode whose mnemonic doesn't yet parse.
Tools reference
| Tool | Location | Purpose |
|---|---|---|
clang |
tools/llvm-mos-build/bin/clang |
C/C++ compiler |
llvm-mc |
tools/llvm-mos-build/bin/llvm-mc |
Assembler |
llvm-objdump |
tools/llvm-mos-build/bin/llvm-objdump |
Disassembler |
llc |
tools/llvm-mos-build/bin/llc |
Standalone codegen (.ll → .s) |
link816 |
tools/link816 |
Our relocating linker |
omfEmit |
tools/omfEmit |
Emit OMF v2.1 binary from link816 output |
mame |
apt (system-wide) |
Apple IIgs emulator |
Debugging
Look at the asm
clang --target=w65816 -O2 -S -o prog.s prog.c
Look at the MIR after each pass
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
Useful pass names to filter on:
| Pass name | What it does |
|---|---|
w65816-isel |
SDAG → MachineInstr selection |
w65816-widen-acc16 |
Promote Acc16 vregs to Wide16 (regalloc help) |
w65816-stack-slot-cleanup |
Remove redundant spill/reload |
w65816-stackrel-to-img |
Promote hot stack slots to DP IMG slots |
w65816-stack-slot-merge |
Collapse PHI src/dst slot pairs |
w65816-branch-expand |
Long-distance Bxx → INV_Bxx skip;BRA |
Single-pass filter
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
Cycle-count benchmarks
Eight microbenchmarks live under benchmarks/.
Each runs N iterations of the bench function and reports a
per-call cycle count via MAME's emu.time():
bash scripts/benchCyclesPrecise.sh
Output:
| Benchmark | Per-call cycles (clang) |
|-----------|------------------------:|
| bsearch | 767 cyc/call |
| dotProduct | 2131 cyc/call |
| fib | 12617 cyc/call |
| memcmp | 989 cyc/call |
| popcount | 2864 cyc/call |
| strcpy | 2216 cyc/call |
| sumOfSquares | 16709 cyc/call |
The compare/ directory has side-by-side .s
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
Rerun with:
bash compare/regen.sh
Known limitations
- C++ exceptions are not implemented.
try/catchcompiles but doesn't unwind.-fsjlj-exceptionsworks for limited SJLJ-style throwing. stdinalways returns EOF.scanfcompiles but isn't useful. Usesscanfon a buffer instead.- File I/O through
fopenetc. requires a backing implementation. The defaultmfsbacking (memory-file-system) lets you simulate files viamfsRegister()— useful for tests, not for real disk I/O. GS/OS file I/O works viaruntime/iigsGsos.oif you link against the GS/OS runtime. fork/exec— not applicable on a 65816, no support.- Code generation gotcha: very large frames (>200 bytes) trigger
FP-relative addressing. Most programs fit under that limit. See
the
frame-reldiscussion in LLVM_65816_DESIGN.md.
Where to go next
- Building real GS/OS apps: see
docs/multiSegmentPlan.mdand therunViaFinder.shscript for booting through real GS/OS 6.0.2 in MAME. - Backend internals (you're hacking on the compiler): LLVM_65816_DESIGN.md.
- Smoke tests:
scripts/smokeTest.shruns ~150 end-to-end checks. Read it for examples of every feature in action.