11 KiB
Using llvm816
This document covers compiling a C program, linking it into an
Apple IIgs binary, and running it under MAME. It assumes you've
followed INSTALL.md and have a working
tools/llvm-mos-build/bin/clang.
Quick reference
CLANG=tools/llvm-mos-build/bin/clang
LINK=tools/link816
RUNTIME=runtime
# 1. Compile C to object
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
# 2. Link to a raw binary (loadable at $00:1000)
$LINK -o hello.bin --text-base 0x1000 \
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
# 3. Run under MAME
bash scripts/runInMame.sh hello.bin --check 0x025000=????
Compiling C
The compiler is invoked just like a normal clang, with
--target=w65816:
clang --target=w65816 -O2 -c source.c -o source.o
Recommended flags:
| Flag | Meaning |
|---|---|
--target=w65816 |
Selects the W65816 backend (required) |
-O2 |
Default optimization level. -O0 and -O1 work but produce ~3-5× larger code |
-ffunction-sections |
Put each function in its own section. Lets the linker drop unreferenced functions |
-I runtime/include |
Find <stdio.h> etc. |
-c |
Compile only — produce .o, don't link |
What works at -O2:
- All C99 scalars:
int8_tthroughint64_t, signed and unsigned, all arithmetic operators - Soft
floatanddouble(full IEEE-754 with round-to-nearest-even) - Pointers, arrays, structs, unions, bitfields
- All control flow:
if,for,while,goto,switch, recursion <stdarg.h>varargs<setjmp.h>setjmp/longjmp (SJLJ, no DWARF unwinder)- Inline
__asm__with"a","x","y"register constraints - C++ subset: classes, single+multiple inheritance, virtual functions,
RTTI,
dynamic_cast. No exceptions (DWARF unwinder not implemented).
See STATUS.md for the full feature matrix.
Linking
The linker is tools/link816. It produces either a raw binary
suitable for direct execution (loaded into a fixed address) or an
OMF binary suitable for GS/OS Loader.
Raw binary
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
--text-base 0x1000— physical address where code is loaded.0x1000is the conventional starting address; the first 4KB of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and zero-page.crt0.o— the C runtime startup. Sets DBR, callsmain, halts. Always link first.libc.o—printf,malloc,strlen, etc.libgcc.o— compiler-helper routines (__mulhi3,__umulhisi3,__divhi3,__ashlhi3, etc.). Required by most non-trivial programs.
Additional runtime libraries
| Library | What you get |
|---|---|
runtime/libc.o |
Core C library — printf, malloc, strlen, etc. |
runtime/libgcc.o |
Compiler helpers — multiply, divide, shift |
runtime/snprintf.o |
sprintf / snprintf / vsnprintf |
runtime/sscanf.o |
sscanf / vsscanf / fscanf |
runtime/softDouble.o |
IEEE 754 double-precision math |
runtime/softFloat.o |
IEEE 754 single-precision math |
runtime/math.o |
fabs, floor, sqrt, sin, cos, etc. |
runtime/qsort.o |
qsort / bsearch |
runtime/strtol.o |
strtol / strtoul / atoi / atol |
runtime/strtok.o |
strtok / strtok_r |
runtime/extras.o |
strcat, strncat, llabs, rand/srand |
runtime/timeExt.o |
time / gmtime / mktime |
runtime/iigsToolbox.o |
Apple IIgs Toolbox call wrappers |
runtime/iigsGsos.o |
GS/OS call wrappers |
Link only what you use — the linker drops unreferenced symbols.
Build them all once with:
bash runtime/build.sh
Multi-segment OMF (for GS/OS Loader)
For programs that need >60 KB of code (the usable bank-0 limit after subtracting the stack, zero-page, and I/O window), build a multi-segment OMF that GS/OS Loader can place across banks:
link816 -o myprog.bin --omf --manifest my.manifest \
--expressload \
crt0Gsos.o ... yourprog.o
See docs/multiSegmentPlan.md for details
and scripts/runMultiSeg.sh for a
working example.
Running under MAME
The supplied scripts/runInMame.sh
launches MAME's apple2gs with the right ROM path, loads your
binary at $00:1000, runs for a few seconds, and reads back a
memory cell.
bash scripts/runInMame.sh prog.bin # just run for 5s
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
The --check ADDR=VALUE form returns exit 0 if ADDR contains
VALUE after the run, exit 1 otherwise. Use 0x???? to dump
the value without checking.
MAME is invoked headless by default (no window) via
-video none + SDL_VIDEODRIVER=dummy. This works on
servers/CI runners.
The bank-switch idiom
Bank 0 ($00:0000-$00:FFFF) has the I/O window at $C000-$CFFF
that interferes with normal data access. The convention is to
switch the data bank register (DBR) to bank 2 ($02:0000) before
doing any data work:
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n" // 8-bit accumulator
".byte 0xa9,0x02\n" // lda #2 (force as bytes — llvm-mc bug)
"pha\n"
"plb\n" // DBR = 2
"rep #0x20\n" // back to 16-bit
);
}
After switchToBank2(), your data lives at $02:0000 upward.
The runInMame.sh --check 0x025000=... address is $02:5000
— accessible via a normal store in bank 2.
Examples
Hello, integer
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
int x = 42;
switchToBank2();
*(volatile int *)0x5000 = x;
while (1) {}
}
Build & run:
clang --target=w65816 -O2 -c hello.c -o hello.o
link816 -o hello.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
Recursion + printing
#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned n) {
if (n < 2) return n;
return fib(n-1) + fib(n-2);
}
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile (
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
);
}
int main(void) {
char buf[32];
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
switchToBank2();
// Copy buf to $025000 so we can read it after the run
for (int i = 0; i <= len; i++)
((volatile char *)0x5000)[i] = buf[i];
while (1) {}
}
Build (note: need snprintf.o for snprintf):
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
link816 -o fib.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
Apple IIgs Toolbox
#include <iigs/toolbox_full.h>
int main(void) {
DrawString("\pHello, World");
while (1) {}
}
Build:
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
link816 -o hello_gs.bin --text-base 0x1000 \
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
runtime/libgcc.o hello_gs.o
Use crt0Gsos.o (not crt0.o) for programs that call into the
toolbox — it sets up the IIgs runtime environment.
Inline assembly
The W65816 backend supports __asm__ with operand constraints
"a", "x", "y":
unsigned short addOne(unsigned short x) {
unsigned short r;
__asm__("inc a" : "=a"(r) : "a"(x));
return r;
}
Multi-instruction asm and raw bytes both work:
__asm__ volatile (
"sep #0x20\n"
".byte 0x68\n" // pla
"rep #0x20\n"
);
The .byte 0xa9, ... form is sometimes needed to work around
llvm-mc encoding gaps — the assembler doesn't yet support every
65816 addressing mode literally. The pattern works for any
opcode whose mnemonic doesn't yet parse.
Tools reference
| Tool | Location | Purpose |
|---|---|---|
clang |
tools/llvm-mos-build/bin/clang |
C/C++ compiler |
llvm-mc |
tools/llvm-mos-build/bin/llvm-mc |
Assembler |
llvm-objdump |
tools/llvm-mos-build/bin/llvm-objdump |
Disassembler |
llc |
tools/llvm-mos-build/bin/llc |
Standalone codegen (.ll → .s) |
link816 |
tools/link816 |
Our relocating linker |
omfEmit |
tools/omfEmit |
Emit OMF v2.1 binary from link816 output |
mame |
apt (system-wide) |
Apple IIgs emulator |
Debugging
Look at the asm
clang --target=w65816 -O2 -S -o prog.s prog.c
Look at the MIR after each pass
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
Useful pass names to filter on:
| Pass name | What it does |
|---|---|
w65816-isel |
SDAG → MachineInstr selection |
w65816-widen-acc16 |
Promote Acc16 vregs to Wide16 (regalloc help) |
w65816-stack-slot-cleanup |
Remove redundant spill/reload |
w65816-stackrel-to-img |
Promote hot stack slots to DP IMG slots |
w65816-stack-slot-merge |
Collapse PHI src/dst slot pairs |
w65816-branch-expand |
Long-distance Bxx → INV_Bxx skip;BRA |
Single-pass filter
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
Cycle-count benchmarks
Eight microbenchmarks live under benchmarks/.
Each runs N iterations of the bench function and reports a
per-call cycle count via MAME's emu.time():
bash scripts/benchCyclesPrecise.sh
Output:
| Benchmark | Per-call cycles (clang) |
|-----------|------------------------:|
| bsearch | 767 cyc/call |
| dotProduct | 2131 cyc/call |
| fib | 12617 cyc/call |
| memcmp | 989 cyc/call |
| popcount | 2864 cyc/call |
| strcpy | 2216 cyc/call |
| sumOfSquares | 16709 cyc/call |
The compare/ directory has side-by-side .s
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
Rerun with:
bash compare/regen.sh
Known limitations
- C++ exceptions are not implemented.
try/catchcompiles but doesn't unwind.-fsjlj-exceptionsworks for limited SJLJ-style throwing. stdinalways returns EOF.scanfcompiles but isn't useful. Usesscanfon a buffer instead.- File I/O through
fopenetc. requires a backing implementation. The defaultmfsbacking (memory-file-system) lets you simulate files viamfsRegister()— useful for tests, not for real disk I/O. GS/OS file I/O works viaruntime/iigsGsos.oif you link against the GS/OS runtime. fork/exec— not applicable on a 65816, no support.- Code generation gotcha: very large frames (>200 bytes) trigger
FP-relative addressing. Most programs fit under that limit. See
the
frame-reldiscussion in LLVM_65816_DESIGN.md.
Where to go next
- Building real GS/OS apps: see
docs/multiSegmentPlan.mdand therunViaFinder.shscript for booting through real GS/OS 6.0.2 in MAME. - Backend internals (you're hacking on the compiler): LLVM_65816_DESIGN.md.
- Smoke tests:
scripts/smokeTest.shruns ~150 end-to-end checks. Read it for examples of every feature in action.