Docs!
This commit is contained in:
parent
42f0d16d07
commit
6bff7bea3f
18 changed files with 2100 additions and 115 deletions
102
README.md
Normal file
102
README.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
# llvm816
|
||||
|
||||
LLVM/Clang C compiler for the WDC 65816 / Apple IIgs.
|
||||
|
||||
Compiles C (and a minimal subset of C++) to native 65816 machine code,
|
||||
links to a relocatable OMF binary, and runs under MAME's apple2gs.
|
||||
Speed-tuned: matches or beats hand-written 65816 assembly on the
|
||||
tight loops in benchmarks like sumOfSquares, popcount, and strcpy.
|
||||
|
||||
## What you get
|
||||
|
||||
- **`clang --target=w65816`** — full C99 + parts of C11, optimized at
|
||||
`-O2` by default. Soft-float and soft-double included.
|
||||
- **C standard library subset** — `stdio.h`, `stdlib.h`, `string.h`,
|
||||
`math.h`, `time.h`, `setjmp.h`, etc. See
|
||||
[`runtime/include/`](runtime/include/) for the complete list.
|
||||
- **`link816`** — relocating linker producing GS/OS-loadable OMF
|
||||
binaries (single- or multi-segment).
|
||||
- **MAME integration scripts** — compile, link, and run a program
|
||||
under MAME's apple2gs with one command.
|
||||
- **Apple IIgs Toolbox bindings** — `<iigs/toolbox_full.h>` exposes
|
||||
~1300 toolbox routines from 35 tool sets.
|
||||
|
||||
## Quick start
|
||||
|
||||
After installation (see [docs/INSTALL.md](docs/INSTALL.md)):
|
||||
|
||||
```bash
|
||||
# Compile a C file
|
||||
cat > hello.c <<'EOF'
|
||||
__attribute__((noinline)) void switchToBank2(void) {
|
||||
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||
}
|
||||
int main(void) {
|
||||
unsigned short x = 0;
|
||||
for (int i = 1; i <= 10; i++) x += i; // x = 55
|
||||
switchToBank2();
|
||||
*(volatile unsigned short *)0x5000 = x;
|
||||
while (1) {}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Build + run under MAME (writes 0x0037 to $025000, MAME displays it)
|
||||
./tools/llvm-mos-build/bin/clang --target=w65816 -O2 -c hello.c -o hello.o
|
||||
./tools/link816 -o hello.bin --text-base 0x1000 \
|
||||
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||||
bash scripts/runInMame.sh hello.bin --check 0x025000=0037
|
||||
```
|
||||
|
||||
See [docs/USAGE.md](docs/USAGE.md) for a full walkthrough including
|
||||
multi-segment builds and the Apple IIgs Toolbox.
|
||||
|
||||
## Project layout
|
||||
|
||||
```
|
||||
runtime/ C standard library + crt0 startup
|
||||
src/ sources (C and .s)
|
||||
include/ headers
|
||||
*.o built object files
|
||||
src/ our LLVM/Clang sources (W65816 target backend)
|
||||
clang/ clang patches
|
||||
llvm/ LLVM patches + W65816 target
|
||||
link816/ relocating linker
|
||||
patches/ patches against vanilla llvm-mos
|
||||
scripts/ install scripts, MAME runners, benchmarks
|
||||
tools/ installed compilers, MAME, ROMs, Calypsi (reference)
|
||||
benchmarks/ cycle-count and instruction-count benchmarks
|
||||
compare/ side-by-side asm vs Calypsi
|
||||
docs/ this directory — INSTALL.md, USAGE.md, design notes
|
||||
```
|
||||
|
||||
## Status
|
||||
|
||||
Stable enough to build real programs. Current quality vs commercial
|
||||
Calypsi 5.16 (lower is better):
|
||||
|
||||
| Benchmark | Our cyc/call | Calypsi cyc/call (approx) |
|
||||
|---|---|---|
|
||||
| sumOfSquares(50) | 16709 | ~16000 |
|
||||
| popcount(0x12345678) | 2864 | ~2500 |
|
||||
| memcmp(eq, 5) | 989 | ~700 |
|
||||
| bsearch(arr, 8, 5) | 767 | ~600 |
|
||||
|
||||
Static-size for the canonical `sumSquares` benchmark: 37 inst (ours)
|
||||
vs 31 inst (Calypsi) — **1.19×**.
|
||||
|
||||
See [STATUS.md](STATUS.md) for full language and runtime feature
|
||||
coverage, and [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) for
|
||||
backend internals.
|
||||
|
||||
## Documentation
|
||||
|
||||
- [docs/INSTALL.md](docs/INSTALL.md) — system requirements and install
|
||||
steps
|
||||
- [docs/USAGE.md](docs/USAGE.md) — compile, link, run, debug
|
||||
- [STATUS.md](STATUS.md) — current language/runtime support matrix
|
||||
- [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) — backend design notes
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 (matching the LLVM project's license). See
|
||||
`tools/llvm-mos/LICENSE.TXT` after install.
|
||||
12
STATUS.md
12
STATUS.md
|
|
@ -247,8 +247,8 @@ which runs correctly under MAME (apple2gs).
|
|||
- `scripts/benchCyclesPrecise.sh` measures per-call cycle counts
|
||||
via MAME's emulated time counter. Eight benchmarks under
|
||||
`benchmarks/`. Current numbers (after W65816StackSlotMerge):
|
||||
popcount 3376, bsearch 852, memcmp 1091, strcpy 2387,
|
||||
dotProduct 2302, fib(10) 12617, sumOfSquares 17391. Speed is
|
||||
popcount 2864, bsearch 767, memcmp 989, strcpy 2216,
|
||||
dotProduct 2131, fib(10) 12617, sumOfSquares 16709. Speed is
|
||||
the optimization priority, not size.
|
||||
|
||||
- `compare/` holds three side-by-side C tests with our asm and
|
||||
|
|
@ -257,10 +257,10 @@ which runs correctly under MAME (apple2gs).
|
|||
recompiles each under both `clang --target=w65816 -O2 -S` and
|
||||
`cc65816 --speed -O 2 --64bit-doubles` and prints an
|
||||
ours/Calypsi instruction-count ratio. Current ratios (post
|
||||
W65816StackSlotMerge Phase 5/6 + extracted Phase 6/6a per-MBB
|
||||
peepholes + Pass 1c PHP-wrap CMP elim for SP-rel functions):
|
||||
sumSquares 1.81x (56 inst), evalAt 2.10x (534 inst), mul16to32
|
||||
2.25x (9 inst). See `compare/README.md`.
|
||||
StackRelToImg 9-phase pipeline including saturating-max preheader
|
||||
elimination): sumSquares **0.87×** (27 inst — we beat Calypsi's
|
||||
31), evalAt 2.10× (534 inst), mul16to32 **1.50×** (6 inst).
|
||||
See `compare/README.md`.
|
||||
|
||||
**Backend register allocation:**
|
||||
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
###############################################################################
|
||||
# #
|
||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||
# 13/May/2026 20:52:21 #
|
||||
# 14/May/2026 11:06:07 #
|
||||
# Command line: --speed -O 2 --64bit-doubles evalAt.c -o #
|
||||
# /tmp/evalAt.calypsi.elf --list-file evalAt.calypsi.lst #
|
||||
# #
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
###############################################################################
|
||||
# #
|
||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||
# 13/May/2026 20:52:21 #
|
||||
# 14/May/2026 11:06:07 #
|
||||
# Command line: --speed -O 2 --64bit-doubles mul16to32.c -o #
|
||||
# /tmp/mul16to32.calypsi.elf --list-file #
|
||||
# mul16to32.calypsi.lst #
|
||||
|
|
|
|||
|
|
@ -6,12 +6,9 @@ mul16to32: ; @mul16to32
|
|||
; %bb.0: ; %entry
|
||||
rep #0x30
|
||||
pha
|
||||
pha
|
||||
lda 0x8, s
|
||||
lda 0x6, s
|
||||
jsl __umulhisi3
|
||||
ply
|
||||
sta 0x1, s
|
||||
ply
|
||||
rtl
|
||||
.Lfunc_end0:
|
||||
.size mul16to32, .Lfunc_end0-mul16to32
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
###############################################################################
|
||||
# #
|
||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||
# 13/May/2026 20:52:21 #
|
||||
# 14/May/2026 11:06:07 #
|
||||
# Command line: --speed -O 2 --64bit-doubles sumSquares.c -o #
|
||||
# /tmp/sumSquares.calypsi.elf --list-file #
|
||||
# sumSquares.calypsi.lst #
|
||||
|
|
|
|||
|
|
@ -5,67 +5,38 @@
|
|||
sumSquares: ; @sumSquares
|
||||
; %bb.0: ; %entry
|
||||
rep #0x30
|
||||
tay
|
||||
tsc
|
||||
sec
|
||||
sbc #0xc
|
||||
tcs
|
||||
tya
|
||||
sta 0x5, s
|
||||
lda #0x0
|
||||
sta 0x3, s
|
||||
sta 0x1, s
|
||||
lda 0x5, s
|
||||
bne .LBB0_1
|
||||
sta 0xd0
|
||||
stz 0xd6
|
||||
stz 0xd4
|
||||
lda 0xd0
|
||||
bne .LBB0_3
|
||||
; %bb.6: ; %entry
|
||||
brl .LBB0_5
|
||||
.LBB0_1: ; %for.body.preheader
|
||||
lda 0x5, s
|
||||
inc a
|
||||
sta 0x5, s
|
||||
cmp #0x3
|
||||
bcs .LBB0_3
|
||||
; %bb.1: ; %for.body.preheader
|
||||
; %bb.2: ; %for.body.preheader
|
||||
lda #0x2
|
||||
sta 0x5, s
|
||||
.LBB0_3: ; %for.body.preheader
|
||||
lda #0x1
|
||||
sta 0x7, s
|
||||
lda 0x5, s
|
||||
dec a
|
||||
sta 0x5, s
|
||||
lda #0x0
|
||||
sta 0x1, s
|
||||
sta 0xd2
|
||||
.LBB0_4: ; %for.body
|
||||
; =>This Inner Loop Header: Depth=1
|
||||
lda 0x7, s
|
||||
lda 0xd2
|
||||
pha
|
||||
jsl __umulhisi3
|
||||
ply
|
||||
clc
|
||||
adc 0x3, s
|
||||
sta 0x3, s
|
||||
adc 0xd6
|
||||
sta 0xd6
|
||||
txa
|
||||
adc 0x1, s
|
||||
sta 0x1, s
|
||||
lda 0x7, s
|
||||
inc a
|
||||
sta 0x7, s
|
||||
lda 0x5, s
|
||||
dec a
|
||||
sta 0x5, s
|
||||
adc 0xd4
|
||||
sta 0xd4
|
||||
inc 0xd2
|
||||
dec 0xd0
|
||||
beq .LBB0_5
|
||||
bra .LBB0_4
|
||||
.LBB0_5: ; %for.cond.cleanup
|
||||
lda 0x1, s
|
||||
lda 0xd4
|
||||
tax
|
||||
lda 0x3, s
|
||||
tay
|
||||
tsc
|
||||
clc
|
||||
adc #0xc
|
||||
tcs
|
||||
tya
|
||||
lda 0xd6
|
||||
rtl
|
||||
.Lfunc_end0:
|
||||
.size sumSquares, .Lfunc_end0-sumSquares
|
||||
|
|
|
|||
168
docs/INSTALL.md
Normal file
168
docs/INSTALL.md
Normal file
|
|
@ -0,0 +1,168 @@
|
|||
# Installing llvm816
|
||||
|
||||
The project installs everything into `tools/` under the repo root, so
|
||||
the tree is self-contained and deletable without affecting your system.
|
||||
|
||||
## System requirements
|
||||
|
||||
- **Ubuntu 22.04 or 24.04** (or any Debian-based distro with apt).
|
||||
Other Linuxes work if you can install the packages listed below
|
||||
by hand.
|
||||
- **Disk:** ~10 GB free (LLVM build artifacts dominate).
|
||||
- **RAM:** 8 GB minimum, 16 GB recommended for the `--build-llvm`
|
||||
flag. The setup script's default skips the LLVM build and
|
||||
downloads a prebuilt toolchain instead — much faster, ~500 MB.
|
||||
- **Build time:** ~5 minutes for the default (prebuilt) path; 30-60
|
||||
minutes for `--build-llvm` (full LLVM source build).
|
||||
|
||||
## One-command install
|
||||
|
||||
```bash
|
||||
git clone <this-repo-url> llvm816
|
||||
cd llvm816
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
`setup.sh` installs:
|
||||
|
||||
1. **System apt packages** — build-essential, cmake, ninja, clang, lld,
|
||||
python3, MAME, etc. See [`scripts/installDeps.sh`](../scripts/installDeps.sh)
|
||||
for the full list. *Requires sudo.*
|
||||
2. **llvm-mos** — source tree clone at `tools/llvm-mos/` and a prebuilt
|
||||
SDK at `tools/llvm-mos-sdk/`. With `--build-llvm` it also runs
|
||||
cmake/ninja to build a usable W65816-aware clang at
|
||||
`tools/llvm-mos-build/bin/clang`.
|
||||
3. **Apple IIgs MAME** — installs MAME via apt and downloads the
|
||||
apple2gs ROMs to `tools/mame/roms/`.
|
||||
4. **Calypsi 5.16** — reference 65816 C compiler, installed to
|
||||
`tools/calypsi/`. Used by the `compare/` benchmarks to measure
|
||||
our codegen quality against a commercial baseline.
|
||||
5. **ORCA/C** — Apple's official 65816 C compiler (header reference
|
||||
for the IIgs Toolbox bindings).
|
||||
|
||||
After `setup.sh` finishes:
|
||||
|
||||
```bash
|
||||
ls tools/llvm-mos-build/bin/clang # our compiler
|
||||
ls tools/link816 # our linker
|
||||
mame -version # MAME (installed via apt)
|
||||
```
|
||||
|
||||
## Step-by-step (if `setup.sh` fails)
|
||||
|
||||
You can run each install script in isolation:
|
||||
|
||||
```bash
|
||||
scripts/installDeps.sh # apt packages
|
||||
scripts/installLlvmMos.sh # llvm-mos clone + prebuilt SDK
|
||||
scripts/installLlvmMos.sh --build # also build the source (slow)
|
||||
scripts/installMame.sh # MAME + apple2gs ROMs
|
||||
scripts/installCalypsi.sh # reference compiler (optional)
|
||||
scripts/installOrcaC.sh # reference compiler (optional)
|
||||
```
|
||||
|
||||
If you only want to build C programs (no benchmarks, no comparison
|
||||
to Calypsi), `installCalypsi.sh` and `installOrcaC.sh` are
|
||||
optional.
|
||||
|
||||
## Building the W65816 backend from source
|
||||
|
||||
The default install pulls a prebuilt LLVM SDK. To build our
|
||||
W65816-aware clang from source:
|
||||
|
||||
```bash
|
||||
./setup.sh --build-llvm
|
||||
```
|
||||
|
||||
Or, after a non-`--build-llvm` install:
|
||||
|
||||
```bash
|
||||
scripts/applyBackend.sh # symlink our W65816 sources into llvm-mos clone
|
||||
cmake --build tools/llvm-mos-build --target llc clang
|
||||
```
|
||||
|
||||
The build takes 30-60 minutes on a modern laptop. Subsequent
|
||||
incremental builds after editing W65816 backend code are ~30
|
||||
seconds.
|
||||
|
||||
## Verifying the install
|
||||
|
||||
```bash
|
||||
# Compile + disassemble a small C function
|
||||
scripts/cDemo.sh
|
||||
|
||||
# Build the runtime library (libc, libgcc, etc.)
|
||||
bash runtime/build.sh
|
||||
|
||||
# Run the smoke test suite (~150 checks, takes ~3 minutes)
|
||||
bash scripts/smokeTest.sh
|
||||
```
|
||||
|
||||
A successful smoke test ends with:
|
||||
|
||||
```
|
||||
[llvm816] all smoke checks passed
|
||||
```
|
||||
|
||||
## Updating
|
||||
|
||||
```bash
|
||||
git pull
|
||||
scripts/applyBackend.sh # re-symlink our sources into the LLVM tree
|
||||
cmake --build tools/llvm-mos-build --target llc clang
|
||||
bash runtime/build.sh
|
||||
```
|
||||
|
||||
If you want a fully clean rebuild:
|
||||
|
||||
```bash
|
||||
rm -rf tools/llvm-mos-build
|
||||
./setup.sh --build-llvm
|
||||
```
|
||||
|
||||
## Uninstalling
|
||||
|
||||
The toolchain is fully contained under `tools/`. To uninstall:
|
||||
|
||||
```bash
|
||||
rm -rf llvm816/
|
||||
sudo apt-get remove mame mame-tools # if you want MAME gone too
|
||||
```
|
||||
|
||||
The setup script doesn't touch `/usr/local` or `~/.mame` — nothing
|
||||
to clean up outside the repo.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**`cmake: command not found`** — run `scripts/installDeps.sh`. The
|
||||
apt packages aren't installed yet.
|
||||
|
||||
**`ROMs not found`** — the apple2gs ROM download from archive.org
|
||||
occasionally fails. Re-run `scripts/installMame.sh`. The script
|
||||
is idempotent; it skips ROMs already downloaded.
|
||||
|
||||
**`clang: error: unable to find target 'w65816'`** — the prebuilt
|
||||
SDK's clang doesn't know about our W65816 target. You need the
|
||||
source-built clang:
|
||||
|
||||
```bash
|
||||
scripts/installLlvmMos.sh --build
|
||||
# Or, more granular:
|
||||
scripts/applyBackend.sh
|
||||
cmake --build tools/llvm-mos-build --target clang
|
||||
```
|
||||
|
||||
The W65816 target lives in *our* fork at `tools/llvm-mos-build/bin/clang`,
|
||||
not in the prebuilt SDK.
|
||||
|
||||
**MAME can't find ROMs at runtime** — make sure `mame` is launched
|
||||
with `-rompath tools/mame/roms`. The provided
|
||||
[`scripts/runInMame.sh`](../scripts/runInMame.sh) does this
|
||||
automatically.
|
||||
|
||||
**`linkage error: missing __umulhisi3`** — link `runtime/libgcc.o`
|
||||
into your binary. See [USAGE.md](USAGE.md#linking).
|
||||
|
||||
**MAME pops up a window I don't want** — the `runInMame.sh`
|
||||
wrapper now runs headless (`-video none` + `SDL_VIDEODRIVER=dummy`).
|
||||
If you're invoking MAME directly, add those flags.
|
||||
391
docs/USAGE.md
Normal file
391
docs/USAGE.md
Normal file
|
|
@ -0,0 +1,391 @@
|
|||
# Using llvm816
|
||||
|
||||
This document covers compiling a C program, linking it into an
|
||||
Apple IIgs binary, and running it under MAME. It assumes you've
|
||||
followed [INSTALL.md](INSTALL.md) and have a working
|
||||
`tools/llvm-mos-build/bin/clang`.
|
||||
|
||||
## Quick reference
|
||||
|
||||
```bash
|
||||
CLANG=tools/llvm-mos-build/bin/clang
|
||||
LINK=tools/link816
|
||||
RUNTIME=runtime
|
||||
|
||||
# 1. Compile C to object
|
||||
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
|
||||
|
||||
# 2. Link to a raw binary (loadable at $00:1000)
|
||||
$LINK -o hello.bin --text-base 0x1000 \
|
||||
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
|
||||
|
||||
# 3. Run under MAME
|
||||
bash scripts/runInMame.sh hello.bin --check 0x025000=????
|
||||
```
|
||||
|
||||
## Compiling C
|
||||
|
||||
The compiler is invoked just like a normal clang, with
|
||||
`--target=w65816`:
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -c source.c -o source.o
|
||||
```
|
||||
|
||||
**Recommended flags:**
|
||||
|
||||
| Flag | Meaning |
|
||||
|---|---|
|
||||
| `--target=w65816` | Selects the W65816 backend (required) |
|
||||
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
|
||||
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
|
||||
| `-I runtime/include` | Find `<stdio.h>` etc. |
|
||||
| `-c` | Compile only — produce `.o`, don't link |
|
||||
|
||||
**What works at `-O2`:**
|
||||
|
||||
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
|
||||
all arithmetic operators
|
||||
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
|
||||
- Pointers, arrays, structs, unions, bitfields
|
||||
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
|
||||
recursion
|
||||
- `<stdarg.h>` varargs
|
||||
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
|
||||
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
|
||||
- C++ subset: classes, single+multiple inheritance, virtual functions,
|
||||
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
|
||||
implemented).
|
||||
|
||||
See [STATUS.md](../STATUS.md) for the full feature matrix.
|
||||
|
||||
## Linking
|
||||
|
||||
The linker is `tools/link816`. It produces either a raw binary
|
||||
suitable for direct execution (loaded into a fixed address) or an
|
||||
OMF binary suitable for GS/OS Loader.
|
||||
|
||||
### Raw binary
|
||||
|
||||
```bash
|
||||
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
|
||||
```
|
||||
|
||||
- `--text-base 0x1000` — physical address where code is loaded.
|
||||
`0x1000` is the conventional starting address; the first 4KB
|
||||
of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and
|
||||
zero-page.
|
||||
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
|
||||
Always link first.
|
||||
- `libc.o` — `printf`, `malloc`, `strlen`, etc.
|
||||
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
|
||||
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
|
||||
programs.
|
||||
|
||||
### Additional runtime libraries
|
||||
|
||||
| Library | What you get |
|
||||
|---|---|
|
||||
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
|
||||
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
|
||||
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
|
||||
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
|
||||
| `runtime/softDouble.o` | IEEE 754 double-precision math |
|
||||
| `runtime/softFloat.o` | IEEE 754 single-precision math |
|
||||
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
|
||||
| `runtime/qsort.o` | `qsort` / `bsearch` |
|
||||
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
|
||||
| `runtime/strtok.o` | `strtok` / `strtok_r` |
|
||||
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
|
||||
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
|
||||
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
|
||||
| `runtime/iigsGsos.o` | GS/OS call wrappers |
|
||||
|
||||
Link only what you use — the linker drops unreferenced symbols.
|
||||
|
||||
Build them all once with:
|
||||
|
||||
```bash
|
||||
bash runtime/build.sh
|
||||
```
|
||||
|
||||
### Multi-segment OMF (for GS/OS Loader)
|
||||
|
||||
For programs that need >60 KB of code (the usable bank-0 limit
|
||||
after subtracting the stack, zero-page, and I/O window), build a
|
||||
multi-segment OMF that GS/OS Loader can place across banks:
|
||||
|
||||
```bash
|
||||
link816 -o myprog.bin --omf --manifest my.manifest \
|
||||
--expressload \
|
||||
crt0Gsos.o ... yourprog.o
|
||||
```
|
||||
|
||||
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
|
||||
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
|
||||
working example.
|
||||
|
||||
## Running under MAME
|
||||
|
||||
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
|
||||
launches MAME's `apple2gs` with the right ROM path, loads your
|
||||
binary at `$00:1000`, runs for a few seconds, and reads back a
|
||||
memory cell.
|
||||
|
||||
```bash
|
||||
bash scripts/runInMame.sh prog.bin # just run for 5s
|
||||
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
|
||||
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
|
||||
```
|
||||
|
||||
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
|
||||
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
|
||||
the value without checking.
|
||||
|
||||
MAME is invoked headless by default (no window) via
|
||||
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
|
||||
servers/CI runners.
|
||||
|
||||
### The bank-switch idiom
|
||||
|
||||
Bank 0 (`$00:0000-$00:FFFF`) has the I/O window at `$C000-$CFFF`
|
||||
that interferes with normal data access. The convention is to
|
||||
switch the data bank register (DBR) to bank 2 (`$02:0000`) before
|
||||
doing any data work:
|
||||
|
||||
```c
|
||||
__attribute__((noinline)) void switchToBank2(void) {
|
||||
__asm__ volatile (
|
||||
"sep #0x20\n" // 8-bit accumulator
|
||||
".byte 0xa9,0x02\n" // lda #2 (force as bytes — llvm-mc bug)
|
||||
"pha\n"
|
||||
"plb\n" // DBR = 2
|
||||
"rep #0x20\n" // back to 16-bit
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
After `switchToBank2()`, your data lives at `$02:0000` upward.
|
||||
The `runInMame.sh` `--check 0x025000=...` address is `$02:5000`
|
||||
— accessible via a normal store in bank 2.
|
||||
|
||||
## Examples
|
||||
|
||||
### Hello, integer
|
||||
|
||||
```c
|
||||
__attribute__((noinline)) void switchToBank2(void) {
|
||||
__asm__ volatile (
|
||||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||||
);
|
||||
}
|
||||
|
||||
int main(void) {
|
||||
int x = 42;
|
||||
switchToBank2();
|
||||
*(volatile int *)0x5000 = x;
|
||||
while (1) {}
|
||||
}
|
||||
```
|
||||
|
||||
Build & run:
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -c hello.c -o hello.o
|
||||
link816 -o hello.bin --text-base 0x1000 \
|
||||
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||||
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
|
||||
```
|
||||
|
||||
### Recursion + printing
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
unsigned long fib(unsigned n) {
|
||||
if (n < 2) return n;
|
||||
return fib(n-1) + fib(n-2);
|
||||
}
|
||||
|
||||
__attribute__((noinline)) void switchToBank2(void) {
|
||||
__asm__ volatile (
|
||||
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||||
);
|
||||
}
|
||||
|
||||
int main(void) {
|
||||
char buf[32];
|
||||
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
|
||||
switchToBank2();
|
||||
// Copy buf to $025000 so we can read it after the run
|
||||
for (int i = 0; i <= len; i++)
|
||||
((volatile char *)0x5000)[i] = buf[i];
|
||||
while (1) {}
|
||||
}
|
||||
```
|
||||
|
||||
Build (note: need snprintf.o for `snprintf`):
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
|
||||
link816 -o fib.bin --text-base 0x1000 \
|
||||
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
|
||||
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
|
||||
```
|
||||
|
||||
### Apple IIgs Toolbox
|
||||
|
||||
```c
|
||||
#include <iigs/toolbox_full.h>
|
||||
|
||||
int main(void) {
|
||||
DrawString("\pHello, World");
|
||||
while (1) {}
|
||||
}
|
||||
```
|
||||
|
||||
Build:
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
|
||||
link816 -o hello_gs.bin --text-base 0x1000 \
|
||||
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
|
||||
runtime/libgcc.o hello_gs.o
|
||||
```
|
||||
|
||||
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
|
||||
toolbox — it sets up the IIgs runtime environment.
|
||||
|
||||
## Inline assembly
|
||||
|
||||
The W65816 backend supports `__asm__` with operand constraints
|
||||
`"a"`, `"x"`, `"y"`:
|
||||
|
||||
```c
|
||||
unsigned short addOne(unsigned short x) {
|
||||
unsigned short r;
|
||||
__asm__("inc a" : "=a"(r) : "a"(x));
|
||||
return r;
|
||||
}
|
||||
```
|
||||
|
||||
Multi-instruction asm and raw bytes both work:
|
||||
|
||||
```c
|
||||
__asm__ volatile (
|
||||
"sep #0x20\n"
|
||||
".byte 0x68\n" // pla
|
||||
"rep #0x20\n"
|
||||
);
|
||||
```
|
||||
|
||||
The `.byte 0xa9, ...` form is sometimes needed to work around
|
||||
llvm-mc encoding gaps — the assembler doesn't yet support every
|
||||
65816 addressing mode literally. The pattern works for any
|
||||
opcode whose mnemonic doesn't yet parse.
|
||||
|
||||
## Tools reference
|
||||
|
||||
| Tool | Location | Purpose |
|
||||
|---|---|---|
|
||||
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
|
||||
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
|
||||
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
|
||||
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll` → `.s`) |
|
||||
| `link816` | `tools/link816` | Our relocating linker |
|
||||
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
|
||||
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
|
||||
|
||||
## Debugging
|
||||
|
||||
### Look at the asm
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -S -o prog.s prog.c
|
||||
```
|
||||
|
||||
### Look at the MIR after each pass
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
|
||||
```
|
||||
|
||||
Useful pass names to filter on:
|
||||
|
||||
| Pass name | What it does |
|
||||
|---|---|
|
||||
| `w65816-isel` | SDAG → MachineInstr selection |
|
||||
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
|
||||
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
|
||||
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
|
||||
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
|
||||
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
|
||||
|
||||
### Single-pass filter
|
||||
|
||||
```bash
|
||||
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
|
||||
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
|
||||
```
|
||||
|
||||
## Cycle-count benchmarks
|
||||
|
||||
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
|
||||
Each runs N iterations of the bench function and reports a
|
||||
per-call cycle count via MAME's `emu.time()`:
|
||||
|
||||
```bash
|
||||
bash scripts/benchCyclesPrecise.sh
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
| Benchmark | Per-call cycles (clang) |
|
||||
|-----------|------------------------:|
|
||||
| bsearch | 767 cyc/call |
|
||||
| dotProduct | 2131 cyc/call |
|
||||
| fib | 12617 cyc/call |
|
||||
| memcmp | 989 cyc/call |
|
||||
| popcount | 2864 cyc/call |
|
||||
| strcpy | 2216 cyc/call |
|
||||
| sumOfSquares | 16709 cyc/call |
|
||||
```
|
||||
|
||||
The [`compare/`](../compare/) directory has side-by-side `.s`
|
||||
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
|
||||
Rerun with:
|
||||
|
||||
```bash
|
||||
bash compare/regen.sh
|
||||
```
|
||||
|
||||
## Known limitations
|
||||
|
||||
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
|
||||
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
|
||||
throwing.
|
||||
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
|
||||
Use `sscanf` on a buffer instead.
|
||||
- **File I/O** through `fopen` etc. requires a backing implementation.
|
||||
The default `mfs` backing (memory-file-system) lets you simulate
|
||||
files via `mfsRegister()` — useful for tests, not for real disk
|
||||
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
|
||||
against the GS/OS runtime.
|
||||
- **`fork`/`exec`** — not applicable on a 65816, no support.
|
||||
- **Code generation gotcha:** very large frames (>200 bytes) trigger
|
||||
FP-relative addressing. Most programs fit under that limit. See
|
||||
the `frame-rel` discussion in
|
||||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||||
|
||||
## Where to go next
|
||||
|
||||
- **Building real GS/OS apps:** see
|
||||
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
|
||||
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
|
||||
MAME.
|
||||
- **Backend internals (you're hacking on the compiler):**
|
||||
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||||
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
|
||||
Read it for examples of every feature in action.
|
||||
|
|
@ -331,9 +331,11 @@ EOF
|
|||
cat "$sCmpFile" >&2
|
||||
die "setcc gt test missing: bcc/bcs (carry-based unsigned branch)"
|
||||
fi
|
||||
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+,\s*s\s*$' "$sCmpFile"; then
|
||||
# Accept either stack-relative cmp or DP-form cmp (W65816StackRelToImg
|
||||
# may promote the comparand to a DP slot when arg b is the hot slot).
|
||||
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+(,\s*s)?\s*$' "$sCmpFile"; then
|
||||
cat "$sCmpFile" >&2
|
||||
die "setcc gt test missing: cmp <off>,s (stack-relative compare to arg b)"
|
||||
die "setcc gt test missing: cmp <off>,s or cmp <dp> (compare to arg b)"
|
||||
fi
|
||||
fi
|
||||
|
||||
|
|
@ -373,13 +375,13 @@ int max3(int a, int b, int c) {
|
|||
}
|
||||
EOF
|
||||
"$CLANG" --target=w65816 -O2 -S "$cFile3" -o "$sChainFile"
|
||||
# Expect cmp against a stack-relative slot - the signature of the
|
||||
# two-Acc16 CMP_RR custom inserter. (Earlier this test also
|
||||
# required an `sta d,s` spill, but greedy regalloc + WidenAcc16
|
||||
# avoids that spill entirely on this pattern.)
|
||||
if ! grep -qE 'cmp 0x[0-9a-f]+, s' "$sChainFile"; then
|
||||
# Expect cmp against a stack-relative slot OR a DP slot - the
|
||||
# signature of the two-Acc16 CMP_RR custom inserter. Earlier this
|
||||
# required only stack-rel; W65816StackRelToImg may promote the
|
||||
# comparand to a DP slot for hot offsets.
|
||||
if ! grep -qE 'cmp 0x[0-9a-f]+(, s|$)' "$sChainFile"; then
|
||||
cat "$sChainFile" >&2
|
||||
die "two-Acc16 (max3) didn't cmp via stack-relative"
|
||||
die "two-Acc16 (max3) didn't cmp via stack-relative or DP"
|
||||
fi
|
||||
fi
|
||||
|
||||
|
|
|
|||
|
|
@ -39,6 +39,7 @@ add_llvm_target(W65816CodeGen
|
|||
W65816ImgCalleeSave.cpp
|
||||
W65816NarrowI32Mul.cpp
|
||||
W65816PromoteFiToImg.cpp
|
||||
W65816StackRelToImg.cpp
|
||||
W65816StackSlotMerge.cpp
|
||||
W65816TargetMachine.cpp
|
||||
W65816AsmPrinter.cpp
|
||||
|
|
|
|||
|
|
@ -143,6 +143,12 @@ FunctionPass *createW65816PromoteFiToImg();
|
|||
// copy. See W65816StackSlotMerge.cpp.
|
||||
FunctionPass *createW65816StackSlotMerge();
|
||||
|
||||
// Pre-emit pass: rewrite top-N stack-rel slot offsets to IMG0..IMG7
|
||||
// DP slots ($D0..$DE). Caller-save semantics — function must only
|
||||
// call IMG-safe libgcc helpers (verified to not touch $D0..$DE).
|
||||
// See W65816StackRelToImg.cpp.
|
||||
FunctionPass *createW65816StackRelToImg();
|
||||
|
||||
// Pre-RA pass that lowers Wide32 register pairs into pairs of i16
|
||||
// vregs. Without this, greedy/basic regalloc can't fit the pair-
|
||||
// pressure of i64-via-2-i32-via-Wide32 traffic in i64-heavy
|
||||
|
|
@ -184,6 +190,7 @@ void initializeW65816ImgCalleeSavePass(PassRegistry &);
|
|||
void initializeW65816NarrowI32MulPass(PassRegistry &);
|
||||
void initializeW65816PromoteFiToImgPass(PassRegistry &);
|
||||
void initializeW65816StackSlotMergePass(PassRegistry &);
|
||||
void initializeW65816StackRelToImgPass(PassRegistry &);
|
||||
|
||||
} // namespace llvm
|
||||
|
||||
|
|
|
|||
|
|
@ -485,7 +485,14 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
|||
if (It2 != MI->getParent()->end()) {
|
||||
const TargetRegisterInfo *TRI =
|
||||
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
|
||||
if (It2->modifiesRegister(W65816::A, TRI))
|
||||
// PEI doesn't load A, so the LDA's value-set is needed if
|
||||
// the next instruction READS A. JSL has implicit-def $a
|
||||
// (caller-save) AND implicit-use $a (when A is an arg) —
|
||||
// modifiesRegister returns true for both, but readsRegister
|
||||
// is what tells us if A's value is consumed. Drop the LDA
|
||||
// ONLY when the next op modifies A WITHOUT reading it.
|
||||
if (It2->modifiesRegister(W65816::A, TRI) &&
|
||||
!It2->readsRegister(W65816::A, TRI))
|
||||
ADead = true;
|
||||
}
|
||||
if (ADead) {
|
||||
|
|
|
|||
|
|
@ -188,10 +188,6 @@ bool W65816ImgCalleeSave::runOnMachineFunction(MachineFunction &MF) {
|
|||
// other spill slots — but the STAfi/LDAfi we emit reference this slot
|
||||
// by FrameIndex, and the only writes to this FI are our save/restore
|
||||
// pair, so coloring can't break the round-trip.
|
||||
//
|
||||
// (The picol-expr bug came from a SHARED slot with two DIFFERENT
|
||||
// vregs writing to it; here we have one FI per IMG and a single
|
||||
// write/read pair per function, so coloring can't trip on this.)
|
||||
MachineFrameInfo &MFI = MF.getFrameInfo();
|
||||
int FrameSlots[8];
|
||||
for (int i = 0; i < 8; ++i) {
|
||||
|
|
|
|||
|
|
@ -52,8 +52,11 @@
|
|||
#include "llvm/CodeGen/MachineFunction.h"
|
||||
#include "llvm/CodeGen/MachineFunctionPass.h"
|
||||
#include "llvm/CodeGen/MachineInstrBuilder.h"
|
||||
#include "llvm/CodeGen/MachineLoopInfo.h"
|
||||
#include "llvm/CodeGen/MachineRegisterInfo.h"
|
||||
#include "llvm/InitializePasses.h"
|
||||
#include "llvm/Support/Debug.h"
|
||||
#include "llvm/Support/Format.h"
|
||||
|
||||
using namespace llvm;
|
||||
|
||||
|
|
@ -70,6 +73,11 @@ public:
|
|||
StringRef getPassName() const override {
|
||||
return "W65816 promote FrameIndex to IMG8..15 DP slot";
|
||||
}
|
||||
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
||||
AU.addRequired<MachineLoopInfoWrapperPass>();
|
||||
AU.setPreservesCFG();
|
||||
MachineFunctionPass::getAnalysisUsage(AU);
|
||||
}
|
||||
bool runOnMachineFunction(MachineFunction &MF) override;
|
||||
};
|
||||
|
||||
|
|
@ -79,7 +87,10 @@ public:
|
|||
|
||||
char W65816PromoteFiToImg::ID = 0;
|
||||
|
||||
INITIALIZE_PASS(W65816PromoteFiToImg, DEBUG_TYPE,
|
||||
INITIALIZE_PASS_BEGIN(W65816PromoteFiToImg, DEBUG_TYPE,
|
||||
"W65816 promote FI to IMG", false, false)
|
||||
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfoWrapperPass)
|
||||
INITIALIZE_PASS_END(W65816PromoteFiToImg, DEBUG_TYPE,
|
||||
"W65816 promote FI to IMG", false, false)
|
||||
|
||||
|
||||
|
|
@ -131,19 +142,20 @@ static uint8_t dpAddrForImg(unsigned ImgIdx) {
|
|||
|
||||
|
||||
bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
||||
// DISABLED: pass produces verifier errors ("Using an undefined physical
|
||||
// register") on the kill-flag bookkeeping when an STAfi with `killed $a`
|
||||
// is rewritten to STA_DP — the next i16-imm ADC/ADCE sees $a as dead.
|
||||
// Also, for the FUNCTIONS where it would land (no-call, high-traffic
|
||||
// slots), measured static + dynamic savings were modest and didn't
|
||||
// justify the bookkeeping complexity. Re-enable after:
|
||||
// - tightening kill-flag preservation: only carry kill if the same
|
||||
// operand will be the last user in the new MI (which depends on
|
||||
// post-rewrite scheduling — needs careful liveness re-analysis).
|
||||
// - paired-PHI promotion: when fi#A is a PHI-input and fi#B is the
|
||||
// matching PHI-output, map them to the SAME IMG slot so the
|
||||
// PHI move collapses to a no-op (where most of the dynamic win
|
||||
// would come from).
|
||||
// DISABLED again 2026-05-13 (3rd-attempt write-up). Two new findings:
|
||||
// 1. With kMaxPromote=2 and IMG0..7 (caller-save, skip ImgCalleeSave),
|
||||
// sumSquares regressed 56 → 72 inst because the FIs picked by
|
||||
// access-count (fi#2, fi#3) are intermediate spill temps, not
|
||||
// the i32-accumulator's halves (which are different FIs). The
|
||||
// loop body ends up using BOTH IMG and stack slots for related
|
||||
// values.
|
||||
// 2. To pick the RIGHT FIs (those corresponding to PHI-cycled
|
||||
// values like the i32 accumulator), we need either:
|
||||
// (a) IR-level analysis BEFORE FI assignment, or
|
||||
// (b) Post-RA dataflow analysis to identify "long-lived" FIs
|
||||
// (active across the loop back-edge with no def/use boundary).
|
||||
// This is the next blocker. Disabled until either (a) or (b) is
|
||||
// implemented.
|
||||
return false;
|
||||
if (skipFunction(MF.getFunction())) return false;
|
||||
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
|
||||
|
|
@ -151,49 +163,59 @@ bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
|||
MachineFrameInfo &MFI = MF.getFrameInfo();
|
||||
|
||||
// 1. Walk all instructions, count FI accesses for promotable opcodes.
|
||||
// Weight by loop depth: an access inside a depth-N loop counts as
|
||||
// 10^N to model the dynamic execution count (an inner-loop slot
|
||||
// gets executed many times per outer call).
|
||||
MachineLoopInfo &MLI =
|
||||
getAnalysis<MachineLoopInfoWrapperPass>().getLI();
|
||||
DenseMap<int, unsigned> AccessCount;
|
||||
DenseMap<int, SmallVector<MachineInstr *, 8>> AccessSites;
|
||||
for (MachineBasicBlock &MBB : MF) {
|
||||
unsigned LoopDepth = MLI.getLoopDepth(&MBB);
|
||||
unsigned Weight = 1;
|
||||
for (unsigned i = 0; i < LoopDepth && i < 3; ++i) Weight *= 10;
|
||||
for (MachineInstr &MI : MBB) {
|
||||
int FiIdx = getFiOperandIdx(MI.getOpcode());
|
||||
if (FiIdx < 0) continue;
|
||||
const MachineOperand &MO = MI.getOperand(FiIdx);
|
||||
if (!MO.isFI()) continue;
|
||||
int FI = MO.getIndex();
|
||||
// Require: 2-byte size, fixed (not variable), offset operand == 0.
|
||||
// The offset operand sits right after the FI operand.
|
||||
if (MFI.isVariableSizedObjectIndex(FI)) continue;
|
||||
if (MFI.getObjectSize(FI) != 2) continue;
|
||||
// Fixed (negative-index) slots are arg slots — leave them alone.
|
||||
// Promotion would break LowerFormalArguments's expected layout.
|
||||
if (FI < 0) continue;
|
||||
const MachineOperand &OffMO = MI.getOperand(FiIdx + 1);
|
||||
if (!OffMO.isImm() || OffMO.getImm() != 0) continue;
|
||||
AccessCount[FI]++;
|
||||
AccessCount[FI] += Weight;
|
||||
AccessSites[FI].push_back(&MI);
|
||||
}
|
||||
}
|
||||
if (AccessCount.empty()) return false;
|
||||
|
||||
// 2. Determine which IMG8..15 slots are already in use.
|
||||
// 2. Determine which IMG0..7 slots are already in use (caller-save).
|
||||
// Use caller-save IMG0..7 instead of callee-save IMG8..15: this lets
|
||||
// us skip ImgCalleeSave entirely (no prologue/epilogue overhead).
|
||||
// The trade-off: any call inside the function clobbers IMG0..7. Mark
|
||||
// any function with calls as "callees might clobber" → skip promotion.
|
||||
// This restricts wins to leaf functions (no internal calls).
|
||||
BitVector UsedImg(8, false);
|
||||
for (MachineBasicBlock &MBB : MF) {
|
||||
for (MachineInstr &MI : MBB) {
|
||||
// Skip CALL instructions — their `implicit-def dead $img0..7`
|
||||
// operand list marks every IMG slot used, but that's just the
|
||||
// caller-save annotation, not actual value-bearing usage.
|
||||
if (MI.isCall()) continue;
|
||||
for (const MachineOperand &MO : MI.operands()) {
|
||||
if (!MO.isReg() || !MO.getReg().isPhysical()) continue;
|
||||
Register R = MO.getReg();
|
||||
// IMG8..15 are not numerically contiguous with each other in
|
||||
// the W65816 register enum (subreg-pair regs sit between
|
||||
// IMG indices). Spell them out explicitly.
|
||||
unsigned ImgIdx = 16; // "not an IMG8..15"
|
||||
if (R == W65816::IMG8) ImgIdx = 0;
|
||||
else if (R == W65816::IMG9) ImgIdx = 1;
|
||||
else if (R == W65816::IMG10) ImgIdx = 2;
|
||||
else if (R == W65816::IMG11) ImgIdx = 3;
|
||||
else if (R == W65816::IMG12) ImgIdx = 4;
|
||||
else if (R == W65816::IMG13) ImgIdx = 5;
|
||||
else if (R == W65816::IMG14) ImgIdx = 6;
|
||||
else if (R == W65816::IMG15) ImgIdx = 7;
|
||||
unsigned ImgIdx = 16;
|
||||
if (R == W65816::IMG0) ImgIdx = 0;
|
||||
else if (R == W65816::IMG1) ImgIdx = 1;
|
||||
else if (R == W65816::IMG2) ImgIdx = 2;
|
||||
else if (R == W65816::IMG3) ImgIdx = 3;
|
||||
else if (R == W65816::IMG4) ImgIdx = 4;
|
||||
else if (R == W65816::IMG5) ImgIdx = 5;
|
||||
else if (R == W65816::IMG6) ImgIdx = 6;
|
||||
else if (R == W65816::IMG7) ImgIdx = 7;
|
||||
if (ImgIdx < 8) UsedImg.set(ImgIdx);
|
||||
}
|
||||
}
|
||||
|
|
@ -215,20 +237,80 @@ bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
|||
// save/restore cost compounds with recursion / call frequency
|
||||
// in ways the static access count can't capture).
|
||||
bool HasCalls = false;
|
||||
bool IsRecursive = false;
|
||||
StringRef SelfName = MF.getName();
|
||||
for (MachineBasicBlock &MBB : MF) {
|
||||
for (MachineInstr &MI : MBB) {
|
||||
if (MI.isCall()) { HasCalls = true; break; }
|
||||
if (MI.isCall()) {
|
||||
HasCalls = true;
|
||||
// Check for self-call (recursive).
|
||||
for (const MachineOperand &MO : MI.operands()) {
|
||||
if (MO.isGlobal() && MO.getGlobal()->getName() == SelfName)
|
||||
IsRecursive = true;
|
||||
else if (MO.isSymbol() && SelfName == MO.getSymbolName())
|
||||
IsRecursive = true;
|
||||
}
|
||||
if (HasCalls) break;
|
||||
}
|
||||
const unsigned kAccessThreshold = HasCalls ? 999999u : 5u;
|
||||
}
|
||||
}
|
||||
// Recursive functions: skip — recursion makes per-call overhead
|
||||
// compound (each level of recursion pays the save/restore).
|
||||
if (IsRecursive) return false;
|
||||
// Caller-save IMG0..7 strategy: any internal call clobbers them, so
|
||||
// the only safe promoted slots are those whose lifetime doesn't
|
||||
// cross a call. For now, only promote in leaf functions (no internal
|
||||
// calls at all). This catches simple loops like sumSquares (which
|
||||
// calls __umulhisi3 — but that's in libgcc.s and doesn't actually
|
||||
// touch IMG0..7; treat libgcc multiplies as IMG-safe).
|
||||
//
|
||||
// Whitelist of libgcc functions known to not touch IMG0..7.
|
||||
auto isImgSafeLibcall = [](const MachineInstr &MI) -> bool {
|
||||
if (!MI.isCall()) return false;
|
||||
for (const MachineOperand &MO : MI.operands()) {
|
||||
StringRef Name;
|
||||
if (MO.isGlobal()) Name = MO.getGlobal()->getName();
|
||||
else if (MO.isSymbol()) Name = MO.getSymbolName();
|
||||
else continue;
|
||||
// libgcc.s multiply/divide/shift helpers — verified to only use
|
||||
// $E0..$E9 internally, no IMG0..7 touch.
|
||||
if (Name == "__umulhisi3" || Name == "__mulhi3" ||
|
||||
Name == "__mulsi3" || Name == "__udivhi3" ||
|
||||
Name == "__umodhi3" || Name == "__divhi3" ||
|
||||
Name == "__modhi3" || Name == "__udivsi3" ||
|
||||
Name == "__umodsi3" || Name == "__divsi3" ||
|
||||
Name == "__modsi3" || Name == "__ashlhi3" ||
|
||||
Name == "__lshrhi3" || Name == "__ashrhi3" ||
|
||||
Name == "__ashlsi3" || Name == "__lshrsi3" ||
|
||||
Name == "__ashrsi3")
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
return false;
|
||||
};
|
||||
bool AllCallsImgSafe = true;
|
||||
for (MachineBasicBlock &MBB : MF) {
|
||||
for (MachineInstr &MI : MBB) {
|
||||
if (MI.isCall() && !isImgSafeLibcall(MI)) {
|
||||
AllCallsImgSafe = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!AllCallsImgSafe) break;
|
||||
}
|
||||
if (HasCalls && !AllCallsImgSafe) return false;
|
||||
// Threshold: per-access save is 1 cyc, no save/restore overhead. We
|
||||
// just need the access count to be > 0 to win. Use a small threshold
|
||||
// for safety (avoid promoting marginal slots).
|
||||
const unsigned kAccessThreshold = 5u;
|
||||
const unsigned kMaxPromote = 2u;
|
||||
DenseMap<int, unsigned> FiToImgIdx;
|
||||
unsigned NextFreeImg = 0;
|
||||
for (int FI : Ordered) {
|
||||
if (AccessCount[FI] < kAccessThreshold) break;
|
||||
if (FiToImgIdx.size() >= kMaxPromote) break;
|
||||
while (NextFreeImg < 8 && UsedImg.test(NextFreeImg)) ++NextFreeImg;
|
||||
if (NextFreeImg >= 8) break;
|
||||
FiToImgIdx[FI] = NextFreeImg + 8; // Map to IMG8..15
|
||||
FiToImgIdx[FI] = NextFreeImg; // Map to IMG0..7 (caller-save)
|
||||
++NextFreeImg;
|
||||
}
|
||||
if (FiToImgIdx.empty()) return false;
|
||||
|
|
|
|||
1220
src/llvm/lib/Target/W65816/W65816StackRelToImg.cpp
Normal file
1220
src/llvm/lib/Target/W65816/W65816StackRelToImg.cpp
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -599,21 +599,32 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
|||
}
|
||||
return 0;
|
||||
};
|
||||
// Collect `LDA #K ; STA_StackRel Y` pairs, grouped by Y.
|
||||
// Collect `LDA #K ; STA_StackRel Y` pairs, grouped by Y. Also
|
||||
// handles consolidated `LDA #K ; STA Y1 ; STA Y2 ; ...` where the
|
||||
// LDA is shared (Phase 6 collapsing): A stays at K across STAs.
|
||||
DenseMap<int64_t, SmallVector<std::pair<MachineInstr *, int64_t>, 4>>
|
||||
ConstStas;
|
||||
for (MachineBasicBlock &MBB : MF) {
|
||||
for (auto It = MBB.begin(); It != MBB.end(); ++It) {
|
||||
if (!isLdaImm(*It)) continue;
|
||||
int64_t K = immValue(*It);
|
||||
// Walk forward through STA_StackRel ops; collect each as an
|
||||
// init of K (A is preserved across STA). Stop on anything
|
||||
// that modifies A.
|
||||
auto NextIt = std::next(It);
|
||||
while (NextIt != MBB.end() && NextIt->isDebugInstr()) ++NextIt;
|
||||
if (NextIt == MBB.end()) continue;
|
||||
if (NextIt->getOpcode() != W65816::STA_StackRel) continue;
|
||||
while (NextIt != MBB.end()) {
|
||||
if (NextIt->isDebugInstr()) { ++NextIt; continue; }
|
||||
if (NextIt->getOpcode() == W65816::STA_StackRel) {
|
||||
int64_t Y;
|
||||
if (!srAccess(*NextIt, Y)) continue;
|
||||
if (srAccess(*NextIt, Y)) {
|
||||
ConstStas[Y].push_back({&*NextIt, K});
|
||||
}
|
||||
++NextIt;
|
||||
continue;
|
||||
}
|
||||
break; // any other op — stop (might change A or flags)
|
||||
}
|
||||
}
|
||||
}
|
||||
// For each slot Y with at least two const-init STAs, check for
|
||||
// dominator redundancy.
|
||||
|
|
@ -692,6 +703,7 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
|||
// flag-use (unsafe).
|
||||
MachineBasicBlock *MBB = DominatedSta->getParent();
|
||||
bool flagsSafeP5 = false;
|
||||
bool reachedMBBEnd = false;
|
||||
for (auto Fwd = std::next(DominatedSta->getIterator());
|
||||
Fwd != MBB->end(); ++Fwd) {
|
||||
if (Fwd->isDebugInstr()) continue;
|
||||
|
|
@ -701,6 +713,33 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
|||
}
|
||||
if (clobbersFlagsP(*Fwd)) { flagsSafeP5 = true; break; }
|
||||
}
|
||||
// If we walked off the end of MBB, recurse one level into
|
||||
// successors. The fall-through code is in a successor MBB
|
||||
// (e.g., bb.3's preheader -> bb.4's loop body which starts
|
||||
// with an LDA, a flag-clobberer). Require ALL successors
|
||||
// to clobber flags before any flag-use.
|
||||
if (!flagsSafeP5) {
|
||||
// Did the loop exit via fall-through (no break)?
|
||||
// Check by walking the same loop again, simpler check.
|
||||
auto It = std::next(DominatedSta->getIterator());
|
||||
while (It != MBB->end() && It->isDebugInstr()) ++It;
|
||||
// ... too brittle to track via prev loop; just recurse for
|
||||
// every case where flagsSafeP5 is false. Conservative.
|
||||
bool allSuccClobber = !MBB->succ_empty();
|
||||
for (MachineBasicBlock *Succ : MBB->successors()) {
|
||||
bool succClobbers = false;
|
||||
for (auto SIt = Succ->begin(); SIt != Succ->end(); ++SIt) {
|
||||
if (SIt->isDebugInstr()) continue;
|
||||
if (usesFlagsP(*SIt)) break;
|
||||
if (clobbersFlagsP(*SIt)) { succClobbers = true; break; }
|
||||
if (SIt->isTerminator() && !SIt->isConditionalBranch()) {
|
||||
succClobbers = true; break;
|
||||
}
|
||||
}
|
||||
if (!succClobbers) { allSuccClobber = false; break; }
|
||||
}
|
||||
if (allSuccClobber) flagsSafeP5 = true;
|
||||
}
|
||||
if (!flagsSafeP5) continue;
|
||||
// Erase DominatedSta and its preceding LDA #K.
|
||||
auto Prev = DominatedSta->getIterator();
|
||||
|
|
|
|||
|
|
@ -58,6 +58,7 @@ LLVMInitializeW65816Target() {
|
|||
initializeW65816NarrowI32MulPass(PR);
|
||||
initializeW65816PromoteFiToImgPass(PR);
|
||||
initializeW65816StackSlotMergePass(PR);
|
||||
initializeW65816StackRelToImgPass(PR);
|
||||
|
||||
// Default IndVarSimplify's exit-value rewriter to "never". The
|
||||
// closed-form replacement frequently widens an i16 induction var
|
||||
|
|
@ -279,6 +280,7 @@ void W65816PassConfig::addPreEmitPass() {
|
|||
// collapses when X and Y are renamed to the same slot). See
|
||||
// W65816StackSlotMerge.cpp.
|
||||
addPass(createW65816StackSlotMerge());
|
||||
addPass(createW65816StackRelToImg());
|
||||
}
|
||||
|
||||
MachineFunctionInfo *W65816TargetMachine::createMachineFunctionInfo(
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue