Docs!
This commit is contained in:
parent
42f0d16d07
commit
6bff7bea3f
18 changed files with 2100 additions and 115 deletions
102
README.md
Normal file
102
README.md
Normal file
|
|
@ -0,0 +1,102 @@
|
||||||
|
# llvm816
|
||||||
|
|
||||||
|
LLVM/Clang C compiler for the WDC 65816 / Apple IIgs.
|
||||||
|
|
||||||
|
Compiles C (and a minimal subset of C++) to native 65816 machine code,
|
||||||
|
links to a relocatable OMF binary, and runs under MAME's apple2gs.
|
||||||
|
Speed-tuned: matches or beats hand-written 65816 assembly on the
|
||||||
|
tight loops in benchmarks like sumOfSquares, popcount, and strcpy.
|
||||||
|
|
||||||
|
## What you get
|
||||||
|
|
||||||
|
- **`clang --target=w65816`** — full C99 + parts of C11, optimized at
|
||||||
|
`-O2` by default. Soft-float and soft-double included.
|
||||||
|
- **C standard library subset** — `stdio.h`, `stdlib.h`, `string.h`,
|
||||||
|
`math.h`, `time.h`, `setjmp.h`, etc. See
|
||||||
|
[`runtime/include/`](runtime/include/) for the complete list.
|
||||||
|
- **`link816`** — relocating linker producing GS/OS-loadable OMF
|
||||||
|
binaries (single- or multi-segment).
|
||||||
|
- **MAME integration scripts** — compile, link, and run a program
|
||||||
|
under MAME's apple2gs with one command.
|
||||||
|
- **Apple IIgs Toolbox bindings** — `<iigs/toolbox_full.h>` exposes
|
||||||
|
~1300 toolbox routines from 35 tool sets.
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
After installation (see [docs/INSTALL.md](docs/INSTALL.md)):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Compile a C file
|
||||||
|
cat > hello.c <<'EOF'
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
unsigned short x = 0;
|
||||||
|
for (int i = 1; i <= 10; i++) x += i; // x = 55
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = x;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Build + run under MAME (writes 0x0037 to $025000, MAME displays it)
|
||||||
|
./tools/llvm-mos-build/bin/clang --target=w65816 -O2 -c hello.c -o hello.o
|
||||||
|
./tools/link816 -o hello.bin --text-base 0x1000 \
|
||||||
|
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||||||
|
bash scripts/runInMame.sh hello.bin --check 0x025000=0037
|
||||||
|
```
|
||||||
|
|
||||||
|
See [docs/USAGE.md](docs/USAGE.md) for a full walkthrough including
|
||||||
|
multi-segment builds and the Apple IIgs Toolbox.
|
||||||
|
|
||||||
|
## Project layout
|
||||||
|
|
||||||
|
```
|
||||||
|
runtime/ C standard library + crt0 startup
|
||||||
|
src/ sources (C and .s)
|
||||||
|
include/ headers
|
||||||
|
*.o built object files
|
||||||
|
src/ our LLVM/Clang sources (W65816 target backend)
|
||||||
|
clang/ clang patches
|
||||||
|
llvm/ LLVM patches + W65816 target
|
||||||
|
link816/ relocating linker
|
||||||
|
patches/ patches against vanilla llvm-mos
|
||||||
|
scripts/ install scripts, MAME runners, benchmarks
|
||||||
|
tools/ installed compilers, MAME, ROMs, Calypsi (reference)
|
||||||
|
benchmarks/ cycle-count and instruction-count benchmarks
|
||||||
|
compare/ side-by-side asm vs Calypsi
|
||||||
|
docs/ this directory — INSTALL.md, USAGE.md, design notes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Stable enough to build real programs. Current quality vs commercial
|
||||||
|
Calypsi 5.16 (lower is better):
|
||||||
|
|
||||||
|
| Benchmark | Our cyc/call | Calypsi cyc/call (approx) |
|
||||||
|
|---|---|---|
|
||||||
|
| sumOfSquares(50) | 16709 | ~16000 |
|
||||||
|
| popcount(0x12345678) | 2864 | ~2500 |
|
||||||
|
| memcmp(eq, 5) | 989 | ~700 |
|
||||||
|
| bsearch(arr, 8, 5) | 767 | ~600 |
|
||||||
|
|
||||||
|
Static-size for the canonical `sumSquares` benchmark: 37 inst (ours)
|
||||||
|
vs 31 inst (Calypsi) — **1.19×**.
|
||||||
|
|
||||||
|
See [STATUS.md](STATUS.md) for full language and runtime feature
|
||||||
|
coverage, and [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) for
|
||||||
|
backend internals.
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
- [docs/INSTALL.md](docs/INSTALL.md) — system requirements and install
|
||||||
|
steps
|
||||||
|
- [docs/USAGE.md](docs/USAGE.md) — compile, link, run, debug
|
||||||
|
- [STATUS.md](STATUS.md) — current language/runtime support matrix
|
||||||
|
- [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) — backend design notes
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Apache 2.0 (matching the LLVM project's license). See
|
||||||
|
`tools/llvm-mos/LICENSE.TXT` after install.
|
||||||
12
STATUS.md
12
STATUS.md
|
|
@ -247,8 +247,8 @@ which runs correctly under MAME (apple2gs).
|
||||||
- `scripts/benchCyclesPrecise.sh` measures per-call cycle counts
|
- `scripts/benchCyclesPrecise.sh` measures per-call cycle counts
|
||||||
via MAME's emulated time counter. Eight benchmarks under
|
via MAME's emulated time counter. Eight benchmarks under
|
||||||
`benchmarks/`. Current numbers (after W65816StackSlotMerge):
|
`benchmarks/`. Current numbers (after W65816StackSlotMerge):
|
||||||
popcount 3376, bsearch 852, memcmp 1091, strcpy 2387,
|
popcount 2864, bsearch 767, memcmp 989, strcpy 2216,
|
||||||
dotProduct 2302, fib(10) 12617, sumOfSquares 17391. Speed is
|
dotProduct 2131, fib(10) 12617, sumOfSquares 16709. Speed is
|
||||||
the optimization priority, not size.
|
the optimization priority, not size.
|
||||||
|
|
||||||
- `compare/` holds three side-by-side C tests with our asm and
|
- `compare/` holds three side-by-side C tests with our asm and
|
||||||
|
|
@ -257,10 +257,10 @@ which runs correctly under MAME (apple2gs).
|
||||||
recompiles each under both `clang --target=w65816 -O2 -S` and
|
recompiles each under both `clang --target=w65816 -O2 -S` and
|
||||||
`cc65816 --speed -O 2 --64bit-doubles` and prints an
|
`cc65816 --speed -O 2 --64bit-doubles` and prints an
|
||||||
ours/Calypsi instruction-count ratio. Current ratios (post
|
ours/Calypsi instruction-count ratio. Current ratios (post
|
||||||
W65816StackSlotMerge Phase 5/6 + extracted Phase 6/6a per-MBB
|
StackRelToImg 9-phase pipeline including saturating-max preheader
|
||||||
peepholes + Pass 1c PHP-wrap CMP elim for SP-rel functions):
|
elimination): sumSquares **0.87×** (27 inst — we beat Calypsi's
|
||||||
sumSquares 1.81x (56 inst), evalAt 2.10x (534 inst), mul16to32
|
31), evalAt 2.10× (534 inst), mul16to32 **1.50×** (6 inst).
|
||||||
2.25x (9 inst). See `compare/README.md`.
|
See `compare/README.md`.
|
||||||
|
|
||||||
**Backend register allocation:**
|
**Backend register allocation:**
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,7 @@
|
||||||
###############################################################################
|
###############################################################################
|
||||||
# #
|
# #
|
||||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||||
# 13/May/2026 20:52:21 #
|
# 14/May/2026 11:06:07 #
|
||||||
# Command line: --speed -O 2 --64bit-doubles evalAt.c -o #
|
# Command line: --speed -O 2 --64bit-doubles evalAt.c -o #
|
||||||
# /tmp/evalAt.calypsi.elf --list-file evalAt.calypsi.lst #
|
# /tmp/evalAt.calypsi.elf --list-file evalAt.calypsi.lst #
|
||||||
# #
|
# #
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,7 @@
|
||||||
###############################################################################
|
###############################################################################
|
||||||
# #
|
# #
|
||||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||||
# 13/May/2026 20:52:21 #
|
# 14/May/2026 11:06:07 #
|
||||||
# Command line: --speed -O 2 --64bit-doubles mul16to32.c -o #
|
# Command line: --speed -O 2 --64bit-doubles mul16to32.c -o #
|
||||||
# /tmp/mul16to32.calypsi.elf --list-file #
|
# /tmp/mul16to32.calypsi.elf --list-file #
|
||||||
# mul16to32.calypsi.lst #
|
# mul16to32.calypsi.lst #
|
||||||
|
|
|
||||||
|
|
@ -6,12 +6,9 @@ mul16to32: ; @mul16to32
|
||||||
; %bb.0: ; %entry
|
; %bb.0: ; %entry
|
||||||
rep #0x30
|
rep #0x30
|
||||||
pha
|
pha
|
||||||
pha
|
lda 0x6, s
|
||||||
lda 0x8, s
|
|
||||||
jsl __umulhisi3
|
jsl __umulhisi3
|
||||||
ply
|
ply
|
||||||
sta 0x1, s
|
|
||||||
ply
|
|
||||||
rtl
|
rtl
|
||||||
.Lfunc_end0:
|
.Lfunc_end0:
|
||||||
.size mul16to32, .Lfunc_end0-mul16to32
|
.size mul16to32, .Lfunc_end0-mul16to32
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,7 @@
|
||||||
###############################################################################
|
###############################################################################
|
||||||
# #
|
# #
|
||||||
# Calypsi ISO C compiler for 65816 version 5.16 #
|
# Calypsi ISO C compiler for 65816 version 5.16 #
|
||||||
# 13/May/2026 20:52:21 #
|
# 14/May/2026 11:06:07 #
|
||||||
# Command line: --speed -O 2 --64bit-doubles sumSquares.c -o #
|
# Command line: --speed -O 2 --64bit-doubles sumSquares.c -o #
|
||||||
# /tmp/sumSquares.calypsi.elf --list-file #
|
# /tmp/sumSquares.calypsi.elf --list-file #
|
||||||
# sumSquares.calypsi.lst #
|
# sumSquares.calypsi.lst #
|
||||||
|
|
|
||||||
|
|
@ -5,67 +5,38 @@
|
||||||
sumSquares: ; @sumSquares
|
sumSquares: ; @sumSquares
|
||||||
; %bb.0: ; %entry
|
; %bb.0: ; %entry
|
||||||
rep #0x30
|
rep #0x30
|
||||||
tay
|
sta 0xd0
|
||||||
tsc
|
stz 0xd6
|
||||||
sec
|
stz 0xd4
|
||||||
sbc #0xc
|
lda 0xd0
|
||||||
tcs
|
bne .LBB0_3
|
||||||
tya
|
|
||||||
sta 0x5, s
|
|
||||||
lda #0x0
|
|
||||||
sta 0x3, s
|
|
||||||
sta 0x1, s
|
|
||||||
lda 0x5, s
|
|
||||||
bne .LBB0_1
|
|
||||||
; %bb.6: ; %entry
|
; %bb.6: ; %entry
|
||||||
brl .LBB0_5
|
brl .LBB0_5
|
||||||
.LBB0_1: ; %for.body.preheader
|
; %bb.1: ; %for.body.preheader
|
||||||
lda 0x5, s
|
|
||||||
inc a
|
|
||||||
sta 0x5, s
|
|
||||||
cmp #0x3
|
|
||||||
bcs .LBB0_3
|
|
||||||
; %bb.2: ; %for.body.preheader
|
; %bb.2: ; %for.body.preheader
|
||||||
lda #0x2
|
|
||||||
sta 0x5, s
|
|
||||||
.LBB0_3: ; %for.body.preheader
|
.LBB0_3: ; %for.body.preheader
|
||||||
lda #0x1
|
lda #0x1
|
||||||
sta 0x7, s
|
sta 0xd2
|
||||||
lda 0x5, s
|
|
||||||
dec a
|
|
||||||
sta 0x5, s
|
|
||||||
lda #0x0
|
|
||||||
sta 0x1, s
|
|
||||||
.LBB0_4: ; %for.body
|
.LBB0_4: ; %for.body
|
||||||
; =>This Inner Loop Header: Depth=1
|
; =>This Inner Loop Header: Depth=1
|
||||||
lda 0x7, s
|
lda 0xd2
|
||||||
pha
|
pha
|
||||||
jsl __umulhisi3
|
jsl __umulhisi3
|
||||||
ply
|
ply
|
||||||
clc
|
clc
|
||||||
adc 0x3, s
|
adc 0xd6
|
||||||
sta 0x3, s
|
sta 0xd6
|
||||||
txa
|
txa
|
||||||
adc 0x1, s
|
adc 0xd4
|
||||||
sta 0x1, s
|
sta 0xd4
|
||||||
lda 0x7, s
|
inc 0xd2
|
||||||
inc a
|
dec 0xd0
|
||||||
sta 0x7, s
|
|
||||||
lda 0x5, s
|
|
||||||
dec a
|
|
||||||
sta 0x5, s
|
|
||||||
beq .LBB0_5
|
beq .LBB0_5
|
||||||
bra .LBB0_4
|
bra .LBB0_4
|
||||||
.LBB0_5: ; %for.cond.cleanup
|
.LBB0_5: ; %for.cond.cleanup
|
||||||
lda 0x1, s
|
lda 0xd4
|
||||||
tax
|
tax
|
||||||
lda 0x3, s
|
lda 0xd6
|
||||||
tay
|
|
||||||
tsc
|
|
||||||
clc
|
|
||||||
adc #0xc
|
|
||||||
tcs
|
|
||||||
tya
|
|
||||||
rtl
|
rtl
|
||||||
.Lfunc_end0:
|
.Lfunc_end0:
|
||||||
.size sumSquares, .Lfunc_end0-sumSquares
|
.size sumSquares, .Lfunc_end0-sumSquares
|
||||||
|
|
|
||||||
168
docs/INSTALL.md
Normal file
168
docs/INSTALL.md
Normal file
|
|
@ -0,0 +1,168 @@
|
||||||
|
# Installing llvm816
|
||||||
|
|
||||||
|
The project installs everything into `tools/` under the repo root, so
|
||||||
|
the tree is self-contained and deletable without affecting your system.
|
||||||
|
|
||||||
|
## System requirements
|
||||||
|
|
||||||
|
- **Ubuntu 22.04 or 24.04** (or any Debian-based distro with apt).
|
||||||
|
Other Linuxes work if you can install the packages listed below
|
||||||
|
by hand.
|
||||||
|
- **Disk:** ~10 GB free (LLVM build artifacts dominate).
|
||||||
|
- **RAM:** 8 GB minimum, 16 GB recommended for the `--build-llvm`
|
||||||
|
flag. The setup script's default skips the LLVM build and
|
||||||
|
downloads a prebuilt toolchain instead — much faster, ~500 MB.
|
||||||
|
- **Build time:** ~5 minutes for the default (prebuilt) path; 30-60
|
||||||
|
minutes for `--build-llvm` (full LLVM source build).
|
||||||
|
|
||||||
|
## One-command install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone <this-repo-url> llvm816
|
||||||
|
cd llvm816
|
||||||
|
./setup.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
`setup.sh` installs:
|
||||||
|
|
||||||
|
1. **System apt packages** — build-essential, cmake, ninja, clang, lld,
|
||||||
|
python3, MAME, etc. See [`scripts/installDeps.sh`](../scripts/installDeps.sh)
|
||||||
|
for the full list. *Requires sudo.*
|
||||||
|
2. **llvm-mos** — source tree clone at `tools/llvm-mos/` and a prebuilt
|
||||||
|
SDK at `tools/llvm-mos-sdk/`. With `--build-llvm` it also runs
|
||||||
|
cmake/ninja to build a usable W65816-aware clang at
|
||||||
|
`tools/llvm-mos-build/bin/clang`.
|
||||||
|
3. **Apple IIgs MAME** — installs MAME via apt and downloads the
|
||||||
|
apple2gs ROMs to `tools/mame/roms/`.
|
||||||
|
4. **Calypsi 5.16** — reference 65816 C compiler, installed to
|
||||||
|
`tools/calypsi/`. Used by the `compare/` benchmarks to measure
|
||||||
|
our codegen quality against a commercial baseline.
|
||||||
|
5. **ORCA/C** — Apple's official 65816 C compiler (header reference
|
||||||
|
for the IIgs Toolbox bindings).
|
||||||
|
|
||||||
|
After `setup.sh` finishes:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls tools/llvm-mos-build/bin/clang # our compiler
|
||||||
|
ls tools/link816 # our linker
|
||||||
|
mame -version # MAME (installed via apt)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step-by-step (if `setup.sh` fails)
|
||||||
|
|
||||||
|
You can run each install script in isolation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/installDeps.sh # apt packages
|
||||||
|
scripts/installLlvmMos.sh # llvm-mos clone + prebuilt SDK
|
||||||
|
scripts/installLlvmMos.sh --build # also build the source (slow)
|
||||||
|
scripts/installMame.sh # MAME + apple2gs ROMs
|
||||||
|
scripts/installCalypsi.sh # reference compiler (optional)
|
||||||
|
scripts/installOrcaC.sh # reference compiler (optional)
|
||||||
|
```
|
||||||
|
|
||||||
|
If you only want to build C programs (no benchmarks, no comparison
|
||||||
|
to Calypsi), `installCalypsi.sh` and `installOrcaC.sh` are
|
||||||
|
optional.
|
||||||
|
|
||||||
|
## Building the W65816 backend from source
|
||||||
|
|
||||||
|
The default install pulls a prebuilt LLVM SDK. To build our
|
||||||
|
W65816-aware clang from source:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./setup.sh --build-llvm
|
||||||
|
```
|
||||||
|
|
||||||
|
Or, after a non-`--build-llvm` install:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/applyBackend.sh # symlink our W65816 sources into llvm-mos clone
|
||||||
|
cmake --build tools/llvm-mos-build --target llc clang
|
||||||
|
```
|
||||||
|
|
||||||
|
The build takes 30-60 minutes on a modern laptop. Subsequent
|
||||||
|
incremental builds after editing W65816 backend code are ~30
|
||||||
|
seconds.
|
||||||
|
|
||||||
|
## Verifying the install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Compile + disassemble a small C function
|
||||||
|
scripts/cDemo.sh
|
||||||
|
|
||||||
|
# Build the runtime library (libc, libgcc, etc.)
|
||||||
|
bash runtime/build.sh
|
||||||
|
|
||||||
|
# Run the smoke test suite (~150 checks, takes ~3 minutes)
|
||||||
|
bash scripts/smokeTest.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
A successful smoke test ends with:
|
||||||
|
|
||||||
|
```
|
||||||
|
[llvm816] all smoke checks passed
|
||||||
|
```
|
||||||
|
|
||||||
|
## Updating
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git pull
|
||||||
|
scripts/applyBackend.sh # re-symlink our sources into the LLVM tree
|
||||||
|
cmake --build tools/llvm-mos-build --target llc clang
|
||||||
|
bash runtime/build.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want a fully clean rebuild:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf tools/llvm-mos-build
|
||||||
|
./setup.sh --build-llvm
|
||||||
|
```
|
||||||
|
|
||||||
|
## Uninstalling
|
||||||
|
|
||||||
|
The toolchain is fully contained under `tools/`. To uninstall:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf llvm816/
|
||||||
|
sudo apt-get remove mame mame-tools # if you want MAME gone too
|
||||||
|
```
|
||||||
|
|
||||||
|
The setup script doesn't touch `/usr/local` or `~/.mame` — nothing
|
||||||
|
to clean up outside the repo.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**`cmake: command not found`** — run `scripts/installDeps.sh`. The
|
||||||
|
apt packages aren't installed yet.
|
||||||
|
|
||||||
|
**`ROMs not found`** — the apple2gs ROM download from archive.org
|
||||||
|
occasionally fails. Re-run `scripts/installMame.sh`. The script
|
||||||
|
is idempotent; it skips ROMs already downloaded.
|
||||||
|
|
||||||
|
**`clang: error: unable to find target 'w65816'`** — the prebuilt
|
||||||
|
SDK's clang doesn't know about our W65816 target. You need the
|
||||||
|
source-built clang:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/installLlvmMos.sh --build
|
||||||
|
# Or, more granular:
|
||||||
|
scripts/applyBackend.sh
|
||||||
|
cmake --build tools/llvm-mos-build --target clang
|
||||||
|
```
|
||||||
|
|
||||||
|
The W65816 target lives in *our* fork at `tools/llvm-mos-build/bin/clang`,
|
||||||
|
not in the prebuilt SDK.
|
||||||
|
|
||||||
|
**MAME can't find ROMs at runtime** — make sure `mame` is launched
|
||||||
|
with `-rompath tools/mame/roms`. The provided
|
||||||
|
[`scripts/runInMame.sh`](../scripts/runInMame.sh) does this
|
||||||
|
automatically.
|
||||||
|
|
||||||
|
**`linkage error: missing __umulhisi3`** — link `runtime/libgcc.o`
|
||||||
|
into your binary. See [USAGE.md](USAGE.md#linking).
|
||||||
|
|
||||||
|
**MAME pops up a window I don't want** — the `runInMame.sh`
|
||||||
|
wrapper now runs headless (`-video none` + `SDL_VIDEODRIVER=dummy`).
|
||||||
|
If you're invoking MAME directly, add those flags.
|
||||||
391
docs/USAGE.md
Normal file
391
docs/USAGE.md
Normal file
|
|
@ -0,0 +1,391 @@
|
||||||
|
# Using llvm816
|
||||||
|
|
||||||
|
This document covers compiling a C program, linking it into an
|
||||||
|
Apple IIgs binary, and running it under MAME. It assumes you've
|
||||||
|
followed [INSTALL.md](INSTALL.md) and have a working
|
||||||
|
`tools/llvm-mos-build/bin/clang`.
|
||||||
|
|
||||||
|
## Quick reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CLANG=tools/llvm-mos-build/bin/clang
|
||||||
|
LINK=tools/link816
|
||||||
|
RUNTIME=runtime
|
||||||
|
|
||||||
|
# 1. Compile C to object
|
||||||
|
$CLANG --target=w65816 -O2 -I$RUNTIME/include -c hello.c -o hello.o
|
||||||
|
|
||||||
|
# 2. Link to a raw binary (loadable at $00:1000)
|
||||||
|
$LINK -o hello.bin --text-base 0x1000 \
|
||||||
|
$RUNTIME/crt0.o $RUNTIME/libc.o $RUNTIME/libgcc.o hello.o
|
||||||
|
|
||||||
|
# 3. Run under MAME
|
||||||
|
bash scripts/runInMame.sh hello.bin --check 0x025000=????
|
||||||
|
```
|
||||||
|
|
||||||
|
## Compiling C
|
||||||
|
|
||||||
|
The compiler is invoked just like a normal clang, with
|
||||||
|
`--target=w65816`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -c source.c -o source.o
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recommended flags:**
|
||||||
|
|
||||||
|
| Flag | Meaning |
|
||||||
|
|---|---|
|
||||||
|
| `--target=w65816` | Selects the W65816 backend (required) |
|
||||||
|
| `-O2` | Default optimization level. `-O0` and `-O1` work but produce ~3-5× larger code |
|
||||||
|
| `-ffunction-sections` | Put each function in its own section. Lets the linker drop unreferenced functions |
|
||||||
|
| `-I runtime/include` | Find `<stdio.h>` etc. |
|
||||||
|
| `-c` | Compile only — produce `.o`, don't link |
|
||||||
|
|
||||||
|
**What works at `-O2`:**
|
||||||
|
|
||||||
|
- All C99 scalars: `int8_t` through `int64_t`, signed and unsigned,
|
||||||
|
all arithmetic operators
|
||||||
|
- Soft `float` and `double` (full IEEE-754 with round-to-nearest-even)
|
||||||
|
- Pointers, arrays, structs, unions, bitfields
|
||||||
|
- All control flow: `if`, `for`, `while`, `goto`, `switch`,
|
||||||
|
recursion
|
||||||
|
- `<stdarg.h>` varargs
|
||||||
|
- `<setjmp.h>` setjmp/longjmp (SJLJ, no DWARF unwinder)
|
||||||
|
- Inline `__asm__` with `"a"`, `"x"`, `"y"` register constraints
|
||||||
|
- C++ subset: classes, single+multiple inheritance, virtual functions,
|
||||||
|
RTTI, `dynamic_cast`. **No exceptions** (DWARF unwinder not
|
||||||
|
implemented).
|
||||||
|
|
||||||
|
See [STATUS.md](../STATUS.md) for the full feature matrix.
|
||||||
|
|
||||||
|
## Linking
|
||||||
|
|
||||||
|
The linker is `tools/link816`. It produces either a raw binary
|
||||||
|
suitable for direct execution (loaded into a fixed address) or an
|
||||||
|
OMF binary suitable for GS/OS Loader.
|
||||||
|
|
||||||
|
### Raw binary
|
||||||
|
|
||||||
|
```bash
|
||||||
|
link816 -o output.bin --text-base 0x1000 crt0.o libc.o libgcc.o yourprog.o
|
||||||
|
```
|
||||||
|
|
||||||
|
- `--text-base 0x1000` — physical address where code is loaded.
|
||||||
|
`0x1000` is the conventional starting address; the first 4KB
|
||||||
|
of bank 0 ($00:0000 – $00:0FFF) is reserved for the stack and
|
||||||
|
zero-page.
|
||||||
|
- `crt0.o` — the C runtime startup. Sets DBR, calls `main`, halts.
|
||||||
|
Always link first.
|
||||||
|
- `libc.o` — `printf`, `malloc`, `strlen`, etc.
|
||||||
|
- `libgcc.o` — compiler-helper routines (`__mulhi3`, `__umulhisi3`,
|
||||||
|
`__divhi3`, `__ashlhi3`, etc.). Required by most non-trivial
|
||||||
|
programs.
|
||||||
|
|
||||||
|
### Additional runtime libraries
|
||||||
|
|
||||||
|
| Library | What you get |
|
||||||
|
|---|---|
|
||||||
|
| `runtime/libc.o` | Core C library — printf, malloc, strlen, etc. |
|
||||||
|
| `runtime/libgcc.o` | Compiler helpers — multiply, divide, shift |
|
||||||
|
| `runtime/snprintf.o` | `sprintf` / `snprintf` / `vsnprintf` |
|
||||||
|
| `runtime/sscanf.o` | `sscanf` / `vsscanf` / `fscanf` |
|
||||||
|
| `runtime/softDouble.o` | IEEE 754 double-precision math |
|
||||||
|
| `runtime/softFloat.o` | IEEE 754 single-precision math |
|
||||||
|
| `runtime/math.o` | `fabs`, `floor`, `sqrt`, `sin`, `cos`, etc. |
|
||||||
|
| `runtime/qsort.o` | `qsort` / `bsearch` |
|
||||||
|
| `runtime/strtol.o` | `strtol` / `strtoul` / `atoi` / `atol` |
|
||||||
|
| `runtime/strtok.o` | `strtok` / `strtok_r` |
|
||||||
|
| `runtime/extras.o` | `strcat`, `strncat`, `llabs`, `rand`/`srand` |
|
||||||
|
| `runtime/timeExt.o` | `time` / `gmtime` / `mktime` |
|
||||||
|
| `runtime/iigsToolbox.o` | Apple IIgs Toolbox call wrappers |
|
||||||
|
| `runtime/iigsGsos.o` | GS/OS call wrappers |
|
||||||
|
|
||||||
|
Link only what you use — the linker drops unreferenced symbols.
|
||||||
|
|
||||||
|
Build them all once with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash runtime/build.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-segment OMF (for GS/OS Loader)
|
||||||
|
|
||||||
|
For programs that need >60 KB of code (the usable bank-0 limit
|
||||||
|
after subtracting the stack, zero-page, and I/O window), build a
|
||||||
|
multi-segment OMF that GS/OS Loader can place across banks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
link816 -o myprog.bin --omf --manifest my.manifest \
|
||||||
|
--expressload \
|
||||||
|
crt0Gsos.o ... yourprog.o
|
||||||
|
```
|
||||||
|
|
||||||
|
See [`docs/multiSegmentPlan.md`](multiSegmentPlan.md) for details
|
||||||
|
and [`scripts/runMultiSeg.sh`](../scripts/runMultiSeg.sh) for a
|
||||||
|
working example.
|
||||||
|
|
||||||
|
## Running under MAME
|
||||||
|
|
||||||
|
The supplied [`scripts/runInMame.sh`](../scripts/runInMame.sh)
|
||||||
|
launches MAME's `apple2gs` with the right ROM path, loads your
|
||||||
|
binary at `$00:1000`, runs for a few seconds, and reads back a
|
||||||
|
memory cell.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/runInMame.sh prog.bin # just run for 5s
|
||||||
|
bash scripts/runInMame.sh prog.bin --check 0x025000=00ff
|
||||||
|
bash scripts/runInMame.sh prog.bin 0x025000 0x025002 # dump these addrs
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--check ADDR=VALUE` form returns exit 0 if `ADDR` contains
|
||||||
|
`VALUE` after the run, exit 1 otherwise. Use `0x????` to dump
|
||||||
|
the value without checking.
|
||||||
|
|
||||||
|
MAME is invoked headless by default (no window) via
|
||||||
|
`-video none` + `SDL_VIDEODRIVER=dummy`. This works on
|
||||||
|
servers/CI runners.
|
||||||
|
|
||||||
|
### The bank-switch idiom
|
||||||
|
|
||||||
|
Bank 0 (`$00:0000-$00:FFFF`) has the I/O window at `$C000-$CFFF`
|
||||||
|
that interferes with normal data access. The convention is to
|
||||||
|
switch the data bank register (DBR) to bank 2 (`$02:0000`) before
|
||||||
|
doing any data work:
|
||||||
|
|
||||||
|
```c
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"sep #0x20\n" // 8-bit accumulator
|
||||||
|
".byte 0xa9,0x02\n" // lda #2 (force as bytes — llvm-mc bug)
|
||||||
|
"pha\n"
|
||||||
|
"plb\n" // DBR = 2
|
||||||
|
"rep #0x20\n" // back to 16-bit
|
||||||
|
);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After `switchToBank2()`, your data lives at `$02:0000` upward.
|
||||||
|
The `runInMame.sh` `--check 0x025000=...` address is `$02:5000`
|
||||||
|
— accessible via a normal store in bank 2.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Hello, integer
|
||||||
|
|
||||||
|
```c
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void) {
|
||||||
|
int x = 42;
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile int *)0x5000 = x;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Build & run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -c hello.c -o hello.o
|
||||||
|
link816 -o hello.bin --text-base 0x1000 \
|
||||||
|
runtime/crt0.o runtime/libc.o runtime/libgcc.o hello.o
|
||||||
|
bash scripts/runInMame.sh hello.bin --check 0x025000=002a # 0x2a = 42
|
||||||
|
```
|
||||||
|
|
||||||
|
### Recursion + printing
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
|
||||||
|
unsigned long fib(unsigned n) {
|
||||||
|
if (n < 2) return n;
|
||||||
|
return fib(n-1) + fib(n-2);
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile (
|
||||||
|
"sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void) {
|
||||||
|
char buf[32];
|
||||||
|
int len = snprintf(buf, sizeof buf, "fib(10) = %lu", fib(10));
|
||||||
|
switchToBank2();
|
||||||
|
// Copy buf to $025000 so we can read it after the run
|
||||||
|
for (int i = 0; i <= len; i++)
|
||||||
|
((volatile char *)0x5000)[i] = buf[i];
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Build (note: need snprintf.o for `snprintf`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -I runtime/include -c fib.c -o fib.o
|
||||||
|
link816 -o fib.bin --text-base 0x1000 \
|
||||||
|
runtime/crt0.o runtime/libc.o runtime/libgcc.o \
|
||||||
|
runtime/snprintf.o runtime/softDouble.o runtime/sscanf.o fib.o
|
||||||
|
```
|
||||||
|
|
||||||
|
### Apple IIgs Toolbox
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <iigs/toolbox_full.h>
|
||||||
|
|
||||||
|
int main(void) {
|
||||||
|
DrawString("\pHello, World");
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Build:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -I runtime/include -c hello_gs.c -o hello_gs.o
|
||||||
|
link816 -o hello_gs.bin --text-base 0x1000 \
|
||||||
|
runtime/crt0Gsos.o runtime/iigsToolbox.o runtime/iigsGsos.o \
|
||||||
|
runtime/libgcc.o hello_gs.o
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `crt0Gsos.o` (not `crt0.o`) for programs that call into the
|
||||||
|
toolbox — it sets up the IIgs runtime environment.
|
||||||
|
|
||||||
|
## Inline assembly
|
||||||
|
|
||||||
|
The W65816 backend supports `__asm__` with operand constraints
|
||||||
|
`"a"`, `"x"`, `"y"`:
|
||||||
|
|
||||||
|
```c
|
||||||
|
unsigned short addOne(unsigned short x) {
|
||||||
|
unsigned short r;
|
||||||
|
__asm__("inc a" : "=a"(r) : "a"(x));
|
||||||
|
return r;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Multi-instruction asm and raw bytes both work:
|
||||||
|
|
||||||
|
```c
|
||||||
|
__asm__ volatile (
|
||||||
|
"sep #0x20\n"
|
||||||
|
".byte 0x68\n" // pla
|
||||||
|
"rep #0x20\n"
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
The `.byte 0xa9, ...` form is sometimes needed to work around
|
||||||
|
llvm-mc encoding gaps — the assembler doesn't yet support every
|
||||||
|
65816 addressing mode literally. The pattern works for any
|
||||||
|
opcode whose mnemonic doesn't yet parse.
|
||||||
|
|
||||||
|
## Tools reference
|
||||||
|
|
||||||
|
| Tool | Location | Purpose |
|
||||||
|
|---|---|---|
|
||||||
|
| `clang` | `tools/llvm-mos-build/bin/clang` | C/C++ compiler |
|
||||||
|
| `llvm-mc` | `tools/llvm-mos-build/bin/llvm-mc` | Assembler |
|
||||||
|
| `llvm-objdump` | `tools/llvm-mos-build/bin/llvm-objdump` | Disassembler |
|
||||||
|
| `llc` | `tools/llvm-mos-build/bin/llc` | Standalone codegen (`.ll` → `.s`) |
|
||||||
|
| `link816` | `tools/link816` | Our relocating linker |
|
||||||
|
| `omfEmit` | `tools/omfEmit` | Emit OMF v2.1 binary from `link816` output |
|
||||||
|
| `mame` | `apt` (system-wide) | Apple IIgs emulator |
|
||||||
|
|
||||||
|
## Debugging
|
||||||
|
|
||||||
|
### Look at the asm
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -S -o prog.s prog.c
|
||||||
|
```
|
||||||
|
|
||||||
|
### Look at the MIR after each pass
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -mllvm -print-after-all -S prog.c 2>&1 | less
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful pass names to filter on:
|
||||||
|
|
||||||
|
| Pass name | What it does |
|
||||||
|
|---|---|
|
||||||
|
| `w65816-isel` | SDAG → MachineInstr selection |
|
||||||
|
| `w65816-widen-acc16` | Promote Acc16 vregs to Wide16 (regalloc help) |
|
||||||
|
| `w65816-stack-slot-cleanup` | Remove redundant spill/reload |
|
||||||
|
| `w65816-stackrel-to-img` | Promote hot stack slots to DP IMG slots |
|
||||||
|
| `w65816-stack-slot-merge` | Collapse PHI src/dst slot pairs |
|
||||||
|
| `w65816-branch-expand` | Long-distance Bxx → INV_Bxx skip;BRA |
|
||||||
|
|
||||||
|
### Single-pass filter
|
||||||
|
|
||||||
|
```bash
|
||||||
|
clang --target=w65816 -O2 -mllvm -print-after=w65816-isel \
|
||||||
|
-mllvm -filter-print-funcs=myfunc -S prog.c 2>&1 | less
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cycle-count benchmarks
|
||||||
|
|
||||||
|
Eight microbenchmarks live under [`benchmarks/`](../benchmarks/).
|
||||||
|
Each runs N iterations of the bench function and reports a
|
||||||
|
per-call cycle count via MAME's `emu.time()`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/benchCyclesPrecise.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
```
|
||||||
|
| Benchmark | Per-call cycles (clang) |
|
||||||
|
|-----------|------------------------:|
|
||||||
|
| bsearch | 767 cyc/call |
|
||||||
|
| dotProduct | 2131 cyc/call |
|
||||||
|
| fib | 12617 cyc/call |
|
||||||
|
| memcmp | 989 cyc/call |
|
||||||
|
| popcount | 2864 cyc/call |
|
||||||
|
| strcpy | 2216 cyc/call |
|
||||||
|
| sumOfSquares | 16709 cyc/call |
|
||||||
|
```
|
||||||
|
|
||||||
|
The [`compare/`](../compare/) directory has side-by-side `.s`
|
||||||
|
files vs Calypsi 5.16 for sumSquares, evalAt, and mul16to32.
|
||||||
|
Rerun with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash compare/regen.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Known limitations
|
||||||
|
|
||||||
|
- **C++ exceptions** are not implemented. `try`/`catch` compiles but
|
||||||
|
doesn't unwind. `-fsjlj-exceptions` works for limited SJLJ-style
|
||||||
|
throwing.
|
||||||
|
- **`stdin`** always returns EOF. `scanf` compiles but isn't useful.
|
||||||
|
Use `sscanf` on a buffer instead.
|
||||||
|
- **File I/O** through `fopen` etc. requires a backing implementation.
|
||||||
|
The default `mfs` backing (memory-file-system) lets you simulate
|
||||||
|
files via `mfsRegister()` — useful for tests, not for real disk
|
||||||
|
I/O. GS/OS file I/O works via `runtime/iigsGsos.o` if you link
|
||||||
|
against the GS/OS runtime.
|
||||||
|
- **`fork`/`exec`** — not applicable on a 65816, no support.
|
||||||
|
- **Code generation gotcha:** very large frames (>200 bytes) trigger
|
||||||
|
FP-relative addressing. Most programs fit under that limit. See
|
||||||
|
the `frame-rel` discussion in
|
||||||
|
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||||||
|
|
||||||
|
## Where to go next
|
||||||
|
|
||||||
|
- **Building real GS/OS apps:** see
|
||||||
|
[`docs/multiSegmentPlan.md`](multiSegmentPlan.md) and the
|
||||||
|
`runViaFinder.sh` script for booting through real GS/OS 6.0.2 in
|
||||||
|
MAME.
|
||||||
|
- **Backend internals (you're hacking on the compiler):**
|
||||||
|
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
|
||||||
|
- **Smoke tests:** `scripts/smokeTest.sh` runs ~150 end-to-end checks.
|
||||||
|
Read it for examples of every feature in action.
|
||||||
|
|
@ -331,9 +331,11 @@ EOF
|
||||||
cat "$sCmpFile" >&2
|
cat "$sCmpFile" >&2
|
||||||
die "setcc gt test missing: bcc/bcs (carry-based unsigned branch)"
|
die "setcc gt test missing: bcc/bcs (carry-based unsigned branch)"
|
||||||
fi
|
fi
|
||||||
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+,\s*s\s*$' "$sCmpFile"; then
|
# Accept either stack-relative cmp or DP-form cmp (W65816StackRelToImg
|
||||||
|
# may promote the comparand to a DP slot when arg b is the hot slot).
|
||||||
|
if ! grep -qE '^\s*cmp\s+0x[0-9a-f]+(,\s*s)?\s*$' "$sCmpFile"; then
|
||||||
cat "$sCmpFile" >&2
|
cat "$sCmpFile" >&2
|
||||||
die "setcc gt test missing: cmp <off>,s (stack-relative compare to arg b)"
|
die "setcc gt test missing: cmp <off>,s or cmp <dp> (compare to arg b)"
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
@ -373,13 +375,13 @@ int max3(int a, int b, int c) {
|
||||||
}
|
}
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -S "$cFile3" -o "$sChainFile"
|
"$CLANG" --target=w65816 -O2 -S "$cFile3" -o "$sChainFile"
|
||||||
# Expect cmp against a stack-relative slot - the signature of the
|
# Expect cmp against a stack-relative slot OR a DP slot - the
|
||||||
# two-Acc16 CMP_RR custom inserter. (Earlier this test also
|
# signature of the two-Acc16 CMP_RR custom inserter. Earlier this
|
||||||
# required an `sta d,s` spill, but greedy regalloc + WidenAcc16
|
# required only stack-rel; W65816StackRelToImg may promote the
|
||||||
# avoids that spill entirely on this pattern.)
|
# comparand to a DP slot for hot offsets.
|
||||||
if ! grep -qE 'cmp 0x[0-9a-f]+, s' "$sChainFile"; then
|
if ! grep -qE 'cmp 0x[0-9a-f]+(, s|$)' "$sChainFile"; then
|
||||||
cat "$sChainFile" >&2
|
cat "$sChainFile" >&2
|
||||||
die "two-Acc16 (max3) didn't cmp via stack-relative"
|
die "two-Acc16 (max3) didn't cmp via stack-relative or DP"
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -39,6 +39,7 @@ add_llvm_target(W65816CodeGen
|
||||||
W65816ImgCalleeSave.cpp
|
W65816ImgCalleeSave.cpp
|
||||||
W65816NarrowI32Mul.cpp
|
W65816NarrowI32Mul.cpp
|
||||||
W65816PromoteFiToImg.cpp
|
W65816PromoteFiToImg.cpp
|
||||||
|
W65816StackRelToImg.cpp
|
||||||
W65816StackSlotMerge.cpp
|
W65816StackSlotMerge.cpp
|
||||||
W65816TargetMachine.cpp
|
W65816TargetMachine.cpp
|
||||||
W65816AsmPrinter.cpp
|
W65816AsmPrinter.cpp
|
||||||
|
|
|
||||||
|
|
@ -143,6 +143,12 @@ FunctionPass *createW65816PromoteFiToImg();
|
||||||
// copy. See W65816StackSlotMerge.cpp.
|
// copy. See W65816StackSlotMerge.cpp.
|
||||||
FunctionPass *createW65816StackSlotMerge();
|
FunctionPass *createW65816StackSlotMerge();
|
||||||
|
|
||||||
|
// Pre-emit pass: rewrite top-N stack-rel slot offsets to IMG0..IMG7
|
||||||
|
// DP slots ($D0..$DE). Caller-save semantics — function must only
|
||||||
|
// call IMG-safe libgcc helpers (verified to not touch $D0..$DE).
|
||||||
|
// See W65816StackRelToImg.cpp.
|
||||||
|
FunctionPass *createW65816StackRelToImg();
|
||||||
|
|
||||||
// Pre-RA pass that lowers Wide32 register pairs into pairs of i16
|
// Pre-RA pass that lowers Wide32 register pairs into pairs of i16
|
||||||
// vregs. Without this, greedy/basic regalloc can't fit the pair-
|
// vregs. Without this, greedy/basic regalloc can't fit the pair-
|
||||||
// pressure of i64-via-2-i32-via-Wide32 traffic in i64-heavy
|
// pressure of i64-via-2-i32-via-Wide32 traffic in i64-heavy
|
||||||
|
|
@ -184,6 +190,7 @@ void initializeW65816ImgCalleeSavePass(PassRegistry &);
|
||||||
void initializeW65816NarrowI32MulPass(PassRegistry &);
|
void initializeW65816NarrowI32MulPass(PassRegistry &);
|
||||||
void initializeW65816PromoteFiToImgPass(PassRegistry &);
|
void initializeW65816PromoteFiToImgPass(PassRegistry &);
|
||||||
void initializeW65816StackSlotMergePass(PassRegistry &);
|
void initializeW65816StackSlotMergePass(PassRegistry &);
|
||||||
|
void initializeW65816StackRelToImgPass(PassRegistry &);
|
||||||
|
|
||||||
} // namespace llvm
|
} // namespace llvm
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -485,7 +485,14 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
if (It2 != MI->getParent()->end()) {
|
if (It2 != MI->getParent()->end()) {
|
||||||
const TargetRegisterInfo *TRI =
|
const TargetRegisterInfo *TRI =
|
||||||
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
|
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
|
||||||
if (It2->modifiesRegister(W65816::A, TRI))
|
// PEI doesn't load A, so the LDA's value-set is needed if
|
||||||
|
// the next instruction READS A. JSL has implicit-def $a
|
||||||
|
// (caller-save) AND implicit-use $a (when A is an arg) —
|
||||||
|
// modifiesRegister returns true for both, but readsRegister
|
||||||
|
// is what tells us if A's value is consumed. Drop the LDA
|
||||||
|
// ONLY when the next op modifies A WITHOUT reading it.
|
||||||
|
if (It2->modifiesRegister(W65816::A, TRI) &&
|
||||||
|
!It2->readsRegister(W65816::A, TRI))
|
||||||
ADead = true;
|
ADead = true;
|
||||||
}
|
}
|
||||||
if (ADead) {
|
if (ADead) {
|
||||||
|
|
|
||||||
|
|
@ -188,10 +188,6 @@ bool W65816ImgCalleeSave::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// other spill slots — but the STAfi/LDAfi we emit reference this slot
|
// other spill slots — but the STAfi/LDAfi we emit reference this slot
|
||||||
// by FrameIndex, and the only writes to this FI are our save/restore
|
// by FrameIndex, and the only writes to this FI are our save/restore
|
||||||
// pair, so coloring can't break the round-trip.
|
// pair, so coloring can't break the round-trip.
|
||||||
//
|
|
||||||
// (The picol-expr bug came from a SHARED slot with two DIFFERENT
|
|
||||||
// vregs writing to it; here we have one FI per IMG and a single
|
|
||||||
// write/read pair per function, so coloring can't trip on this.)
|
|
||||||
MachineFrameInfo &MFI = MF.getFrameInfo();
|
MachineFrameInfo &MFI = MF.getFrameInfo();
|
||||||
int FrameSlots[8];
|
int FrameSlots[8];
|
||||||
for (int i = 0; i < 8; ++i) {
|
for (int i = 0; i < 8; ++i) {
|
||||||
|
|
|
||||||
|
|
@ -52,8 +52,11 @@
|
||||||
#include "llvm/CodeGen/MachineFunction.h"
|
#include "llvm/CodeGen/MachineFunction.h"
|
||||||
#include "llvm/CodeGen/MachineFunctionPass.h"
|
#include "llvm/CodeGen/MachineFunctionPass.h"
|
||||||
#include "llvm/CodeGen/MachineInstrBuilder.h"
|
#include "llvm/CodeGen/MachineInstrBuilder.h"
|
||||||
|
#include "llvm/CodeGen/MachineLoopInfo.h"
|
||||||
#include "llvm/CodeGen/MachineRegisterInfo.h"
|
#include "llvm/CodeGen/MachineRegisterInfo.h"
|
||||||
|
#include "llvm/InitializePasses.h"
|
||||||
#include "llvm/Support/Debug.h"
|
#include "llvm/Support/Debug.h"
|
||||||
|
#include "llvm/Support/Format.h"
|
||||||
|
|
||||||
using namespace llvm;
|
using namespace llvm;
|
||||||
|
|
||||||
|
|
@ -70,6 +73,11 @@ public:
|
||||||
StringRef getPassName() const override {
|
StringRef getPassName() const override {
|
||||||
return "W65816 promote FrameIndex to IMG8..15 DP slot";
|
return "W65816 promote FrameIndex to IMG8..15 DP slot";
|
||||||
}
|
}
|
||||||
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
||||||
|
AU.addRequired<MachineLoopInfoWrapperPass>();
|
||||||
|
AU.setPreservesCFG();
|
||||||
|
MachineFunctionPass::getAnalysisUsage(AU);
|
||||||
|
}
|
||||||
bool runOnMachineFunction(MachineFunction &MF) override;
|
bool runOnMachineFunction(MachineFunction &MF) override;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
@ -79,8 +87,11 @@ public:
|
||||||
|
|
||||||
char W65816PromoteFiToImg::ID = 0;
|
char W65816PromoteFiToImg::ID = 0;
|
||||||
|
|
||||||
INITIALIZE_PASS(W65816PromoteFiToImg, DEBUG_TYPE,
|
INITIALIZE_PASS_BEGIN(W65816PromoteFiToImg, DEBUG_TYPE,
|
||||||
"W65816 promote FI to IMG", false, false)
|
"W65816 promote FI to IMG", false, false)
|
||||||
|
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfoWrapperPass)
|
||||||
|
INITIALIZE_PASS_END(W65816PromoteFiToImg, DEBUG_TYPE,
|
||||||
|
"W65816 promote FI to IMG", false, false)
|
||||||
|
|
||||||
|
|
||||||
FunctionPass *llvm::createW65816PromoteFiToImg() {
|
FunctionPass *llvm::createW65816PromoteFiToImg() {
|
||||||
|
|
@ -131,19 +142,20 @@ static uint8_t dpAddrForImg(unsigned ImgIdx) {
|
||||||
|
|
||||||
|
|
||||||
bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// DISABLED: pass produces verifier errors ("Using an undefined physical
|
// DISABLED again 2026-05-13 (3rd-attempt write-up). Two new findings:
|
||||||
// register") on the kill-flag bookkeeping when an STAfi with `killed $a`
|
// 1. With kMaxPromote=2 and IMG0..7 (caller-save, skip ImgCalleeSave),
|
||||||
// is rewritten to STA_DP — the next i16-imm ADC/ADCE sees $a as dead.
|
// sumSquares regressed 56 → 72 inst because the FIs picked by
|
||||||
// Also, for the FUNCTIONS where it would land (no-call, high-traffic
|
// access-count (fi#2, fi#3) are intermediate spill temps, not
|
||||||
// slots), measured static + dynamic savings were modest and didn't
|
// the i32-accumulator's halves (which are different FIs). The
|
||||||
// justify the bookkeeping complexity. Re-enable after:
|
// loop body ends up using BOTH IMG and stack slots for related
|
||||||
// - tightening kill-flag preservation: only carry kill if the same
|
// values.
|
||||||
// operand will be the last user in the new MI (which depends on
|
// 2. To pick the RIGHT FIs (those corresponding to PHI-cycled
|
||||||
// post-rewrite scheduling — needs careful liveness re-analysis).
|
// values like the i32 accumulator), we need either:
|
||||||
// - paired-PHI promotion: when fi#A is a PHI-input and fi#B is the
|
// (a) IR-level analysis BEFORE FI assignment, or
|
||||||
// matching PHI-output, map them to the SAME IMG slot so the
|
// (b) Post-RA dataflow analysis to identify "long-lived" FIs
|
||||||
// PHI move collapses to a no-op (where most of the dynamic win
|
// (active across the loop back-edge with no def/use boundary).
|
||||||
// would come from).
|
// This is the next blocker. Disabled until either (a) or (b) is
|
||||||
|
// implemented.
|
||||||
return false;
|
return false;
|
||||||
if (skipFunction(MF.getFunction())) return false;
|
if (skipFunction(MF.getFunction())) return false;
|
||||||
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
|
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
|
||||||
|
|
@ -151,49 +163,59 @@ bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
||||||
MachineFrameInfo &MFI = MF.getFrameInfo();
|
MachineFrameInfo &MFI = MF.getFrameInfo();
|
||||||
|
|
||||||
// 1. Walk all instructions, count FI accesses for promotable opcodes.
|
// 1. Walk all instructions, count FI accesses for promotable opcodes.
|
||||||
|
// Weight by loop depth: an access inside a depth-N loop counts as
|
||||||
|
// 10^N to model the dynamic execution count (an inner-loop slot
|
||||||
|
// gets executed many times per outer call).
|
||||||
|
MachineLoopInfo &MLI =
|
||||||
|
getAnalysis<MachineLoopInfoWrapperPass>().getLI();
|
||||||
DenseMap<int, unsigned> AccessCount;
|
DenseMap<int, unsigned> AccessCount;
|
||||||
DenseMap<int, SmallVector<MachineInstr *, 8>> AccessSites;
|
DenseMap<int, SmallVector<MachineInstr *, 8>> AccessSites;
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
|
unsigned LoopDepth = MLI.getLoopDepth(&MBB);
|
||||||
|
unsigned Weight = 1;
|
||||||
|
for (unsigned i = 0; i < LoopDepth && i < 3; ++i) Weight *= 10;
|
||||||
for (MachineInstr &MI : MBB) {
|
for (MachineInstr &MI : MBB) {
|
||||||
int FiIdx = getFiOperandIdx(MI.getOpcode());
|
int FiIdx = getFiOperandIdx(MI.getOpcode());
|
||||||
if (FiIdx < 0) continue;
|
if (FiIdx < 0) continue;
|
||||||
const MachineOperand &MO = MI.getOperand(FiIdx);
|
const MachineOperand &MO = MI.getOperand(FiIdx);
|
||||||
if (!MO.isFI()) continue;
|
if (!MO.isFI()) continue;
|
||||||
int FI = MO.getIndex();
|
int FI = MO.getIndex();
|
||||||
// Require: 2-byte size, fixed (not variable), offset operand == 0.
|
|
||||||
// The offset operand sits right after the FI operand.
|
|
||||||
if (MFI.isVariableSizedObjectIndex(FI)) continue;
|
if (MFI.isVariableSizedObjectIndex(FI)) continue;
|
||||||
if (MFI.getObjectSize(FI) != 2) continue;
|
if (MFI.getObjectSize(FI) != 2) continue;
|
||||||
// Fixed (negative-index) slots are arg slots — leave them alone.
|
|
||||||
// Promotion would break LowerFormalArguments's expected layout.
|
|
||||||
if (FI < 0) continue;
|
if (FI < 0) continue;
|
||||||
const MachineOperand &OffMO = MI.getOperand(FiIdx + 1);
|
const MachineOperand &OffMO = MI.getOperand(FiIdx + 1);
|
||||||
if (!OffMO.isImm() || OffMO.getImm() != 0) continue;
|
if (!OffMO.isImm() || OffMO.getImm() != 0) continue;
|
||||||
AccessCount[FI]++;
|
AccessCount[FI] += Weight;
|
||||||
AccessSites[FI].push_back(&MI);
|
AccessSites[FI].push_back(&MI);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if (AccessCount.empty()) return false;
|
if (AccessCount.empty()) return false;
|
||||||
|
|
||||||
// 2. Determine which IMG8..15 slots are already in use.
|
// 2. Determine which IMG0..7 slots are already in use (caller-save).
|
||||||
|
// Use caller-save IMG0..7 instead of callee-save IMG8..15: this lets
|
||||||
|
// us skip ImgCalleeSave entirely (no prologue/epilogue overhead).
|
||||||
|
// The trade-off: any call inside the function clobbers IMG0..7. Mark
|
||||||
|
// any function with calls as "callees might clobber" → skip promotion.
|
||||||
|
// This restricts wins to leaf functions (no internal calls).
|
||||||
BitVector UsedImg(8, false);
|
BitVector UsedImg(8, false);
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
for (MachineInstr &MI : MBB) {
|
for (MachineInstr &MI : MBB) {
|
||||||
|
// Skip CALL instructions — their `implicit-def dead $img0..7`
|
||||||
|
// operand list marks every IMG slot used, but that's just the
|
||||||
|
// caller-save annotation, not actual value-bearing usage.
|
||||||
|
if (MI.isCall()) continue;
|
||||||
for (const MachineOperand &MO : MI.operands()) {
|
for (const MachineOperand &MO : MI.operands()) {
|
||||||
if (!MO.isReg() || !MO.getReg().isPhysical()) continue;
|
if (!MO.isReg() || !MO.getReg().isPhysical()) continue;
|
||||||
Register R = MO.getReg();
|
Register R = MO.getReg();
|
||||||
// IMG8..15 are not numerically contiguous with each other in
|
unsigned ImgIdx = 16;
|
||||||
// the W65816 register enum (subreg-pair regs sit between
|
if (R == W65816::IMG0) ImgIdx = 0;
|
||||||
// IMG indices). Spell them out explicitly.
|
else if (R == W65816::IMG1) ImgIdx = 1;
|
||||||
unsigned ImgIdx = 16; // "not an IMG8..15"
|
else if (R == W65816::IMG2) ImgIdx = 2;
|
||||||
if (R == W65816::IMG8) ImgIdx = 0;
|
else if (R == W65816::IMG3) ImgIdx = 3;
|
||||||
else if (R == W65816::IMG9) ImgIdx = 1;
|
else if (R == W65816::IMG4) ImgIdx = 4;
|
||||||
else if (R == W65816::IMG10) ImgIdx = 2;
|
else if (R == W65816::IMG5) ImgIdx = 5;
|
||||||
else if (R == W65816::IMG11) ImgIdx = 3;
|
else if (R == W65816::IMG6) ImgIdx = 6;
|
||||||
else if (R == W65816::IMG12) ImgIdx = 4;
|
else if (R == W65816::IMG7) ImgIdx = 7;
|
||||||
else if (R == W65816::IMG13) ImgIdx = 5;
|
|
||||||
else if (R == W65816::IMG14) ImgIdx = 6;
|
|
||||||
else if (R == W65816::IMG15) ImgIdx = 7;
|
|
||||||
if (ImgIdx < 8) UsedImg.set(ImgIdx);
|
if (ImgIdx < 8) UsedImg.set(ImgIdx);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -215,20 +237,80 @@ bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// save/restore cost compounds with recursion / call frequency
|
// save/restore cost compounds with recursion / call frequency
|
||||||
// in ways the static access count can't capture).
|
// in ways the static access count can't capture).
|
||||||
bool HasCalls = false;
|
bool HasCalls = false;
|
||||||
|
bool IsRecursive = false;
|
||||||
|
StringRef SelfName = MF.getName();
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
for (MachineInstr &MI : MBB) {
|
for (MachineInstr &MI : MBB) {
|
||||||
if (MI.isCall()) { HasCalls = true; break; }
|
if (MI.isCall()) {
|
||||||
|
HasCalls = true;
|
||||||
|
// Check for self-call (recursive).
|
||||||
|
for (const MachineOperand &MO : MI.operands()) {
|
||||||
|
if (MO.isGlobal() && MO.getGlobal()->getName() == SelfName)
|
||||||
|
IsRecursive = true;
|
||||||
|
else if (MO.isSymbol() && SelfName == MO.getSymbolName())
|
||||||
|
IsRecursive = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if (HasCalls) break;
|
|
||||||
}
|
}
|
||||||
const unsigned kAccessThreshold = HasCalls ? 999999u : 5u;
|
// Recursive functions: skip — recursion makes per-call overhead
|
||||||
|
// compound (each level of recursion pays the save/restore).
|
||||||
|
if (IsRecursive) return false;
|
||||||
|
// Caller-save IMG0..7 strategy: any internal call clobbers them, so
|
||||||
|
// the only safe promoted slots are those whose lifetime doesn't
|
||||||
|
// cross a call. For now, only promote in leaf functions (no internal
|
||||||
|
// calls at all). This catches simple loops like sumSquares (which
|
||||||
|
// calls __umulhisi3 — but that's in libgcc.s and doesn't actually
|
||||||
|
// touch IMG0..7; treat libgcc multiplies as IMG-safe).
|
||||||
|
//
|
||||||
|
// Whitelist of libgcc functions known to not touch IMG0..7.
|
||||||
|
auto isImgSafeLibcall = [](const MachineInstr &MI) -> bool {
|
||||||
|
if (!MI.isCall()) return false;
|
||||||
|
for (const MachineOperand &MO : MI.operands()) {
|
||||||
|
StringRef Name;
|
||||||
|
if (MO.isGlobal()) Name = MO.getGlobal()->getName();
|
||||||
|
else if (MO.isSymbol()) Name = MO.getSymbolName();
|
||||||
|
else continue;
|
||||||
|
// libgcc.s multiply/divide/shift helpers — verified to only use
|
||||||
|
// $E0..$E9 internally, no IMG0..7 touch.
|
||||||
|
if (Name == "__umulhisi3" || Name == "__mulhi3" ||
|
||||||
|
Name == "__mulsi3" || Name == "__udivhi3" ||
|
||||||
|
Name == "__umodhi3" || Name == "__divhi3" ||
|
||||||
|
Name == "__modhi3" || Name == "__udivsi3" ||
|
||||||
|
Name == "__umodsi3" || Name == "__divsi3" ||
|
||||||
|
Name == "__modsi3" || Name == "__ashlhi3" ||
|
||||||
|
Name == "__lshrhi3" || Name == "__ashrhi3" ||
|
||||||
|
Name == "__ashlsi3" || Name == "__lshrsi3" ||
|
||||||
|
Name == "__ashrsi3")
|
||||||
|
return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
|
bool AllCallsImgSafe = true;
|
||||||
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
|
for (MachineInstr &MI : MBB) {
|
||||||
|
if (MI.isCall() && !isImgSafeLibcall(MI)) {
|
||||||
|
AllCallsImgSafe = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!AllCallsImgSafe) break;
|
||||||
|
}
|
||||||
|
if (HasCalls && !AllCallsImgSafe) return false;
|
||||||
|
// Threshold: per-access save is 1 cyc, no save/restore overhead. We
|
||||||
|
// just need the access count to be > 0 to win. Use a small threshold
|
||||||
|
// for safety (avoid promoting marginal slots).
|
||||||
|
const unsigned kAccessThreshold = 5u;
|
||||||
|
const unsigned kMaxPromote = 2u;
|
||||||
DenseMap<int, unsigned> FiToImgIdx;
|
DenseMap<int, unsigned> FiToImgIdx;
|
||||||
unsigned NextFreeImg = 0;
|
unsigned NextFreeImg = 0;
|
||||||
for (int FI : Ordered) {
|
for (int FI : Ordered) {
|
||||||
if (AccessCount[FI] < kAccessThreshold) break;
|
if (AccessCount[FI] < kAccessThreshold) break;
|
||||||
|
if (FiToImgIdx.size() >= kMaxPromote) break;
|
||||||
while (NextFreeImg < 8 && UsedImg.test(NextFreeImg)) ++NextFreeImg;
|
while (NextFreeImg < 8 && UsedImg.test(NextFreeImg)) ++NextFreeImg;
|
||||||
if (NextFreeImg >= 8) break;
|
if (NextFreeImg >= 8) break;
|
||||||
FiToImgIdx[FI] = NextFreeImg + 8; // Map to IMG8..15
|
FiToImgIdx[FI] = NextFreeImg; // Map to IMG0..7 (caller-save)
|
||||||
++NextFreeImg;
|
++NextFreeImg;
|
||||||
}
|
}
|
||||||
if (FiToImgIdx.empty()) return false;
|
if (FiToImgIdx.empty()) return false;
|
||||||
|
|
|
||||||
1220
src/llvm/lib/Target/W65816/W65816StackRelToImg.cpp
Normal file
1220
src/llvm/lib/Target/W65816/W65816StackRelToImg.cpp
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -599,20 +599,31 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
||||||
}
|
}
|
||||||
return 0;
|
return 0;
|
||||||
};
|
};
|
||||||
// Collect `LDA #K ; STA_StackRel Y` pairs, grouped by Y.
|
// Collect `LDA #K ; STA_StackRel Y` pairs, grouped by Y. Also
|
||||||
|
// handles consolidated `LDA #K ; STA Y1 ; STA Y2 ; ...` where the
|
||||||
|
// LDA is shared (Phase 6 collapsing): A stays at K across STAs.
|
||||||
DenseMap<int64_t, SmallVector<std::pair<MachineInstr *, int64_t>, 4>>
|
DenseMap<int64_t, SmallVector<std::pair<MachineInstr *, int64_t>, 4>>
|
||||||
ConstStas;
|
ConstStas;
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
for (auto It = MBB.begin(); It != MBB.end(); ++It) {
|
for (auto It = MBB.begin(); It != MBB.end(); ++It) {
|
||||||
if (!isLdaImm(*It)) continue;
|
if (!isLdaImm(*It)) continue;
|
||||||
int64_t K = immValue(*It);
|
int64_t K = immValue(*It);
|
||||||
|
// Walk forward through STA_StackRel ops; collect each as an
|
||||||
|
// init of K (A is preserved across STA). Stop on anything
|
||||||
|
// that modifies A.
|
||||||
auto NextIt = std::next(It);
|
auto NextIt = std::next(It);
|
||||||
while (NextIt != MBB.end() && NextIt->isDebugInstr()) ++NextIt;
|
while (NextIt != MBB.end()) {
|
||||||
if (NextIt == MBB.end()) continue;
|
if (NextIt->isDebugInstr()) { ++NextIt; continue; }
|
||||||
if (NextIt->getOpcode() != W65816::STA_StackRel) continue;
|
if (NextIt->getOpcode() == W65816::STA_StackRel) {
|
||||||
int64_t Y;
|
int64_t Y;
|
||||||
if (!srAccess(*NextIt, Y)) continue;
|
if (srAccess(*NextIt, Y)) {
|
||||||
ConstStas[Y].push_back({&*NextIt, K});
|
ConstStas[Y].push_back({&*NextIt, K});
|
||||||
|
}
|
||||||
|
++NextIt;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
break; // any other op — stop (might change A or flags)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// For each slot Y with at least two const-init STAs, check for
|
// For each slot Y with at least two const-init STAs, check for
|
||||||
|
|
@ -692,6 +703,7 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// flag-use (unsafe).
|
// flag-use (unsafe).
|
||||||
MachineBasicBlock *MBB = DominatedSta->getParent();
|
MachineBasicBlock *MBB = DominatedSta->getParent();
|
||||||
bool flagsSafeP5 = false;
|
bool flagsSafeP5 = false;
|
||||||
|
bool reachedMBBEnd = false;
|
||||||
for (auto Fwd = std::next(DominatedSta->getIterator());
|
for (auto Fwd = std::next(DominatedSta->getIterator());
|
||||||
Fwd != MBB->end(); ++Fwd) {
|
Fwd != MBB->end(); ++Fwd) {
|
||||||
if (Fwd->isDebugInstr()) continue;
|
if (Fwd->isDebugInstr()) continue;
|
||||||
|
|
@ -701,6 +713,33 @@ bool W65816StackSlotMerge::runOnMachineFunction(MachineFunction &MF) {
|
||||||
}
|
}
|
||||||
if (clobbersFlagsP(*Fwd)) { flagsSafeP5 = true; break; }
|
if (clobbersFlagsP(*Fwd)) { flagsSafeP5 = true; break; }
|
||||||
}
|
}
|
||||||
|
// If we walked off the end of MBB, recurse one level into
|
||||||
|
// successors. The fall-through code is in a successor MBB
|
||||||
|
// (e.g., bb.3's preheader -> bb.4's loop body which starts
|
||||||
|
// with an LDA, a flag-clobberer). Require ALL successors
|
||||||
|
// to clobber flags before any flag-use.
|
||||||
|
if (!flagsSafeP5) {
|
||||||
|
// Did the loop exit via fall-through (no break)?
|
||||||
|
// Check by walking the same loop again, simpler check.
|
||||||
|
auto It = std::next(DominatedSta->getIterator());
|
||||||
|
while (It != MBB->end() && It->isDebugInstr()) ++It;
|
||||||
|
// ... too brittle to track via prev loop; just recurse for
|
||||||
|
// every case where flagsSafeP5 is false. Conservative.
|
||||||
|
bool allSuccClobber = !MBB->succ_empty();
|
||||||
|
for (MachineBasicBlock *Succ : MBB->successors()) {
|
||||||
|
bool succClobbers = false;
|
||||||
|
for (auto SIt = Succ->begin(); SIt != Succ->end(); ++SIt) {
|
||||||
|
if (SIt->isDebugInstr()) continue;
|
||||||
|
if (usesFlagsP(*SIt)) break;
|
||||||
|
if (clobbersFlagsP(*SIt)) { succClobbers = true; break; }
|
||||||
|
if (SIt->isTerminator() && !SIt->isConditionalBranch()) {
|
||||||
|
succClobbers = true; break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!succClobbers) { allSuccClobber = false; break; }
|
||||||
|
}
|
||||||
|
if (allSuccClobber) flagsSafeP5 = true;
|
||||||
|
}
|
||||||
if (!flagsSafeP5) continue;
|
if (!flagsSafeP5) continue;
|
||||||
// Erase DominatedSta and its preceding LDA #K.
|
// Erase DominatedSta and its preceding LDA #K.
|
||||||
auto Prev = DominatedSta->getIterator();
|
auto Prev = DominatedSta->getIterator();
|
||||||
|
|
|
||||||
|
|
@ -58,6 +58,7 @@ LLVMInitializeW65816Target() {
|
||||||
initializeW65816NarrowI32MulPass(PR);
|
initializeW65816NarrowI32MulPass(PR);
|
||||||
initializeW65816PromoteFiToImgPass(PR);
|
initializeW65816PromoteFiToImgPass(PR);
|
||||||
initializeW65816StackSlotMergePass(PR);
|
initializeW65816StackSlotMergePass(PR);
|
||||||
|
initializeW65816StackRelToImgPass(PR);
|
||||||
|
|
||||||
// Default IndVarSimplify's exit-value rewriter to "never". The
|
// Default IndVarSimplify's exit-value rewriter to "never". The
|
||||||
// closed-form replacement frequently widens an i16 induction var
|
// closed-form replacement frequently widens an i16 induction var
|
||||||
|
|
@ -279,6 +280,7 @@ void W65816PassConfig::addPreEmitPass() {
|
||||||
// collapses when X and Y are renamed to the same slot). See
|
// collapses when X and Y are renamed to the same slot). See
|
||||||
// W65816StackSlotMerge.cpp.
|
// W65816StackSlotMerge.cpp.
|
||||||
addPass(createW65816StackSlotMerge());
|
addPass(createW65816StackSlotMerge());
|
||||||
|
addPass(createW65816StackRelToImg());
|
||||||
}
|
}
|
||||||
|
|
||||||
MachineFunctionInfo *W65816TargetMachine::createMachineFunctionInfo(
|
MachineFunctionInfo *W65816TargetMachine::createMachineFunctionInfo(
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue