563 lines
27 KiB
Markdown
563 lines
27 KiB
Markdown
# Session Resume — llvm816 project
|
|
|
|
Drop this into a new Claude Code session and say "read SESSION_STATE.md and
|
|
continue where we left off." Pairs with `LLVM_65816_DESIGN.md` (the design
|
|
doc — read that second).
|
|
|
|
---
|
|
|
|
## 1. Project in one sentence
|
|
|
|
Build an open-source LLVM/Clang backend for the WDC 65816 (Apple IIgs) that
|
|
matches or exceeds Calypsi's output quality, forked from llvm-mos but
|
|
maintained as our own separate W65816 target. User is Scott; expert C dev,
|
|
doesn't want hand-holding on LLVM or 65816 basics.
|
|
|
|
## 2. Where we are in the plan
|
|
|
|
Design doc section 7 lists a 12-step implementation order. We are at:
|
|
|
|
- [x] **Setup toolchain** (prior session)
|
|
- [x] **Architectural decision: separate W65816 target** (design doc §2.5)
|
|
- [x] **Repo-layout decision:** `src/` holds our authored files, `patches/`
|
|
holds modifications to upstream llvm-mos files, `tools/llvm-mos/` is
|
|
ephemeral and gitignored. `scripts/applyBackend.sh` stitches src +
|
|
patches into the clone.
|
|
- [x] **Step 1 — scaffold W65816 target directory.** 41 files under
|
|
`src/llvm/lib/Target/W65816/` + 2 files under
|
|
`src/clang/lib/Basic/Targets/`. 4 upstream patches under `patches/`.
|
|
- [x] **Step 2 — verify the skeleton fully compiles and links.** All 8
|
|
tablegen generators run clean, three static libs
|
|
(LLVMW65816Info/Desc/CodeGen) build, llc links with the target
|
|
registered, zero warnings in the W65816-local build.
|
|
`./bin/llc -march=w65816 -filetype=null /dev/null` → exit 0.
|
|
- [x] **Step 2a — real MC-layer instructions.** `W65816InstrInfo.td` now
|
|
holds ~90 real 65816 opcodes (LDA/STA/LDX/LDY/STX/STY across
|
|
immediate/DP/abs/DPX/AbsX/AbsY/long where applicable; ADC/SBC/CMP/
|
|
AND/ORA/EOR/BIT; INC/DEC/ASL/LSR/ROL/ROR; all transfers; stack
|
|
push/pull; REP/SEP/CLC/SEC/XCE/XBA; branches; JMP/JML/JSR/JSL;
|
|
RTS/RTL/RTI; MVN/MVP). Instructions whose size depends on M or X
|
|
bits exist as `_Imm8`/`_Imm16` pairs carrying the appropriate
|
|
TSFlag bits (MLow/MHigh/XLow/XHigh) for the future REP/SEP pass.
|
|
- [x] **Step 2b — wire MCCodeEmitter.** Tablegen `-gen-emitter` runs
|
|
cleanly; `W65816MCCodeEmitter.cpp` calls the tablegen-provided
|
|
`getBinaryCodeForInstr` and emits Size bytes little-endian.
|
|
- [x] **Step 2c — symbolic fixups.** Each operand class (imm8/imm16/
|
|
addrDP/addrAbs/addrLong/pcrel8/pcrel16) has its own
|
|
`EncoderMethod` that emits a `W65816::fixup_*` at the correct
|
|
byte offset for expression operands. `W65816AsmBackend::applyFixup`
|
|
patches the data bytes little-endian for resolved fixups and
|
|
defers to `maybeAddReloc` for unresolved ones.
|
|
`W65816ELFObjectWriter::getRelocType` returns placeholder
|
|
relocation numbers 1-5 (swap for canonical R_W65816_* names once
|
|
the ELF `EM_` is decided — §7 item 1).
|
|
- [x] **Step 2d — patch 0005 eliminates data-layout warning.** The
|
|
data-layout string for `Triple::w65816` lives in
|
|
`llvm/lib/TargetParser/TargetDataLayout.cpp`; `W65816TargetMachine`
|
|
now calls `TT.computeDataLayout()`. Zero warnings in the W65816
|
|
build.
|
|
- [x] **Step 2e — AsmParser scaffold.** 441-line
|
|
`AsmParser/W65816AsmParser.cpp` ported from MSP430, stripped of
|
|
register-operand handling (65816 has no MC register operands),
|
|
with width-narrowing predicates on each operand class so the
|
|
matcher picks the narrowest instruction variant the value fits
|
|
(e.g. `sta $10` → STA_DP, `sta $1000` → STA_Abs, `sta $10000` →
|
|
STA_Long). `#` is emitted as a literal token to match the AsmString
|
|
tokenisation. Block-move (MVN/MVP) uses `addrDP` for both bank
|
|
bytes so `mvn $01, $02` parses.
|
|
- [x] **Step 2f — operand bit-field wiring.** Every `Inst*` class in
|
|
`W65816InstrFormats.td` now assigns named bitfields into
|
|
`Inst{N-8}` (e.g. `let Inst{15-8} = imm;`). Without this
|
|
tablegen emits an encoder that writes the opcode but leaves the
|
|
operand bytes as zero — we had that bug for an iteration.
|
|
- [x] **Step 2g — smoke-test script.** `scripts/smokeTest.sh` checks
|
|
llc registration, empty-module codegen, and llvm-mc encoding of
|
|
a representative instruction mix. Run with `--build` to rebuild
|
|
first.
|
|
- [x] **Step 2h — end-to-end ELF object.** `llvm-mc -filetype=obj`
|
|
produces a valid ELF with relocations at correct byte offsets.
|
|
Relocations are placeholder numbers 1-5 (§7 item 1 — decide
|
|
EM_/R_* mapping).
|
|
- [x] **Step 2i — Disassembler.** 190-line
|
|
`Disassembler/W65816Disassembler.cpp` tries 1/2/3/4-byte decode
|
|
tables in ascending size order. Custom decoder callbacks for
|
|
imm/addr/pcrel operands wrap raw bits into MCOperands. Mode-
|
|
ambiguous opcodes (LDA/LDX/LDY/ADC/SBC/CMP/AND/ORA/EOR/BIT/CPX/
|
|
CPY immediate forms) are parked in separate
|
|
DecoderTableW65816{MHigh,XHigh}16 tables and the scaffold only
|
|
reads the default tables — so those opcodes always disassemble
|
|
as 3-byte 16-bit-immediate forms until a mode-aware decoder
|
|
lands (alongside REP/SEP).
|
|
- [x] **Step 2j — register operands in the AsmParser.** Key fix found
|
|
via round-trip test: tablegen treats `a`, `x`, `y` in AsmStrings
|
|
(e.g. `"inc a"`, `"lda\t$addr, x"`) as references to the real
|
|
register records, so the matcher expects register operands, not
|
|
literal tokens. AsmParser now produces `k_Reg` operands for
|
|
these identifiers. Verified: `inc a` → 0x1A, `lda $1000, x` →
|
|
0xBD,0x00,0x10, full ELF round-trip passes.
|
|
- [x] **Step 2k — smoke test covers disassembly.** The smoke test
|
|
now feeds raw bytes through `llvm-mc --disassemble` and checks
|
|
for expected mnemonics, so encoder/decoder asymmetries surface
|
|
immediately.
|
|
- [x] **Step 3a — first DAG patterns.** Type-as-mode model (approved).
|
|
`LDAi16imm` pseudo for i16 constants; `RTL` for retglue;
|
|
`emitPrologue` emits canonical `REP #$30`. Mode-dependent
|
|
`_Imm8` variants are `isCodeGenOnly` so the asm matcher never
|
|
picks them.
|
|
- [x] **Step 3c — single-arg function calls.** `LowerFormalArguments`
|
|
receives arg 0 in A; `LowerCall` passes arg 0 in A and JSL's
|
|
via a JSL pseudo to bridge the i16 symbol operand to the MC
|
|
`JSL_Long`'s 24-bit operand class. Result is back in A.
|
|
Multi-arg call lowering still wants a `PUSHA` SDNode + SP unwind
|
|
sequence — caller side currently fatals on >1 args.
|
|
- [x] **Step 3d — multi-arg via stack (callee side).**
|
|
`LowerFormalArguments` now reads arg 1+ from stack via
|
|
FrameIndex + load. `eliminateFrameIndex` translates LDAfi /
|
|
STAfi / ADCfi / SBCfi / ANDfi / ORAfi / EORfi / CMPfi pseudos
|
|
to their `LDA d,S` etc. counterparts with the offset baked in.
|
|
Stack-relative MC instructions are in place; AsmParser
|
|
recognises the `,s` suffix. Callee-side fully working: a
|
|
`define i16 @sum3(i16 %a, i16 %b, i16 %c)` compiles to
|
|
`clc; adc 4,s; clc; adc 6,s; rtl`.
|
|
- [x] **Step 3e — frame-index spill plumbing.** `storeRegToStackSlot`
|
|
and `loadRegFromStackSlot` emit STAfi / LDAfi pseudos so the
|
|
register allocator can spill Acc16 values when needed.
|
|
- [x] **Step 3f — multiplications via shifts.** Multiply by power-of-2
|
|
constants inherits the `shl` patterns (1/2/3/4 bits unrolled to
|
|
`asl a` sequences). Multiply by arbitrary constants and
|
|
runtime values fail at ISel pending library functions.
|
|
- [x] **Step 3h — clang front end builds.** Real C → 65816 machine
|
|
code via the full `clang -target w65816 -c` pipeline. Bumped
|
|
clang's `IntAlign`/`LongAlign`/`PointerAlign`/`SuitableAlign`
|
|
from 8 to 16; also overrode `allowsMisalignedMemoryAccesses` to
|
|
return true. `scripts/cDemo.sh` shows the full front-end
|
|
pipeline on a built-in 7-function demo. Additional patterns:
|
|
`INC_Abs`/`DEC_Abs` for `*p = *p + 1`; `ASRA16` (PHA;ASL;PLA;ROR
|
|
sequence) for signed shift-right by 1.
|
|
- [x] **Step 3i — frame reservation + epilogue.** `emitPrologue`
|
|
now emits `TSC; SEC; SBC #N; TCS` to reserve N bytes for locals
|
|
and spills, then `emitEpilogue` reverses with `TSC; CLC; ADC #N;
|
|
TCS` before the RTL. `eliminateFrameIndex` translates
|
|
FrameIndex operands into stack-relative offsets via
|
|
`disp = FrameOffset + StackSize`. `hasFPImpl` returns false
|
|
(no native FP — direct page would be the logical home). This
|
|
unblocks `clang -O0 -c` for pure-arithmetic functions (each
|
|
arg gets spilled to its own stack slot). Stack-relative
|
|
addressing modes for ADC/SBC/AND/ORA/EOR/CMP let the codegen
|
|
fold loads from frame indices into the carry-arithmetic ops.
|
|
- [x] **Step 3g — basic i8 codegen.** Acc8 patterns now cover:
|
|
`LDAi8imm` (constants), `INA_PSEUDO8` / `DEA_PSEUDO8` (inc/dec),
|
|
`ADCi8imm` / `SBCi8imm` (add/sub immediate), `ANDi8imm` /
|
|
`ORAi8imm` / `EORi8imm` (bitwise immediate), `LDA8abs` /
|
|
`STA8abs` (load/store via global), `ASLA8` / `LSRA8` (1-bit
|
|
shifts), `CMPi8imm` (compare against immediate, with BR_CC i8
|
|
lowering). Frame lowering scans the function IR for any i8
|
|
type usage (return, args, instruction values, operands) and
|
|
picks `REP #$10; SEP #$20` prologue when found, else
|
|
`REP #$30`. AsmPrinter masks i8 immediates to 8 bits before
|
|
printing so `i8 -16` shows `0xf0` rather than `0xfff0`.
|
|
Limitations: i8 mode is per-function only — mixed-mode
|
|
functions get the i8 prologue (8-bit A) and i16 ops fail.
|
|
Asm round-trip for i8 still loses M-mode info (the parser
|
|
can't disambiguate `lda #imm` between Imm8 and Imm16); use
|
|
`-filetype=obj` directly from llc to get the right encoding.
|
|
- [x] **Step 3b — globals, loads, stores, arithmetic, branches,
|
|
bitwise.** `LowerOperation` custom-lowers `GlobalAddress` and
|
|
`ExternalSymbol` to `W65816Wrapper(target...)`. Pseudo +
|
|
AsmPrinter-expansion family covers:
|
|
|
|
- `LDAi16imm`, `LDAabs`, `STAabs` (load/store/materialise via
|
|
Wrapper of global)
|
|
- `ADCi16imm`, `ADCabs`, `SBCi16imm`, `SBCabs` (add/sub with the
|
|
required CLC/SEC carry prefix)
|
|
- `ANDi16imm`, `ORAi16imm`, `EORi16imm` and their `*abs`
|
|
memory-fold variants
|
|
- `CMPi16imm`, `CMPabs` plus `W65816ISD::CMP` / `W65816ISD::BR_CC`
|
|
SDNodes; `LowerBR_CC` swaps constant-on-LHS forms and rewrites
|
|
SETULE/SETUGT/SETLE/SETGT to SETULT/SETUGE/SETLT/SETGE+1 so
|
|
the canonicalised DAG hits our patterns; condition-code map
|
|
covers BEQ/BNE/BCS/BCC plus signed BMI/BPL.
|
|
- `BRA` for unconditional `br`.
|
|
- `INA_PSEUDO` / `DEA_PSEUDO` for `add x, ±1` → `inc a` / `dec a`
|
|
- `ASLA16` / `LSRA16` for `shl x, 1` and `lshr x, 1` → `asl a` /
|
|
`lsr a`
|
|
- `NEGA16` for `0 - x` → `eor #$ffff; inc a`
|
|
- `(xor x, -1)` → `eor #$ffff` (bitwise NOT)
|
|
- Zero-extending byte load: `lda addr; and #$ff`
|
|
|
|
The end-to-end pipeline can now compile and assemble functions
|
|
that read/write globals, do arithmetic on them, and branch
|
|
conditionally — all with optimal-looking 65816 idioms (e.g.
|
|
`lda x ; clc ; adc y` for `*x + *y`).
|
|
- [ ] **Step 3i — open codegen gaps:**
|
|
|
|
1. **Multi-arg call lowering** (caller side). Callee side works;
|
|
caller still bails on >1 arg. Needs PUSHA SDNode + SP-unwind
|
|
in ADJCALLSTACKUP.
|
|
2. **Frame-reserved scratch space.** Prologue doesn't reserve
|
|
stack space for locals/spills, so any alloca'd value or
|
|
allocator-spilled value lands at a negative SP offset and
|
|
eliminateFrameIndex bails. Blocks: -O0 compilation of
|
|
functions with parameters; loops with PHIs that need to
|
|
compare two computed values; two-Acc16 binary ops in
|
|
general. Fix: emit `TSC; CLC; ADC #-N; TCS` (or PHA-loop)
|
|
in emitPrologue and the inverse in emitEpilogue, where N
|
|
is the function's frame size.
|
|
3. **Mixed-mode i8/i16.** Per-function mode only — the prologue
|
|
picks one mode; the other type's ops fail. REP/SEP scheduling
|
|
pass needed.
|
|
4. **Signed `(a - b)` overflow handling.** BMI/BPL based signed
|
|
comparisons are correct only when the subtraction can't
|
|
overflow; pathological values give wrong results.
|
|
5. **`sub imm, var`** and **`mul var, var`** (or non-power-of-2
|
|
constants). Need libcall support.
|
|
6. **SETCC and SELECT_CC i16.** Boolean conversions like
|
|
`(int)(cond != 0)` and `(cond) ? a : b` aren't selectable.
|
|
Custom lowering needed.
|
|
7. **Library functions.** `__mulhi3`, etc. — no runtime yet.
|
|
- [ ] **Step 4 — real frame lowering, calling convention, REP/SEP
|
|
scheduling pass.** The prologue `REP #$30` is unconditional;
|
|
the REP/SEP pass will remove it when redundant.
|
|
|
|
## 3. What is installed and where
|
|
|
|
All under `/home/scott/claude/llvm816/tools/`:
|
|
|
|
| Tool | Path | Notes |
|
|
|---|---|---|
|
|
| llvm-mos source | `tools/llvm-mos/` | shallow clone. Backend files are symlinked in from `src/`; patches applied on top. Reset cleanly via `scripts/updateLlvmMos.sh`. |
|
|
| llvm-mos build dir | `tools/llvm-mos-build/` | cmake-generated, ephemeral |
|
|
| llvm-mos-sdk | `tools/llvm-mos-sdk/` | prebuilt toolchain |
|
|
| MAME 0.264 | `/usr/games/mame` (apt) | supports `-console` (Lua) |
|
|
| Apple IIgs ROMs | `tools/mame/roms/apple2gs.zip`, `apple2gsr1.zip` | from archive.org |
|
|
| Calypsi 5.16 | `tools/calypsi/` | extracted .deb |
|
|
| ORCA/C source | `tools/orca-c/` | reference only |
|
|
|
|
`./setup.sh --verify-only` passed all checks as of the prior session.
|
|
|
|
## 4. Repo layout (current)
|
|
|
|
```
|
|
llvm816/ # git repo, branch main
|
|
├── LLVM_65816_DESIGN.md # tracked
|
|
├── SESSION_STATE.md # this file
|
|
├── setup.sh # tracked
|
|
├── scripts/ # tracked
|
|
│ ├── common.sh
|
|
│ ├── installDeps.sh installCalypsi.sh installOrcaC.sh
|
|
│ ├── installLlvmMos.sh # non-destructive (see §8)
|
|
│ ├── installMame.sh verify.sh
|
|
│ ├── applyBackend.sh # src/ + patches/ -> tools/llvm-mos/
|
|
│ └── updateLlvmMos.sh # reset clone, re-apply backend
|
|
├── src/ # authored files, tracked
|
|
│ ├── llvm/lib/Target/W65816/ # 41 files
|
|
│ │ ├── MCTargetDesc/ (10 files)
|
|
│ │ ├── TargetInfo/ (3 files)
|
|
│ │ └── (28 top-level files)
|
|
│ └── clang/lib/Basic/Targets/
|
|
│ ├── W65816.h
|
|
│ └── W65816.cpp
|
|
├── patches/ # unified diffs, tracked
|
|
│ ├── 0001-triple-add-w65816-arch.patch
|
|
│ ├── 0002-triple-cpp-add-w65816-cases.patch
|
|
│ ├── 0003-clang-basic-dispatch-w65816.patch
|
|
│ └── 0004-cmake-add-w65816-experimental.patch
|
|
├── tools/ # gitignored, ephemeral
|
|
└── .gitignore # excludes tools/, .cache/
|
|
```
|
|
|
|
## 5. Key architectural decisions
|
|
|
|
### 5.1 Separate target, not MOS subtarget feature
|
|
|
|
llvm-mos has `FeatureW65816` declared in `MOSDevices.td` but codegen
|
|
unimplemented (issue #321). We are NOT extending MOS. Reasons:
|
|
- We cannot upstream an AI-assisted backend to llvm-mos anyway.
|
|
- Clean register model: `Acc8`/`Acc16`/`Idx8`/`Idx16` as separate classes.
|
|
- Independent evolution.
|
|
|
|
Recorded in design doc §2.5.
|
|
|
|
### 5.2 Symlinks + patches, not a fork
|
|
|
|
`applyBackend.sh` symlinks every file under `src/` into the corresponding
|
|
path under `tools/llvm-mos/`, then applies each `patches/*.patch` with
|
|
`git apply`. Idempotent: skips already-current symlinks and
|
|
already-applied patches (detected via `git apply --reverse --check`).
|
|
|
|
`updateLlvmMos.sh` is the ONLY script allowed to destructively reset the
|
|
clone. It reverses all patches, removes our symlinks, `git reset --hard
|
|
FETCH_HEAD`, then re-runs `applyBackend.sh`.
|
|
|
|
`installLlvmMos.sh` refuses to touch the clone if it is dirty or off
|
|
main — this is deliberate to protect applied patches.
|
|
|
|
## 6. Concrete next actions (in order)
|
|
|
|
### 6.1 Function arguments
|
|
|
|
`LowerFormalArguments` and `LowerCall` still fatal-error. Without
|
|
arguments, every function we test has to use globals as inputs. The
|
|
plan: pass i8/i16 args via the stack (push right-to-left, caller
|
|
cleans), with the first 1-2 args optionally going in A or X for
|
|
register-passing. Calypsi output is the reference for ABI choices.
|
|
|
|
### 6.2 i8 codegen
|
|
|
|
Currently every function gets `REP #$30` (16-bit mode). For i8 ops
|
|
we need either:
|
|
|
|
- A scan-and-prepend approach: if the function has any i8 op, emit
|
|
`SEP #$20` after the REP for whichever mode dominates, plus
|
|
toggle pseudos around the off-mode regions.
|
|
- Or commit to widening all i8 to i16 pre-ISel (simpler, but uses 2x
|
|
the cycles for byte-heavy code).
|
|
|
|
This is the natural lead-in to the REP/SEP scheduling pass (§6.4).
|
|
|
|
### 6.3 Frame indices, stack locals
|
|
|
|
Add `eliminateFrameIndex` and frame-pointer pseudos so we can spill
|
|
to the stack. Today `W65816RegisterInfo::eliminateFrameIndex` is
|
|
`llvm_unreachable`. Stack accesses on 65816 are `,s` and `(,s),y`
|
|
indirect — needs new operand classes.
|
|
|
|
### 6.4 REP/SEP scheduling pass
|
|
|
|
The core algorithmic work. TSFlag bits on every mode-dependent
|
|
instruction are already in place; the pass walks MIR, dataflows the
|
|
required mode per region, and inserts/removes REP/SEP transitions to
|
|
minimise total mode switches. Design doc §3.3.
|
|
|
|
### 6.2 Wire frame lowering + calling convention (real)
|
|
|
|
`W65816FrameLowering.cpp` is still `llvm_unreachable`. The simplest
|
|
working version: establish an i16 stack pointer-based frame using the
|
|
native SP, locals accessed via stack-relative indirect via Y. Calypsi
|
|
output for a trivial function is a good model.
|
|
|
|
`W65816CallingConv.td` covers i8/i16 return in A but nothing for
|
|
arguments. Start with stack-based (push right-to-left, caller cleans)
|
|
per design doc §3.5.
|
|
|
|
### 6.3 Disassembler mode-aware decoding (deferred)
|
|
|
|
The scaffold disassembler always decodes LDA/LDX/LDY/ADC/SBC/CMP/AND/
|
|
ORA/EOR/BIT/CPX/CPY immediate forms as 3-byte 16-bit-immediate
|
|
variants. A real decoder should track the M/X bits across the stream
|
|
(consuming REP/SEP, XCE transitions) and choose between
|
|
`DecoderTableW65816` (default) and `DecoderTableW65816{MHigh,XHigh}16`
|
|
per instruction. Naturally pairs with the REP/SEP codegen pass since
|
|
both need the same M/X tracking model.
|
|
|
|
### 6.4 REP/SEP scheduling pass
|
|
|
|
The core algorithmic work (design doc §3.3). Every real instruction
|
|
now carries TSFlag bits indicating which M/X mode it requires. The
|
|
pass reads those, does the width-inference / coalescing / transition
|
|
insertion dataflow, and emits REP/SEP instructions at block
|
|
boundaries. Plan to spend multiple sessions.
|
|
|
|
### 6.5 Tidy-ups (can happen in any order)
|
|
|
|
- Decide ELF `EM_` value (§7 item 1). Currently `EM_NONE`, with
|
|
placeholder relocation numbers 1-5 in `W65816ELFObjectWriter`.
|
|
Swap for canonical `R_W65816_*` names once chosen.
|
|
- Replace ASCII-art mnemonics (`inc a`, `dec a`, `asl a`, etc.) with
|
|
proper InstAliases so both `INA` and `INC A` assemble to the same
|
|
opcode. Requires AsmParser (§6.3).
|
|
|
|
## 7. Open design questions flagged by the scaffold
|
|
|
|
1. **ELF `EM_` machine number.** `W65816ELFObjectWriter.cpp` uses
|
|
`ELF::EM_NONE` as a placeholder. llvm-mos uses `EM_MOS = 0x1966` for
|
|
the 6502 family. Decide: share `EM_MOS`, or pick a new value?
|
|
2. **Data layout string** is hardcoded in
|
|
`W65816TargetMachine.cpp` rather than routing through
|
|
`Triple::computeDataLayout()`. That is OK for now — when we're
|
|
ready to consolidate, add a case in `TargetDataLayout.cpp` and
|
|
switch to `TT.computeDataLayout()`.
|
|
3. **i32 return convention** — does i32 return in A:X or via a hidden
|
|
pointer? Currently `W65816CallingConv.td` only handles i8/i16. Design
|
|
doc §3.5 says "A:X for 32-bit" but this isn't modelled yet.
|
|
4. **Register aliasing for mode-dependent widths.** `Acc8` and `Acc16`
|
|
both contain physical register `A`. LLVM's allocator will not cope
|
|
with this correctly. The REP/SEP management pass (§3.3) is required.
|
|
Flagged per the design doc.
|
|
5. **Open questions from design doc §8** (GS/OS DP reservation, bank
|
|
memory model, interrupt ABI, ORCA/C ABI compat, width-contract
|
|
attribute, MAME cycle accuracy) — still unresolved. Punt until after
|
|
we have a working instruction set.
|
|
|
|
## 8. Gotchas + hard-won knowledge
|
|
|
|
- **`installLlvmMos.sh` is non-destructive now.** It refuses to reset
|
|
the clone if it is dirty or off main. Use `scripts/updateLlvmMos.sh`
|
|
to refresh (the only script allowed to reset).
|
|
- **MAME `-console` flag** is listed by `-showusage`, NOT `-help`.
|
|
- **`log()` in `common.sh` writes to stderr.** Don't change it.
|
|
- **llvm-mos has `FeatureW65816` but not working codegen** (issue #321).
|
|
- **`RemapAllTargetPseudoPointerOperands<PtrRegs>` is required** in
|
|
`W65816.td` or tablegen fails with 8 "missing target override for
|
|
pseudoinstruction using PointerLikeRegClass" errors. Don't remove it.
|
|
- **`Triple::w65816` placement in Triple.h:** inserted right after
|
|
`mos,` to keep the 65xx family clustered. See patch 0001.
|
|
- **Added to `LLVM_ALL_EXPERIMENTAL_TARGETS`** in `llvm/CMakeLists.txt`
|
|
so `-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=all` picks up W65816. Not
|
|
strictly required — passing the name explicitly also works.
|
|
- **Operand `OperandType` field wants LLVM's enum spelling**, not
|
|
shortened. Use `OPERAND_IMMEDIATE`, `OPERAND_MEMORY`, `OPERAND_PCREL`
|
|
(see `llvm/include/llvm/MC/MCInstrDesc.h` MCOI::OperandType).
|
|
`OPERAND_IMM` / `OPERAND_IMM8` / `OPERAND_IMM16` are NOT valid.
|
|
- **`PrintMethod` signature for PC-rel operands takes `Address`.**
|
|
Tablegen generates `printPCRel8(MI, Address, OpNo, O)` — 4 args, not
|
|
3. Non-PC-rel PrintMethods use the 3-arg form `(MI, OpNo, O)`.
|
|
- **Several `.cpp` files needed explicit `#include`s beyond what MSP430
|
|
ships with** because the tablegen-generated `.inc` references full
|
|
types: `W65816RegisterInfo.cpp` needs `W65816Subtarget.h` and
|
|
`W65816FrameLowering.h` (for `GET_REGINFO_TARGET_DESC`);
|
|
`W65816InstPrinter.cpp` needs `llvm/MC/MCAsmInfo.h` (for
|
|
`MAI.printExpr`).
|
|
- **Marker classes can't override mayLoad/mayStore via `let`.**
|
|
TableGen's multi-inheritance doesn't let unrelated sibling classes
|
|
touch fields from the base `Instruction`. Use `let isReturn = 1, ...
|
|
in { ... }` blocks at def sites instead (idiomatic LLVM style).
|
|
- **Data layout is hardcoded** in `W65816TargetMachine.cpp` rather than
|
|
computed from `TT.computeDataLayout()`, because `TargetDataLayout.cpp`
|
|
doesn't have a case for `w65816` yet. This produces one `-Wswitch`
|
|
warning in the llvm-mos build. §6.5 notes adding a 5th patch to
|
|
silence it.
|
|
|
|
## 9. Disk space recovery
|
|
|
|
If space is tight before resume:
|
|
|
|
```
|
|
# safe to delete — regenerable from setup.sh + applyBackend.sh:
|
|
rm -rf /home/scott/claude/llvm816/tools/
|
|
rm -rf /home/scott/claude/llvm816/.cache/
|
|
```
|
|
|
|
Regenerate with `./setup.sh` then `./scripts/applyBackend.sh`.
|
|
|
|
The `tools/llvm-mos-build/` directory alone is ~2 GB after a full
|
|
configure+tablegen. A full ninja build will be much more.
|
|
|
|
## 10. Quick verification commands for resume
|
|
|
|
```
|
|
# Verify the scaffold is in place:
|
|
ls src/llvm/lib/Target/W65816/ | wc -l # expect ~20 top-level files
|
|
ls patches/ # expect 4 .patch files
|
|
|
|
# Verify apply is clean:
|
|
./scripts/applyBackend.sh # expect 0 new, 44 current symlinks; 0 new, 4 applied patches
|
|
|
|
# Verify cmake configures:
|
|
cmake -S tools/llvm-mos/llvm -B tools/llvm-mos-build -G Ninja \
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
-DLLVM_TARGETS_TO_BUILD="" \
|
|
-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="MOS;W65816" \
|
|
-DLLVM_ENABLE_PROJECTS="clang" \
|
|
-DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_EXAMPLES=OFF \
|
|
-DLLVM_INCLUDE_BENCHMARKS=OFF
|
|
|
|
# Verify full build + llc registration (slow first time, cached after):
|
|
( cd tools/llvm-mos-build && ninja LLVMW65816Info LLVMW65816Desc LLVMW65816CodeGen llc )
|
|
./tools/llvm-mos-build/bin/llc --version | grep w65816
|
|
./tools/llvm-mos-build/bin/llc -march=w65816 -filetype=null /dev/null ; echo $?
|
|
# Expect: grep matches; llc exits 0.
|
|
```
|
|
|
|
## 11. Files changed this session (not yet committed by user)
|
|
|
|
```
|
|
scripts/applyBackend.sh # idempotent src+patches apply
|
|
scripts/updateLlvmMos.sh # safe reset+reapply
|
|
scripts/installLlvmMos.sh # no longer destructively resets
|
|
scripts/smokeTest.sh # regression smoke test
|
|
src/llvm/lib/Target/W65816/ # full MC layer + first codegen:
|
|
# CodeGen scaffolds (~40 files)
|
|
# AsmParser/ (2 files)
|
|
# Disassembler/ (2 files)
|
|
# MCTargetDesc/ (11 files)
|
|
# TargetInfo/ (3 files)
|
|
# ~90 real instruction defs
|
|
# ~25 codegen pseudos +
|
|
# AsmPrinter expansion
|
|
src/clang/lib/Basic/Targets/W65816.{h,cpp}
|
|
patches/0001..0005.patch # upstream llvm-mos mods
|
|
SESSION_STATE.md # this file
|
|
```
|
|
|
|
The tools/ tree is all ephemeral (gitignored).
|
|
|
|
### What now works end-to-end
|
|
|
|
Try it yourself:
|
|
|
|
```
|
|
./scripts/cDemo.sh # built-in demo
|
|
./scripts/cDemo.sh path/to/your.c
|
|
```
|
|
|
|
Sample output for the built-in demo (real C → real 65816):
|
|
|
|
```
|
|
get_counter: lda counter ; rtl
|
|
set_counter: sta counter ; rtl
|
|
sum_with_target: clc ; adc target ; rtl
|
|
doubler: asl a ; rtl
|
|
half: lsr a ; rtl
|
|
reset: lda #0 ; sta counter ; rtl
|
|
answer: lda #42 ; rtl
|
|
```
|
|
|
|
### Detail: command-line invocations
|
|
|
|
```
|
|
# Round-trip asm -> bytes -> asm:
|
|
echo ' lda #0x1234' | ./bin/llvm-mc -arch=w65816 -show-encoding
|
|
# -> lda #0x1234 ; encoding: [0xa9,0x34,0x12]
|
|
|
|
echo '0xea 0xa9 0x34 0x12 0x6b' | ./bin/llvm-mc --disassemble --triple=w65816
|
|
# -> nop ; lda #0x1234 ; rtl
|
|
|
|
# Full asm -> ELF -> disasm:
|
|
./bin/llvm-mc -arch=w65816 -filetype=obj foo.s -o foo.o
|
|
./bin/llvm-objdump --triple=w65816 -d foo.o
|
|
|
|
# Real codegen. This .ll compiles cleanly:
|
|
@x = global i16 0
|
|
@y = global i16 0
|
|
define i16 @fib_step() {
|
|
%a = load i16, ptr @x
|
|
%b = load i16, ptr @y
|
|
%s = add i16 %a, %b
|
|
store i16 %a, ptr @y
|
|
store i16 %s, ptr @x
|
|
ret i16 %s
|
|
}
|
|
# llc emits idiomatic 65816:
|
|
# rep #0x30
|
|
# lda x; clc; adc y ; A = a + b
|
|
# sta x ; x = a + b
|
|
# ...
|
|
```
|
|
|
|
### What doesn't work yet
|
|
|
|
- **Multi-arg calls** (caller side). Callee accepts stack-passed
|
|
args; the matching push side is unimplemented. Functions with
|
|
more than one arg can be defined and compile correctly, but
|
|
cannot be called from another function.
|
|
- **Two-Acc16 cmp.** Loops with PHIs that need to compare two
|
|
computed values fail at ISel — only one A.
|
|
- **i8 ops** (always 16-bit mode for now).
|
|
- **Signed overflow** in CMP-based branches: BMI/BPL test the N flag
|
|
of the subtraction, which is incorrect when the subtract overflows.
|
|
- **`mul var, var`** (or by non-power-of-2 constants). Needs library
|
|
functions (`__mulhi3` etc.).
|
|
- **`sub imm, var`** (only `sub var, imm` works).
|
|
|
|
See §6.1-§6.4 for the next steps.
|