Scott Duensing 873eab4922 Checkpoint.

2026-04-25 17:07:28 -05:00

27 KiB

Raw Blame History

Session Resume — llvm816 project

Drop this into a new Claude Code session and say "read SESSION_STATE.md and continue where we left off." Pairs with LLVM_65816_DESIGN.md (the design doc — read that second).

1. Project in one sentence

Build an open-source LLVM/Clang backend for the WDC 65816 (Apple IIgs) that matches or exceeds Calypsi's output quality, forked from llvm-mos but maintained as our own separate W65816 target. User is Scott; expert C dev, doesn't want hand-holding on LLVM or 65816 basics.

2. Where we are in the plan

Design doc section 7 lists a 12-step implementation order. We are at:

Setup toolchain (prior session)
Architectural decision: separate W65816 target (design doc §2.5)
Repo-layout decision: src/ holds our authored files, patches/ holds modifications to upstream llvm-mos files, tools/llvm-mos/ is ephemeral and gitignored. scripts/applyBackend.sh stitches src + patches into the clone.
Step 1 — scaffold W65816 target directory. 41 files under src/llvm/lib/Target/W65816/ + 2 files under src/clang/lib/Basic/Targets/. 4 upstream patches under patches/.
Step 2 — verify the skeleton fully compiles and links. All 8 tablegen generators run clean, three static libs (LLVMW65816Info/Desc/CodeGen) build, llc links with the target registered, zero warnings in the W65816-local build. ./bin/llc -march=w65816 -filetype=null /dev/null → exit 0.
Step 2a — real MC-layer instructions. W65816InstrInfo.td now holds ~90 real 65816 opcodes (LDA/STA/LDX/LDY/STX/STY across immediate/DP/abs/DPX/AbsX/AbsY/long where applicable; ADC/SBC/CMP/ AND/ORA/EOR/BIT; INC/DEC/ASL/LSR/ROL/ROR; all transfers; stack push/pull; REP/SEP/CLC/SEC/XCE/XBA; branches; JMP/JML/JSR/JSL; RTS/RTL/RTI; MVN/MVP). Instructions whose size depends on M or X bits exist as _Imm8/_Imm16 pairs carrying the appropriate TSFlag bits (MLow/MHigh/XLow/XHigh) for the future REP/SEP pass.
Step 2b — wire MCCodeEmitter. Tablegen -gen-emitter runs cleanly; W65816MCCodeEmitter.cpp calls the tablegen-provided getBinaryCodeForInstr and emits Size bytes little-endian.
Step 2c — symbolic fixups. Each operand class (imm8/imm16/ addrDP/addrAbs/addrLong/pcrel8/pcrel16) has its own EncoderMethod that emits a W65816::fixup_* at the correct byte offset for expression operands. W65816AsmBackend::applyFixup patches the data bytes little-endian for resolved fixups and defers to maybeAddReloc for unresolved ones. W65816ELFObjectWriter::getRelocType returns placeholder relocation numbers 1-5 (swap for canonical R_W65816_* names once the ELF EM_ is decided — §7 item 1).
Step 2d — patch 0005 eliminates data-layout warning. The data-layout string for Triple::w65816 lives in llvm/lib/TargetParser/TargetDataLayout.cpp; W65816TargetMachine now calls TT.computeDataLayout(). Zero warnings in the W65816 build.
Step 2e — AsmParser scaffold. 441-line AsmParser/W65816AsmParser.cpp ported from MSP430, stripped of register-operand handling (65816 has no MC register operands), with width-narrowing predicates on each operand class so the matcher picks the narrowest instruction variant the value fits (e.g. sta $10 → STA_DP, sta $1000 → STA_Abs, sta $10000 → STA_Long). # is emitted as a literal token to match the AsmString tokenisation. Block-move (MVN/MVP) uses addrDP for both bank bytes so mvn $01, $02 parses.
Step 2f — operand bit-field wiring. Every Inst* class in W65816InstrFormats.td now assigns named bitfields into Inst{N-8} (e.g. let Inst{15-8} = imm;). Without this tablegen emits an encoder that writes the opcode but leaves the operand bytes as zero — we had that bug for an iteration.
Step 2g — smoke-test script. scripts/smokeTest.sh checks llc registration, empty-module codegen, and llvm-mc encoding of a representative instruction mix. Run with --build to rebuild first.
Step 2h — end-to-end ELF object. llvm-mc -filetype=obj produces a valid ELF with relocations at correct byte offsets. Relocations are placeholder numbers 1-5 (§7 item 1 — decide EM_/R_* mapping).
Step 2i — Disassembler. 190-line Disassembler/W65816Disassembler.cpp tries 1/2/3/4-byte decode tables in ascending size order. Custom decoder callbacks for imm/addr/pcrel operands wrap raw bits into MCOperands. Mode- ambiguous opcodes (LDA/LDX/LDY/ADC/SBC/CMP/AND/ORA/EOR/BIT/CPX/ CPY immediate forms) are parked in separate DecoderTableW65816{MHigh,XHigh}16 tables and the scaffold only reads the default tables — so those opcodes always disassemble as 3-byte 16-bit-immediate forms until a mode-aware decoder lands (alongside REP/SEP).
Step 2j — register operands in the AsmParser. Key fix found via round-trip test: tablegen treats a, x, y in AsmStrings (e.g. "inc a", "lda\t$addr, x") as references to the real register records, so the matcher expects register operands, not literal tokens. AsmParser now produces k_Reg operands for these identifiers. Verified: inc a → 0x1A, lda $1000, x → 0xBD,0x00,0x10, full ELF round-trip passes.
Step 2k — smoke test covers disassembly. The smoke test now feeds raw bytes through llvm-mc --disassemble and checks for expected mnemonics, so encoder/decoder asymmetries surface immediately.
Step 3a — first DAG patterns. Type-as-mode model (approved). LDAi16imm pseudo for i16 constants; RTL for retglue; emitPrologue emits canonical REP #$30. Mode-dependent _Imm8 variants are isCodeGenOnly so the asm matcher never picks them.
Step 3c — single-arg function calls. LowerFormalArguments receives arg 0 in A; LowerCall passes arg 0 in A and JSL's via a JSL pseudo to bridge the i16 symbol operand to the MC JSL_Long's 24-bit operand class. Result is back in A. Multi-arg call lowering still wants a PUSHA SDNode + SP unwind sequence — caller side currently fatals on >1 args.
Step 3d — multi-arg via stack (callee side). LowerFormalArguments now reads arg 1+ from stack via FrameIndex + load. eliminateFrameIndex translates LDAfi / STAfi / ADCfi / SBCfi / ANDfi / ORAfi / EORfi / CMPfi pseudos to their LDA d,S etc. counterparts with the offset baked in. Stack-relative MC instructions are in place; AsmParser recognises the ,s suffix. Callee-side fully working: a define i16 @sum3(i16 %a, i16 %b, i16 %c) compiles to clc; adc 4,s; clc; adc 6,s; rtl.
Step 3e — frame-index spill plumbing. storeRegToStackSlot and loadRegFromStackSlot emit STAfi / LDAfi pseudos so the register allocator can spill Acc16 values when needed.
Step 3f — multiplications via shifts. Multiply by power-of-2 constants inherits the shl patterns (1/2/3/4 bits unrolled to asl a sequences). Multiply by arbitrary constants and runtime values fail at ISel pending library functions.
Step 3h — clang front end builds. Real C → 65816 machine code via the full clang -target w65816 -c pipeline. Bumped clang's IntAlign/LongAlign/PointerAlign/SuitableAlign from 8 to 16; also overrode allowsMisalignedMemoryAccesses to return true. scripts/cDemo.sh shows the full front-end pipeline on a built-in 7-function demo. Additional patterns: INC_Abs/DEC_Abs for *p = *p + 1; ASRA16 (PHA;ASL;PLA;ROR sequence) for signed shift-right by 1.
Step 3i — frame reservation + epilogue. emitPrologue now emits TSC; SEC; SBC #N; TCS to reserve N bytes for locals and spills, then emitEpilogue reverses with TSC; CLC; ADC #N; TCS before the RTL. eliminateFrameIndex translates FrameIndex operands into stack-relative offsets via disp = FrameOffset + StackSize. hasFPImpl returns false (no native FP — direct page would be the logical home). This unblocks clang -O0 -c for pure-arithmetic functions (each arg gets spilled to its own stack slot). Stack-relative addressing modes for ADC/SBC/AND/ORA/EOR/CMP let the codegen fold loads from frame indices into the carry-arithmetic ops.
Step 3g — basic i8 codegen. Acc8 patterns now cover: LDAi8imm (constants), INA_PSEUDO8 / DEA_PSEUDO8 (inc/dec), ADCi8imm / SBCi8imm (add/sub immediate), ANDi8imm / ORAi8imm / EORi8imm (bitwise immediate), LDA8abs / STA8abs (load/store via global), ASLA8 / LSRA8 (1-bit shifts), CMPi8imm (compare against immediate, with BR_CC i8 lowering). Frame lowering scans the function IR for any i8 type usage (return, args, instruction values, operands) and picks REP #$10; SEP #$20 prologue when found, else REP #$30. AsmPrinter masks i8 immediates to 8 bits before printing so i8 -16 shows 0xf0 rather than 0xfff0. Limitations: i8 mode is per-function only — mixed-mode functions get the i8 prologue (8-bit A) and i16 ops fail. Asm round-trip for i8 still loses M-mode info (the parser can't disambiguate lda #imm between Imm8 and Imm16); use -filetype=obj directly from llc to get the right encoding.

Step 3b — globals, loads, stores, arithmetic, branches, bitwise. LowerOperation custom-lowers GlobalAddress and ExternalSymbol to W65816Wrapper(target...). Pseudo + AsmPrinter-expansion family covers:

- `LDAi16imm`, `LDAabs`, `STAabs` (load/store/materialise via
  Wrapper of global)
- `ADCi16imm`, `ADCabs`, `SBCi16imm`, `SBCabs` (add/sub with the
  required CLC/SEC carry prefix)
- `ANDi16imm`, `ORAi16imm`, `EORi16imm` and their `*abs`
  memory-fold variants
- `CMPi16imm`, `CMPabs` plus `W65816ISD::CMP` / `W65816ISD::BR_CC`
  SDNodes; `LowerBR_CC` swaps constant-on-LHS forms and rewrites
  SETULE/SETUGT/SETLE/SETGT to SETULT/SETUGE/SETLT/SETGE+1 so
  the canonicalised DAG hits our patterns; condition-code map
  covers BEQ/BNE/BCS/BCC plus signed BMI/BPL.
- `BRA` for unconditional `br`.
- `INA_PSEUDO` / `DEA_PSEUDO` for `add x, ±1` → `inc a` / `dec a`
- `ASLA16` / `LSRA16` for `shl x, 1` and `lshr x, 1` → `asl a` /
  `lsr a`
- `NEGA16` for `0 - x` → `eor #$ffff; inc a`
- `(xor x, -1)` → `eor #$ffff` (bitwise NOT)
- Zero-extending byte load: `lda addr; and #$ff`

The end-to-end pipeline can now compile and assemble functions
that read/write globals, do arithmetic on them, and branch
conditionally — all with optimal-looking 65816 idioms (e.g.
`lda x ; clc ; adc y` for `*x + *y`).

Step 3i — open codegen gaps:

1. **Multi-arg call lowering** (caller side).  Callee side works;
   caller still bails on >1 arg.  Needs PUSHA SDNode + SP-unwind
   in ADJCALLSTACKUP.
2. **Frame-reserved scratch space.**  Prologue doesn't reserve
   stack space for locals/spills, so any alloca'd value or
   allocator-spilled value lands at a negative SP offset and
   eliminateFrameIndex bails.  Blocks: -O0 compilation of
   functions with parameters; loops with PHIs that need to
   compare two computed values; two-Acc16 binary ops in
   general.  Fix: emit `TSC; CLC; ADC #-N; TCS` (or PHA-loop)
   in emitPrologue and the inverse in emitEpilogue, where N
   is the function's frame size.
3. **Mixed-mode i8/i16.**  Per-function mode only — the prologue
   picks one mode; the other type's ops fail.  REP/SEP scheduling
   pass needed.
4. **Signed `(a - b)` overflow handling.**  BMI/BPL based signed
   comparisons are correct only when the subtraction can't
   overflow; pathological values give wrong results.
5. **`sub imm, var`** and **`mul var, var`** (or non-power-of-2
   constants).  Need libcall support.
6. **SETCC and SELECT_CC i16.**  Boolean conversions like
   `(int)(cond != 0)` and `(cond) ? a : b` aren't selectable.
   Custom lowering needed.
7. **Library functions.**  `__mulhi3`, etc. — no runtime yet.

Step 4 — real frame lowering, calling convention, REP/SEP scheduling pass. The prologue REP #$30 is unconditional; the REP/SEP pass will remove it when redundant.

3. What is installed and where

All under /home/scott/claude/llvm816/tools/:

Tool	Path	Notes
llvm-mos source	`tools/llvm-mos/`	shallow clone. Backend files are symlinked in from `src/`; patches applied on top. Reset cleanly via `scripts/updateLlvmMos.sh`.
llvm-mos build dir	`tools/llvm-mos-build/`	cmake-generated, ephemeral
llvm-mos-sdk	`tools/llvm-mos-sdk/`	prebuilt toolchain
MAME 0.264	`/usr/games/mame` (apt)	supports `-console` (Lua)
Apple IIgs ROMs	`tools/mame/roms/apple2gs.zip`, `apple2gsr1.zip`	from archive.org
Calypsi 5.16	`tools/calypsi/`	extracted .deb
ORCA/C source	`tools/orca-c/`	reference only

./setup.sh --verify-only passed all checks as of the prior session.

4. Repo layout (current)

llvm816/                                      # git repo, branch main
├── LLVM_65816_DESIGN.md                      # tracked
├── SESSION_STATE.md                          # this file
├── setup.sh                                  # tracked
├── scripts/                                  # tracked
│   ├── common.sh
│   ├── installDeps.sh  installCalypsi.sh  installOrcaC.sh
│   ├── installLlvmMos.sh                     # non-destructive (see §8)
│   ├── installMame.sh  verify.sh
│   ├── applyBackend.sh                       # src/ + patches/ -> tools/llvm-mos/
│   └── updateLlvmMos.sh                      # reset clone, re-apply backend
├── src/                                      # authored files, tracked
│   ├── llvm/lib/Target/W65816/               # 41 files
│   │   ├── MCTargetDesc/ (10 files)
│   │   ├── TargetInfo/ (3 files)
│   │   └── (28 top-level files)
│   └── clang/lib/Basic/Targets/
│       ├── W65816.h
│       └── W65816.cpp
├── patches/                                  # unified diffs, tracked
│   ├── 0001-triple-add-w65816-arch.patch
│   ├── 0002-triple-cpp-add-w65816-cases.patch
│   ├── 0003-clang-basic-dispatch-w65816.patch
│   └── 0004-cmake-add-w65816-experimental.patch
├── tools/                                    # gitignored, ephemeral
└── .gitignore                                # excludes tools/, .cache/

5. Key architectural decisions

5.1 Separate target, not MOS subtarget feature

llvm-mos has FeatureW65816 declared in MOSDevices.td but codegen unimplemented (issue #321). We are NOT extending MOS. Reasons:

We cannot upstream an AI-assisted backend to llvm-mos anyway.
Clean register model: Acc8/Acc16/Idx8/Idx16 as separate classes.
Independent evolution.

Recorded in design doc §2.5.

5.2 Symlinks + patches, not a fork

applyBackend.sh symlinks every file under src/ into the corresponding path under tools/llvm-mos/, then applies each patches/*.patch with git apply. Idempotent: skips already-current symlinks and already-applied patches (detected via git apply --reverse --check).

updateLlvmMos.sh is the ONLY script allowed to destructively reset the clone. It reverses all patches, removes our symlinks, git reset --hard FETCH_HEAD, then re-runs applyBackend.sh.

installLlvmMos.sh refuses to touch the clone if it is dirty or off main — this is deliberate to protect applied patches.

6. Concrete next actions (in order)

6.1 Function arguments

LowerFormalArguments and LowerCall still fatal-error. Without arguments, every function we test has to use globals as inputs. The plan: pass i8/i16 args via the stack (push right-to-left, caller cleans), with the first 1-2 args optionally going in A or X for register-passing. Calypsi output is the reference for ABI choices.

6.2 i8 codegen

Currently every function gets REP #$30 (16-bit mode). For i8 ops we need either:

A scan-and-prepend approach: if the function has any i8 op, emit SEP #$20 after the REP for whichever mode dominates, plus toggle pseudos around the off-mode regions.
Or commit to widening all i8 to i16 pre-ISel (simpler, but uses 2x the cycles for byte-heavy code).

This is the natural lead-in to the REP/SEP scheduling pass (§6.4).

6.3 Frame indices, stack locals

Add eliminateFrameIndex and frame-pointer pseudos so we can spill to the stack. Today W65816RegisterInfo::eliminateFrameIndex is llvm_unreachable. Stack accesses on 65816 are ,s and (,s),y indirect — needs new operand classes.

6.4 REP/SEP scheduling pass

The core algorithmic work. TSFlag bits on every mode-dependent instruction are already in place; the pass walks MIR, dataflows the required mode per region, and inserts/removes REP/SEP transitions to minimise total mode switches. Design doc §3.3.

6.2 Wire frame lowering + calling convention (real)

W65816FrameLowering.cpp is still llvm_unreachable. The simplest working version: establish an i16 stack pointer-based frame using the native SP, locals accessed via stack-relative indirect via Y. Calypsi output for a trivial function is a good model.

W65816CallingConv.td covers i8/i16 return in A but nothing for arguments. Start with stack-based (push right-to-left, caller cleans) per design doc §3.5.

6.3 Disassembler mode-aware decoding (deferred)

The scaffold disassembler always decodes LDA/LDX/LDY/ADC/SBC/CMP/AND/ ORA/EOR/BIT/CPX/CPY immediate forms as 3-byte 16-bit-immediate variants. A real decoder should track the M/X bits across the stream (consuming REP/SEP, XCE transitions) and choose between DecoderTableW65816 (default) and DecoderTableW65816{MHigh,XHigh}16 per instruction. Naturally pairs with the REP/SEP codegen pass since both need the same M/X tracking model.

6.4 REP/SEP scheduling pass

The core algorithmic work (design doc §3.3). Every real instruction now carries TSFlag bits indicating which M/X mode it requires. The pass reads those, does the width-inference / coalescing / transition insertion dataflow, and emits REP/SEP instructions at block boundaries. Plan to spend multiple sessions.

6.5 Tidy-ups (can happen in any order)

Decide ELF EM_ value (§7 item 1). Currently EM_NONE, with placeholder relocation numbers 1-5 in W65816ELFObjectWriter. Swap for canonical R_W65816_* names once chosen.
Replace ASCII-art mnemonics (inc a, dec a, asl a, etc.) with proper InstAliases so both INA and INC A assemble to the same opcode. Requires AsmParser (§6.3).

7. Open design questions flagged by the scaffold

ELF EM_ machine number. W65816ELFObjectWriter.cpp uses ELF::EM_NONE as a placeholder. llvm-mos uses EM_MOS = 0x1966 for the 6502 family. Decide: share EM_MOS, or pick a new value?
Data layout string is hardcoded in W65816TargetMachine.cpp rather than routing through Triple::computeDataLayout(). That is OK for now — when we're ready to consolidate, add a case in TargetDataLayout.cpp and switch to TT.computeDataLayout().
i32 return convention — does i32 return in A:X or via a hidden pointer? Currently W65816CallingConv.td only handles i8/i16. Design doc §3.5 says "A:X for 32-bit" but this isn't modelled yet.
Register aliasing for mode-dependent widths. Acc8 and Acc16 both contain physical register A. LLVM's allocator will not cope with this correctly. The REP/SEP management pass (§3.3) is required. Flagged per the design doc.
Open questions from design doc §8 (GS/OS DP reservation, bank memory model, interrupt ABI, ORCA/C ABI compat, width-contract attribute, MAME cycle accuracy) — still unresolved. Punt until after we have a working instruction set.

8. Gotchas + hard-won knowledge

installLlvmMos.sh is non-destructive now. It refuses to reset the clone if it is dirty or off main. Use scripts/updateLlvmMos.sh to refresh (the only script allowed to reset).
MAME -console flag is listed by -showusage, NOT -help.
log() in common.sh writes to stderr. Don't change it.
llvm-mos has FeatureW65816 but not working codegen (issue #321).
RemapAllTargetPseudoPointerOperands<PtrRegs> is required in W65816.td or tablegen fails with 8 "missing target override for pseudoinstruction using PointerLikeRegClass" errors. Don't remove it.
Triple::w65816 placement in Triple.h: inserted right after mos, to keep the 65xx family clustered. See patch 0001.
Added to LLVM_ALL_EXPERIMENTAL_TARGETS in llvm/CMakeLists.txt so -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=all picks up W65816. Not strictly required — passing the name explicitly also works.
Operand OperandType field wants LLVM's enum spelling, not shortened. Use OPERAND_IMMEDIATE, OPERAND_MEMORY, OPERAND_PCREL (see llvm/include/llvm/MC/MCInstrDesc.h MCOI::OperandType). OPERAND_IMM / OPERAND_IMM8 / OPERAND_IMM16 are NOT valid.
PrintMethod signature for PC-rel operands takes Address. Tablegen generates printPCRel8(MI, Address, OpNo, O) — 4 args, not 3. Non-PC-rel PrintMethods use the 3-arg form (MI, OpNo, O).
Several .cpp files needed explicit #includes beyond what MSP430 ships with because the tablegen-generated .inc references full types: W65816RegisterInfo.cpp needs W65816Subtarget.h and W65816FrameLowering.h (for GET_REGINFO_TARGET_DESC); W65816InstPrinter.cpp needs llvm/MC/MCAsmInfo.h (for MAI.printExpr).
Marker classes can't override mayLoad/mayStore via let. TableGen's multi-inheritance doesn't let unrelated sibling classes touch fields from the base Instruction. Use let isReturn = 1, ... in { ... } blocks at def sites instead (idiomatic LLVM style).
Data layout is hardcoded in W65816TargetMachine.cpp rather than computed from TT.computeDataLayout(), because TargetDataLayout.cpp doesn't have a case for w65816 yet. This produces one -Wswitch warning in the llvm-mos build. §6.5 notes adding a 5th patch to silence it.

9. Disk space recovery

If space is tight before resume:

# safe to delete — regenerable from setup.sh + applyBackend.sh:
rm -rf /home/scott/claude/llvm816/tools/
rm -rf /home/scott/claude/llvm816/.cache/

Regenerate with ./setup.sh then ./scripts/applyBackend.sh.

The tools/llvm-mos-build/ directory alone is ~2 GB after a full configure+tablegen. A full ninja build will be much more.

10. Quick verification commands for resume

# Verify the scaffold is in place:
ls src/llvm/lib/Target/W65816/ | wc -l       # expect ~20 top-level files
ls patches/                                  # expect 4 .patch files

# Verify apply is clean:
./scripts/applyBackend.sh                    # expect 0 new, 44 current symlinks; 0 new, 4 applied patches

# Verify cmake configures:
cmake -S tools/llvm-mos/llvm -B tools/llvm-mos-build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_TARGETS_TO_BUILD="" \
  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="MOS;W65816" \
  -DLLVM_ENABLE_PROJECTS="clang" \
  -DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_EXAMPLES=OFF \
  -DLLVM_INCLUDE_BENCHMARKS=OFF

# Verify full build + llc registration (slow first time, cached after):
( cd tools/llvm-mos-build && ninja LLVMW65816Info LLVMW65816Desc LLVMW65816CodeGen llc )
./tools/llvm-mos-build/bin/llc --version | grep w65816
./tools/llvm-mos-build/bin/llc -march=w65816 -filetype=null /dev/null ; echo $?
# Expect: grep matches; llc exits 0.

11. Files changed this session (not yet committed by user)

scripts/applyBackend.sh                  # idempotent src+patches apply
scripts/updateLlvmMos.sh                 # safe reset+reapply
scripts/installLlvmMos.sh                # no longer destructively resets
scripts/smokeTest.sh                     # regression smoke test
src/llvm/lib/Target/W65816/              # full MC layer + first codegen:
                                         #   CodeGen scaffolds (~40 files)
                                         #   AsmParser/ (2 files)
                                         #   Disassembler/ (2 files)
                                         #   MCTargetDesc/ (11 files)
                                         #   TargetInfo/ (3 files)
                                         #   ~90 real instruction defs
                                         #   ~25 codegen pseudos +
                                         #     AsmPrinter expansion
src/clang/lib/Basic/Targets/W65816.{h,cpp}
patches/0001..0005.patch                 # upstream llvm-mos mods
SESSION_STATE.md                         # this file

The tools/ tree is all ephemeral (gitignored).

What now works end-to-end

Try it yourself:

./scripts/cDemo.sh    # built-in demo
./scripts/cDemo.sh path/to/your.c

Sample output for the built-in demo (real C → real 65816):

get_counter:     lda counter ; rtl
set_counter:     sta counter ; rtl
sum_with_target: clc ; adc target ; rtl
doubler:         asl a ; rtl
half:            lsr a ; rtl
reset:           lda #0 ; sta counter ; rtl
answer:          lda #42 ; rtl

Detail: command-line invocations

# Round-trip asm -> bytes -> asm:
echo '	lda #0x1234' | ./bin/llvm-mc -arch=w65816 -show-encoding
# -> lda #0x1234 ; encoding: [0xa9,0x34,0x12]

echo '0xea 0xa9 0x34 0x12 0x6b' | ./bin/llvm-mc --disassemble --triple=w65816
# -> nop ; lda #0x1234 ; rtl

# Full asm -> ELF -> disasm:
./bin/llvm-mc -arch=w65816 -filetype=obj foo.s -o foo.o
./bin/llvm-objdump --triple=w65816 -d foo.o

# Real codegen.  This .ll compiles cleanly:
@x = global i16 0
@y = global i16 0
define i16 @fib_step() {
  %a = load i16, ptr @x
  %b = load i16, ptr @y
  %s = add i16 %a, %b
  store i16 %a, ptr @y
  store i16 %s, ptr @x
  ret i16 %s
}
# llc emits idiomatic 65816:
#   rep #0x30
#   lda x; clc; adc y    ; A = a + b
#   sta x                ; x = a + b
#   ...

What doesn't work yet

Multi-arg calls (caller side). Callee accepts stack-passed args; the matching push side is unimplemented. Functions with more than one arg can be defined and compile correctly, but cannot be called from another function.
Two-Acc16 cmp. Loops with PHIs that need to compare two computed values fail at ISel — only one A.
i8 ops (always 16-bit mode for now).
Signed overflow in CMP-based branches: BMI/BPL test the N flag of the subtraction, which is incorrect when the subtract overflows.
mul var, var (or by non-power-of-2 constants). Needs library functions (__mulhi3 etc.).
sub imm, var (only sub var, imm works).

See §6.1-§6.4 for the next steps.

27 KiB Raw Blame History

Session Resume — llvm816 project

1. Project in one sentence

2. Where we are in the plan

3. What is installed and where

4. Repo layout (current)

5. Key architectural decisions

5.1 Separate target, not MOS subtarget feature

5.2 Symlinks + patches, not a fork

6. Concrete next actions (in order)

6.1 Function arguments

6.2 i8 codegen

6.3 Frame indices, stack locals

6.4 REP/SEP scheduling pass

6.2 Wire frame lowering + calling convention (real)

6.3 Disassembler mode-aware decoding (deferred)

6.4 REP/SEP scheduling pass

6.5 Tidy-ups (can happen in any order)

7. Open design questions flagged by the scaffold

8. Gotchas + hard-won knowledge

9. Disk space recovery

10. Quick verification commands for resume

11. Files changed this session (not yet committed by user)

What now works end-to-end

Detail: command-line invocations

What doesn't work yet

27 KiB

Raw Blame History