65816-llvm-mos/SESSION_STATE.md
Scott Duensing 873eab4922 Checkpoint.
2026-04-25 17:07:28 -05:00

27 KiB

Session Resume — llvm816 project

Drop this into a new Claude Code session and say "read SESSION_STATE.md and continue where we left off." Pairs with LLVM_65816_DESIGN.md (the design doc — read that second).


1. Project in one sentence

Build an open-source LLVM/Clang backend for the WDC 65816 (Apple IIgs) that matches or exceeds Calypsi's output quality, forked from llvm-mos but maintained as our own separate W65816 target. User is Scott; expert C dev, doesn't want hand-holding on LLVM or 65816 basics.

2. Where we are in the plan

Design doc section 7 lists a 12-step implementation order. We are at:

  • Setup toolchain (prior session)

  • Architectural decision: separate W65816 target (design doc §2.5)

  • Repo-layout decision: src/ holds our authored files, patches/ holds modifications to upstream llvm-mos files, tools/llvm-mos/ is ephemeral and gitignored. scripts/applyBackend.sh stitches src + patches into the clone.

  • Step 1 — scaffold W65816 target directory. 41 files under src/llvm/lib/Target/W65816/ + 2 files under src/clang/lib/Basic/Targets/. 4 upstream patches under patches/.

  • Step 2 — verify the skeleton fully compiles and links. All 8 tablegen generators run clean, three static libs (LLVMW65816Info/Desc/CodeGen) build, llc links with the target registered, zero warnings in the W65816-local build. ./bin/llc -march=w65816 -filetype=null /dev/null → exit 0.

  • Step 2a — real MC-layer instructions. W65816InstrInfo.td now holds ~90 real 65816 opcodes (LDA/STA/LDX/LDY/STX/STY across immediate/DP/abs/DPX/AbsX/AbsY/long where applicable; ADC/SBC/CMP/ AND/ORA/EOR/BIT; INC/DEC/ASL/LSR/ROL/ROR; all transfers; stack push/pull; REP/SEP/CLC/SEC/XCE/XBA; branches; JMP/JML/JSR/JSL; RTS/RTL/RTI; MVN/MVP). Instructions whose size depends on M or X bits exist as _Imm8/_Imm16 pairs carrying the appropriate TSFlag bits (MLow/MHigh/XLow/XHigh) for the future REP/SEP pass.

  • Step 2b — wire MCCodeEmitter. Tablegen -gen-emitter runs cleanly; W65816MCCodeEmitter.cpp calls the tablegen-provided getBinaryCodeForInstr and emits Size bytes little-endian.

  • Step 2c — symbolic fixups. Each operand class (imm8/imm16/ addrDP/addrAbs/addrLong/pcrel8/pcrel16) has its own EncoderMethod that emits a W65816::fixup_* at the correct byte offset for expression operands. W65816AsmBackend::applyFixup patches the data bytes little-endian for resolved fixups and defers to maybeAddReloc for unresolved ones. W65816ELFObjectWriter::getRelocType returns placeholder relocation numbers 1-5 (swap for canonical R_W65816_* names once the ELF EM_ is decided — §7 item 1).

  • Step 2d — patch 0005 eliminates data-layout warning. The data-layout string for Triple::w65816 lives in llvm/lib/TargetParser/TargetDataLayout.cpp; W65816TargetMachine now calls TT.computeDataLayout(). Zero warnings in the W65816 build.

  • Step 2e — AsmParser scaffold. 441-line AsmParser/W65816AsmParser.cpp ported from MSP430, stripped of register-operand handling (65816 has no MC register operands), with width-narrowing predicates on each operand class so the matcher picks the narrowest instruction variant the value fits (e.g. sta $10 → STA_DP, sta $1000 → STA_Abs, sta $10000 → STA_Long). # is emitted as a literal token to match the AsmString tokenisation. Block-move (MVN/MVP) uses addrDP for both bank bytes so mvn $01, $02 parses.

  • Step 2f — operand bit-field wiring. Every Inst* class in W65816InstrFormats.td now assigns named bitfields into Inst{N-8} (e.g. let Inst{15-8} = imm;). Without this tablegen emits an encoder that writes the opcode but leaves the operand bytes as zero — we had that bug for an iteration.

  • Step 2g — smoke-test script. scripts/smokeTest.sh checks llc registration, empty-module codegen, and llvm-mc encoding of a representative instruction mix. Run with --build to rebuild first.

  • Step 2h — end-to-end ELF object. llvm-mc -filetype=obj produces a valid ELF with relocations at correct byte offsets. Relocations are placeholder numbers 1-5 (§7 item 1 — decide EM_/R_* mapping).

  • Step 2i — Disassembler. 190-line Disassembler/W65816Disassembler.cpp tries 1/2/3/4-byte decode tables in ascending size order. Custom decoder callbacks for imm/addr/pcrel operands wrap raw bits into MCOperands. Mode- ambiguous opcodes (LDA/LDX/LDY/ADC/SBC/CMP/AND/ORA/EOR/BIT/CPX/ CPY immediate forms) are parked in separate DecoderTableW65816{MHigh,XHigh}16 tables and the scaffold only reads the default tables — so those opcodes always disassemble as 3-byte 16-bit-immediate forms until a mode-aware decoder lands (alongside REP/SEP).

  • Step 2j — register operands in the AsmParser. Key fix found via round-trip test: tablegen treats a, x, y in AsmStrings (e.g. "inc a", "lda\t$addr, x") as references to the real register records, so the matcher expects register operands, not literal tokens. AsmParser now produces k_Reg operands for these identifiers. Verified: inc a → 0x1A, lda $1000, x → 0xBD,0x00,0x10, full ELF round-trip passes.

  • Step 2k — smoke test covers disassembly. The smoke test now feeds raw bytes through llvm-mc --disassemble and checks for expected mnemonics, so encoder/decoder asymmetries surface immediately.

  • Step 3a — first DAG patterns. Type-as-mode model (approved). LDAi16imm pseudo for i16 constants; RTL for retglue; emitPrologue emits canonical REP #$30. Mode-dependent _Imm8 variants are isCodeGenOnly so the asm matcher never picks them.

  • Step 3c — single-arg function calls. LowerFormalArguments receives arg 0 in A; LowerCall passes arg 0 in A and JSL's via a JSL pseudo to bridge the i16 symbol operand to the MC JSL_Long's 24-bit operand class. Result is back in A. Multi-arg call lowering still wants a PUSHA SDNode + SP unwind sequence — caller side currently fatals on >1 args.

  • Step 3d — multi-arg via stack (callee side). LowerFormalArguments now reads arg 1+ from stack via FrameIndex + load. eliminateFrameIndex translates LDAfi / STAfi / ADCfi / SBCfi / ANDfi / ORAfi / EORfi / CMPfi pseudos to their LDA d,S etc. counterparts with the offset baked in. Stack-relative MC instructions are in place; AsmParser recognises the ,s suffix. Callee-side fully working: a define i16 @sum3(i16 %a, i16 %b, i16 %c) compiles to clc; adc 4,s; clc; adc 6,s; rtl.

  • Step 3e — frame-index spill plumbing. storeRegToStackSlot and loadRegFromStackSlot emit STAfi / LDAfi pseudos so the register allocator can spill Acc16 values when needed.

  • Step 3f — multiplications via shifts. Multiply by power-of-2 constants inherits the shl patterns (1/2/3/4 bits unrolled to asl a sequences). Multiply by arbitrary constants and runtime values fail at ISel pending library functions.

  • Step 3h — clang front end builds. Real C → 65816 machine code via the full clang -target w65816 -c pipeline. Bumped clang's IntAlign/LongAlign/PointerAlign/SuitableAlign from 8 to 16; also overrode allowsMisalignedMemoryAccesses to return true. scripts/cDemo.sh shows the full front-end pipeline on a built-in 7-function demo. Additional patterns: INC_Abs/DEC_Abs for *p = *p + 1; ASRA16 (PHA;ASL;PLA;ROR sequence) for signed shift-right by 1.

  • Step 3i — frame reservation + epilogue. emitPrologue now emits TSC; SEC; SBC #N; TCS to reserve N bytes for locals and spills, then emitEpilogue reverses with TSC; CLC; ADC #N; TCS before the RTL. eliminateFrameIndex translates FrameIndex operands into stack-relative offsets via disp = FrameOffset + StackSize. hasFPImpl returns false (no native FP — direct page would be the logical home). This unblocks clang -O0 -c for pure-arithmetic functions (each arg gets spilled to its own stack slot). Stack-relative addressing modes for ADC/SBC/AND/ORA/EOR/CMP let the codegen fold loads from frame indices into the carry-arithmetic ops.

  • Step 3g — basic i8 codegen. Acc8 patterns now cover: LDAi8imm (constants), INA_PSEUDO8 / DEA_PSEUDO8 (inc/dec), ADCi8imm / SBCi8imm (add/sub immediate), ANDi8imm / ORAi8imm / EORi8imm (bitwise immediate), LDA8abs / STA8abs (load/store via global), ASLA8 / LSRA8 (1-bit shifts), CMPi8imm (compare against immediate, with BR_CC i8 lowering). Frame lowering scans the function IR for any i8 type usage (return, args, instruction values, operands) and picks REP #$10; SEP #$20 prologue when found, else REP #$30. AsmPrinter masks i8 immediates to 8 bits before printing so i8 -16 shows 0xf0 rather than 0xfff0. Limitations: i8 mode is per-function only — mixed-mode functions get the i8 prologue (8-bit A) and i16 ops fail. Asm round-trip for i8 still loses M-mode info (the parser can't disambiguate lda #imm between Imm8 and Imm16); use -filetype=obj directly from llc to get the right encoding.

  • Step 3b — globals, loads, stores, arithmetic, branches, bitwise. LowerOperation custom-lowers GlobalAddress and ExternalSymbol to W65816Wrapper(target...). Pseudo + AsmPrinter-expansion family covers:

    - `LDAi16imm`, `LDAabs`, `STAabs` (load/store/materialise via
      Wrapper of global)
    - `ADCi16imm`, `ADCabs`, `SBCi16imm`, `SBCabs` (add/sub with the
      required CLC/SEC carry prefix)
    - `ANDi16imm`, `ORAi16imm`, `EORi16imm` and their `*abs`
      memory-fold variants
    - `CMPi16imm`, `CMPabs` plus `W65816ISD::CMP` / `W65816ISD::BR_CC`
      SDNodes; `LowerBR_CC` swaps constant-on-LHS forms and rewrites
      SETULE/SETUGT/SETLE/SETGT to SETULT/SETUGE/SETLT/SETGE+1 so
      the canonicalised DAG hits our patterns; condition-code map
      covers BEQ/BNE/BCS/BCC plus signed BMI/BPL.
    - `BRA` for unconditional `br`.
    - `INA_PSEUDO` / `DEA_PSEUDO` for `add x, ±1` → `inc a` / `dec a`
    - `ASLA16` / `LSRA16` for `shl x, 1` and `lshr x, 1` → `asl a` /
      `lsr a`
    - `NEGA16` for `0 - x` → `eor #$ffff; inc a`
    - `(xor x, -1)` → `eor #$ffff` (bitwise NOT)
    - Zero-extending byte load: `lda addr; and #$ff`
    
    The end-to-end pipeline can now compile and assemble functions
    that read/write globals, do arithmetic on them, and branch
    conditionally — all with optimal-looking 65816 idioms (e.g.
    `lda x ; clc ; adc y` for `*x + *y`).
    
  • Step 3i — open codegen gaps:

    1. **Multi-arg call lowering** (caller side).  Callee side works;
       caller still bails on >1 arg.  Needs PUSHA SDNode + SP-unwind
       in ADJCALLSTACKUP.
    2. **Frame-reserved scratch space.**  Prologue doesn't reserve
       stack space for locals/spills, so any alloca'd value or
       allocator-spilled value lands at a negative SP offset and
       eliminateFrameIndex bails.  Blocks: -O0 compilation of
       functions with parameters; loops with PHIs that need to
       compare two computed values; two-Acc16 binary ops in
       general.  Fix: emit `TSC; CLC; ADC #-N; TCS` (or PHA-loop)
       in emitPrologue and the inverse in emitEpilogue, where N
       is the function's frame size.
    3. **Mixed-mode i8/i16.**  Per-function mode only — the prologue
       picks one mode; the other type's ops fail.  REP/SEP scheduling
       pass needed.
    4. **Signed `(a - b)` overflow handling.**  BMI/BPL based signed
       comparisons are correct only when the subtraction can't
       overflow; pathological values give wrong results.
    5. **`sub imm, var`** and **`mul var, var`** (or non-power-of-2
       constants).  Need libcall support.
    6. **SETCC and SELECT_CC i16.**  Boolean conversions like
       `(int)(cond != 0)` and `(cond) ? a : b` aren't selectable.
       Custom lowering needed.
    7. **Library functions.**  `__mulhi3`, etc. — no runtime yet.
    
  • Step 4 — real frame lowering, calling convention, REP/SEP scheduling pass. The prologue REP #$30 is unconditional; the REP/SEP pass will remove it when redundant.

3. What is installed and where

All under /home/scott/claude/llvm816/tools/:

Tool Path Notes
llvm-mos source tools/llvm-mos/ shallow clone. Backend files are symlinked in from src/; patches applied on top. Reset cleanly via scripts/updateLlvmMos.sh.
llvm-mos build dir tools/llvm-mos-build/ cmake-generated, ephemeral
llvm-mos-sdk tools/llvm-mos-sdk/ prebuilt toolchain
MAME 0.264 /usr/games/mame (apt) supports -console (Lua)
Apple IIgs ROMs tools/mame/roms/apple2gs.zip, apple2gsr1.zip from archive.org
Calypsi 5.16 tools/calypsi/ extracted .deb
ORCA/C source tools/orca-c/ reference only

./setup.sh --verify-only passed all checks as of the prior session.

4. Repo layout (current)

llvm816/                                      # git repo, branch main
├── LLVM_65816_DESIGN.md                      # tracked
├── SESSION_STATE.md                          # this file
├── setup.sh                                  # tracked
├── scripts/                                  # tracked
│   ├── common.sh
│   ├── installDeps.sh  installCalypsi.sh  installOrcaC.sh
│   ├── installLlvmMos.sh                     # non-destructive (see §8)
│   ├── installMame.sh  verify.sh
│   ├── applyBackend.sh                       # src/ + patches/ -> tools/llvm-mos/
│   └── updateLlvmMos.sh                      # reset clone, re-apply backend
├── src/                                      # authored files, tracked
│   ├── llvm/lib/Target/W65816/               # 41 files
│   │   ├── MCTargetDesc/ (10 files)
│   │   ├── TargetInfo/ (3 files)
│   │   └── (28 top-level files)
│   └── clang/lib/Basic/Targets/
│       ├── W65816.h
│       └── W65816.cpp
├── patches/                                  # unified diffs, tracked
│   ├── 0001-triple-add-w65816-arch.patch
│   ├── 0002-triple-cpp-add-w65816-cases.patch
│   ├── 0003-clang-basic-dispatch-w65816.patch
│   └── 0004-cmake-add-w65816-experimental.patch
├── tools/                                    # gitignored, ephemeral
└── .gitignore                                # excludes tools/, .cache/

5. Key architectural decisions

5.1 Separate target, not MOS subtarget feature

llvm-mos has FeatureW65816 declared in MOSDevices.td but codegen unimplemented (issue #321). We are NOT extending MOS. Reasons:

  • We cannot upstream an AI-assisted backend to llvm-mos anyway.
  • Clean register model: Acc8/Acc16/Idx8/Idx16 as separate classes.
  • Independent evolution.

Recorded in design doc §2.5.

applyBackend.sh symlinks every file under src/ into the corresponding path under tools/llvm-mos/, then applies each patches/*.patch with git apply. Idempotent: skips already-current symlinks and already-applied patches (detected via git apply --reverse --check).

updateLlvmMos.sh is the ONLY script allowed to destructively reset the clone. It reverses all patches, removes our symlinks, git reset --hard FETCH_HEAD, then re-runs applyBackend.sh.

installLlvmMos.sh refuses to touch the clone if it is dirty or off main — this is deliberate to protect applied patches.

6. Concrete next actions (in order)

6.1 Function arguments

LowerFormalArguments and LowerCall still fatal-error. Without arguments, every function we test has to use globals as inputs. The plan: pass i8/i16 args via the stack (push right-to-left, caller cleans), with the first 1-2 args optionally going in A or X for register-passing. Calypsi output is the reference for ABI choices.

6.2 i8 codegen

Currently every function gets REP #$30 (16-bit mode). For i8 ops we need either:

  • A scan-and-prepend approach: if the function has any i8 op, emit SEP #$20 after the REP for whichever mode dominates, plus toggle pseudos around the off-mode regions.
  • Or commit to widening all i8 to i16 pre-ISel (simpler, but uses 2x the cycles for byte-heavy code).

This is the natural lead-in to the REP/SEP scheduling pass (§6.4).

6.3 Frame indices, stack locals

Add eliminateFrameIndex and frame-pointer pseudos so we can spill to the stack. Today W65816RegisterInfo::eliminateFrameIndex is llvm_unreachable. Stack accesses on 65816 are ,s and (,s),y indirect — needs new operand classes.

6.4 REP/SEP scheduling pass

The core algorithmic work. TSFlag bits on every mode-dependent instruction are already in place; the pass walks MIR, dataflows the required mode per region, and inserts/removes REP/SEP transitions to minimise total mode switches. Design doc §3.3.

6.2 Wire frame lowering + calling convention (real)

W65816FrameLowering.cpp is still llvm_unreachable. The simplest working version: establish an i16 stack pointer-based frame using the native SP, locals accessed via stack-relative indirect via Y. Calypsi output for a trivial function is a good model.

W65816CallingConv.td covers i8/i16 return in A but nothing for arguments. Start with stack-based (push right-to-left, caller cleans) per design doc §3.5.

6.3 Disassembler mode-aware decoding (deferred)

The scaffold disassembler always decodes LDA/LDX/LDY/ADC/SBC/CMP/AND/ ORA/EOR/BIT/CPX/CPY immediate forms as 3-byte 16-bit-immediate variants. A real decoder should track the M/X bits across the stream (consuming REP/SEP, XCE transitions) and choose between DecoderTableW65816 (default) and DecoderTableW65816{MHigh,XHigh}16 per instruction. Naturally pairs with the REP/SEP codegen pass since both need the same M/X tracking model.

6.4 REP/SEP scheduling pass

The core algorithmic work (design doc §3.3). Every real instruction now carries TSFlag bits indicating which M/X mode it requires. The pass reads those, does the width-inference / coalescing / transition insertion dataflow, and emits REP/SEP instructions at block boundaries. Plan to spend multiple sessions.

6.5 Tidy-ups (can happen in any order)

  • Decide ELF EM_ value (§7 item 1). Currently EM_NONE, with placeholder relocation numbers 1-5 in W65816ELFObjectWriter. Swap for canonical R_W65816_* names once chosen.
  • Replace ASCII-art mnemonics (inc a, dec a, asl a, etc.) with proper InstAliases so both INA and INC A assemble to the same opcode. Requires AsmParser (§6.3).

7. Open design questions flagged by the scaffold

  1. ELF EM_ machine number. W65816ELFObjectWriter.cpp uses ELF::EM_NONE as a placeholder. llvm-mos uses EM_MOS = 0x1966 for the 6502 family. Decide: share EM_MOS, or pick a new value?
  2. Data layout string is hardcoded in W65816TargetMachine.cpp rather than routing through Triple::computeDataLayout(). That is OK for now — when we're ready to consolidate, add a case in TargetDataLayout.cpp and switch to TT.computeDataLayout().
  3. i32 return convention — does i32 return in A:X or via a hidden pointer? Currently W65816CallingConv.td only handles i8/i16. Design doc §3.5 says "A:X for 32-bit" but this isn't modelled yet.
  4. Register aliasing for mode-dependent widths. Acc8 and Acc16 both contain physical register A. LLVM's allocator will not cope with this correctly. The REP/SEP management pass (§3.3) is required. Flagged per the design doc.
  5. Open questions from design doc §8 (GS/OS DP reservation, bank memory model, interrupt ABI, ORCA/C ABI compat, width-contract attribute, MAME cycle accuracy) — still unresolved. Punt until after we have a working instruction set.

8. Gotchas + hard-won knowledge

  • installLlvmMos.sh is non-destructive now. It refuses to reset the clone if it is dirty or off main. Use scripts/updateLlvmMos.sh to refresh (the only script allowed to reset).
  • MAME -console flag is listed by -showusage, NOT -help.
  • log() in common.sh writes to stderr. Don't change it.
  • llvm-mos has FeatureW65816 but not working codegen (issue #321).
  • RemapAllTargetPseudoPointerOperands<PtrRegs> is required in W65816.td or tablegen fails with 8 "missing target override for pseudoinstruction using PointerLikeRegClass" errors. Don't remove it.
  • Triple::w65816 placement in Triple.h: inserted right after mos, to keep the 65xx family clustered. See patch 0001.
  • Added to LLVM_ALL_EXPERIMENTAL_TARGETS in llvm/CMakeLists.txt so -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=all picks up W65816. Not strictly required — passing the name explicitly also works.
  • Operand OperandType field wants LLVM's enum spelling, not shortened. Use OPERAND_IMMEDIATE, OPERAND_MEMORY, OPERAND_PCREL (see llvm/include/llvm/MC/MCInstrDesc.h MCOI::OperandType). OPERAND_IMM / OPERAND_IMM8 / OPERAND_IMM16 are NOT valid.
  • PrintMethod signature for PC-rel operands takes Address. Tablegen generates printPCRel8(MI, Address, OpNo, O) — 4 args, not 3. Non-PC-rel PrintMethods use the 3-arg form (MI, OpNo, O).
  • Several .cpp files needed explicit #includes beyond what MSP430 ships with because the tablegen-generated .inc references full types: W65816RegisterInfo.cpp needs W65816Subtarget.h and W65816FrameLowering.h (for GET_REGINFO_TARGET_DESC); W65816InstPrinter.cpp needs llvm/MC/MCAsmInfo.h (for MAI.printExpr).
  • Marker classes can't override mayLoad/mayStore via let. TableGen's multi-inheritance doesn't let unrelated sibling classes touch fields from the base Instruction. Use let isReturn = 1, ... in { ... } blocks at def sites instead (idiomatic LLVM style).
  • Data layout is hardcoded in W65816TargetMachine.cpp rather than computed from TT.computeDataLayout(), because TargetDataLayout.cpp doesn't have a case for w65816 yet. This produces one -Wswitch warning in the llvm-mos build. §6.5 notes adding a 5th patch to silence it.

9. Disk space recovery

If space is tight before resume:

# safe to delete — regenerable from setup.sh + applyBackend.sh:
rm -rf /home/scott/claude/llvm816/tools/
rm -rf /home/scott/claude/llvm816/.cache/

Regenerate with ./setup.sh then ./scripts/applyBackend.sh.

The tools/llvm-mos-build/ directory alone is ~2 GB after a full configure+tablegen. A full ninja build will be much more.

10. Quick verification commands for resume

# Verify the scaffold is in place:
ls src/llvm/lib/Target/W65816/ | wc -l       # expect ~20 top-level files
ls patches/                                  # expect 4 .patch files

# Verify apply is clean:
./scripts/applyBackend.sh                    # expect 0 new, 44 current symlinks; 0 new, 4 applied patches

# Verify cmake configures:
cmake -S tools/llvm-mos/llvm -B tools/llvm-mos-build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_TARGETS_TO_BUILD="" \
  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="MOS;W65816" \
  -DLLVM_ENABLE_PROJECTS="clang" \
  -DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_EXAMPLES=OFF \
  -DLLVM_INCLUDE_BENCHMARKS=OFF

# Verify full build + llc registration (slow first time, cached after):
( cd tools/llvm-mos-build && ninja LLVMW65816Info LLVMW65816Desc LLVMW65816CodeGen llc )
./tools/llvm-mos-build/bin/llc --version | grep w65816
./tools/llvm-mos-build/bin/llc -march=w65816 -filetype=null /dev/null ; echo $?
# Expect: grep matches; llc exits 0.

11. Files changed this session (not yet committed by user)

scripts/applyBackend.sh                  # idempotent src+patches apply
scripts/updateLlvmMos.sh                 # safe reset+reapply
scripts/installLlvmMos.sh                # no longer destructively resets
scripts/smokeTest.sh                     # regression smoke test
src/llvm/lib/Target/W65816/              # full MC layer + first codegen:
                                         #   CodeGen scaffolds (~40 files)
                                         #   AsmParser/ (2 files)
                                         #   Disassembler/ (2 files)
                                         #   MCTargetDesc/ (11 files)
                                         #   TargetInfo/ (3 files)
                                         #   ~90 real instruction defs
                                         #   ~25 codegen pseudos +
                                         #     AsmPrinter expansion
src/clang/lib/Basic/Targets/W65816.{h,cpp}
patches/0001..0005.patch                 # upstream llvm-mos mods
SESSION_STATE.md                         # this file

The tools/ tree is all ephemeral (gitignored).

What now works end-to-end

Try it yourself:

./scripts/cDemo.sh    # built-in demo
./scripts/cDemo.sh path/to/your.c

Sample output for the built-in demo (real C → real 65816):

get_counter:     lda counter ; rtl
set_counter:     sta counter ; rtl
sum_with_target: clc ; adc target ; rtl
doubler:         asl a ; rtl
half:            lsr a ; rtl
reset:           lda #0 ; sta counter ; rtl
answer:          lda #42 ; rtl

Detail: command-line invocations

# Round-trip asm -> bytes -> asm:
echo '	lda #0x1234' | ./bin/llvm-mc -arch=w65816 -show-encoding
# -> lda #0x1234 ; encoding: [0xa9,0x34,0x12]

echo '0xea 0xa9 0x34 0x12 0x6b' | ./bin/llvm-mc --disassemble --triple=w65816
# -> nop ; lda #0x1234 ; rtl

# Full asm -> ELF -> disasm:
./bin/llvm-mc -arch=w65816 -filetype=obj foo.s -o foo.o
./bin/llvm-objdump --triple=w65816 -d foo.o

# Real codegen.  This .ll compiles cleanly:
@x = global i16 0
@y = global i16 0
define i16 @fib_step() {
  %a = load i16, ptr @x
  %b = load i16, ptr @y
  %s = add i16 %a, %b
  store i16 %a, ptr @y
  store i16 %s, ptr @x
  ret i16 %s
}
# llc emits idiomatic 65816:
#   rep #0x30
#   lda x; clc; adc y    ; A = a + b
#   sta x                ; x = a + b
#   ...

What doesn't work yet

  • Multi-arg calls (caller side). Callee accepts stack-passed args; the matching push side is unimplemented. Functions with more than one arg can be defined and compile correctly, but cannot be called from another function.
  • Two-Acc16 cmp. Loops with PHIs that need to compare two computed values fail at ISel — only one A.
  • i8 ops (always 16-bit mode for now).
  • Signed overflow in CMP-based branches: BMI/BPL test the N flag of the subtraction, which is incorrect when the subtract overflows.
  • mul var, var (or by non-power-of-2 constants). Needs library functions (__mulhi3 etc.).
  • sub imm, var (only sub var, imm works).

See §6.1-§6.4 for the next steps.