65816-llvm-mos/SESSION_RECOVERY.md
Scott Duensing 583cee849d Checkpoint
2026-05-08 16:19:04 -05:00

13 KiB

Session Recovery — 2026-05-07/08

Living recovery doc. Update on every meaningful change. If session is lost, read this top-to-bottom + the memory notes referenced inside, then reread the actual diffs in tree to ground assumptions.

Headline state

  • Smoke: 131/131 green.
  • Active config: ptr32 (p:32:16), full IMG0..IMG15 caller-clobber on JSL, basic regalloc at -O1+.
  • Working tree: clean except 3 modified files listed below; all are real fixes that haven't been committed yet.
  • Branch: main, ahead of origin/main by recent checkpoint commits.

Uncommitted, must keep

These are the in-flight improvements. Rebuild after applying any of them.

  1. runtime/src/snprintf.c — removed __attribute__((optnone)) from emitULong (line 106) and snprintf (line 303). Slot-aliasing workaround that the IMG-clobber + LDAfi-IMG fixes made unnecessary.
  2. src/llvm/lib/Target/W65816/W65816InstrInfo.cpp
    • copyPhysReg virtual-register short-circuit: if SrcReg or DestReg is virtual, emit a TargetOpcode::COPY and return. Basic regalloc's InlineSpiller calls storeRegToStackSlot with vreg sources before final physreg assignment; without the short-circuit the unpaired- Wide32 default branch hits the unreachable.
    • copyPhysReg IMG-to-IMG PHA-bracket: was lda src; sta dst — unbracketed clobber of A, regalloc inserted these copies between $a = COPY $img10 and use-of-A. PHA/PLA bracket preserves A.
  3. src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp — catchtab build moved BEFORE landingpad erase. Old code did LPadBB->getLandingPadInst() AFTER erasing the insts → returned nullptr → empty LSDA → catch never matched, abort. Now captures catch-clause typeinfo Constants into a DenseMap<BasicBlock*, LPadInfo> BEFORE erase; build loop reads from the saved map.

To commit when ready (do NOT amend; create new commits):

git add runtime/src/snprintf.c \
        src/llvm/lib/Target/W65816/W65816InstrInfo.cpp \
        src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp
git commit -m "..."  # message stub below

Suggested commit message: see "Fixes landed" section below; one commit per logical change is cleaner.

Already-committed in this session arc

Per git log --oneline -20 these are the recent checkpoint commits; the diffs they contain are real and load-bearing.

The big ones (search by file or grep):

  • JSLpseudo Defs += IMG0..IMG15 in W65816InstrInfo.td. With the wider Defs, regalloc spills IMG-class vregs around calls instead of treating them as preserved.
  • W65816RegisterInfo.cpp eliminateFrameIndex for STAfi: PHA-bracketed for non-A source (IMG/X/Y). The lda dp; sta d,s chain clobbered A; bracket preserves A while shifting offset by +2 between PHA and PLA. Defs=[A] kept on STAfi as safe over-approximation.
  • W65816RegisterInfo.cpp eliminateFrameIndex for LDAfi: if Dst = IMGn, append STA dp so the IMG slot actually receives the loaded value. Previously only loaded into A; downstream COPY $x = $imgN (= ldx $D?) read garbage. This was the smoking gun for dadd(1.5, 2.5) → 0x4010_0000_3000_3000.
  • W65816LowerWide32.cpp fixed-point erase loop. Was single-pass; REG_SEQUENCE got skipped if a not-yet-erased COPY consumer kept it alive at the iteration moment. Removed ~40 dead Wide32 vregs from __adddf3's pre-RA MIR.
  • src/llvm/test/CodeGen/W65816/i64-first-arg-img16.ll relaxed stx 0xd / sta 0xd to 0x{{[cd]}} (regalloc now picks IMG8..15 too).

Fixes landed (full list with rationale)

Each entry: what / why / where / what regression it would cause if reverted.

A. Hash-shell DELETE bug → IMG caller-clobber

Symptom: dbDelete("age") returned 0 ("not found") instead of 1. DELETE never ran; COUNT stayed at 2.

Root cause: dbDelete did stx 0xd0 to save k_high, called hashKey, then pei 0xd0 to push k_high to strcmp. hashKey used $D0 as scratch in its loop body (sta 0xd0 storing the iterator's running-ptr-low). $D0 was clobbered by the time pei 0xd0 ran. JSLpseudo Defs only listed A, X, Y, DPF0 — IMG slots were not modelled as caller-clobber.

Fix: JSLpseudo Defs += [IMG0..IMG15].

Cascading fallout (each required its own fix):

A1. copyPhysReg vreg fallback

storeRegToStackSlot's unpaired-Wide32 default branch hit unreachable when called with a vreg source. Basic regalloc's InlineSpiller does this. Fix: short-circuit virtual-reg cases to TargetOpcode::COPY.

A2. LowerWide32 fixed-point erase

Single-pass erase left ~40 dead Wide32 vregs in __adddf3. Pattern:

%X:wide32 = REG_SEQUENCE ...
%Y:wide32 = COPY %X
... uses of %Y rewritten by Pass 3 ...

Single-pass: REG_SEQUENCE skipped (COPY consumer still alive), then COPY erased (now %X dead but loop already passed it). Fix: iterate until no progress.

A3. STAfi PHA-bracket

Without bracket, regalloc could schedule $img0 = COPY $a AFTER a STAfi-with-IMG-source whose internal lda dp clobbered $a, silently storing X's value where A's was expected.

A4. LDAfi-IMG-dest STA dp

The big one. With narrow IMG, regalloc kept Wide16 vregs in IMG slots across calls, never needed $imgN = LDAfi %stack.X. With full IMG, every cross-call spill needed it. The expansion only emitted LDA d,s (load A) — never wrote to the IMG slot. Downstream COPY $x = $imgN (= ldx $D?) read stale prior data. Manifested as dadd(1.5, 2.5) → 0x4010_0000_3000_3000 (mantissa garbage).

Diagnostic that found it: diff post-RA MIR narrow vs full IMG. Pre-RA MIR was identical. Full had 6 $imgN = LDAfi instances; narrow had 0. Narrow used COPY $imgN = $a patterns instead — those work correctly.

A5. FileCheck regex

src/llvm/test/CodeGen/W65816/i64-first-arg-img16.ll expected stx 0xd / sta 0xd. Under full IMG clobber, regalloc picks IMG8..15 ($C0/$C2) for cross-call arg saves. Relaxed to 0x{{[cd]}}.

B. C++ try/catch source-level path

Two bugs blocking real clang++ -fsjlj-exceptions source code:

B1. W65816SjLjFinalize catchtab ordering

runOnFunction erased landingpad insts at line ~245, then built the catchtab at line ~290 via LPadBB->getLandingPadInst(). By that point, landingpads were nullptr. The build loop's if (!LP) continue; skipped every entry. Catchtab ended with just (0,0) sentinel. LSDA was 4 bytes of zeros. findCatch saw ctx->lsda == 0's entry and bailed. Result: any throw aborted.

Fix: capture catch-clause typeinfo Constants into a DenseMap<BasicBlock*, LPadInfo> BEFORE erasing landingpads; the catchtab build loop reads from the saved map.

B2. copyPhysReg IMG-to-IMG PHA-bracket

Comment said "Caller is responsible for ensuring A is dead at this program point (regalloc usually arranges this)." It doesn't, in practice. Regalloc inserted IMG-to-IMG copies between $a = COPY $img10 and STAfi $a, slot. Unbracketed lda src; sta dst clobbered A. The subsequent STAfi spilled garbage. Visible as *p = 42 after __cxa_allocate_exception storing 42 to wrong addr (indirect-long setup got hi-half at lo-slot).

Fix: PHA-bracket. Cost +7 cyc / +2 bytes per IMG-IMG copy (rare).

Verified end-to-end via MAME breakpoints: begin_catch entered with correct ExcHeader, end_catch entered with A=42, doTest returns A=42 from real C++ try { throw 42; } catch (int x) { return x; }.

C. Cleanup wins

  • runtime/src/snprintf.c:106 — removed optnone on emitULong. Smoke green.
  • runtime/src/snprintf.c:303 — removed optnone on snprintf. Smoke green.

Still-open work areas

Each carries a fair-warning note for whoever picks it up.

1. qsort/bsearch optnone — REMOVED 2026-05-08

Source-restructured qsort: split the inner loop into a __attribute__((noinline)) helper qsortInner (4 args: base, cur, size, cmp). Outer qsort just iterates i = 1..nmemb-1 and calls qsortInner(base, base + i*size, size, cmp). This drops outer qsort's i32-vreg simultaneous-live count below the inline-spill OOM threshold; both halves compile cleanly at -O2 + basic regalloc.

bsearch optnone was kept-for-symmetry — once removed, it just worked. The IMG-clobber + LDAfi-IMG-store backend fixes from 2026-05-07 had already resolved its underlying pressure issue.

Smoke 131/131 stays green.

2. gmtime_r optnone

runtime/src/timeExt.c:69. NOT a backend bug — IR-level optimization issue (loop rotation + IndVar simplify mis-evaluating days >= 365L + (__isLeap(...) ? 1 : 0)). Fixing requires deciding which combine pass is wrong and why. Out of scope for backend work.

3. softDouble noinlines

runtime/src/softDouble.c:30 (dpack) and :51 (dclass). Removing dpack noinline broke dadd this session — register pressure for __adddf3/__muldf3/__divdf3. Architectural for the same reason as qsort.

4. Greedy regalloc retry — TRIED, blocked

Tested 2026-05-08. Greedy fails immediately on atoi in libc.c:

LiveRangeEdit.cpp:200: void llvm::LiveRangeEdit::eliminateDeadDef(...):
Assertion `MI->allDefsAreDead() && "Def isn't really dead"' failed.

Same upstream LLVM bug class as the dadd full-IMG attempt — sub-register pair partial defs that the regalloc treats as fully dead. Greedy is genuinely incompatible with the W65816's split-half subreg-pair patterns until the upstream LLVM issue is patched. Reverted to basic regalloc. Document feedback_greedy_high_pressure.md already covers this.

5. gmtime_r optnone — TRIED, blocked

Tested 2026-05-08. Hoisting yearLen to a long local (avoiding the double-recompute of 365L + (__isLeap ? 1 : 0)) didn't help; adding volatile to the local also didn't help. IR optimizer is still folding the comparison to compile-time-false. Source-level C restructuring won't dodge it; needs IR-pass-level work to identify which combine pass mis-evaluates and why. optnone stays.

How to verify recovery

cd /home/scott/claude/llvm816
git status                                        # 3 modified files listed above
cd tools/llvm-mos-build && ninja llc clang        # rebuild backend
cd /home/scott/claude/llvm816
bash runtime/build.sh                             # build runtime
bash scripts/smokeTest.sh                         # should print "all smoke checks passed"

If smoke fails, the most likely cause is one of the three uncommitted files got reverted; check git status and re-apply.

Diagnostic tools that worked

For posterity — these are the patterns that paid off this session.

Pre-RA vs post-RA MIR diff

clang -mllvm -stop-before=regallocbasic -S ...     # pre-RA
clang -mllvm -stop-after=virtregrewriter -S ...    # post-RA (post-virtregrewriter)

Diff narrow-IMG vs full-IMG post-RA MIR for the failing function. Pre-RA is identical (same IR), so the diff isolates regalloc-decision divergence. Look at every NEW pattern that appears only in the failing build — $imgN = LDAfi was the smoking gun for dadd.

Pass-by-pass IR/MIR dumps

clang -mllvm -print-after=w65816-lower-wide32 -S ...
clang -mllvm -print-after-all -S ... 2>dump.txt

MAME debugger via xvfb-run

xvfb-run -a mame apple2gs ... -debug -debugger qt -oslog -seconds_to_run N

With autoboot Lua: load .bin into bank 0 (skip $C000..$CFFF I/O), set CPU state, then cpu.debug:bpset(addr, condition, action) with actions like "logerror \"...\\n\",a,x,...; go". logerror with format args goes to stdout under -oslog. Memory reads in expressions: b@(addr), w@(addr). Watchpoints: cpu.debug:wpset(prog_space, "w", addr, len, condition, action).

AVOID: add_machine_pause_notifier + cpu.debug:go() in callback — segfaults from reentrancy. printf in actions stays in debugger console (not -oslog). tracelog also debugger-console only.

Trace methodology (find divergence point)

  1. Set BPs at every JSLpseudo callee in the failing function.
  2. Capture A/X/Y/DPF0 at each return.
  3. Find first divergent return between known-good and failing builds.
  4. The instruction sequence between previous-OK and first-divergent return is where the bug lives.

This pattern found the dadd bug at jsl@0x207f → __lshrsi3(0x8001_8000, 3) in 30 minutes. Recommended.

Memory notes referenced

(Filenames under /home/scott/.claude/projects/-home-scott-claude-llvm816/memory/.)

  • feedback_strstr9_long_haystack.md — the hash-shell bug story.
  • feedback_cpp_subset.md — C++ subset, including the SjLj fix.
  • feedback_ptr32_frame_limit.md — was 5 days stale; updated 2026-05-07 to "DONE, 131/131 smoke green".
  • feedback_jslpseudo_caller_save.md, feedback_libcall_img_clobber.md, feedback_img_slot_expansion.md, feedback_greedy_high_pressure.md — related backend topics.

Next session candidates (ranked)

  1. Commit the uncommitted fixes. They've earned it.
  2. Greedy regalloc retry. Cheap experiment, potentially big win.
  3. qsort source restructure. Clear optnone if you're willing to reshape the algorithm. Source-level work, not backend.
  4. gmtime_r IR investigation. Find which combine miscompiles days >= 365L + (leap?1:0). IR-level, not backend.