# Session Recovery — 2026-05-07/08 Living recovery doc. Update on every meaningful change. If session is lost, read this top-to-bottom + the memory notes referenced inside, then reread the actual diffs in tree to ground assumptions. ## Headline state - **Smoke**: 131/131 green. - **Active config**: ptr32 (`p:32:16`), full IMG0..IMG15 caller-clobber on JSL, basic regalloc at -O1+. - **Working tree**: clean except 3 modified files listed below; all are real fixes that haven't been committed yet. - **Branch**: `main`, ahead of `origin/main` by recent checkpoint commits. ## Uncommitted, must keep These are the in-flight improvements. Rebuild after applying any of them. 1. `runtime/src/snprintf.c` — removed `__attribute__((optnone))` from `emitULong` (line 106) and `snprintf` (line 303). Slot-aliasing workaround that the IMG-clobber + LDAfi-IMG fixes made unnecessary. 2. `src/llvm/lib/Target/W65816/W65816InstrInfo.cpp` - `copyPhysReg` virtual-register short-circuit: if `SrcReg` or `DestReg` is virtual, emit a `TargetOpcode::COPY` and return. Basic regalloc's InlineSpiller calls `storeRegToStackSlot` with vreg sources before final physreg assignment; without the short-circuit the unpaired- Wide32 default branch hits the `unreachable`. - `copyPhysReg` IMG-to-IMG PHA-bracket: was `lda src; sta dst` — unbracketed clobber of A, regalloc inserted these copies between `$a = COPY $img10` and use-of-A. PHA/PLA bracket preserves A. 3. `src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp` — catchtab build moved BEFORE landingpad erase. Old code did `LPadBB->getLandingPadInst()` AFTER erasing the insts → returned nullptr → empty LSDA → catch never matched, abort. Now captures catch-clause typeinfo Constants into a `DenseMap` BEFORE erase; build loop reads from the saved map. To commit when ready (do NOT amend; create new commits): ```bash git add runtime/src/snprintf.c \ src/llvm/lib/Target/W65816/W65816InstrInfo.cpp \ src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp git commit -m "..." # message stub below ``` Suggested commit message: see "Fixes landed" section below; one commit per logical change is cleaner. ## Already-committed in this session arc Per `git log --oneline -20` these are the recent checkpoint commits; the diffs they contain are real and load-bearing. The big ones (search by file or grep): - **JSLpseudo Defs += IMG0..IMG15** in `W65816InstrInfo.td`. With the wider Defs, regalloc spills IMG-class vregs around calls instead of treating them as preserved. - **`W65816RegisterInfo.cpp` `eliminateFrameIndex` for `STAfi`**: PHA-bracketed for non-A source (IMG/X/Y). The `lda dp; sta d,s` chain clobbered A; bracket preserves A while shifting offset by +2 between PHA and PLA. Defs=[A] kept on STAfi as safe over-approximation. - **`W65816RegisterInfo.cpp` `eliminateFrameIndex` for `LDAfi`**: if `Dst = IMGn`, append `STA dp` so the IMG slot actually receives the loaded value. Previously only loaded into A; downstream `COPY $x = $imgN` (= `ldx $D?`) read garbage. **This was the smoking gun for `dadd(1.5, 2.5) → 0x4010_0000_3000_3000`.** - **`W65816LowerWide32.cpp`** fixed-point erase loop. Was single-pass; REG_SEQUENCE got skipped if a not-yet-erased COPY consumer kept it alive at the iteration moment. Removed ~40 dead Wide32 vregs from `__adddf3`'s pre-RA MIR. - **`src/llvm/test/CodeGen/W65816/i64-first-arg-img16.ll`** relaxed `stx 0xd / sta 0xd` to `0x{{[cd]}}` (regalloc now picks IMG8..15 too). ## Fixes landed (full list with rationale) Each entry: what / why / where / what regression it would cause if reverted. ### A. Hash-shell DELETE bug → IMG caller-clobber **Symptom**: `dbDelete("age")` returned 0 ("not found") instead of 1. DELETE never ran; `COUNT` stayed at 2. **Root cause**: `dbDelete` did `stx 0xd0` to save k_high, called `hashKey`, then `pei 0xd0` to push k_high to strcmp. `hashKey` used $D0 as scratch in its loop body (`sta 0xd0` storing the iterator's running-ptr-low). $D0 was clobbered by the time `pei 0xd0` ran. JSLpseudo Defs only listed `A, X, Y, DPF0` — IMG slots were not modelled as caller-clobber. **Fix**: `JSLpseudo Defs += [IMG0..IMG15]`. **Cascading fallout** (each required its own fix): #### A1. copyPhysReg vreg fallback `storeRegToStackSlot`'s unpaired-Wide32 default branch hit `unreachable` when called with a vreg source. Basic regalloc's InlineSpiller does this. Fix: short-circuit virtual-reg cases to `TargetOpcode::COPY`. #### A2. LowerWide32 fixed-point erase Single-pass erase left ~40 dead Wide32 vregs in `__adddf3`. Pattern: ``` %X:wide32 = REG_SEQUENCE ... %Y:wide32 = COPY %X ... uses of %Y rewritten by Pass 3 ... ``` Single-pass: REG_SEQUENCE skipped (COPY consumer still alive), then COPY erased (now %X dead but loop already passed it). Fix: iterate until no progress. #### A3. STAfi PHA-bracket Without bracket, regalloc could schedule `$img0 = COPY $a` AFTER a `STAfi`-with-IMG-source whose internal `lda dp` clobbered $a, silently storing X's value where A's was expected. #### A4. LDAfi-IMG-dest STA dp **The big one.** With narrow IMG, regalloc kept Wide16 vregs in IMG slots across calls, never needed `$imgN = LDAfi %stack.X`. With full IMG, every cross-call spill needed it. The expansion only emitted `LDA d,s` (load A) — never wrote to the IMG slot. Downstream `COPY $x = $imgN` (= `ldx $D?`) read stale prior data. Manifested as `dadd(1.5, 2.5) → 0x4010_0000_3000_3000` (mantissa garbage). **Diagnostic that found it**: diff post-RA MIR narrow vs full IMG. Pre-RA MIR was identical. Full had 6 `$imgN = LDAfi` instances; narrow had 0. Narrow used `COPY $imgN = $a` patterns instead — those work correctly. #### A5. FileCheck regex `src/llvm/test/CodeGen/W65816/i64-first-arg-img16.ll` expected `stx 0xd / sta 0xd`. Under full IMG clobber, regalloc picks IMG8..15 ($C0/$C2) for cross-call arg saves. Relaxed to `0x{{[cd]}}`. ### B. C++ try/catch source-level path Two bugs blocking real `clang++ -fsjlj-exceptions` source code: #### B1. W65816SjLjFinalize catchtab ordering `runOnFunction` erased landingpad insts at line ~245, then built the catchtab at line ~290 via `LPadBB->getLandingPadInst()`. By that point, landingpads were nullptr. The build loop's `if (!LP) continue;` skipped every entry. Catchtab ended with just `(0,0)` sentinel. LSDA was 4 bytes of zeros. `findCatch` saw `ctx->lsda == 0`'s entry and bailed. Result: any `throw` aborted. Fix: capture catch-clause typeinfo Constants into a `DenseMap` BEFORE erasing landingpads; the catchtab build loop reads from the saved map. #### B2. copyPhysReg IMG-to-IMG PHA-bracket Comment said "Caller is responsible for ensuring A is dead at this program point (regalloc usually arranges this)." It doesn't, in practice. Regalloc inserted IMG-to-IMG copies between `$a = COPY $img10` and `STAfi $a, slot`. Unbracketed `lda src; sta dst` clobbered A. The subsequent STAfi spilled garbage. Visible as `*p = 42` after `__cxa_allocate_exception` storing 42 to wrong addr (indirect-long setup got hi-half at lo-slot). Fix: PHA-bracket. Cost +7 cyc / +2 bytes per IMG-IMG copy (rare). **Verified end-to-end** via MAME breakpoints: `begin_catch` entered with correct ExcHeader, `end_catch` entered with A=42, doTest returns A=42 from real C++ `try { throw 42; } catch (int x) { return x; }`. ### C. Cleanup wins - `runtime/src/snprintf.c:106` — removed `optnone` on `emitULong`. Smoke green. - `runtime/src/snprintf.c:303` — removed `optnone` on `snprintf`. Smoke green. ## Still-open work areas Each carries a fair-warning note for whoever picks it up. ### 1. qsort/bsearch `optnone` — REMOVED 2026-05-08 Source-restructured `qsort`: split the inner loop into a `__attribute__((noinline))` helper `qsortInner` (4 args: base, cur, size, cmp). Outer `qsort` just iterates `i = 1..nmemb-1` and calls `qsortInner(base, base + i*size, size, cmp)`. This drops outer qsort's i32-vreg simultaneous-live count below the inline-spill OOM threshold; both halves compile cleanly at -O2 + basic regalloc. `bsearch` `optnone` was kept-for-symmetry — once removed, it just worked. The IMG-clobber + LDAfi-IMG-store backend fixes from 2026-05-07 had already resolved its underlying pressure issue. Smoke 131/131 stays green. ### 2. gmtime_r `optnone` `runtime/src/timeExt.c:69`. NOT a backend bug — IR-level optimization issue (loop rotation + IndVar simplify mis-evaluating `days >= 365L + (__isLeap(...) ? 1 : 0)`). Fixing requires deciding which combine pass is wrong and why. Out of scope for backend work. ### 3. softDouble noinlines `runtime/src/softDouble.c:30` (`dpack`) and `:51` (`dclass`). Removing `dpack` noinline broke dadd this session — register pressure for `__adddf3`/`__muldf3`/`__divdf3`. Architectural for the same reason as qsort. ### 4. Greedy regalloc retry — TRIED, blocked Tested 2026-05-08. Greedy fails immediately on `atoi` in libc.c: ``` LiveRangeEdit.cpp:200: void llvm::LiveRangeEdit::eliminateDeadDef(...): Assertion `MI->allDefsAreDead() && "Def isn't really dead"' failed. ``` Same upstream LLVM bug class as the dadd full-IMG attempt — sub-register pair partial defs that the regalloc treats as fully dead. Greedy is genuinely incompatible with the W65816's split-half subreg-pair patterns until the upstream LLVM issue is patched. Reverted to basic regalloc. Document `feedback_greedy_high_pressure.md` already covers this. ### 5. gmtime_r `optnone` — TRIED, blocked Tested 2026-05-08. Hoisting `yearLen` to a long local (avoiding the double-recompute of `365L + (__isLeap ? 1 : 0)`) didn't help; adding `volatile` to the local also didn't help. IR optimizer is still folding the comparison to compile-time-false. Source-level C restructuring won't dodge it; needs IR-pass-level work to identify which combine pass mis-evaluates and why. optnone stays. ## How to verify recovery ```bash cd /home/scott/claude/llvm816 git status # 3 modified files listed above cd tools/llvm-mos-build && ninja llc clang # rebuild backend cd /home/scott/claude/llvm816 bash runtime/build.sh # build runtime bash scripts/smokeTest.sh # should print "all smoke checks passed" ``` If smoke fails, the most likely cause is one of the three uncommitted files got reverted; check `git status` and re-apply. ## Diagnostic tools that worked For posterity — these are the patterns that paid off this session. ### Pre-RA vs post-RA MIR diff ```bash clang -mllvm -stop-before=regallocbasic -S ... # pre-RA clang -mllvm -stop-after=virtregrewriter -S ... # post-RA (post-virtregrewriter) ``` Diff narrow-IMG vs full-IMG post-RA MIR for the failing function. Pre-RA is identical (same IR), so the diff isolates regalloc-decision divergence. Look at every NEW pattern that appears only in the failing build — `$imgN = LDAfi` was the smoking gun for dadd. ### Pass-by-pass IR/MIR dumps ```bash clang -mllvm -print-after=w65816-lower-wide32 -S ... clang -mllvm -print-after-all -S ... 2>dump.txt ``` ### MAME debugger via xvfb-run ```bash xvfb-run -a mame apple2gs ... -debug -debugger qt -oslog -seconds_to_run N ``` With autoboot Lua: load .bin into bank 0 (skip `$C000..$CFFF` I/O), set CPU state, then `cpu.debug:bpset(addr, condition, action)` with actions like `"logerror \"...\\n\",a,x,...; go"`. `logerror` with format args goes to stdout under `-oslog`. Memory reads in expressions: `b@(addr)`, `w@(addr)`. Watchpoints: `cpu.debug:wpset(prog_space, "w", addr, len, condition, action)`. **AVOID**: `add_machine_pause_notifier` + `cpu.debug:go()` in callback — segfaults from reentrancy. `printf` in actions stays in debugger console (not -oslog). `tracelog` also debugger-console only. ### Trace methodology (find divergence point) 1. Set BPs at every `JSLpseudo` callee in the failing function. 2. Capture A/X/Y/DPF0 at each return. 3. Find first divergent return between known-good and failing builds. 4. The instruction sequence between previous-OK and first-divergent return is where the bug lives. This pattern found the dadd bug at `jsl@0x207f → __lshrsi3(0x8001_8000, 3)` in 30 minutes. Recommended. ## Memory notes referenced (Filenames under `/home/scott/.claude/projects/-home-scott-claude-llvm816/memory/`.) - `feedback_strstr9_long_haystack.md` — the hash-shell bug story. - `feedback_cpp_subset.md` — C++ subset, including the SjLj fix. - `feedback_ptr32_frame_limit.md` — was 5 days stale; updated 2026-05-07 to "DONE, 131/131 smoke green". - `feedback_jslpseudo_caller_save.md`, `feedback_libcall_img_clobber.md`, `feedback_img_slot_expansion.md`, `feedback_greedy_high_pressure.md` — related backend topics. ## Next session candidates (ranked) 1. **Commit the uncommitted fixes.** They've earned it. 2. **Greedy regalloc retry.** Cheap experiment, potentially big win. 3. **qsort source restructure.** Clear `optnone` if you're willing to reshape the algorithm. Source-level work, not backend. 4. **gmtime_r IR investigation.** Find which combine miscompiles `days >= 365L + (leap?1:0)`. IR-level, not backend.