13 KiB
Session Recovery — 2026-05-07/08
Living recovery doc. Update on every meaningful change. If session is lost, read this top-to-bottom + the memory notes referenced inside, then reread the actual diffs in tree to ground assumptions.
Headline state
- Smoke: 131/131 green.
- Active config: ptr32 (
p:32:16), full IMG0..IMG15 caller-clobber on JSL, basic regalloc at -O1+. - Working tree: clean except 3 modified files listed below; all are real fixes that haven't been committed yet.
- Branch:
main, ahead oforigin/mainby recent checkpoint commits.
Uncommitted, must keep
These are the in-flight improvements. Rebuild after applying any of them.
runtime/src/snprintf.c— removed__attribute__((optnone))fromemitULong(line 106) andsnprintf(line 303). Slot-aliasing workaround that the IMG-clobber + LDAfi-IMG fixes made unnecessary.src/llvm/lib/Target/W65816/W65816InstrInfo.cppcopyPhysRegvirtual-register short-circuit: ifSrcRegorDestRegis virtual, emit aTargetOpcode::COPYand return. Basic regalloc's InlineSpiller callsstoreRegToStackSlotwith vreg sources before final physreg assignment; without the short-circuit the unpaired- Wide32 default branch hits theunreachable.copyPhysRegIMG-to-IMG PHA-bracket: waslda src; sta dst— unbracketed clobber of A, regalloc inserted these copies between$a = COPY $img10and use-of-A. PHA/PLA bracket preserves A.
src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp— catchtab build moved BEFORE landingpad erase. Old code didLPadBB->getLandingPadInst()AFTER erasing the insts → returned nullptr → empty LSDA → catch never matched, abort. Now captures catch-clause typeinfo Constants into aDenseMap<BasicBlock*, LPadInfo>BEFORE erase; build loop reads from the saved map.
To commit when ready (do NOT amend; create new commits):
git add runtime/src/snprintf.c \
src/llvm/lib/Target/W65816/W65816InstrInfo.cpp \
src/llvm/lib/Target/W65816/W65816SjLjFinalize.cpp
git commit -m "..." # message stub below
Suggested commit message: see "Fixes landed" section below; one commit per logical change is cleaner.
Already-committed in this session arc
Per git log --oneline -20 these are the recent checkpoint commits;
the diffs they contain are real and load-bearing.
The big ones (search by file or grep):
- JSLpseudo Defs += IMG0..IMG15 in
W65816InstrInfo.td. With the wider Defs, regalloc spills IMG-class vregs around calls instead of treating them as preserved. W65816RegisterInfo.cppeliminateFrameIndexforSTAfi: PHA-bracketed for non-A source (IMG/X/Y). Thelda dp; sta d,schain clobbered A; bracket preserves A while shifting offset by +2 between PHA and PLA. Defs=[A] kept on STAfi as safe over-approximation.W65816RegisterInfo.cppeliminateFrameIndexforLDAfi: ifDst = IMGn, appendSTA dpso the IMG slot actually receives the loaded value. Previously only loaded into A; downstreamCOPY $x = $imgN(=ldx $D?) read garbage. This was the smoking gun fordadd(1.5, 2.5) → 0x4010_0000_3000_3000.W65816LowerWide32.cppfixed-point erase loop. Was single-pass; REG_SEQUENCE got skipped if a not-yet-erased COPY consumer kept it alive at the iteration moment. Removed ~40 dead Wide32 vregs from__adddf3's pre-RA MIR.src/llvm/test/CodeGen/W65816/i64-first-arg-img16.llrelaxedstx 0xd / sta 0xdto0x{{[cd]}}(regalloc now picks IMG8..15 too).
Fixes landed (full list with rationale)
Each entry: what / why / where / what regression it would cause if reverted.
A. Hash-shell DELETE bug → IMG caller-clobber
Symptom: dbDelete("age") returned 0 ("not found") instead of 1.
DELETE never ran; COUNT stayed at 2.
Root cause: dbDelete did stx 0xd0 to save k_high, called hashKey,
then pei 0xd0 to push k_high to strcmp. hashKey used $D0 as scratch
in its loop body (sta 0xd0 storing the iterator's running-ptr-low). $D0
was clobbered by the time pei 0xd0 ran. JSLpseudo Defs only listed
A, X, Y, DPF0 — IMG slots were not modelled as caller-clobber.
Fix: JSLpseudo Defs += [IMG0..IMG15].
Cascading fallout (each required its own fix):
A1. copyPhysReg vreg fallback
storeRegToStackSlot's unpaired-Wide32 default branch hit unreachable
when called with a vreg source. Basic regalloc's InlineSpiller does
this. Fix: short-circuit virtual-reg cases to TargetOpcode::COPY.
A2. LowerWide32 fixed-point erase
Single-pass erase left ~40 dead Wide32 vregs in __adddf3. Pattern:
%X:wide32 = REG_SEQUENCE ...
%Y:wide32 = COPY %X
... uses of %Y rewritten by Pass 3 ...
Single-pass: REG_SEQUENCE skipped (COPY consumer still alive), then COPY erased (now %X dead but loop already passed it). Fix: iterate until no progress.
A3. STAfi PHA-bracket
Without bracket, regalloc could schedule $img0 = COPY $a AFTER a
STAfi-with-IMG-source whose internal lda dp clobbered $a, silently
storing X's value where A's was expected.
A4. LDAfi-IMG-dest STA dp
The big one. With narrow IMG, regalloc kept Wide16 vregs in IMG
slots across calls, never needed $imgN = LDAfi %stack.X. With full
IMG, every cross-call spill needed it. The expansion only emitted
LDA d,s (load A) — never wrote to the IMG slot. Downstream
COPY $x = $imgN (= ldx $D?) read stale prior data. Manifested as
dadd(1.5, 2.5) → 0x4010_0000_3000_3000 (mantissa garbage).
Diagnostic that found it: diff post-RA MIR narrow vs full IMG. Pre-RA
MIR was identical. Full had 6 $imgN = LDAfi instances; narrow had 0.
Narrow used COPY $imgN = $a patterns instead — those work correctly.
A5. FileCheck regex
src/llvm/test/CodeGen/W65816/i64-first-arg-img16.ll expected
stx 0xd / sta 0xd. Under full IMG clobber, regalloc picks IMG8..15
($C0/$C2) for cross-call arg saves. Relaxed to 0x{{[cd]}}.
B. C++ try/catch source-level path
Two bugs blocking real clang++ -fsjlj-exceptions source code:
B1. W65816SjLjFinalize catchtab ordering
runOnFunction erased landingpad insts at line ~245, then built the
catchtab at line ~290 via LPadBB->getLandingPadInst(). By that
point, landingpads were nullptr. The build loop's if (!LP) continue;
skipped every entry. Catchtab ended with just (0,0) sentinel. LSDA
was 4 bytes of zeros. findCatch saw ctx->lsda == 0's entry and
bailed. Result: any throw aborted.
Fix: capture catch-clause typeinfo Constants into a
DenseMap<BasicBlock*, LPadInfo> BEFORE erasing landingpads; the
catchtab build loop reads from the saved map.
B2. copyPhysReg IMG-to-IMG PHA-bracket
Comment said "Caller is responsible for ensuring A is dead at this
program point (regalloc usually arranges this)." It doesn't, in
practice. Regalloc inserted IMG-to-IMG copies between $a = COPY $img10
and STAfi $a, slot. Unbracketed lda src; sta dst clobbered A.
The subsequent STAfi spilled garbage. Visible as *p = 42 after
__cxa_allocate_exception storing 42 to wrong addr (indirect-long
setup got hi-half at lo-slot).
Fix: PHA-bracket. Cost +7 cyc / +2 bytes per IMG-IMG copy (rare).
Verified end-to-end via MAME breakpoints: begin_catch entered
with correct ExcHeader, end_catch entered with A=42, doTest returns
A=42 from real C++ try { throw 42; } catch (int x) { return x; }.
C. Cleanup wins
runtime/src/snprintf.c:106— removedoptnoneonemitULong. Smoke green.runtime/src/snprintf.c:303— removedoptnoneonsnprintf. Smoke green.
Still-open work areas
Each carries a fair-warning note for whoever picks it up.
1. qsort/bsearch optnone — REMOVED 2026-05-08
Source-restructured qsort: split the inner loop into a
__attribute__((noinline)) helper qsortInner (4 args: base, cur,
size, cmp). Outer qsort just iterates i = 1..nmemb-1 and calls
qsortInner(base, base + i*size, size, cmp). This drops outer
qsort's i32-vreg simultaneous-live count below the inline-spill
OOM threshold; both halves compile cleanly at -O2 + basic regalloc.
bsearch optnone was kept-for-symmetry — once removed, it just
worked. The IMG-clobber + LDAfi-IMG-store backend fixes from
2026-05-07 had already resolved its underlying pressure issue.
Smoke 131/131 stays green.
2. gmtime_r optnone
runtime/src/timeExt.c:69. NOT a backend bug — IR-level optimization
issue (loop rotation + IndVar simplify mis-evaluating
days >= 365L + (__isLeap(...) ? 1 : 0)). Fixing requires deciding
which combine pass is wrong and why. Out of scope for backend work.
3. softDouble noinlines
runtime/src/softDouble.c:30 (dpack) and :51 (dclass). Removing
dpack noinline broke dadd this session — register pressure for
__adddf3/__muldf3/__divdf3. Architectural for the same reason as
qsort.
4. Greedy regalloc retry — TRIED, blocked
Tested 2026-05-08. Greedy fails immediately on atoi in libc.c:
LiveRangeEdit.cpp:200: void llvm::LiveRangeEdit::eliminateDeadDef(...):
Assertion `MI->allDefsAreDead() && "Def isn't really dead"' failed.
Same upstream LLVM bug class as the dadd full-IMG attempt — sub-register
pair partial defs that the regalloc treats as fully dead. Greedy is
genuinely incompatible with the W65816's split-half subreg-pair patterns
until the upstream LLVM issue is patched. Reverted to basic regalloc.
Document feedback_greedy_high_pressure.md already covers this.
5. gmtime_r optnone — TRIED, blocked
Tested 2026-05-08. Hoisting yearLen to a long local (avoiding the
double-recompute of 365L + (__isLeap ? 1 : 0)) didn't help; adding
volatile to the local also didn't help. IR optimizer is still
folding the comparison to compile-time-false. Source-level C
restructuring won't dodge it; needs IR-pass-level work to identify
which combine pass mis-evaluates and why. optnone stays.
How to verify recovery
cd /home/scott/claude/llvm816
git status # 3 modified files listed above
cd tools/llvm-mos-build && ninja llc clang # rebuild backend
cd /home/scott/claude/llvm816
bash runtime/build.sh # build runtime
bash scripts/smokeTest.sh # should print "all smoke checks passed"
If smoke fails, the most likely cause is one of the three uncommitted
files got reverted; check git status and re-apply.
Diagnostic tools that worked
For posterity — these are the patterns that paid off this session.
Pre-RA vs post-RA MIR diff
clang -mllvm -stop-before=regallocbasic -S ... # pre-RA
clang -mllvm -stop-after=virtregrewriter -S ... # post-RA (post-virtregrewriter)
Diff narrow-IMG vs full-IMG post-RA MIR for the failing function.
Pre-RA is identical (same IR), so the diff isolates regalloc-decision
divergence. Look at every NEW pattern that appears only in the failing
build — $imgN = LDAfi was the smoking gun for dadd.
Pass-by-pass IR/MIR dumps
clang -mllvm -print-after=w65816-lower-wide32 -S ...
clang -mllvm -print-after-all -S ... 2>dump.txt
MAME debugger via xvfb-run
xvfb-run -a mame apple2gs ... -debug -debugger qt -oslog -seconds_to_run N
With autoboot Lua: load .bin into bank 0 (skip $C000..$CFFF I/O),
set CPU state, then cpu.debug:bpset(addr, condition, action) with
actions like "logerror \"...\\n\",a,x,...; go". logerror with format
args goes to stdout under -oslog. Memory reads in expressions:
b@(addr), w@(addr). Watchpoints: cpu.debug:wpset(prog_space, "w", addr, len, condition, action).
AVOID: add_machine_pause_notifier + cpu.debug:go() in callback —
segfaults from reentrancy. printf in actions stays in debugger console
(not -oslog). tracelog also debugger-console only.
Trace methodology (find divergence point)
- Set BPs at every
JSLpseudocallee in the failing function. - Capture A/X/Y/DPF0 at each return.
- Find first divergent return between known-good and failing builds.
- The instruction sequence between previous-OK and first-divergent return is where the bug lives.
This pattern found the dadd bug at jsl@0x207f → __lshrsi3(0x8001_8000, 3)
in 30 minutes. Recommended.
Memory notes referenced
(Filenames under /home/scott/.claude/projects/-home-scott-claude-llvm816/memory/.)
feedback_strstr9_long_haystack.md— the hash-shell bug story.feedback_cpp_subset.md— C++ subset, including the SjLj fix.feedback_ptr32_frame_limit.md— was 5 days stale; updated 2026-05-07 to "DONE, 131/131 smoke green".feedback_jslpseudo_caller_save.md,feedback_libcall_img_clobber.md,feedback_img_slot_expansion.md,feedback_greedy_high_pressure.md— related backend topics.
Next session candidates (ranked)
- Commit the uncommitted fixes. They've earned it.
- Greedy regalloc retry. Cheap experiment, potentially big win.
- qsort source restructure. Clear
optnoneif you're willing to reshape the algorithm. Source-level work, not backend. - gmtime_r IR investigation. Find which combine miscompiles
days >= 365L + (leap?1:0). IR-level, not backend.