65816-llvm-mos/tests/benchSummary_2026_06_03.md
Scott Duensing 3388f3c5a5 More updates
2026-06-03 20:46:31 -05:00

6.7 KiB

// Benchmark cycle regression sweep — 2026-06-03 // // Methodology // // - scripts/benchCyclesPrecise.sh harness (default Layer 1, no // W65816_CC_EXTRA), measured via emu.time() inside MAME. // - Three back-to-back runs; numbers were byte-identical across // runs (emu.time() is deterministic when MAME is driven from the // same Lua boot script). No MAME flakiness involved. // - Compared against the most recent recorded baseline in each // bench's MEMORY.md entry (see "Source" column). // // Suspected cause of regressions: commit 09f7405 (2026-06-03, // "Updates") removed three major peephole/pass bodies: // // - W65816UnLSR.cpp lost processReturnedCounter (-241 lines). // This was the strLen-style counter-PHI-to-pointer-PHI undo that // enabled the downstream Y-as-counter peephole in StackRelToImg. // Without it, strLen / strcpy / memcmp loops emit the // pre-2026-05-25 22 cyc/iter form instead of the 13 cyc/iter // form. // - W65816SepRepCleanup.cpp lost the store-forwarding pass body // (-370 lines including 358 comment+code lines). This was the // PHI-copy memory-to-memory eliminator that fed djb2Hash and // popcount. // - W65816WidenAcc16.cpp lost the Phase-2 PHI cycle widening // scaffolding (-214 lines). Effect on benches less direct but // correlates with djb2Hash, popcount, memcmp regressions. // // Commit message claims "Updates" — diff is a wholesale removal of // "disabled" / "experimental" #if-0'd code blocks. Some of those // blocks were actually wired in (UnLSR.processReturnedCounter was // not gated behind any disable; the call site at line ~107 was // Changed |= processReturnedCounter(L); per memory, with the // "disabled" comment now showing the call removed). // // // Results // // benchCyclesPrecise.sh on commit HEAD (09f7405), default Layer 1 // (no -mllvm -w65816-dbr-safe-ptrs), all benches 3x consistent. // // | Bench | Baseline | Current | Delta % | Regression? | Baseline source | // |---------------|---------:|--------:|---------:|:-------------|----------------------------------------------| // | bsearch | 767 | 767 | +0.0% | NO | feedback_remaining_optimization_opportunities | // | bubbleSort | 15004 | 15004 | +0.0% | NO | feedback_layer2_loop_miscompile (L1 baseline) | // | crc32 | n/a | 55839 | n/a | NO BASELINE | first measurement | // | djb2Hash | 2387 | 2728 | +14.3% | YES | feedback_mul_const_strength_reduce 2026-05-25 | // | dotProduct | 1620 | 1620 | +0.0% | NO | feedback_dpf0_setup_collapse 2026-05-15 | // | fib | 11594 | 11764 | +1.5% | marginal | feedback_stackrel_dead_store_fib 2026-05-27 | // | memcmp | 716 | 887 | +23.9% | YES | feedback_dp_dead_store_elim 2026-05-25 | // | popcount | 1194 | 1228 | +2.8% | YES (mild) | feedback_popcount_carry_trick 2026-05-26 | // | strcpy | 1108 | 1705 | +53.9% | YES | feedback_stackrel_dead_store_elim 2026-05-27 | // | strLen | 767 | 2643 | +244.6% | YES (severe) | feedback_y_as_counter_strlen 2026-05-27 | // | sumOfSquares | n/cmp | 6820 | n/a | NO (improved)| harness change since 18755 number | // | globalArr8Sum | n/a | 3922 | n/a | NO BASELINE | first measurement | // | globalArrFill | n/a | 8184 | n/a | NO BASELINE | first measurement | // | globalArrSum | n/a | 8525 | n/a | NO BASELINE | first measurement | // // // Notes per regression // // strLen +244.6% The 767-cyc baseline came from the y-as-counter // peephole in W65816StackRelToImg, whose INPUT // pattern is produced by W65816UnLSR's // processReturnedCounter (the strLen-style undo). // With that undo removed, StackRelToImg sees the // LSR-widened counter-PHI form and bails to // generic codegen. The peephole code is still // present in StackRelToImg.cpp lines 2941, 3106 — // but it never matches. // // strcpy +53.9% Same root cause: UnLSR's processReturnedCounter // also fed the strcpy-style pointer-walk shapes. // The "stack-rel dead-store elim" peephole in // StackRelToImg (which produced the 1108 cyc // baseline) is upstream of the pattern collapse // that UnLSR removed. // // memcmp +23.9% Two-pointer deref loop; same family of patterns. // The Pass-2c DPF0-setup-collapse in // W65816StackSlotCleanup (which produced 818 cyc // and was later tightened to 716 via dead-store // elim) is still present, but its upstream // structural shape isn't being produced. // // djb2Hash +14.3% Hash loop with i32 accumulator. The // store-forwarding pass removed from // SepRepCleanup was the eliminator for the PHI // memory copy at end of body (2387-cyc baseline // required it). // // popcount +2.8% Slight regression; the carry-trick peephole // is still present (StackRelToImg.cpp line 2541), // but the lagged-PHI store-forwarding step it // relied on is gone, costing 3 cyc/iter * 16 iters // plus a few cleanup cycles at exit. // // fib +1.5% Marginal. Stack-rel dead-store-elim still // present per StackRelToImg.cpp; the small // regression may be CMake / regalloc noise from // the unrelated WidenAcc16 changes. // // // Verdict: REGRESSIONS FOUND. // // Five clear regressions (strLen, strcpy, memcmp, djb2Hash, popcount) // and one marginal (fib) attributable to commit 09f7405 (2026-06-03, // "Updates") which removed perf-critical pass bodies from // W65816UnLSR.cpp, W65816SepRepCleanup.cpp, and W65816WidenAcc16.cpp. // // Fix path (not this agent): restore the deleted blocks (especially // W65816UnLSR::processReturnedCounter and its registration in // runOnFunction), then re-run this sweep to confirm strLen 2643 → // 767, strcpy 1705 → 1108, memcmp 887 → 716, djb2Hash 2728 → 2387. // // Files unchanged by this agent: src/llvm/lib/Target/W65816/*. // New file created by this agent: tests/benchSummary_2026_06_03.md // (this file).