65816-llvm-mos/tests/benchSummary_2026_06_03.md
Scott Duensing 3388f3c5a5 More updates
2026-06-03 20:46:31 -05:00

119 lines
6.7 KiB
Markdown

// Benchmark cycle regression sweep — 2026-06-03
//
// Methodology
//
// - scripts/benchCyclesPrecise.sh harness (default Layer 1, no
// W65816_CC_EXTRA), measured via emu.time() inside MAME.
// - Three back-to-back runs; numbers were byte-identical across
// runs (emu.time() is deterministic when MAME is driven from the
// same Lua boot script). No MAME flakiness involved.
// - Compared against the most recent recorded baseline in each
// bench's MEMORY.md entry (see "Source" column).
//
// Suspected cause of regressions: commit 09f7405 (2026-06-03,
// "Updates") removed three major peephole/pass bodies:
//
// - W65816UnLSR.cpp lost processReturnedCounter (-241 lines).
// This was the strLen-style counter-PHI-to-pointer-PHI undo that
// enabled the downstream Y-as-counter peephole in StackRelToImg.
// Without it, strLen / strcpy / memcmp loops emit the
// pre-2026-05-25 22 cyc/iter form instead of the 13 cyc/iter
// form.
// - W65816SepRepCleanup.cpp lost the store-forwarding pass body
// (-370 lines including 358 comment+code lines). This was the
// PHI-copy memory-to-memory eliminator that fed djb2Hash and
// popcount.
// - W65816WidenAcc16.cpp lost the Phase-2 PHI cycle widening
// scaffolding (-214 lines). Effect on benches less direct but
// correlates with djb2Hash, popcount, memcmp regressions.
//
// Commit message claims "Updates" — diff is a wholesale removal of
// "disabled" / "experimental" #if-0'd code blocks. Some of those
// blocks were actually wired in (UnLSR.processReturnedCounter was
// not gated behind any disable; the call site at line ~107 was
// `Changed |= processReturnedCounter(L);` per memory, with the
// "disabled" comment now showing the call removed).
//
//
// Results
//
// benchCyclesPrecise.sh on commit HEAD (09f7405), default Layer 1
// (no -mllvm -w65816-dbr-safe-ptrs), all benches 3x consistent.
//
// | Bench | Baseline | Current | Delta % | Regression? | Baseline source |
// |---------------|---------:|--------:|---------:|:-------------|----------------------------------------------|
// | bsearch | 767 | 767 | +0.0% | NO | feedback_remaining_optimization_opportunities |
// | bubbleSort | 15004 | 15004 | +0.0% | NO | feedback_layer2_loop_miscompile (L1 baseline) |
// | crc32 | n/a | 55839 | n/a | NO BASELINE | first measurement |
// | djb2Hash | 2387 | 2728 | +14.3% | YES | feedback_mul_const_strength_reduce 2026-05-25 |
// | dotProduct | 1620 | 1620 | +0.0% | NO | feedback_dpf0_setup_collapse 2026-05-15 |
// | fib | 11594 | 11764 | +1.5% | marginal | feedback_stackrel_dead_store_fib 2026-05-27 |
// | memcmp | 716 | 887 | +23.9% | YES | feedback_dp_dead_store_elim 2026-05-25 |
// | popcount | 1194 | 1228 | +2.8% | YES (mild) | feedback_popcount_carry_trick 2026-05-26 |
// | strcpy | 1108 | 1705 | +53.9% | YES | feedback_stackrel_dead_store_elim 2026-05-27 |
// | strLen | 767 | 2643 | +244.6% | YES (severe) | feedback_y_as_counter_strlen 2026-05-27 |
// | sumOfSquares | n/cmp | 6820 | n/a | NO (improved)| harness change since 18755 number |
// | globalArr8Sum | n/a | 3922 | n/a | NO BASELINE | first measurement |
// | globalArrFill | n/a | 8184 | n/a | NO BASELINE | first measurement |
// | globalArrSum | n/a | 8525 | n/a | NO BASELINE | first measurement |
//
//
// Notes per regression
//
// strLen +244.6% The 767-cyc baseline came from the y-as-counter
// peephole in W65816StackRelToImg, whose INPUT
// pattern is produced by W65816UnLSR's
// processReturnedCounter (the strLen-style undo).
// With that undo removed, StackRelToImg sees the
// LSR-widened counter-PHI form and bails to
// generic codegen. The peephole code is still
// present in StackRelToImg.cpp lines 2941, 3106 —
// but it never matches.
//
// strcpy +53.9% Same root cause: UnLSR's processReturnedCounter
// also fed the strcpy-style pointer-walk shapes.
// The "stack-rel dead-store elim" peephole in
// StackRelToImg (which produced the 1108 cyc
// baseline) is upstream of the pattern collapse
// that UnLSR removed.
//
// memcmp +23.9% Two-pointer deref loop; same family of patterns.
// The Pass-2c DPF0-setup-collapse in
// W65816StackSlotCleanup (which produced 818 cyc
// and was later tightened to 716 via dead-store
// elim) is still present, but its upstream
// structural shape isn't being produced.
//
// djb2Hash +14.3% Hash loop with i32 accumulator. The
// store-forwarding pass removed from
// SepRepCleanup was the eliminator for the PHI
// memory copy at end of body (2387-cyc baseline
// required it).
//
// popcount +2.8% Slight regression; the carry-trick peephole
// is still present (StackRelToImg.cpp line 2541),
// but the lagged-PHI store-forwarding step it
// relied on is gone, costing 3 cyc/iter * 16 iters
// plus a few cleanup cycles at exit.
//
// fib +1.5% Marginal. Stack-rel dead-store-elim still
// present per StackRelToImg.cpp; the small
// regression may be CMake / regalloc noise from
// the unrelated WidenAcc16 changes.
//
//
// Verdict: REGRESSIONS FOUND.
//
// Five clear regressions (strLen, strcpy, memcmp, djb2Hash, popcount)
// and one marginal (fib) attributable to commit 09f7405 (2026-06-03,
// "Updates") which removed perf-critical pass bodies from
// W65816UnLSR.cpp, W65816SepRepCleanup.cpp, and W65816WidenAcc16.cpp.
//
// Fix path (not this agent): restore the deleted blocks (especially
// W65816UnLSR::processReturnedCounter and its registration in
// runOnFunction), then re-run this sweep to confirm strLen 2643 →
// 767, strcpy 1705 → 1108, memcmp 887 → 716, djb2Hash 2728 → 2387.
//
// Files unchanged by this agent: src/llvm/lib/Target/W65816/*.
// New file created by this agent: tests/benchSummary_2026_06_03.md
// (this file).