51 KiB
llvm816 gap-closure: comprehensive step plan
This is the full, ordered, dependency-aware plan for closing the 18 feature gaps identified in the 2026-05-30 audit, rolled together with every prerequisite the adversarial reviewers found. The earlier "master plan" stopped at milestones + per-item criticism; this document is the actionable step list.
Total effort (reviewer-adjusted): roughly 700-1000 hours across all 18 items plus the Phase 0 + Phase 1 prerequisites the reviewers added. Original briefs sum to ~280h; reviewers added ~420h of hidden work most planners missed.
Default audience: expert C/C++ developer porting code to the Apple IIgs, with a secondary path for retrocomputing tinkerers who want source-level debugging.
The plan is organized in six phases, with hard dependency arrows. Each step is a concrete, individually-shippable piece of work; nothing is "TBD" at the step level.
Phase 0 - Architectural decisions (DECIDED 2026-05-31)
| # | Topic | Decision |
|---|---|---|
| 0.1 | EH model | SJLJ + _Unwind_RaiseException-over-SJLJ stub |
| 0.2 | LTO model | ThinLTO |
| 0.3 | Sanitizer scope | UBSan-min + coverage only (no ASan) |
| 0.4 | localvars split |
Full -O0 + -O2/IMG in one shot (override of recommendation) |
| 0.5 | cxxstdlib split |
cxxchrono first, then cxxstream+format+path |
| 0.6 | Sprite harness | Standalone first, desktop-coupled follow-up |
| 0.7 | Resource fork delivery | AppleSingle blob |
| 0.8 | clangdwarffix approach |
BPF-style MAI flag spike first, escalate to new relocs only if needed |
| 0.9 | IigsSoundParmT |
Fix as breaking change (existing demos likely already broken) |
| 0.10 | rename cross-dir |
Implement copy+delete fallback now (override of recommendation) |
Impact of overrides:
- 0.4 (full localvars in one landing): Phase 3.2 and 3.3 collapse into a single ~55h item. Risk profile is higher (multi-stage delivery is now one-stage), but with Phase 1.5 DBG_VALUE audit landed first the foundation is solid. Expect 3-5 additional clang DWARF bugs to surface during -O2 / IMG work; budget contingency.
- 0.10 (rename copy+delete fallback): Adds ~6h to Phase 2.3. Real work is the error-recovery path (partial-copy state, partial-delete state, source vanished mid-operation). Use the existing GS/OS class-1 calls (Open/Read/Write/Create/Close/Destroy) to compose the fallback; no new toolbox wrappers required.
Why these (rationale)
0.1 Pick exception-handling model (sanitizers, unwinder, cxxstdlib all depend on this)
Three options:
- A. Keep SJLJ as default. Ship
_Unwind_RaiseException-over-SJLJ stub for third-party C++ libraries. ~20h. Loses no functionality. Strongly recommended. - B. Build a real DWARF CFI unwinder (libunwind port + backend CFI emission
- per-MIR-pass CFI annotation +
__jsl_indirhand-CFI). ~260h floor. Real throw across non-instrumented frames.
- per-MIR-pass CFI annotation +
- C. Make EH model a Subtarget feature with two MCAsmInfo subclasses, ship both. ~20h extra plumbing on top of (A) or (B).
Recommendation: A. Reviewer for unwinder called (B) a multi-week
structural change and a foot-in-the-door for a long string of follow-on bugs.
0.2 Pick LTO model
- A. ThinLTO. Preserves per-TU codegen attachments so Lua's per-file
-mllvm -regalloc=basickeeps working. Summary-based inlining decisions less prone to over-inlining the project already fought (feedback_lapi_inline_threshold.md,feedback_coremark_matrix_test_regression.md). - B. Full LTO with whole-module merge. Simpler tool, harder integration. Per-TU regalloc becomes unrepresentable.
Recommendation: A. Reviewer called the original "full LTO is simpler" choice exactly backwards for this codebase.
0.3 Pick sanitizer scope
- A. UBSan-minimal + coverage only. Achievable. ~22h.
- B. UBSan + ASan + coverage. ASan's 8:1 shadow-memory model does not fit the 65816 (full 16MB → 2MB shadow; programs run in 1-2 banks). Reviewer called the brief wrong on architectural grounds.
Recommendation: A. Document ASan as out-of-scope.
0.4 Pick localvars split
- A. Ship -O0 stack-resident locals only (20-30h). Faster delivery, narrower payload.
- B. Ship -O0 + -O2/IMG-resident + location-list crossing PCs + inlined subroutines (50-80h).
Recommendation: A first, then (B) as a follow-up. Reviewer's split point is the natural project boundary.
0.5 Pick cxxstdlib split
- A. Ship
cxxchronoonly (etl::chrono + libc time hooks). 3-4h. - B. Ship
cxxstream+format+path(string_stream + format + iigs::path +<cmath>shim + smoke). 12-15h. - C. Both, but as separate landings.
Recommendation: C, with A landing first. Reviewer noted mixing them hides
the format/<cmath> rabbit hole.
0.6 Pick sprite-engine harness
- A. Standalone. sprite.c does its own
$C029+ SCB + palette init. UserunInMame.shbare-metal harness. - B. Desktop-coupled. Relies on
paintDesktopBackdrop, runs through GS/OS Finder launch (runViaFinder.sh). - C. Both, ship A first.
Recommendation: C.
0.7 Pick resource-fork delivery shape
- A. AppleSingle blob (one file = data fork + resource fork; cadius auto-detects). Cleaner.
- B.
_ResourceFork.binsidecar (cadiusProdos_Add.c:386supports it). Cleaner separation.
Recommendation: A. Verify via a 1-hour disposable spike before writing the bundler.
0.8 Pick clangdwarffix approach
- A. BPF-style one-liner:
MAI->setDwarfUsesRelocationsAcrossSections(false). Reviewer citesBPFTargetMachine.cpp:87as the precedent. If it works the whole 14h plan collapses to ~1h. - B. New
R_W65816_DATA32+R_W65816_PCREL32relocs through the full pipeline (MC → ELF writer → link816 →pc2line.py).
Recommendation: Spike A first (10 minutes). Fall back to B only if A introduces new failures. Reviewer documented (B) is a strict superset of work covered by (A).
0.9 Decide whether to fix IigsSoundParmT as a breaking change
The current in-tree struct (6 bytes) does not match ORCA's authoritative
SoundParamBlock (18 bytes). iigsPlayDocSample is almost certainly silently
broken today. Either:
- A. Fix the struct as a breaking change. Existing demos relying on it will need migration. Reviewer believes none actually work today, so the "breakage" is theoretical.
- B. Add new corrected API (
iigsPlaySoundV2?), leave broken one.
Recommendation: A. Item docram cannot be delivered honestly otherwise.
0.10 Decide rename() / cross-directory policy
GS/OS ChangePath ($2004) is rename-in-place-only. POSIX rename(old, new)
across directories is impossible without an explicit copy-then-delete path.
- A. Reject cross-dir new-paths up front with EINVAL.
- B. Implement copy-then-delete fallback. ~6h, error-recovery is the hard part.
- C. Accept divergence; document loudly.
Recommendation: A initially, with (B) deferred until a real user complains.
Phase 1 - Foundational prerequisites (everything later depends on these)
These are NOT in the original 18 items but were surfaced as preconditions by multiple reviewers. They unblock the actual feature work.
1.1 GS/OS fopen hang investigation [BLOCKS: resourcemgr runtime, tmpfile real path, cxxstdlib::filesystem, posixfile real I/O]
JSL $E100A8 doesn't return under real GS/OS 6.0.2 (per STATUS.md).
- Reproduce under MAME with the
-debug -debugger qt -oslogbpset workflow documented inSESSION_RECOVERY.md. - Bisect: ABI mismatch, stack-shape mismatch, missing tool init, DP/SP layout.
- If unsolvable in a 4-8h budget: document the limitation and route all
GS/OS-dependent items through the stub linker (
iigsGsosStub.s) with the honest-failure sentinel from step 1.2.
Effort: 8-16h investigation. If fix is found, +4-8h to land. If unsolvable, project moves on with stub-only paths for affected items.
1.2 Stub-mode sentinel for honest iigsGsosStub.s [BLOCKS: tmpfile, posixfile, cursor]
iigsGsosStub.s currently makes every GS/OS call succeed silently.
__gsosAvailable() returns TRUE in stub mode, so newly-added wrappers
(gsosDestroy, gsosChangePath, prefix/dir/info) will fall through the
catch-all stub and appear to succeed while doing nothing.
Two options to fix:
- Add explicit error stubs for each new wrapper (returns -1 / sets errno).
- Add a
__gsosIsRealImpl()sentinel that distinguishes "real GS/OS linked" from "universal-success stub linked".
Recommendation: Sentinel. ~3h. One source of truth.
1.3 FK_Data_4 → R_W65816_*32 reloc fix [BLOCKS: clangdwarffix → debugger → localvars → profiler]
Independent of Phase 0.8 choice — once the BPF-style spike either passes or fails, this is the actual step.
Reviewer surfaced multiple landmines the original plan missed:
ELFObjectWriter::recordRelocation(MC/ELFObjectWriter.cpp:1329-1349) converts in-section diffs to PC-relative. Need BOTHR_W65816_DATA32ANDR_W65816_PCREL32.link816.cpp:1275has a hardcodedr.offset + 3 > sec.sizewidth check insidewriteDebugSidecarthat must become reloc-type-driven.- Reloc-type emission must land BEFORE the MC change starts emitting the new
types or every intermediate
-gbuild dies on unknown reloc. - A
ninja clean && ninjais required (TableGen-emitted enum dependencies do not play well with incremental builds).
Steps:
- (a) Spike
setDwarfUsesRelocationsAcrossSections(false)inW65816MCAsmInfo. Rebuild clang,xxd .debug_lineof a-ghello.c, verify non-zerounit_length/header_length. If green: skip to step 1.3.h. - (b) Add
R_W65816_DATA32+R_W65816_PCREL32toW65816FixupKinds.h/W65816AsmBackend.cpp. - (c) Extend
W65816ELFObjectWriter::getRelocTypeto dispatch FK_Data_4 byIsPCRel. - (d) Add 4-byte reloc handlers to
link816.cpp::applyReloc(DATA32 = writesectionBase + addend; PCREL32 = writetarget - patchAddr + addend). - (e) Generalize the
r.offset + 3 > sec.sizewidth check to use a small switch on reloc type. - (f) Land link816 + AsmBackend in one commit (so intermediate builds don't die). Then land the MC switch that starts emitting the new types.
- (g) Update
pc2line.pyto use the now-correctunit_length/header_length, keeping the tolerant zero-fallback for older artifacts. - (h) Audit
emitAbsoluteSymbolDiff/emitDwarfUnitLength/makeEndMinusStartExprcallers; verify.debug_frame,.debug_loclists,.eh_framealso work. - (i) Drop
llvm-dwarfdump consumes without warningsfromshipsAs—EM_NONEwill still warn. File EM_ assignment as a separate gap item.
Effort: 1h (best case, MAI flag works) or 14-20h (full reloc path). Risk HIGH on rebase pain.
1.4 Backend prerequisites bundle [BLOCKS: sanitizers, lto, unwinder]
Three small backend changes that unblock multiple items:
- (a)
setOperationAction(ISD::RETURNADDR, MVT::i32, Expand)inW65816ISelLowering.cpp. Today any code calling__builtin_return_address(0)ICEs clang (since pointers are i32 but RETURNADDR is registered for i16 Expand only). Required by UBSan's caller-pc dedup AND by user code. ~30 min. - (b)
setOperationAction(ISD::TRAP, MVT::Other, Custom)+ lower toBRK_pseudothat writes a sentinel to$70before halting. Required by-fsanitize-trap=undefined. ~2h. - (c) Minimal
W65816TTI(TargetTransformInfo) returning 4× generic cost for i32 ops and 20× for soft-float libcalls. Required by LTO inliner so it doesn't over-inline based on generic cost defaults. ~6h.
Effort: ~9h total. Land as three separate commits.
1.5 DBG_VALUE preservation audit across custom MIR passes [BLOCKS: -O2 localvars]
Custom MIR passes (W65816StackRelToImg, W65816StackSlotMerge,
W65816SepRepCleanup, W65816LowerWide32, W65816ImgCalleeSave,
W65816SpillToX, W65816TiedDefSpill) only use getDebugLoc() for source-line
info. None call MachineInstr::transferDbgValues() when slots move/coalesce
or when stack slots get promoted to IMG slots.
For each pass: grep for slot/register replacement. Each call site that substitutes one operand for another must propagate DBG_VALUE.
Effort: 8-15h. Without this, -O2 locals are vapor regardless of how good the DWARF parser is.
1.6 IigsSoundParmT correction [BLOCKS: docram]
Phase 0.9 decided this is a breaking change. Steps:
- Replace 6-byte struct in
runtime/include/iigs/sound.hwith ORCA's 18-byte layout (Pointer waveStart (4B) / Word waveSize pages (2B) / Word freqOffset (2B) / Word docBuffer (2B) / Word bufferSize (2B) / Pointer nextWavePtr (4B) / Word volSetting (2B)). - Rewrite
iigsPlayDocSampleto populate the corrected struct. Move channel out of the struct intoFFStartSound's arg0. - Audit existing callsite at
smokeTest.sh:1147and migrate. - Update
README.md:144-147andSTATUS.mdclaim that DOC-RAM staging is not wrapped — those lines are about to be wrong. - Verify under real GS/OS or a known-good MAME version (silence vs. audio is the validation gate).
Effort: 4-6h.
1.7 Build/harness prerequisites bundle
- (a)
runInMame.sh --check-u8 <addr>=<val>for byte-level SHR pixel checks. Required by sprites. ~1h. - (b)
runViaFinder.sh --data /PATH=fileinjection. Required by any GS/OS demo with file I/O (tmpfile,posixfileGS/OS path, eventuallycxxstdlib::filesystem). ~1h. - (c) buildGno-launched MAME smoke harness. Currently smoke runs C++ via
inline cpp HEREDOCs at build-time only; the
cxxsmoke,cxxstdlib, andcursorsmoke checks need actual MAME-launched OMF execution. Mirrortests/lua/pattern. ~4h. - (d) Fix
softDouble.o.bakinruntime/(15KB stale dated May 1). Required beforebuildsystemcan dofile(GLOB)overruntime/*.o. Either delete the .bak or generate the imports manifest fromruntime/build.sh. ~30 min. - (e) Generate
W65816RuntimeImports.cmakefromruntime/build.sh(or have build.sh emit a manifest). Single source of truth for the runtime .o list. ~2h.
Effort: ~9h.
1.8 srand seeding + ReadTimeHex toolbox call [BLOCKS: tmpfile uniqueness, posixfile mkstemp]
extras.c:124 seeds rand() to constant 1. mkstemp's claimed uniqueness
guarantee is a lie without time-based seeding.
- Expose
ReadTimeHex($0D03) iniigsToolbox.s(currently absent). - Add
srandhook incrt0Gsos.s+crt0Gno.sthat reads time and seeds.
Effort: ~2-3h.
1.9 <cmath> C++ shim [BLOCKS: cxxstdlib (format / chrono with FP)]
clang++ on llvm-mos has no system C++ stdlib; #include <cmath> fails. ETL's
format.h #includes <cmath> when ETL_USING_FORMAT_FLOATING_POINT=1.
- Create
runtime/include/c++/cmaththat pulls<math.h>(already extern-C-wrapped) and exportsstd::aliases for the libc functions. - Optionally add
<cstdlib>,<cstddef>shims following the same pattern. - Decide
ETL_USING_FORMAT_FLOATING_POINTdefault policy inetl_profile.h: recommend OFF by default with--layer2opt-in for FP format builds.
Effort: ~3h.
1.10 PATH_MAX and friends in limits.h [BLOCKS: posixfile]
PATH_MAX is not defined anywhere. Add to runtime/include/limits.h with a
comment tying it to GSString.length being u16 and the practical
NUL-terminated-path-fits-in-256-bytes rule.
Effort: ~30 min.
1.11 Weak-extern survival policy for LTO [BLOCKS: lto] [DONE]
libc.c declares dozens of __attribute__((weak)) extern GS/OS calls
(gsosOpen/Read/Write/...). Under LTO, the inliner may decide a weak-extern
is undefined and propagate that as constant 0 / NULL through callers, then DCE
the surrounding code.
- Marked all weak-extern decls in
libc.cwith__attribute__((weak, retain, used)): the GS/OS dispatchers (gsosOpen/Read/Write/Close/GetEOF/SetEOF/SetMark/GetMark/Create),__gsosIsRealImpl,__putByte,__getByte,__putByteErr,__heap_start,__heap_end.usedkeeps the compiler from dropping references;retainsurvives linker GC; both are no-ops in non-LTO builds. libcxxabi.c::abiRunCxaAtexit(__run_cxa_atexit) annotated with__attribute__((retain, used))— its only callers live in crt0*.s (jsl __run_cxa_atexit), which is invisible to LTO's IR view, so without the attributes LTO would strip the body and crt0 would JSL into the weak no-op fallback in libgcc.s and C++ global dtors would never run.- Definitions in
libcGno.cleft unannotated: link-pull-in from libc.c's weak-externs already keeps them alive; the LTO hazard is on the declaration side, not the definition side (the linker pulls libcGno.o in to resolve the libc.c weak-externs regardless of LTO). - 145 smoke checks pass.
1.12 LTO × Layer 2 silent-miscompile gate [BLOCKS: lto]
-mllvm -w65816-dbr-safe-ptrs is per-TU. Mixing in an LTO set produces silent
wrong code.
Build the gate FIRST, before any LTO codegen work:
- Embed Layer 2 flag in IR as a module-level attribute on every TU.
- In the LTO driver pre-pass, hard-fail if attributes disagree.
Effort: ~3h.
1.13 ELF EM_ assignment [BLOCKS: clangdwarffix, llvm-dwarfdump tooling]
llvm-dwarfdump warns persistently because EM_NONE is set on output.
Assign a real (vendor-private if needed) EM_ value.
Effort: ~2h.
Phase 1 total: ~60-90h.
Phase 2 - M1 quick wins (parallel, no DWARF dependency)
These items have no cross-dependencies and can run concurrently once Phase 1 lands the build-harness prerequisites.
2.1 clangdwarffix (continued from Phase 1.3)
Phase 1.3 covered the reloc plumbing. Remaining work:
- Update smoke checks at
smokeTest.sh:5347(encodes 3-byte width — the new 4-byte LE address starts with the same 3 LE bytes, so green, but fragile). - Add
pc2line.pycleanup to drop the zero-length fallback. - Update docs (
USAGE.md,STATUS.md) to drop the "llvm-dwarfdump warns" caveat — depends on Phase 1.13.
Effort: ~3h after Phase 1.3 + 1.13.
2.2 hexfloat (%a / %A printf)
- Decide subnormal canonical form (recommend
0x0.{mantissa}p-1022). - Decide trailing-zero stripping policy (recommend glibc-style: strip when precision unspecified).
- Implement
emitHexFloatinruntime/src/snprintf.cwith local width/leftAlign/zeroPad arithmetic (do NOT reuseemitNumber's monolithic numeric body — only use it for the exponent). - Use 4 u16 words instead of u64 shifts to dodge i64-codegen surprises (>>52 and 12-bit mask paths).
- Bring
%f/%g/%eto Inf/NaN parity OR document the asymmetry (don't half-do it). - Add a new smoke probe block (don't extend the existing 0x7f bitmap — used
by two checks at
smokeTest.sh:2407and:2581). - Update
STATUS.md:48-52(printf conversion table) and snprintf.c banner at lines 21-23.
Effort: 6-8h.
2.3 tmpfile / tmpnam / rename
Following Phase 0.10's decision (copy+delete fallback for cross-dir rename):
- Per-FILE owned name buffer (extend FILE struct or use parallel
tmpNames[MFS_MAX_FILES][L_tmpnam]table). Update__mfs[]initializer. - Add
gsosDestroy($2002 pCount=1) andgsosChangePath($2004) wrappers iniigsGsos.s+iigsGsosStub.s(real stub semantics from Phase 1.2). - Promote
remove()from mfs-only to mfs-then-GS/OS-Destroy. - Promote
tmpfile()from stub: generate unique name viatmpnam, open O_CREAT|O_EXCL, set the auto-delete-on-close flag in the FILE. - Promote
tmpnam()from stub: read time via Phase 1.8 srand seed, format/RAM5/T{16-hex-chars}.TMPor similar. - Promote
rename()from stub:- Fast path: if new-path is in the same directory, route to ChangePath.
- Cross-dir copy+delete fallback: Open source RDONLY, Create destination, chunked Read/Write loop (8KB buffer), Close both, Destroy source. Error recovery: if Write fails mid-loop, Destroy destination + return -1. If final Destroy of source fails, leave dest in place + return -1 with errno set + emit a debug log line (destructive partial-state, but the data is preserved). Source-vanished-mid-op is rare under GS/OS (no concurrent process); leave as best-effort.
- Use GSString256 stack scratch (already present at
__gsosPathBufin libc.c).
- Update mfs-path detection auto-detect
/vs:separator. - Smoke tests:
- create + write + close + remove + verify destroyed.
- rename within same dir (ChangePath path).
- rename across dirs (copy+delete fallback) — write 10KB file, verify contents byte-identical post-rename, verify source gone.
Effort: 16-18h (was 10-12h; +6h for copy+delete fallback per Phase 0.10).
2.4 docram (DOC-RAM sample upload)
Phase 1.6 already corrected IigsSoundParmT. Remaining work:
- Add
iigsLoadDocSample(const int8_t *wave, uint16_t size, uint16_t docOffset)wrapper aroundWriteRamBlocktoolbox call. - Update
iigsPlaySoundV2/iigsPlayDocSampleto consume corrected struct. - Add
demos/helloSample.cstandalone demo. - Wire
runtime/src/sound.cintodemos/build.sh(currently missing). - Add standalone MMStartUp+SoundStartUp helper to
iigs/sound.h(sincestartdesk()is too heavy for a CLI-style sample probe). - Smoke test: WriteRamBlock returns cleanly + a marker store fires.
Effort: 6-8h after Phase 1.6.
2.5 cursor helpers
- Add
IigsCursorTtypedef toruntime/include/iigs/toolbox.h. - Add
runtime/src/cursor.cwithiigsCursorPushArrow,iigsCursorPushBusy,iigsCursorPop,iigsCursorRegister(region, cursor)(via TaskMaster wmTaskMask cursor auto-track, NOT a custom idle hook). - Save-stack stores a COPY of the CursorRecord (not the pointer — toolset memory can move).
- Hard-error or asserted-no-op before
startdesk()(InitCursor invariant). - Decide: drop embedded cursor blobs from scope (just wrappers + Wait/IBeam ROM
shapes via
GetCursorAdr($800c)) OR hand-code 4 cursor blobs and budget ~3-4h for mask/hotspot debugging. - Recommend: drop embedded blobs; expose
SetIigsCursor(const IigsCursorT*)+iigsCursorBusy()/iigsCursorArrow(). - Update
runtime/build.sh(use__attribute__((section(...)))per cursor blob if embedded; OR use-fdata-sectionstarget-wide and re-verify smoke). - Smoke: $70-marker MAME region-transition probe.
Effort: 14-18h.
2.6 buildsystem (CMake + Make integration)
- Decide on
TYPEenumeration:flat|flatMultiSeg|gsos|gno(four values, not three — reviewer caught this). - Build
CMAKE_C_LINK_EXECUTABLEoverride that fully bypasses CMake's link-line generator (link816 takes no-L/-l/-Wl/response files). - Generate
W65816RuntimeImports.cmakefrom Phase 1.7.e (single source of truth). - Per-source-file CFLAGS override:
set_source_files_properties(... PROPERTIES W65816_LAYER2 ON W65816_REGALLOC basic). - Wrap all four runner harnesses (
runInMame.sh,runMultiSeg.sh,runViaFinder.sh,runInGno.sh) underadd_w65816_mame_test(). - Hand-build the link line in exact order (libcGno.o BEFORE libc.o for weak override).
- ProDOS filetype/aux: pass
--filetypeto link816, emit.metasidecar, ctest wrapper reads.metato construct cadius#XX0000suffix. - Guard at CMake configure time:
TYPE=gno+SEGMENT_CAPis an error (omfEmit rejects this combo atomfEmit.cpp:723-724). - C++ auto-link of
libcxxabi.o+libcxxabiSjlj.oAFTERlibc.o: read SOURCES extensions, branch in CMake function body (genex can't reorder). - Make template: scope explicitly to single-binary single-mode flat hello-world ONLY. Document the gap.
- Smoke integration under
ulimit -t 90s: cold-cache CMake configure can take 30+s; ensure graceful skip whencommand -v cmakefails. - Optional: GENERATE_DEBUG keyword + ctest hookup for
pc2line.py(depends on Phase 1.3).
Effort: 55h. HIGH risk on link-line override.
2.7 cxxsmoke (modern C++ smoke coverage)
- Pre-spike: run each candidate snippet as a one-off demo through buildGno.sh
- runInGno.sh BEFORE writing smoke checks. 30-min sanity gate.
- Decide demo placement: create
tests/cxxSmoke/mirroringtests/coremark//tests/lua/pattern, NOT indemos/(wherebuildGno.shauto-discovery would build them as GNO commands). - Add
-include etl_profile.hto smoke compile line OR replaceetl::tuplestructured-binding check with a user struct that has tuple_size / tuple_element specializations defined in the heredoc. - Five checks: range-for, generic lambda + capture-by-reference of i32 local (the i32 path is where most recent fixes have lived — most likely to regress), variadic templates, structured bindings, fold expressions.
- Each check: a buildGno-style probe with $70 marker on success.
- Smoke harness from Phase 1.7.c launches each under MAME and verifies marker.
- If any check fails: stop work, XFAIL the test with TODO note, book a separate codegen-fix PR.
Effort: 10-12h (clean run). Best case 4h, worst case multi-day if a codegen bug surfaces.
Phase 2 total: ~110-130h.
Phase 3 - M2 source-level debugging end-to-end
3.1 debugger (interactive GDB-style front-end)
Reviewer's critical findings:
cpu.debug:bpset(addr)1-arg form CRASHES MAME. Usebpset(pc, '', 'logerror "BP-HIT PC=%X A=%X X=%X Y=%X S=%X DBR=%X\n",pc,a,x,y,s,db; go').SESSION_RECOVERY.md:362-385already documents the working-debug -debugger qt -oslogworkflow. Reuse, do not reinvent.- Reentrancy SEGFAULT:
add_machine_pause_notifier+cpu.debug:go()from a callback. Design must NOT callgo()from Lua resume command callbacks. - MAME under
-debugstarts withexecution_state = 'stop'. Harness must explicitly calldbg.execution_state = 'run'. - Drop
btfrom initial scope OR downgrade to best-effort single-frame parent only. Real multi-framebtrequires either DW_AT_frame_base in .debug_info or a per-function frame-size sidecar from link816 (new work item, not budgeted). - Add
finish/return command (run-until-current-frame-RTL/RTS) — easier than step-over JSL and the natural escape from accidental step-into.
Steps:
- Add
demos/build.sh --debugmode (adds-gto clang,--debug-out/--mapto link816,_dbgoutput naming). - Add
demos/buildGno.sh --debugmode equivalent. - Build Python front-end consuming
-oslogstream (one-way pipe). Usemachine.debugger.command(string)to inject debugger console commands at runtime for set-bp / step / continue. - Pre-spike: confirm
bpset(pc, '', '')form, verify bank-aware bp matching (24-bit PB:PC vs 16-bit PC), confirm execution_state behavior after pre-run bpset. 2h spike. - Implement commands:
b FUNC | FILE:LINE,c,s(step-instr),n(step-over: temp-bp at jsl_pc+4),finish,p &GLOBAL(map lookup only —p VARdeferred tolocalvars). - Update
SESSION_RECOVERY.md(not a new doc — keep one source of truth) to reference the new workflow. - Add
--tracemode that sets bp atmain, captures one BP-HIT via -oslog, asserts pc2line.py resolves it. Default-on smoke, noDEBUGGER_E2E=1gate. - Gate interactive
(dbg)prompt portion behindDEBUGGER_E2E=1only.
Effort: 24-30h.
3.2 localvars (-O0 + -O2/IMG + location-lists + inlined subroutines, per Phase 0.4)
Depends on Phase 1.3 (DWARF reloc fix) + Phase 1.5 (DBG_VALUE preservation).
Per Phase 0.4 decision: full surface in one landing.
Steps:
- Verify llvm-dwarfdump can parse a
-g.oafter Phase 1.3. Hard precondition. - Validate +1 stack skew convention with deliberate probe (int x=0xABCD; int y=0x1234; int z=0x5678; read fbreg offsets from memdump, verify alignment). Add as smoke check.
- Extend
pc2line.pyinto a full DIE walker for.debug_info+.debug_abbrev.debug_addr+.debug_str+.debug_str_offsets.
- Implement a DW_OP evaluator for: DW_OP_fbreg, DW_OP_addr, DW_OP_constN, DW_OP_reg0..7, DW_OP_breg0..7, DW_OP_call_frame_cfa.
- Add
--locals 0xPCmode that reads from a MAME memdump (snapshot or-oslogregister dump). - Wire
p VARin debugger (3.1) to callpc2line.py --locals. - -O2 / IMG-resident locals: rewrite DW_OP_regN refs to IMG slot indices
(IMG0..IMG15) into
DW_OP_breg<DP_base>+offsetform. LLVM emits the fictitious-register form; pc2line maps it to actual DP $C0..$DE locations. - Location lists: parse
.debug_loclists(DWARF 5) for PC-range-keyed location expressions. Resolve to the correct entry for the queried PC. - Inlined subroutines: DW_TAG_inlined_subroutine descent. Multiple- DIE-per-PC handling. Show inlined frame stack at the queried PC.
- Smoke checks (covering -O0 AND -O2 paths):
add(3, 4)-O0: locals printa=3 b=4 c=7.popcount(0xF0F0)-O2 with IMG-resident vars: locals resolve correctly.- Multi-CU program (Lua-scale): locals from any CU resolve.
- Inlined-helper case: stack shows the inlined frame.
- Expect 3-5 additional clang DWARF bugs to surface as -O2 / IMG / loclists
work probes
.debug_infodeeper. Each is its own upstream-or-local-patch decision; budget contingency in this phase.
Effort: 50-75h (combined slice). Risk: HIGH (Phase 0.4 override accepts this). Mitigation: land Phase 1.5 DBG_VALUE audit FIRST.
3.3 posixfile (POSIX file helpers)
Depends on Phase 0.10 (cross-dir policy), Phase 1.7.b (--data injection), Phase 1.8 (srand), Phase 1.10 (PATH_MAX).
Steps:
- Add 3 new GS/OS class-1 wrappers to
iigsGsos.s:Get_Prefix($200A) forrealpathGet_File_Info($2006) fordirname/basenamesemanticsGet_Dir_Entry($201C) forglob/directory iteration
- Add corresponding parm-block typedefs to
runtime/include/iigs/gsos.h. - Add stub-mode counterparts to
iigsGsosStub.s(using Phase 1.2 sentinel). - Pre-spike: write
demos/gsosProbeDirEntry.cexercising directory open + Get_Dir_Entry iteration. Run underrunInGno.sh + GSOS_FILE_SMOKE=1BEFORE committing to glob's API. ~2h. - Implement
realpath(uses prefix resolution + Get_File_Info). - Implement
dirname/basenamewith auto-detect/vs:separator. - Implement
fnmatchwith FULL bracket-set support ([A-Z]*,[!a-z]) — MANDATORY per reviewer, not optional. - Implement
globusing directory iteration + fnmatch. - Implement
mkstempusing Phase 1.8 srand seed. Template-must-be-writable invariant (refuse non-writable template, document rodata-write risk in header). - Smoke check each: 6 helpers × ~20 min.
- Document GNO/POSIX-VFS limitation: realpath/glob route through GS/OS class-1 on both bare-metal-with-GS/OS and GNO. GNO chdir-via-K* not honored.
Effort: 18-26h.
3.4 resourcemgr (deferred or stub-only per Phase 1.1 outcome)
If Phase 1.1 resolves GS/OS fopen hang, proceed. Otherwise: stub-only landing documented as such.
Steps (full version):
- Decide bundler input format:
TYPECODE_ID.binper reviewer recommendation (16-bit type + 16-bit ID encoded in filename like8005_0001.bin). - Verify AppleSingle round-trip with disposable 1-hour cadius spike before writing full bundler.
- Install or build ORCA's
rezas hard dependency for layout cross-checking. - Write
tools/rsrcBundle/rsrcBundle.py:- Read TYPECODE_ID.bin files
- Build rResourceMap + rIndex
- Stitch with OMF data fork
- Emit AppleSingle
- Write
tools/rsrcBundle/dumpFork.pyfor diffing against rez output. - Implement
resourceProbeInit()inruntime/src/resource.c(MMStartUp + TLStartUp + ResourceStartUp + OpenResourceFile-on-own-pathname). - Build typed-C façade: LoadResource, GetResourceSize, HLock semantics (handle relocation via Memory Manager).
- Add ResourceShutDown hook via
__cxa_atexit. - Build
demos/rsrcProbe.cwith marker discipline (write $025000=0x99 + while(1); runViaFinder LAUNCHES only, no keypress automation). - Add
--rsrc <applesingle>mode torunViaFinder.sh. - Update
demos/build.shto callrsrcBundleas post-step when.rsrc/dir present. - WriteResource + UpdateResourceFile DEFERRED to a separate item (persistent write needs disk-extract-and-diff verification).
Effort: 40-50h.
Phase 3 total: ~120-180h.
Phase 4 - M3 IIgs application authoring kit (parallel with Phase 3)
4.1 menubuilder
Steps:
- Pre-verify: does DrawMenuBar actually still hang post-InitCursor-landing? Drop paintMenuBarTitles fallback if not. 30-min check.
- Side-by-side dump struct offsets of NewWindowParm vs ORCA's window.h. 30-min ABI check.
- Reconcile WmTaskRec (used in all 5 demos) with IigsEventT (used in eventLoop.h). Either align field offsets or document why both exist.
- Build menu mini-format assembler in
runtime/src/uiBuilder.c:- Handles
>>(menu start),>>@(Apple menu),\X(icon),\N###(numeric ID),*Xx(cmd-key),--(item prefix),---(divider),D(disabled),V(visible/check),*(separator),.\r(terminator). - Round-trip test against Menu Mgr's parser. 6h.
- Handles
- Window builder + control wrappers (cButton/cCheckBox/cEditLine/cScrollBar using abstract 32-bit proc constants — NOT bank-E1 ROM addresses). 4h.
- Add cmdId→itemID lookup table to IigsMenuT. Document dispatch contract.
- Extend IigsEventCallbacksT with
onCmd(menu-pick dispatcher). - Migrate ALL FIVE affected demos (frame.c, orcaFrame.c, minicad.c, reversi.c, helloWindow.c). 6h.
- Either include AlertTemplate/ItemTemplate wrapper (
uiBuilderAlert) in scope OR carve out a separatealertbuilderitem. Recommend in-scope. - Smoke check: install menu with one item, simulate keystroke via scripted MAME input, verify onCmd fires by setting $70=0x99. 4h.
- Re-baseline OMF sizes; verify cRELOC budget headroom.
Effort: 25-30h.
4.2 sprites (320 mode, standalone per Phase 0.6)
Steps:
- Standalone init: sprite.c does its own
$C029NEWVIDEO bit 7 + SCB ($E1:9D00)- palette ($E1:9E00). 2h.
- SHR-safe heap policy:
- Document
$C035shadow register interaction. - Sprite save buffers MUST live above $A000 OR in a bank != 0 (since bank-0 $2000..$9FFF mirrors to $E1:2000..$E1:9FFF).
- Add
iigsSpriteAttachBuffer(void *buf, size_t size)so caller controls placement. - Document this in
iigs/sprite.handSTATUS.md.
- Document
- Software sprite engine:
- 16×16 fixed sprite shape, 4bpp packed.
- Background save/restore.
- Transparent blit (mask).
- Sprite list (Begin/Add/RenderAll/EraseAll).
- Integration with eventLoop's TaskMaster frame cadence.
- Demo (
demos/spriteProbe.c):- Init SHR.
- Place 8 sprites.
- One frame of update.
- Verify via
runInMame.sh --check-u8(from Phase 1.7.a) at known SHR offsets.
- Cycle benchmarks in
tests/sprites/: "blit one 16×16 sprite in <2000 cyc", "erase + redraw 8 sprites in <16000 cyc / 1 frame". - 640 mode DEFERRED to follow-up item (Phase 0.6 decision).
pha;plbDBR-to-$E1 optimization in inner loop: only if blit doesn't call any libgcc helper while DBR is contaminated. Audit before enabling.
Effort: 22-28h.
Phase 4 total: ~50-60h.
Phase 5 - M4 production-grade C++ toolchain
Per Phase 0.1/0.2, this is materially smaller than the original brief.
5.1 unwinder — _Unwind_RaiseException-over-SJLJ stub (Phase 0.1 option A)
Not a real DWARF unwinder. Provides the Itanium surface third-party C++ libraries expect.
runtime/src/libunwindStub.c:_Unwind_RaiseException,_Unwind_Resume,_Unwind_GetIP,_Unwind_GetCFArouted to existing SJLJ jmpbuf.- Smoke: probe that throws + catches via the stub.
- Document: "third-party libcxx-using code links; throw across non-instrumented frames terminates."
Effort: ~20h.
5.2 lto (ThinLTO per Phase 0.2)
Depends on Phase 1.4.c (TTI), Phase 1.11 (weak-extern survival), Phase 1.12 (Layer 2 gate).
Steps:
- Pre-spike (30 min): build llvm-link + llvm-dis, ThinLTO 3 small TUs
(extras.c + strtok.c + libcGno.c),
--mtriple=w65816 -inline-threshold=50, link with asm objects, run helloBeep. Validates the pipeline. - Add
llvm-link,llvm-as,llvm-distoinstallLlvmMos.shninja targets. Extend existence-check at lines 75-78. - Build
scripts/ltoLink.shthat:- Reads bitcode + native asm objects
- Runs
llvm-linkon bitcode - Runs
opt -O2 --mtriple=w65816 -inline-threshold=50(explicitly set; opt does NOT invoke TargetPassConfig so the TM-init hook for inline-threshold doesn't fire). - Runs
llc -filetype=obj - Hands resulting .o to link816.
- Verify GlobalDCE doesn't strip
.init_arrayboundary symbols. Mark withllvm.usedif needed. - Document: per-file
-mllvm -regalloc=basicfor Lua's lvm.c / ldebug.c / ltablib.c is preserved by ThinLTO's per-TU codegen attachment. - CoreMark + Lua LTO smoke: success criterion "produces a working binary at parity size or better."
- Document LTO × Layer 2 hard-fail behavior (Phase 1.12).
Effort: 30-40h.
Status (2026-06-02 PARTIAL - NoTTI-Lite mode):
scripts/ltoLink.shLANDED. Driver: llvm-link merges bitcode, opt -passes='w65816-layer2-gate' enforces Phase 1.12 (refuses on mismatch), opt --mtriple=w65816 -passes='default' -inline-threshold=50 runs IR-level optimization with the W65816- appropriate inline threshold, llc -filetype=obj produces the final native object. Flags: -o, --keep-temps, --layer2 (caller-asserts), --inline-threshold N (override), --emit-ll (debug).installLlvmMos.shnow builds llvm-link / llvm-as / llvm-dis / opt as part of the toolchain ninja targets and gates the existence check on all four. Phase 5.2 step 2.- W65816TTI (
W65816TargetTransformInfo.h+ override in W65816TargetMachine) WIRED butkMildCostModelEnabled = false. The Phase 1.4c bsearch hang (smoke #77) RE-SURFACED when qsort.c was recompiled under TTI-active multipliers (2x i32, 5x float) — meeting the "if bsearch smoke fails, ship NoTTI-Lite" criterion in the spec. The TTI plumbing ships present-but-bypassed so flippingkMildCostModelEnabledto true is the only change needed to enable full Phase 5.2 cost-driven inlining once the underlying i32 termination-compare codegen bug is fixed. - Layer 2 LTO hard-fail behavior (Phase 1.12) is documented in
W65816Layer2Gate.cpp header comment + ltoLink.sh step 2 comment.
The gate has been end-to-end-verified: mixed Layer 2 + non-Layer 2
bitcode IS rejected at LTO time with a deterministic
LLVM ERROR: W65816 Layer 2 LTO gate: Layer 2 mode disagreement. - Per-TU codegen attachment (
-mllvm -regalloc=basicfor Lua's lvm.c / ldebug.c / ltablib.c) is preserved by ThinLTO's per-function attribute mechanism — those flags translate to function-level attributes that survive bitcode merge. No code change needed. - Size parity probe:
demos/ltoProbe.c+demos/ltoProbeHelper.cthrough ltoLink.sh produces 37781-byte GNO OMF vs 37785 bytes for non-LTO (parity-or-better met). Runs cleanly under MAME + GNO with the harness marker hit. - All 162 smoke checks green after Phase 5.2 land + TTI bring-up.
Deferred to a future phase:
- Enabling the 2x i32 / 5x float TTI multipliers. Requires fixing the
i32 termination-compare codegen bug that the original Phase 1.4c
attempt surfaced (smoke #77 bsearch hang). Reproducer:
kMildCostModelEnabled = true+ rebuild runtime + run smoke. - CoreMark / Lua LTO smoke probes (the spec's step 6). CoreMark's bank-budget pressure under aggressive inlining is exactly what TTI was meant to address; without TTI active, ThinLTO of CoreMark is expected to bloat past Layer 2's single-bank budget. Re-attempt after the TTI re-enable lands.
5.3 cxxchrono (Phase 0.5 split — chrono only)
- Add
etl_get_steady_clock+etl_get_high_resolution_clock+etl_get_system_clockC-side hooks inruntime/src/libc.c. - Verify ETL chrono milliseconds rep is i32 or i64 with
static_assert. SetETL_CHRONO_*_CLOCK_DURATIONinetl_profile.hto force i32 if i64. - Add prototype to
runtime/include/time.h. - Smoke: chrono::steady_clock::now() returns monotonically increasing millisecond values.
Effort: 3-4h.
5.4 cxxstream+format+path (Phase 0.5 split — the rest)
Depends on Phase 1.9 (<cmath> shim).
Steps:
- Set
ETL_USING_FORMAT_FLOATING_POINT=0default inetl_profile.h. FP-format build is a separate--layer2target. - Define
runtime/include/c++/iigs/path.hwith ProDOS-aware path operations (64-char component / 8-component /:separator limits validated). etl::string_stream+printf("%s", ss.str().c_str())is the cout replacement. Drop theiigs/console.hcout-shim idea — adds surface area without value.- Add
runtime/include/c++/cstdlib,<cstddef>shims. - 1-hour
etl::formatsize spike before committing: measureformat_to(buf, "{}", 42)vs etlProbe size. If >10KB delta for one int format, document and downgrade scope. - Smoke: cxxStdlibProbe demo through buildGno+MAME via Phase 1.7.c harness.
- Document
std::iostream,std::regex,std::filesystem,std::format(the full versions, not ETL substitutes) as explicit out-of-scope with reasons (size, locale dependencies, GS/OS fopen). - Set explicit per-component size budgets up front (regex link budget, filesystem code budget). Skip with documentation if exceeded.
Status (2026-06-02 LANDED):
ETL_USING_FORMAT_FLOATING_POINT=0default confirmed inruntime/include/c++/etl_profile.h(via theETL_FORMAT_NO_FLOATING_POINTgate); FP-format is a-UETL_FORMAT_NO_FLOATING_POINTopt-in.runtime/include/c++/iigs/path.hprovidespathNormalize/pathJoin/pathSplitwith 64-char component + 8-depth +:-or-/separator validation. Header-only, no link footprint when unreferenced.runtime/include/c++/sstreamaliasesetl::string_streamasstd::stringstreamso portable code that namesstd::stringstreamresolves to the ETL fixed-capacity surface. Cout-replacement idiom documented iniigs/path.hheader preamble and in the<sstream>shim itself:etl::string_stream ss(buf); ss << ...; printf("%s", ss.str().c_str());<cstdlib>/<cstddef>/<cmath>shims already exist (Phase 1.9).- Chrono::milliseconds rep is i32 on the W65816 by way of the
ETL_CHRONO_*_CLOCK_DURATIONoverrides;cxxStreamProbecarries astatic_assert(sizeof(etl::chrono::steady_clock::duration::rep) == 4)that fails compile if the override regresses. etl::formatsize spike (step 5): a 1-lineformat_to(buf, "{}", 42)added ~82 KB to the binary over the no-format flavor. Hard downgrade per the step-5 rule (>10 KB threshold).etl::formatis the layer2-opt-in surface, NOT default; gated by-DCXX_STREAM_PROBE_WITH_FORMAT=1in the demo.demos/cxxStreamProbe.cppexercises stream<<int + path join/normalize/ split + chrono i32 contract + format sentinel. Bin 19199 bytes (well under bank-0 budget). Smoke check 9/9 green under GS/OS 6.0.4 + GNO in MAME.- Smoke-check entry added to
scripts/smokeTest.shafter thecxxChronoProbecheck (~6422 area).
Explicit out-of-scope (step 7), documented here for future reviewers:
std::iostream(full): locale-aware num_put/num_get machinery, ctype tables, and per-stream sentry construction cost ~15-25 KB even for a singlecout << int. Replacement:etl::string_stream+printf("%s", ss.str().c_str()). Aliased asstd::stringstreamin<sstream>for code-portability.std::regex: full NFA + DFA construction is a ~30-40 KB code budget on the W65816 even with a single-character-class regex. No locale surface available either. Replacement: caller-supplied scanner or hand-rolled state machine. Documented out-of-scope.std::filesystem: directory-iterator + canonical-path resolution- permission-bit handling rely on POSIX surface the GS/OS FST does
not provide (no
lstat, norealpath, no permission bits beyond ProDOS access byte). Replacement:iigs::path::*+ the existing libcopendir/readdir/statsurface inruntime/include/dirent.handruntime/include/sys/stat.h. Documented out-of-scope.
- permission-bit handling rely on POSIX surface the GS/OS FST does
not provide (no
std::format(the C++20 surface): the ETL surrogate (etl::format_to) measured at +82 KB for one int, the C++20 std:: surface would be larger again (full charconv float-to-text, locale hooks). Documented out-of-scope; the layer2-opt-inetl::formatis the replacement.
Effort: 12-15h.
Phase 5 total: ~65-90h (vs original brief's 120-220h — Phase 0 decisions collapse the unwinder cost dramatically).
Phase 6 - M5 observability
6.1 profiler (function-attribution under MAME)
Depends on Phase 1.3 (DWARF reloc fix) + Phase 3.2 (pc2line DIE walker).
Steps:
- Pre-spike (2-3h): minimum-viable PC sampler as one-off script. Validate
emu.register_periodicfires with usable density. Run against three representative shapes: short hot bench (strLen), libcall-dominated bench (popcount), multi-seg (Lua). If <30 samples or >50% misattribution, pivot to-debugmode +cpu.debug.bpset-with-counter (additional 6h). - Switch attribution model to "sample count + hits-percent" (NOT emu.time() weighting — sample sparsity makes cycle% dishonest).
- Have link816 emit ALL local symbols (not just globalSyms) to a separate
map file, gated by
--map-locals. Required for meaningful libgcc / libc attribution. 1-2h link816 edit. - CLOCK_HZ as CLI arg (slow-mode default 1023000;
--fast-modefor GS/OS demos). - Add
--samplemode torunInMameCycles.sh(andrunMultiSeg.sh). Do NOT fork into a separaterunInMameProfile.sh— keep single-sourced. - Smoke: assert ≤10% samples in '?' (unattributed) + assert dominant bucket matches expectation.
- Defer
--linemode to a follow-up.
Effort: 14-20h.
6.2 sanitizers (UBSan-minimal + coverage per Phase 0.3)
Depends on Phase 1.4.a (RETURNADDR i32) + Phase 1.4.b (TRAP→BRK).
Steps:
- Document ASan as out-of-scope. STATUS.md + USAGE.md.
- Driver toolchain decision: Option (a) skip driver-side changes; users pass
-fsanitize=undefined -fsanitize-minimal-runtimemanually plus linkruntime/ubsan.o. RECOMMENDED — 10h effort. Option (b) is +6h. - Hand-roll
runtime/src/ubsan.cbased onubsan_minimal_handlers.cpp:- Macro-substitute
__builtin_return_address(Phase 1.4.a makes it work but at unknown cost; use Phase 1.4.b BRK trap PC for caller-pc dedup). caller_pcsdedup table OR stub it out.- All 24 HANDLER pairs (recover + abort) + 2 RECOVER-only.
- Macro-substitute
- Route ubsan messages via
__putByteErr(stderr, fd 3 in GNO). - Compile ubsan.c with
-fno-sanitize=undefined(recursive ubsan footgun). Updateruntime/build.sh. - Add
tests/ubsan/mirroringtests/coremark/pattern: build.sh, ubsanProbe.c, manifest. - Probe scope: signed-overflow (add/sub/mul) + shift + divide. Three checks verified via $025000 sentinels.
- Document object-size cost honestly: empirically a 9-line indexed-read function expands from 12 to 682 lines instrumented. 3 intentionally- triggering ops may not fit single-bank.
- Coverage:
-fprofile-instr-generate -fcoverage-mappingsmoke check that verifies counters write to expected.profrawshape.
Effort: 22-28h.
Phase 6 total: ~36-48h.
Critical-path summary
The dependency arrows that gate everything else:
Phase 1.3 (DWARF reloc fix)
├─→ 2.1 clangdwarffix completion
│ └─→ 3.1 debugger
│ └─→ 3.2 localvars (full -O0 + -O2/IMG slice per Phase 0.4)
│ └─→ 6.1 profiler
└─→ 1.5 DBG_VALUE audit (must land before 3.2)
Phase 1.1 (GS/OS fopen hang)
├─→ 3.3 posixfile real I/O
├─→ 3.4 resourcemgr (or defer to stub)
└─→ 5.4 cxxstream+format+path::filesystem (or document gap)
Phase 1.4 (backend prereqs)
├─→ 5.2 lto (1.4.c TTI)
└─→ 6.2 sanitizers (1.4.a RETURNADDR, 1.4.b TRAP→BRK)
Phase 1.6 (IigsSoundParmT fix)
└─→ 2.4 docram
Phase 1.11 + 1.12 (LTO weak-extern + Layer 2 gate)
└─→ 5.2 lto
Recommended landing order (calendar weeks)
| Week | Phase | Items |
|---|---|---|
| 1 | Phase 0 (DONE) + 1.1 spike + 1.3.a spike | GS/OS fopen + MAI flag spikes |
| 2 | Phase 1.1-1.6 | Foundational prerequisites |
| 3 | Phase 1.7-1.13 | Build/harness + LTO gates |
| 4-5 | Phase 2 (parallel) | M1 quick wins: clangdwarffix, hexfloat, tmpfile (+copy/delete fallback), docram, cursor, buildsystem, cxxsmoke |
| 6-7 | Phase 3.1 + Phase 4 (parallel) | debugger; menubuilder + sprites |
| 8-9 | Phase 3.2 | localvars full slice (-O0 + -O2/IMG + loclists + inlined) |
| 10 | Phase 3.3-3.4 | posixfile; resourcemgr (or stub-only landing) |
| 11 | Phase 5 | unwinder-stub + ThinLTO + cxxchrono + cxxstream/format/path |
| 12 | Phase 6 | profiler + sanitizers (UBSan-min + coverage) |
Total: 12 weeks of focused work for ~750-950h with Phase 0 decisions locked.
Phase 0.4 override (full localvars in one shot) adds ~10-15h vs the split approach; Phase 0.10 override (rename copy+delete) adds ~6h. Both are absorbed in the per-phase budgets above.
Risks I'm worried about (final list)
FK_Data_4truncation discovery cascade. The reviewer forlocalvarsfound the IMM24 truncation bug while planning DWARF work. The bug is fixed in Phase 1.3, but it's almost certainly the FIRST of several clang DWARF bugs for this target. Budget contingency in Phase 3.2-3.3.cxxsmokesurfaces silent codegen regressions. Every prior C++ probe this project has run (cxxProbe, etlProbe) has surfaced at least one backend bug. Phase 2.7 will likely do the same. Budget contingency.- GS/OS fopen hang is unsolvable in budget. If Phase 1.1 doesn't yield a
fix within 8-16h, multiple downstream items (
resourcemgr,cxxstdlib::filesystem,tmpfilereal path,posixfilereal I/O) ship stub-only with documented limitations. This is acceptable but worth confirming up front. - Layer-2-aware LTO miscompile. Phase 1.12 gate must be built FIRST. If skipped, the resulting binaries are silently wrong in the most performance-sensitive code path.
menubuildercRELOC budget pressure. reversi.omf already at 40.5KB; adding uiBuilder.c may push some demos past the cRELOC threshold. Re- baseline post-migration.unwinderscope creep. Phase 0.1 must be a hard decision. Going from (A) stub to (B) real DWARF mid-work would derail the schedule.- MEMORY.md truncation. The index is already past the 200-line load
limit. Before starting any item, grep for
feedback_*<item-substring>*.mdin the memory dir to surface anything the loaded portion doesn't show. spritesSHR shadow scribble. Phase 4.2.2 heap-vs-shadow policy is load-bearing. Without explicit handling, sprite save buffers will land in the visible display window and corrupt user pixels.
How to use this document
- Start at Phase 0. Make each decision EXPLICITLY before any Phase 1 work.
- Phase 1 is FOUNDATIONAL. Skip nothing. Items in later phases will fail silently if any Phase 1 prerequisite is missing.
- For any item touching DWARF: Phase 1.3 MUST be green first.
- For any item that does GS/OS file I/O: Phase 1.1 MUST be investigated.
- Reviewer-adjusted hours are working estimates; brief hours are systematically low across the board.
- The
Critical-path summaryis the dependency graph — respect it.