65816-llvm-mos/docs/GAP_CLOSURE_PLAN.md
Scott Duensing da095402ec Updated
2026-06-02 23:17:57 -05:00

1095 lines
51 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# llvm816 gap-closure: comprehensive step plan
This is the full, ordered, dependency-aware plan for closing the 18 feature gaps
identified in the 2026-05-30 audit, rolled together with every prerequisite the
adversarial reviewers found. The earlier "master plan" stopped at milestones +
per-item criticism; this document is the actionable step list.
**Total effort (reviewer-adjusted):** roughly 700-1000 hours across all 18 items
plus the Phase 0 + Phase 1 prerequisites the reviewers added. Original briefs
sum to ~280h; reviewers added ~420h of hidden work most planners missed.
**Default audience:** expert C/C++ developer porting code to the Apple IIgs,
with a secondary path for retrocomputing tinkerers who want source-level
debugging.
The plan is organized in six phases, with hard dependency arrows. Each step is
a concrete, individually-shippable piece of work; nothing is "TBD" at the step
level.
---
## Phase 0 - Architectural decisions (DECIDED 2026-05-31)
| # | Topic | Decision |
|---|-------|----------|
| 0.1 | EH model | **SJLJ + `_Unwind_RaiseException`-over-SJLJ stub** |
| 0.2 | LTO model | **ThinLTO** |
| 0.3 | Sanitizer scope | **UBSan-min + coverage only** (no ASan) |
| 0.4 | `localvars` split | **Full -O0 + -O2/IMG in one shot** (override of recommendation) |
| 0.5 | `cxxstdlib` split | **`cxxchrono` first, then `cxxstream+format+path`** |
| 0.6 | Sprite harness | **Standalone first, desktop-coupled follow-up** |
| 0.7 | Resource fork delivery | **AppleSingle blob** |
| 0.8 | `clangdwarffix` approach | **BPF-style MAI flag spike first, escalate to new relocs only if needed** |
| 0.9 | `IigsSoundParmT` | **Fix as breaking change** (existing demos likely already broken) |
| 0.10 | `rename` cross-dir | **Implement copy+delete fallback now** (override of recommendation) |
**Impact of overrides:**
- **0.4 (full localvars in one landing):** Phase 3.2 and 3.3 collapse into a
single ~55h item. Risk profile is higher (multi-stage delivery is now
one-stage), but with Phase 1.5 DBG_VALUE audit landed first the foundation
is solid. Expect 3-5 additional clang DWARF bugs to surface during -O2 /
IMG work; budget contingency.
- **0.10 (rename copy+delete fallback):** Adds ~6h to Phase 2.3. Real work is
the error-recovery path (partial-copy state, partial-delete state, source
vanished mid-operation). Use the existing GS/OS class-1 calls
(Open/Read/Write/Create/Close/Destroy) to compose the fallback; no new
toolbox wrappers required.
### Why these (rationale)
### 0.1 Pick exception-handling model (sanitizers, unwinder, cxxstdlib all depend on this)
Three options:
- **A.** Keep SJLJ as default. Ship `_Unwind_RaiseException`-over-SJLJ stub for
third-party C++ libraries. ~20h. Loses no functionality. Strongly
recommended.
- **B.** Build a real DWARF CFI unwinder (libunwind port + backend CFI emission
+ per-MIR-pass CFI annotation + `__jsl_indir` hand-CFI). ~260h floor. Real
throw across non-instrumented frames.
- **C.** Make EH model a Subtarget feature with two MCAsmInfo subclasses, ship
both. ~20h extra plumbing on top of (A) or (B).
**Recommendation:** A. Reviewer for `unwinder` called (B) a multi-week
structural change and a foot-in-the-door for a long string of follow-on bugs.
### 0.2 Pick LTO model
- **A.** ThinLTO. Preserves per-TU codegen attachments so Lua's per-file
`-mllvm -regalloc=basic` keeps working. Summary-based inlining decisions
less prone to over-inlining the project already fought
(`feedback_lapi_inline_threshold.md`, `feedback_coremark_matrix_test_regression.md`).
- **B.** Full LTO with whole-module merge. Simpler tool, harder integration.
Per-TU regalloc becomes unrepresentable.
**Recommendation:** A. Reviewer called the original "full LTO is simpler"
choice exactly backwards for this codebase.
### 0.3 Pick sanitizer scope
- **A.** UBSan-minimal + coverage only. Achievable. ~22h.
- **B.** UBSan + ASan + coverage. ASan's 8:1 shadow-memory model does not fit
the 65816 (full 16MB → 2MB shadow; programs run in 1-2 banks). Reviewer
called the brief wrong on architectural grounds.
**Recommendation:** A. Document ASan as out-of-scope.
### 0.4 Pick `localvars` split
- **A.** Ship -O0 stack-resident locals only (20-30h). Faster delivery, narrower
payload.
- **B.** Ship -O0 + -O2/IMG-resident + location-list crossing PCs + inlined
subroutines (50-80h).
**Recommendation:** A first, then (B) as a follow-up. Reviewer's split point
is the natural project boundary.
### 0.5 Pick `cxxstdlib` split
- **A.** Ship `cxxchrono` only (etl::chrono + libc time hooks). 3-4h.
- **B.** Ship `cxxstream+format+path` (string_stream + format + iigs::path +
`<cmath>` shim + smoke). 12-15h.
- **C.** Both, but as separate landings.
**Recommendation:** C, with A landing first. Reviewer noted mixing them hides
the format/`<cmath>` rabbit hole.
### 0.6 Pick sprite-engine harness
- **A.** Standalone. sprite.c does its own `$C029` + SCB + palette init. Use
`runInMame.sh` bare-metal harness.
- **B.** Desktop-coupled. Relies on `paintDesktopBackdrop`, runs through GS/OS
Finder launch (`runViaFinder.sh`).
- **C.** Both, ship A first.
**Recommendation:** C.
### 0.7 Pick resource-fork delivery shape
- **A.** AppleSingle blob (one file = data fork + resource fork; cadius
auto-detects). Cleaner.
- **B.** `_ResourceFork.bin` sidecar (cadius `Prodos_Add.c:386` supports it).
Cleaner separation.
**Recommendation:** A. Verify via a 1-hour disposable spike before writing the
bundler.
### 0.8 Pick `clangdwarffix` approach
- **A.** BPF-style one-liner: `MAI->setDwarfUsesRelocationsAcrossSections(false)`.
Reviewer cites `BPFTargetMachine.cpp:87` as the precedent. If it works the
whole 14h plan collapses to ~1h.
- **B.** New `R_W65816_DATA32` + `R_W65816_PCREL32` relocs through the full
pipeline (MC → ELF writer → link816 → `pc2line.py`).
**Recommendation:** Spike A first (10 minutes). Fall back to B only if A
introduces new failures. Reviewer documented (B) is a strict superset of work
covered by (A).
### 0.9 Decide whether to fix `IigsSoundParmT` as a breaking change
The current in-tree struct (6 bytes) does not match ORCA's authoritative
`SoundParamBlock` (18 bytes). `iigsPlayDocSample` is almost certainly silently
broken today. Either:
- **A.** Fix the struct as a breaking change. Existing demos relying on it
will need migration. Reviewer believes none actually work today, so the
"breakage" is theoretical.
- **B.** Add new corrected API (`iigsPlaySoundV2`?), leave broken one.
**Recommendation:** A. Item `docram` cannot be delivered honestly otherwise.
### 0.10 Decide rename() / cross-directory policy
GS/OS `ChangePath ($2004)` is rename-in-place-only. POSIX rename(old, new)
across directories is impossible without an explicit copy-then-delete path.
- **A.** Reject cross-dir new-paths up front with EINVAL.
- **B.** Implement copy-then-delete fallback. ~6h, error-recovery is the hard
part.
- **C.** Accept divergence; document loudly.
**Recommendation:** A initially, with (B) deferred until a real user complains.
---
## Phase 1 - Foundational prerequisites (everything later depends on these)
These are NOT in the original 18 items but were surfaced as preconditions by
multiple reviewers. They unblock the actual feature work.
### 1.1 GS/OS `fopen` hang investigation [BLOCKS: resourcemgr runtime, tmpfile real path, cxxstdlib::filesystem, posixfile real I/O]
`JSL $E100A8` doesn't return under real GS/OS 6.0.2 (per `STATUS.md`).
- Reproduce under MAME with the `-debug -debugger qt -oslog` bpset workflow
documented in `SESSION_RECOVERY.md`.
- Bisect: ABI mismatch, stack-shape mismatch, missing tool init, DP/SP layout.
- If unsolvable in a 4-8h budget: document the limitation and route all
GS/OS-dependent items through the stub linker (`iigsGsosStub.s`) with the
honest-failure sentinel from step 1.2.
**Effort:** 8-16h investigation. If fix is found, +4-8h to land. If
unsolvable, project moves on with stub-only paths for affected items.
### 1.2 Stub-mode sentinel for honest `iigsGsosStub.s` [BLOCKS: tmpfile, posixfile, cursor]
`iigsGsosStub.s` currently makes every GS/OS call succeed silently.
`__gsosAvailable()` returns TRUE in stub mode, so newly-added wrappers
(`gsosDestroy`, `gsosChangePath`, prefix/dir/info) will fall through the
catch-all stub and *appear to succeed while doing nothing*.
Two options to fix:
- Add explicit error stubs for each new wrapper (returns -1 / sets errno).
- Add a `__gsosIsRealImpl()` sentinel that distinguishes "real GS/OS linked"
from "universal-success stub linked".
**Recommendation:** Sentinel. ~3h. One source of truth.
### 1.3 `FK_Data_4` → `R_W65816_*32` reloc fix [BLOCKS: clangdwarffix → debugger → localvars → profiler]
Independent of Phase 0.8 choice — once the BPF-style spike either passes or
fails, this is the actual step.
Reviewer surfaced multiple landmines the original plan missed:
- `ELFObjectWriter::recordRelocation` (`MC/ELFObjectWriter.cpp:1329-1349`)
converts in-section diffs to PC-relative. Need BOTH `R_W65816_DATA32` AND
`R_W65816_PCREL32`.
- `link816.cpp:1275` has a hardcoded `r.offset + 3 > sec.size` width check
inside `writeDebugSidecar` that must become reloc-type-driven.
- Reloc-type emission must land BEFORE the MC change starts emitting the new
types or every intermediate `-g` build dies on unknown reloc.
- A `ninja clean && ninja` is required (TableGen-emitted enum dependencies do
not play well with incremental builds).
**Steps:**
1. (a) Spike `setDwarfUsesRelocationsAcrossSections(false)` in
`W65816MCAsmInfo`. Rebuild clang, `xxd .debug_line` of a `-g` hello.c,
verify non-zero `unit_length` / `header_length`. If green: skip to step 1.3.h.
2. (b) Add `R_W65816_DATA32` + `R_W65816_PCREL32` to
`W65816FixupKinds.h` / `W65816AsmBackend.cpp`.
3. (c) Extend `W65816ELFObjectWriter::getRelocType` to dispatch FK_Data_4 by
`IsPCRel`.
4. (d) Add 4-byte reloc handlers to `link816.cpp::applyReloc` (DATA32 = write
`sectionBase + addend`; PCREL32 = write `target - patchAddr + addend`).
5. (e) Generalize the `r.offset + 3 > sec.size` width check to use a small
switch on reloc type.
6. (f) Land link816 + AsmBackend in one commit (so intermediate builds don't
die). Then land the MC switch that starts emitting the new types.
7. (g) Update `pc2line.py` to use the now-correct `unit_length` /
`header_length`, keeping the tolerant zero-fallback for older artifacts.
8. (h) Audit `emitAbsoluteSymbolDiff` / `emitDwarfUnitLength` /
`makeEndMinusStartExpr` callers; verify `.debug_frame`, `.debug_loclists`,
`.eh_frame` also work.
9. (i) Drop `llvm-dwarfdump consumes without warnings` from `shipsAs`
`EM_NONE` will still warn. File EM_ assignment as a separate gap item.
**Effort:** 1h (best case, MAI flag works) or 14-20h (full reloc path). Risk
HIGH on rebase pain.
### 1.4 Backend prerequisites bundle [BLOCKS: sanitizers, lto, unwinder]
Three small backend changes that unblock multiple items:
- (a) `setOperationAction(ISD::RETURNADDR, MVT::i32, Expand)` in
`W65816ISelLowering.cpp`. Today any code calling
`__builtin_return_address(0)` ICEs clang (since pointers are i32 but
RETURNADDR is registered for i16 Expand only). Required by UBSan's
caller-pc dedup AND by user code. ~30 min.
- (b) `setOperationAction(ISD::TRAP, MVT::Other, Custom)` + lower to
`BRK_pseudo` that writes a sentinel to `$70` before halting. Required by
`-fsanitize-trap=undefined`. ~2h.
- (c) Minimal `W65816TTI` (TargetTransformInfo) returning 4× generic cost for
i32 ops and 20× for soft-float libcalls. Required by LTO inliner so it
doesn't over-inline based on generic cost defaults. ~6h.
**Effort:** ~9h total. Land as three separate commits.
### 1.5 DBG_VALUE preservation audit across custom MIR passes [BLOCKS: -O2 localvars]
Custom MIR passes (`W65816StackRelToImg`, `W65816StackSlotMerge`,
`W65816SepRepCleanup`, `W65816LowerWide32`, `W65816ImgCalleeSave`,
`W65816SpillToX`, `W65816TiedDefSpill`) only use `getDebugLoc()` for source-line
info. None call `MachineInstr::transferDbgValues()` when slots move/coalesce
or when stack slots get promoted to IMG slots.
For each pass: grep for slot/register replacement. Each call site that
substitutes one operand for another must propagate DBG_VALUE.
**Effort:** 8-15h. Without this, -O2 locals are vapor regardless of how good
the DWARF parser is.
### 1.6 `IigsSoundParmT` correction [BLOCKS: docram]
Phase 0.9 decided this is a breaking change. Steps:
1. Replace 6-byte struct in `runtime/include/iigs/sound.h` with ORCA's 18-byte
layout (Pointer waveStart (4B) / Word waveSize pages (2B) / Word freqOffset
(2B) / Word docBuffer (2B) / Word bufferSize (2B) / Pointer nextWavePtr
(4B) / Word volSetting (2B)).
2. Rewrite `iigsPlayDocSample` to populate the corrected struct. Move channel
out of the struct into `FFStartSound`'s arg0.
3. Audit existing callsite at `smokeTest.sh:1147` and migrate.
4. Update `README.md:144-147` and `STATUS.md` claim that DOC-RAM staging is not
wrapped — those lines are about to be wrong.
5. Verify under real GS/OS or a known-good MAME version (silence vs. audio is
the validation gate).
**Effort:** 4-6h.
### 1.7 Build/harness prerequisites bundle
- (a) `runInMame.sh --check-u8 <addr>=<val>` for byte-level SHR pixel checks.
Required by sprites. ~1h.
- (b) `runViaFinder.sh --data /PATH=file` injection. Required by any GS/OS
demo with file I/O (`tmpfile`, `posixfile` GS/OS path, eventually
`cxxstdlib::filesystem`). ~1h.
- (c) buildGno-launched MAME smoke harness. Currently smoke runs C++ via
inline cpp HEREDOCs at build-time only; the `cxxsmoke`, `cxxstdlib`, and
`cursor` smoke checks need actual MAME-launched OMF execution. Mirror
`tests/lua/` pattern. ~4h.
- (d) Fix `softDouble.o.bak` in `runtime/` (15KB stale dated May 1). Required
before `buildsystem` can do `file(GLOB)` over `runtime/*.o`. Either delete
the .bak or generate the imports manifest from `runtime/build.sh`. ~30 min.
- (e) Generate `W65816RuntimeImports.cmake` from `runtime/build.sh` (or have
build.sh emit a manifest). Single source of truth for the runtime .o list.
~2h.
**Effort:** ~9h.
### 1.8 `srand` seeding + `ReadTimeHex` toolbox call [BLOCKS: tmpfile uniqueness, posixfile mkstemp]
`extras.c:124` seeds `rand()` to constant 1. `mkstemp`'s claimed uniqueness
guarantee is a lie without time-based seeding.
- Expose `ReadTimeHex` ($0D03) in `iigsToolbox.s` (currently absent).
- Add `srand` hook in `crt0Gsos.s` + `crt0Gno.s` that reads time and seeds.
**Effort:** ~2-3h.
### 1.9 `<cmath>` C++ shim [BLOCKS: cxxstdlib (format / chrono with FP)]
clang++ on llvm-mos has no system C++ stdlib; `#include <cmath>` fails. ETL's
`format.h` `#include`s `<cmath>` when `ETL_USING_FORMAT_FLOATING_POINT=1`.
- Create `runtime/include/c++/cmath` that pulls `<math.h>` (already
extern-C-wrapped) and exports `std::` aliases for the libc functions.
- Optionally add `<cstdlib>`, `<cstddef>` shims following the same pattern.
- Decide `ETL_USING_FORMAT_FLOATING_POINT` default policy in `etl_profile.h`:
recommend OFF by default with `--layer2` opt-in for FP format builds.
**Effort:** ~3h.
### 1.10 `PATH_MAX` and friends in `limits.h` [BLOCKS: posixfile]
`PATH_MAX` is not defined anywhere. Add to `runtime/include/limits.h` with a
comment tying it to `GSString.length` being u16 and the practical
NUL-terminated-path-fits-in-256-bytes rule.
**Effort:** ~30 min.
### 1.11 Weak-extern survival policy for LTO [BLOCKS: lto] [DONE]
`libc.c` declares dozens of `__attribute__((weak)) extern` GS/OS calls
(`gsosOpen/Read/Write/...`). Under LTO, the inliner may decide a weak-extern
is undefined and propagate that as constant 0 / NULL through callers, then DCE
the surrounding code.
- Marked all weak-extern decls in `libc.c` with `__attribute__((weak, retain, used))`:
the GS/OS dispatchers (`gsosOpen/Read/Write/Close/GetEOF/SetEOF/SetMark/GetMark/Create`),
`__gsosIsRealImpl`, `__putByte`, `__getByte`, `__putByteErr`, `__heap_start`,
`__heap_end`. `used` keeps the compiler from dropping references; `retain`
survives linker GC; both are no-ops in non-LTO builds.
- `libcxxabi.c::abiRunCxaAtexit` (`__run_cxa_atexit`) annotated with
`__attribute__((retain, used))` — its only callers live in crt0*.s
(`jsl __run_cxa_atexit`), which is invisible to LTO's IR view, so without
the attributes LTO would strip the body and crt0 would JSL into the weak
no-op fallback in libgcc.s and C++ global dtors would never run.
- Definitions in `libcGno.c` left unannotated: link-pull-in from libc.c's
weak-externs already keeps them alive; the LTO hazard is on the
declaration side, not the definition side (the linker pulls libcGno.o
in to resolve the libc.c weak-externs regardless of LTO).
- 145 smoke checks pass.
### 1.12 LTO × Layer 2 silent-miscompile gate [BLOCKS: lto]
`-mllvm -w65816-dbr-safe-ptrs` is per-TU. Mixing in an LTO set produces silent
wrong code.
Build the gate FIRST, before any LTO codegen work:
- Embed Layer 2 flag in IR as a module-level attribute on every TU.
- In the LTO driver pre-pass, hard-fail if attributes disagree.
**Effort:** ~3h.
### 1.13 ELF EM_ assignment [BLOCKS: clangdwarffix, llvm-dwarfdump tooling]
`llvm-dwarfdump` warns persistently because `EM_NONE` is set on output.
Assign a real (vendor-private if needed) `EM_` value.
**Effort:** ~2h.
**Phase 1 total: ~60-90h.**
---
## Phase 2 - M1 quick wins (parallel, no DWARF dependency)
These items have no cross-dependencies and can run concurrently once Phase 1
lands the build-harness prerequisites.
### 2.1 `clangdwarffix` (continued from Phase 1.3)
Phase 1.3 covered the reloc plumbing. Remaining work:
- Update smoke checks at `smokeTest.sh:5347` (encodes 3-byte width — the new
4-byte LE address starts with the same 3 LE bytes, so green, but fragile).
- Add `pc2line.py` cleanup to drop the zero-length fallback.
- Update docs (`USAGE.md`, `STATUS.md`) to drop the "llvm-dwarfdump warns"
caveat — depends on Phase 1.13.
**Effort:** ~3h after Phase 1.3 + 1.13.
### 2.2 `hexfloat` (`%a` / `%A` printf)
- Decide subnormal canonical form (recommend `0x0.{mantissa}p-1022`).
- Decide trailing-zero stripping policy (recommend glibc-style: strip when
precision unspecified).
- Implement `emitHexFloat` in `runtime/src/snprintf.c` with local
width/leftAlign/zeroPad arithmetic (do NOT reuse `emitNumber`'s monolithic
numeric body — only use it for the exponent).
- Use 4 u16 words instead of u64 shifts to dodge i64-codegen surprises (>>52
and 12-bit mask paths).
- Bring `%f`/`%g`/`%e` to Inf/NaN parity OR document the asymmetry (don't half-do
it).
- Add a *new* smoke probe block (don't extend the existing 0x7f bitmap — used
by two checks at `smokeTest.sh:2407` and `:2581`).
- Update `STATUS.md:48-52` (printf conversion table) and snprintf.c banner at
lines 21-23.
**Effort:** 6-8h.
### 2.3 `tmpfile` / `tmpnam` / `rename`
Following Phase 0.10's decision (copy+delete fallback for cross-dir rename):
- Per-FILE owned name buffer (extend FILE struct or use parallel
`tmpNames[MFS_MAX_FILES][L_tmpnam]` table). Update `__mfs[]` initializer.
- Add `gsosDestroy` ($2002 pCount=1) and `gsosChangePath` ($2004) wrappers in
`iigsGsos.s` + `iigsGsosStub.s` (real stub semantics from Phase 1.2).
- Promote `remove()` from mfs-only to mfs-then-GS/OS-Destroy.
- Promote `tmpfile()` from stub: generate unique name via `tmpnam`, open
O_CREAT|O_EXCL, set the auto-delete-on-close flag in the FILE.
- Promote `tmpnam()` from stub: read time via Phase 1.8 srand seed, format
`/RAM5/T{16-hex-chars}.TMP` or similar.
- Promote `rename()` from stub:
- **Fast path:** if new-path is in the same directory, route to ChangePath.
- **Cross-dir copy+delete fallback:** Open source RDONLY, Create destination,
chunked Read/Write loop (8KB buffer), Close both, Destroy source. Error
recovery: if Write fails mid-loop, Destroy destination + return -1. If
final Destroy of source fails, leave dest in place + return -1 with errno
set + emit a debug log line (destructive partial-state, but the data is
preserved). Source-vanished-mid-op is rare under GS/OS (no concurrent
process); leave as best-effort.
- Use GSString256 stack scratch (already present at `__gsosPathBuf` in
libc.c).
- Update mfs-path detection auto-detect `/` vs `:` separator.
- Smoke tests:
- create + write + close + remove + verify destroyed.
- rename within same dir (ChangePath path).
- rename across dirs (copy+delete fallback) — write 10KB file, verify
contents byte-identical post-rename, verify source gone.
**Effort:** 16-18h (was 10-12h; +6h for copy+delete fallback per Phase 0.10).
### 2.4 `docram` (DOC-RAM sample upload)
Phase 1.6 already corrected `IigsSoundParmT`. Remaining work:
- Add `iigsLoadDocSample(const int8_t *wave, uint16_t size, uint16_t docOffset)`
wrapper around `WriteRamBlock` toolbox call.
- Update `iigsPlaySoundV2` / `iigsPlayDocSample` to consume corrected struct.
- Add `demos/helloSample.c` standalone demo.
- Wire `runtime/src/sound.c` into `demos/build.sh` (currently missing).
- Add standalone MMStartUp+SoundStartUp helper to `iigs/sound.h` (since
`startdesk()` is too heavy for a CLI-style sample probe).
- Smoke test: WriteRamBlock returns cleanly + a marker store fires.
**Effort:** 6-8h after Phase 1.6.
### 2.5 `cursor` helpers
- Add `IigsCursorT` typedef to `runtime/include/iigs/toolbox.h`.
- Add `runtime/src/cursor.c` with `iigsCursorPushArrow`, `iigsCursorPushBusy`,
`iigsCursorPop`, `iigsCursorRegister(region, cursor)` (via TaskMaster
wmTaskMask cursor auto-track, NOT a custom idle hook).
- Save-stack stores a COPY of the CursorRecord (not the pointer — toolset
memory can move).
- Hard-error or asserted-no-op before `startdesk()` (InitCursor invariant).
- Decide: drop embedded cursor blobs from scope (just wrappers + Wait/IBeam ROM
shapes via `GetCursorAdr($800c)`) OR hand-code 4 cursor blobs and budget
~3-4h for mask/hotspot debugging.
- Recommend: drop embedded blobs; expose
`SetIigsCursor(const IigsCursorT*)` + `iigsCursorBusy()`/`iigsCursorArrow()`.
- Update `runtime/build.sh` (use `__attribute__((section(...)))` per cursor
blob if embedded; OR use `-fdata-sections` target-wide and re-verify smoke).
- Smoke: $70-marker MAME region-transition probe.
**Effort:** 14-18h.
### 2.6 `buildsystem` (CMake + Make integration)
- Decide on `TYPE` enumeration: `flat` | `flatMultiSeg` | `gsos` | `gno` (four
values, not three — reviewer caught this).
- Build `CMAKE_C_LINK_EXECUTABLE` override that fully bypasses CMake's link-line
generator (link816 takes no `-L`/`-l`/`-Wl`/response files).
- Generate `W65816RuntimeImports.cmake` from Phase 1.7.e (single source of
truth).
- Per-source-file CFLAGS override:
`set_source_files_properties(... PROPERTIES W65816_LAYER2 ON W65816_REGALLOC basic)`.
- Wrap all four runner harnesses (`runInMame.sh`, `runMultiSeg.sh`,
`runViaFinder.sh`, `runInGno.sh`) under `add_w65816_mame_test()`.
- Hand-build the link line in exact order (libcGno.o BEFORE libc.o for weak
override).
- ProDOS filetype/aux: pass `--filetype` to link816, emit `.meta` sidecar,
ctest wrapper reads `.meta` to construct cadius `#XX0000` suffix.
- Guard at CMake configure time: `TYPE=gno` + `SEGMENT_CAP` is an error
(omfEmit rejects this combo at `omfEmit.cpp:723-724`).
- C++ auto-link of `libcxxabi.o` + `libcxxabiSjlj.o` AFTER `libc.o`: read
SOURCES extensions, branch in CMake function body (genex can't reorder).
- Make template: scope explicitly to single-binary single-mode flat hello-world
ONLY. Document the gap.
- Smoke integration under `ulimit -t 90s`: cold-cache CMake configure can take
30+s; ensure graceful skip when `command -v cmake` fails.
- Optional: GENERATE_DEBUG keyword + ctest hookup for `pc2line.py` (depends on
Phase 1.3).
**Effort:** 55h. HIGH risk on link-line override.
### 2.7 `cxxsmoke` (modern C++ smoke coverage)
- Pre-spike: run each candidate snippet as a one-off demo through buildGno.sh
+ runInGno.sh BEFORE writing smoke checks. 30-min sanity gate.
- Decide demo placement: create `tests/cxxSmoke/` mirroring `tests/coremark/` /
`tests/lua/` pattern, NOT in `demos/` (where `buildGno.sh` auto-discovery
would build them as GNO commands).
- Add `-include etl_profile.h` to smoke compile line OR replace `etl::tuple`
structured-binding check with a user struct that has tuple_size /
tuple_element specializations defined in the heredoc.
- Five checks: range-for, generic lambda + capture-by-reference of i32 local
(the i32 path is where most recent fixes have lived — most likely to
regress), variadic templates, structured bindings, fold expressions.
- Each check: a buildGno-style probe with $70 marker on success.
- Smoke harness from Phase 1.7.c launches each under MAME and verifies marker.
- If any check fails: stop work, XFAIL the test with TODO note, book a
separate codegen-fix PR.
**Effort:** 10-12h (clean run). Best case 4h, worst case multi-day if a
codegen bug surfaces.
**Phase 2 total: ~110-130h.**
---
## Phase 3 - M2 source-level debugging end-to-end
### 3.1 `debugger` (interactive GDB-style front-end)
Reviewer's critical findings:
- `cpu.debug:bpset(addr)` 1-arg form CRASHES MAME. Use
`bpset(pc, '', 'logerror "BP-HIT PC=%X A=%X X=%X Y=%X S=%X DBR=%X\n",pc,a,x,y,s,db; go')`.
- `SESSION_RECOVERY.md:362-385` already documents the working `-debug -debugger
qt -oslog` workflow. Reuse, do not reinvent.
- Reentrancy SEGFAULT: `add_machine_pause_notifier` + `cpu.debug:go()` from a
callback. Design must NOT call `go()` from Lua resume command callbacks.
- MAME under `-debug` starts with `execution_state = 'stop'`. Harness must
explicitly call `dbg.execution_state = 'run'`.
- Drop `bt` from initial scope OR downgrade to best-effort single-frame parent
only. Real multi-frame `bt` requires either DW_AT_frame_base in .debug_info
or a per-function frame-size sidecar from link816 (new work item, not
budgeted).
- Add `finish`/return command (run-until-current-frame-RTL/RTS) — easier than
step-over JSL and the natural escape from accidental step-into.
**Steps:**
1. Add `demos/build.sh --debug` mode (adds `-g` to clang, `--debug-out`/`--map`
to link816, `_dbg` output naming).
2. Add `demos/buildGno.sh --debug` mode equivalent.
3. Build Python front-end consuming `-oslog` stream (one-way pipe). Use
`machine.debugger.command(string)` to inject debugger console commands at
runtime for set-bp / step / continue.
4. Pre-spike: confirm `bpset(pc, '', '')` form, verify bank-aware bp matching
(24-bit PB:PC vs 16-bit PC), confirm execution_state behavior after pre-run
bpset. 2h spike.
5. Implement commands: `b FUNC | FILE:LINE`, `c`, `s` (step-instr), `n`
(step-over: temp-bp at jsl_pc+4), `finish`, `p &GLOBAL` (map lookup only —
`p VAR` deferred to `localvars`).
6. Update `SESSION_RECOVERY.md` (not a new doc — keep one source of truth) to
reference the new workflow.
7. Add `--trace` mode that sets bp at `main`, captures one BP-HIT via -oslog,
asserts pc2line.py resolves it. Default-on smoke, no `DEBUGGER_E2E=1` gate.
8. Gate interactive `(dbg)` prompt portion behind `DEBUGGER_E2E=1` only.
**Effort:** 24-30h.
### 3.2 `localvars` (-O0 + -O2/IMG + location-lists + inlined subroutines, per Phase 0.4)
Depends on Phase 1.3 (DWARF reloc fix) + Phase 1.5 (DBG_VALUE preservation).
Per Phase 0.4 decision: full surface in one landing.
**Steps:**
1. Verify llvm-dwarfdump can parse a `-g` `.o` after Phase 1.3. Hard
precondition.
2. Validate +1 stack skew convention with deliberate probe (int x=0xABCD; int
y=0x1234; int z=0x5678; read fbreg offsets from memdump, verify alignment).
Add as smoke check.
3. Extend `pc2line.py` into a full DIE walker for `.debug_info` + `.debug_abbrev`
+ `.debug_addr` + `.debug_str` + `.debug_str_offsets`.
4. Implement a DW_OP evaluator for: DW_OP_fbreg, DW_OP_addr, DW_OP_constN,
DW_OP_reg0..7, DW_OP_breg0..7, DW_OP_call_frame_cfa.
5. Add `--locals 0xPC` mode that reads from a MAME memdump (snapshot or
`-oslog` register dump).
6. Wire `p VAR` in debugger (3.1) to call `pc2line.py --locals`.
7. **-O2 / IMG-resident locals:** rewrite DW_OP_regN refs to IMG slot indices
(IMG0..IMG15) into `DW_OP_breg<DP_base>+offset` form. LLVM emits the
fictitious-register form; pc2line maps it to actual DP $C0..$DE locations.
8. **Location lists:** parse `.debug_loclists` (DWARF 5) for PC-range-keyed
location expressions. Resolve to the correct entry for the queried PC.
9. **Inlined subroutines:** DW_TAG_inlined_subroutine descent. Multiple-
DIE-per-PC handling. Show inlined frame stack at the queried PC.
10. Smoke checks (covering -O0 AND -O2 paths):
- `add(3, 4)` -O0: locals print `a=3 b=4 c=7`.
- `popcount(0xF0F0)` -O2 with IMG-resident vars: locals resolve correctly.
- Multi-CU program (Lua-scale): locals from any CU resolve.
- Inlined-helper case: stack shows the inlined frame.
11. Expect 3-5 additional clang DWARF bugs to surface as -O2 / IMG / loclists
work probes `.debug_info` deeper. Each is its own upstream-or-local-patch
decision; budget contingency in this phase.
**Effort:** 50-75h (combined slice). Risk: HIGH (Phase 0.4 override accepts
this). Mitigation: land Phase 1.5 DBG_VALUE audit FIRST.
### 3.3 `posixfile` (POSIX file helpers)
Depends on Phase 0.10 (cross-dir policy), Phase 1.7.b (--data injection),
Phase 1.8 (srand), Phase 1.10 (PATH_MAX).
**Steps:**
1. Add 3 new GS/OS class-1 wrappers to `iigsGsos.s`:
- `Get_Prefix` ($200A) for `realpath`
- `Get_File_Info` ($2006) for `dirname`/`basename` semantics
- `Get_Dir_Entry` ($201C) for `glob`/directory iteration
2. Add corresponding parm-block typedefs to `runtime/include/iigs/gsos.h`.
3. Add stub-mode counterparts to `iigsGsosStub.s` (using Phase 1.2 sentinel).
4. Pre-spike: write `demos/gsosProbeDirEntry.c` exercising directory open +
Get_Dir_Entry iteration. Run under `runInGno.sh + GSOS_FILE_SMOKE=1`
BEFORE committing to glob's API. ~2h.
5. Implement `realpath` (uses prefix resolution + Get_File_Info).
6. Implement `dirname` / `basename` with auto-detect `/` vs `:` separator.
7. Implement `fnmatch` with FULL bracket-set support (`[A-Z]*`, `[!a-z]`) —
MANDATORY per reviewer, not optional.
8. Implement `glob` using directory iteration + fnmatch.
9. Implement `mkstemp` using Phase 1.8 srand seed. Template-must-be-writable
invariant (refuse non-writable template, document rodata-write risk in
header).
10. Smoke check each: 6 helpers × ~20 min.
11. Document GNO/POSIX-VFS limitation: realpath/glob route through GS/OS
class-1 on both bare-metal-with-GS/OS and GNO. GNO chdir-via-K* not
honored.
**Effort:** 18-26h.
### 3.4 `resourcemgr` (deferred or stub-only per Phase 1.1 outcome)
If Phase 1.1 resolves GS/OS fopen hang, proceed. Otherwise: stub-only landing
documented as such.
**Steps (full version):**
1. Decide bundler input format: `TYPECODE_ID.bin` per reviewer recommendation
(16-bit type + 16-bit ID encoded in filename like `8005_0001.bin`).
2. Verify AppleSingle round-trip with disposable 1-hour cadius spike before
writing full bundler.
3. Install or build ORCA's `rez` as hard dependency for layout cross-checking.
4. Write `tools/rsrcBundle/rsrcBundle.py`:
- Read TYPECODE_ID.bin files
- Build rResourceMap + rIndex
- Stitch with OMF data fork
- Emit AppleSingle
5. Write `tools/rsrcBundle/dumpFork.py` for diffing against rez output.
6. Implement `resourceProbeInit()` in `runtime/src/resource.c` (MMStartUp +
TLStartUp + ResourceStartUp + OpenResourceFile-on-own-pathname).
7. Build typed-C façade: LoadResource, GetResourceSize, HLock semantics
(handle relocation via Memory Manager).
8. Add ResourceShutDown hook via `__cxa_atexit`.
9. Build `demos/rsrcProbe.c` with marker discipline (write $025000=0x99 +
while(1); runViaFinder LAUNCHES only, no keypress automation).
10. Add `--rsrc <applesingle>` mode to `runViaFinder.sh`.
11. Update `demos/build.sh` to call `rsrcBundle` as post-step when `.rsrc/`
dir present.
12. WriteResource + UpdateResourceFile DEFERRED to a separate item (persistent
write needs disk-extract-and-diff verification).
**Effort:** 40-50h.
**Phase 3 total: ~120-180h.**
---
## Phase 4 - M3 IIgs application authoring kit (parallel with Phase 3)
### 4.1 `menubuilder`
**Steps:**
1. Pre-verify: does DrawMenuBar actually still hang post-InitCursor-landing?
Drop paintMenuBarTitles fallback if not. 30-min check.
2. Side-by-side dump struct offsets of NewWindowParm vs ORCA's window.h.
30-min ABI check.
3. Reconcile WmTaskRec (used in all 5 demos) with IigsEventT (used in
eventLoop.h). Either align field offsets or document why both exist.
4. Build menu mini-format assembler in `runtime/src/uiBuilder.c`:
- Handles `>>` (menu start), `>>@` (Apple menu), `\X` (icon), `\N###`
(numeric ID), `*Xx` (cmd-key), `--` (item prefix), `---` (divider), `D`
(disabled), `V` (visible/check), `*` (separator), `.\r` (terminator).
- Round-trip test against Menu Mgr's parser. 6h.
5. Window builder + control wrappers (cButton/cCheckBox/cEditLine/cScrollBar
using abstract 32-bit proc constants — NOT bank-E1 ROM addresses). 4h.
6. Add cmdId→itemID lookup table to IigsMenuT. Document dispatch contract.
7. Extend IigsEventCallbacksT with `onCmd` (menu-pick dispatcher).
8. Migrate ALL FIVE affected demos (frame.c, orcaFrame.c, minicad.c,
reversi.c, helloWindow.c). 6h.
9. Either include AlertTemplate/ItemTemplate wrapper (`uiBuilderAlert`) in
scope OR carve out a separate `alertbuilder` item. Recommend in-scope.
10. Smoke check: install menu with one item, simulate keystroke via scripted
MAME input, verify onCmd fires by setting $70=0x99. 4h.
11. Re-baseline OMF sizes; verify cRELOC budget headroom.
**Effort:** 25-30h.
### 4.2 `sprites` (320 mode, standalone per Phase 0.6)
**Steps:**
1. Standalone init: sprite.c does its own `$C029` NEWVIDEO bit 7 + SCB ($E1:9D00)
+ palette ($E1:9E00). 2h.
2. SHR-safe heap policy:
- Document `$C035` shadow register interaction.
- Sprite save buffers MUST live above $A000 OR in a bank != 0 (since
bank-0 $2000..$9FFF mirrors to $E1:2000..$E1:9FFF).
- Add `iigsSpriteAttachBuffer(void *buf, size_t size)` so caller controls
placement.
- Document this in `iigs/sprite.h` and `STATUS.md`.
3. Software sprite engine:
- 16×16 fixed sprite shape, 4bpp packed.
- Background save/restore.
- Transparent blit (mask).
- Sprite list (Begin/Add/RenderAll/EraseAll).
4. Integration with eventLoop's TaskMaster frame cadence.
5. Demo (`demos/spriteProbe.c`):
- Init SHR.
- Place 8 sprites.
- One frame of update.
- Verify via `runInMame.sh --check-u8` (from Phase 1.7.a) at known SHR
offsets.
6. Cycle benchmarks in `tests/sprites/`: "blit one 16×16 sprite in <2000 cyc",
"erase + redraw 8 sprites in <16000 cyc / 1 frame".
7. 640 mode DEFERRED to follow-up item (Phase 0.6 decision).
8. `pha;plb` DBR-to-$E1 optimization in inner loop: only if blit doesn't call
any libgcc helper while DBR is contaminated. Audit before enabling.
**Effort:** 22-28h.
**Phase 4 total: ~50-60h.**
---
## Phase 5 - M4 production-grade C++ toolchain
Per Phase 0.1/0.2, this is materially smaller than the original brief.
### 5.1 `unwinder` — `_Unwind_RaiseException`-over-SJLJ stub (Phase 0.1 option A)
Not a real DWARF unwinder. Provides the Itanium surface third-party C++
libraries expect.
- `runtime/src/libunwindStub.c`: `_Unwind_RaiseException`, `_Unwind_Resume`,
`_Unwind_GetIP`, `_Unwind_GetCFA` routed to existing SJLJ jmpbuf.
- Smoke: probe that throws + catches via the stub.
- Document: "third-party libcxx-using code links; throw across
non-instrumented frames terminates."
**Effort:** ~20h.
### 5.2 `lto` (ThinLTO per Phase 0.2)
Depends on Phase 1.4.c (TTI), Phase 1.11 (weak-extern survival),
Phase 1.12 (Layer 2 gate).
**Steps:**
1. Pre-spike (30 min): build llvm-link + llvm-dis, ThinLTO 3 small TUs
(extras.c + strtok.c + libcGno.c), `--mtriple=w65816 -inline-threshold=50`,
link with asm objects, run helloBeep. Validates the pipeline.
2. Add `llvm-link`, `llvm-as`, `llvm-dis` to `installLlvmMos.sh` ninja
targets. Extend existence-check at lines 75-78.
3. Build `scripts/ltoLink.sh` that:
- Reads bitcode + native asm objects
- Runs `llvm-link` on bitcode
- Runs `opt -O2 --mtriple=w65816 -inline-threshold=50` (explicitly set;
opt does NOT invoke TargetPassConfig so the TM-init hook for
inline-threshold doesn't fire).
- Runs `llc -filetype=obj`
- Hands resulting .o to link816.
4. Verify GlobalDCE doesn't strip `.init_array` boundary symbols. Mark with
`llvm.used` if needed.
5. Document: per-file `-mllvm -regalloc=basic` for Lua's lvm.c / ldebug.c /
ltablib.c is preserved by ThinLTO's per-TU codegen attachment.
6. CoreMark + Lua LTO smoke: success criterion "produces a working binary at
parity size or better."
7. Document LTO × Layer 2 hard-fail behavior (Phase 1.12).
**Effort:** 30-40h.
**Status (2026-06-02 PARTIAL - NoTTI-Lite mode):**
- `scripts/ltoLink.sh` LANDED. Driver: llvm-link merges bitcode, opt
-passes='w65816-layer2-gate' enforces Phase 1.12 (refuses on
mismatch), opt --mtriple=w65816 -passes='default<O2>'
-inline-threshold=50 runs IR-level optimization with the W65816-
appropriate inline threshold, llc -filetype=obj produces the final
native object. Flags: -o, --keep-temps, --layer2 (caller-asserts),
--inline-threshold N (override), --emit-ll (debug).
- `installLlvmMos.sh` now builds llvm-link / llvm-as / llvm-dis / opt
as part of the toolchain ninja targets and gates the existence check
on all four. Phase 5.2 step 2.
- W65816TTI (`W65816TargetTransformInfo.h` + override in
W65816TargetMachine) WIRED but `kMildCostModelEnabled = false`. The
Phase 1.4c bsearch hang (smoke #77) RE-SURFACED when qsort.c was
recompiled under TTI-active multipliers (2x i32, 5x float) — meeting
the "if bsearch smoke fails, ship NoTTI-Lite" criterion in the spec.
The TTI plumbing ships present-but-bypassed so flipping
`kMildCostModelEnabled` to true is the only change needed to enable
full Phase 5.2 cost-driven inlining once the underlying i32
termination-compare codegen bug is fixed.
- Layer 2 LTO hard-fail behavior (Phase 1.12) is documented in
W65816Layer2Gate.cpp header comment + ltoLink.sh step 2 comment.
The gate has been end-to-end-verified: mixed Layer 2 + non-Layer 2
bitcode IS rejected at LTO time with a deterministic
`LLVM ERROR: W65816 Layer 2 LTO gate: Layer 2 mode disagreement`.
- Per-TU codegen attachment (`-mllvm -regalloc=basic` for Lua's
lvm.c / ldebug.c / ltablib.c) is preserved by ThinLTO's per-function
attribute mechanism — those flags translate to function-level
attributes that survive bitcode merge. No code change needed.
- Size parity probe: `demos/ltoProbe.c` + `demos/ltoProbeHelper.c`
through ltoLink.sh produces 37781-byte GNO OMF vs 37785 bytes for
non-LTO (parity-or-better met). Runs cleanly under MAME + GNO with
the harness marker hit.
- All 162 smoke checks green after Phase 5.2 land + TTI bring-up.
**Deferred to a future phase:**
- Enabling the 2x i32 / 5x float TTI multipliers. Requires fixing the
i32 termination-compare codegen bug that the original Phase 1.4c
attempt surfaced (smoke #77 bsearch hang). Reproducer:
`kMildCostModelEnabled = true` + rebuild runtime + run smoke.
- CoreMark / Lua LTO smoke probes (the spec's step 6). CoreMark's
bank-budget pressure under aggressive inlining is exactly what TTI
was meant to address; without TTI active, ThinLTO of CoreMark is
expected to bloat past Layer 2's single-bank budget. Re-attempt
after the TTI re-enable lands.
### 5.3 `cxxchrono` (Phase 0.5 split — chrono only)
- Add `etl_get_steady_clock` + `etl_get_high_resolution_clock` +
`etl_get_system_clock` C-side hooks in `runtime/src/libc.c`.
- Verify ETL chrono milliseconds rep is i32 or i64 with `static_assert`. Set
`ETL_CHRONO_*_CLOCK_DURATION` in `etl_profile.h` to force i32 if i64.
- Add prototype to `runtime/include/time.h`.
- Smoke: chrono::steady_clock::now() returns monotonically increasing
millisecond values.
**Effort:** 3-4h.
### 5.4 `cxxstream+format+path` (Phase 0.5 split — the rest)
Depends on Phase 1.9 (`<cmath>` shim).
**Steps:**
1. Set `ETL_USING_FORMAT_FLOATING_POINT=0` default in `etl_profile.h`.
FP-format build is a separate `--layer2` target.
2. Define `runtime/include/c++/iigs/path.h` with ProDOS-aware path operations
(64-char component / 8-component / `:` separator limits validated).
3. `etl::string_stream` + `printf("%s", ss.str().c_str())` is the cout
replacement. Drop the `iigs/console.h` cout-shim idea — adds surface area
without value.
4. Add `runtime/include/c++/cstdlib`, `<cstddef>` shims.
5. 1-hour `etl::format` size spike before committing: measure `format_to(buf,
"{}", 42)` vs etlProbe size. If >10KB delta for one int format, document
and downgrade scope.
6. Smoke: cxxStdlibProbe demo through buildGno+MAME via Phase 1.7.c harness.
7. Document `std::iostream`, `std::regex`, `std::filesystem`, `std::format`
(the full versions, not ETL substitutes) as explicit out-of-scope with
reasons (size, locale dependencies, GS/OS fopen).
8. Set explicit per-component size budgets up front (regex link budget,
filesystem code budget). Skip with documentation if exceeded.
**Status (2026-06-02 LANDED):**
- `ETL_USING_FORMAT_FLOATING_POINT=0` default confirmed in
`runtime/include/c++/etl_profile.h` (via the `ETL_FORMAT_NO_FLOATING_POINT`
gate); FP-format is a `-UETL_FORMAT_NO_FLOATING_POINT` opt-in.
- `runtime/include/c++/iigs/path.h` provides `pathNormalize` / `pathJoin` /
`pathSplit` with 64-char component + 8-depth + `:`-or-`/` separator
validation. Header-only, no link footprint when unreferenced.
- `runtime/include/c++/sstream` aliases `etl::string_stream` as
`std::stringstream` so portable code that names `std::stringstream`
resolves to the ETL fixed-capacity surface. Cout-replacement idiom
documented in `iigs/path.h` header preamble and in the `<sstream>`
shim itself: `etl::string_stream ss(buf); ss << ...; printf("%s",
ss.str().c_str());`
- `<cstdlib>` / `<cstddef>` / `<cmath>` shims already exist (Phase 1.9).
- Chrono::milliseconds rep is i32 on the W65816 by way of the
`ETL_CHRONO_*_CLOCK_DURATION` overrides; `cxxStreamProbe` carries a
`static_assert(sizeof(etl::chrono::steady_clock::duration::rep) == 4)`
that fails compile if the override regresses.
- `etl::format` size spike (step 5): a 1-line `format_to(buf, "{}", 42)`
added **~82 KB** to the binary over the no-format flavor. Hard
downgrade per the step-5 rule (>10 KB threshold). `etl::format` is
the layer2-opt-in surface, NOT default; gated by
`-DCXX_STREAM_PROBE_WITH_FORMAT=1` in the demo.
- `demos/cxxStreamProbe.cpp` exercises stream<<int + path join/normalize/
split + chrono i32 contract + format sentinel. Bin 19199 bytes
(well under bank-0 budget). Smoke check 9/9 green under GS/OS 6.0.4 +
GNO in MAME.
- Smoke-check entry added to `scripts/smokeTest.sh` after the
`cxxChronoProbe` check (~6422 area).
**Explicit out-of-scope (step 7), documented here for future reviewers:**
- `std::iostream` (full): locale-aware num_put/num_get machinery, ctype
tables, and per-stream sentry construction cost ~15-25 KB even for a
single `cout << int`. Replacement: `etl::string_stream` +
`printf("%s", ss.str().c_str())`. Aliased as `std::stringstream` in
`<sstream>` for code-portability.
- `std::regex`: full NFA + DFA construction is a ~30-40 KB code budget
on the W65816 even with a single-character-class regex. No locale
surface available either. Replacement: caller-supplied scanner or
hand-rolled state machine. Documented out-of-scope.
- `std::filesystem`: directory-iterator + canonical-path resolution
+ permission-bit handling rely on POSIX surface the GS/OS FST does
not provide (no `lstat`, no `realpath`, no permission bits beyond
ProDOS access byte). Replacement: `iigs::path::*` + the existing
libc `opendir`/`readdir`/`stat` surface in `runtime/include/dirent.h`
and `runtime/include/sys/stat.h`. Documented out-of-scope.
- `std::format` (the C++20 surface): the ETL surrogate
(`etl::format_to`) measured at +82 KB for one int, the C++20 std::
surface would be larger again (full charconv float-to-text, locale
hooks). Documented out-of-scope; the layer2-opt-in `etl::format`
is the replacement.
**Effort:** 12-15h.
**Phase 5 total: ~65-90h (vs original brief's 120-220h — Phase 0 decisions
collapse the unwinder cost dramatically).**
---
## Phase 6 - M5 observability
### 6.1 `profiler` (function-attribution under MAME)
Depends on Phase 1.3 (DWARF reloc fix) + Phase 3.2 (`pc2line` DIE walker).
**Steps:**
1. Pre-spike (2-3h): minimum-viable PC sampler as one-off script. Validate
`emu.register_periodic` fires with usable density. Run against three
representative shapes: short hot bench (strLen), libcall-dominated bench
(popcount), multi-seg (Lua). If <30 samples or >50% misattribution, pivot
to `-debug` mode + `cpu.debug.bpset`-with-counter (additional 6h).
2. Switch attribution model to "sample count + hits-percent" (NOT emu.time()
weighting — sample sparsity makes cycle% dishonest).
3. Have link816 emit ALL local symbols (not just globalSyms) to a separate
map file, gated by `--map-locals`. Required for meaningful libgcc / libc
attribution. 1-2h link816 edit.
4. CLOCK_HZ as CLI arg (slow-mode default 1023000; `--fast-mode` for GS/OS
demos).
5. Add `--sample` mode to `runInMameCycles.sh` (and `runMultiSeg.sh`). Do NOT
fork into a separate `runInMameProfile.sh` — keep single-sourced.
6. Smoke: assert ≤10% samples in '?' (unattributed) + assert dominant bucket
matches expectation.
7. Defer `--line` mode to a follow-up.
**Effort:** 14-20h.
### 6.2 `sanitizers` (UBSan-minimal + coverage per Phase 0.3)
Depends on Phase 1.4.a (RETURNADDR i32) + Phase 1.4.b (TRAP→BRK).
**Steps:**
1. Document ASan as out-of-scope. STATUS.md + USAGE.md.
2. Driver toolchain decision: Option (a) skip driver-side changes; users pass
`-fsanitize=undefined -fsanitize-minimal-runtime` manually plus link
`runtime/ubsan.o`. RECOMMENDED — 10h effort. Option (b) is +6h.
3. Hand-roll `runtime/src/ubsan.c` based on `ubsan_minimal_handlers.cpp`:
- Macro-substitute `__builtin_return_address` (Phase 1.4.a makes it work
but at unknown cost; use Phase 1.4.b BRK trap PC for caller-pc dedup).
- `caller_pcs` dedup table OR stub it out.
- All 24 HANDLER pairs (recover + abort) + 2 RECOVER-only.
4. Route ubsan messages via `__putByteErr` (stderr, fd 3 in GNO).
5. Compile ubsan.c with `-fno-sanitize=undefined` (recursive ubsan footgun).
Update `runtime/build.sh`.
6. Add `tests/ubsan/` mirroring `tests/coremark/` pattern: build.sh,
ubsanProbe.c, manifest.
7. Probe scope: signed-overflow (add/sub/mul) + shift + divide. Three checks
verified via $025000 sentinels.
8. Document object-size cost honestly: empirically a 9-line indexed-read
function expands from 12 to 682 lines instrumented. 3 intentionally-
triggering ops may not fit single-bank.
9. Coverage: `-fprofile-instr-generate -fcoverage-mapping` smoke check that
verifies counters write to expected `.profraw` shape.
**Effort:** 22-28h.
**Phase 6 total: ~36-48h.**
---
## Critical-path summary
The dependency arrows that gate everything else:
```
Phase 1.3 (DWARF reloc fix)
├─→ 2.1 clangdwarffix completion
│ └─→ 3.1 debugger
│ └─→ 3.2 localvars (full -O0 + -O2/IMG slice per Phase 0.4)
│ └─→ 6.1 profiler
└─→ 1.5 DBG_VALUE audit (must land before 3.2)
Phase 1.1 (GS/OS fopen hang)
├─→ 3.3 posixfile real I/O
├─→ 3.4 resourcemgr (or defer to stub)
└─→ 5.4 cxxstream+format+path::filesystem (or document gap)
Phase 1.4 (backend prereqs)
├─→ 5.2 lto (1.4.c TTI)
└─→ 6.2 sanitizers (1.4.a RETURNADDR, 1.4.b TRAP→BRK)
Phase 1.6 (IigsSoundParmT fix)
└─→ 2.4 docram
Phase 1.11 + 1.12 (LTO weak-extern + Layer 2 gate)
└─→ 5.2 lto
```
---
## Recommended landing order (calendar weeks)
| Week | Phase | Items |
|------|-------|-------|
| 1 | Phase 0 (DONE) + 1.1 spike + 1.3.a spike | GS/OS fopen + MAI flag spikes |
| 2 | Phase 1.1-1.6 | Foundational prerequisites |
| 3 | Phase 1.7-1.13 | Build/harness + LTO gates |
| 4-5 | Phase 2 (parallel) | M1 quick wins: clangdwarffix, hexfloat, tmpfile (+copy/delete fallback), docram, cursor, buildsystem, cxxsmoke |
| 6-7 | Phase 3.1 + Phase 4 (parallel) | debugger; menubuilder + sprites |
| 8-9 | Phase 3.2 | localvars full slice (-O0 + -O2/IMG + loclists + inlined) |
| 10 | Phase 3.3-3.4 | posixfile; resourcemgr (or stub-only landing) |
| 11 | Phase 5 | unwinder-stub + ThinLTO + cxxchrono + cxxstream/format/path |
| 12 | Phase 6 | profiler + sanitizers (UBSan-min + coverage) |
**Total: 12 weeks of focused work for ~750-950h with Phase 0 decisions locked.**
Phase 0.4 override (full localvars in one shot) adds ~10-15h vs the split
approach; Phase 0.10 override (rename copy+delete) adds ~6h. Both are
absorbed in the per-phase budgets above.
---
## Risks I'm worried about (final list)
1. **`FK_Data_4` truncation discovery cascade.** The reviewer for `localvars`
found the IMM24 truncation bug while planning DWARF work. The bug is fixed
in Phase 1.3, but it's almost certainly the FIRST of several clang DWARF
bugs for this target. Budget contingency in Phase 3.2-3.3.
2. **`cxxsmoke` surfaces silent codegen regressions.** Every prior C++ probe
this project has run (cxxProbe, etlProbe) has surfaced at least one backend
bug. Phase 2.7 will likely do the same. Budget contingency.
3. **GS/OS fopen hang is unsolvable in budget.** If Phase 1.1 doesn't yield a
fix within 8-16h, multiple downstream items (`resourcemgr`,
`cxxstdlib::filesystem`, `tmpfile` real path, `posixfile` real I/O) ship
stub-only with documented limitations. This is acceptable but worth
confirming up front.
4. **Layer-2-aware LTO miscompile.** Phase 1.12 gate must be built FIRST. If
skipped, the resulting binaries are silently wrong in the most
performance-sensitive code path.
5. **`menubuilder` cRELOC budget pressure.** reversi.omf already at 40.5KB;
adding uiBuilder.c may push some demos past the cRELOC threshold. Re-
baseline post-migration.
6. **`unwinder` scope creep.** Phase 0.1 must be a hard decision. Going from
(A) stub to (B) real DWARF mid-work would derail the schedule.
7. **MEMORY.md truncation.** The index is already past the 200-line load
limit. Before starting any item, grep for
`feedback_*<item-substring>*.md` in the memory dir to surface anything the
loaded portion doesn't show.
8. **`sprites` SHR shadow scribble.** Phase 4.2.2 heap-vs-shadow policy is
load-bearing. Without explicit handling, sprite save buffers will land in
the visible display window and corrupt user pixels.
---
## How to use this document
- Start at Phase 0. Make each decision EXPLICITLY before any Phase 1 work.
- Phase 1 is FOUNDATIONAL. Skip nothing. Items in later phases will fail
silently if any Phase 1 prerequisite is missing.
- For any item touching DWARF: Phase 1.3 MUST be green first.
- For any item that does GS/OS file I/O: Phase 1.1 MUST be investigated.
- Reviewer-adjusted hours are working estimates; brief hours are systematically
low across the board.
- The `Critical-path summary` is the dependency graph — respect it.