1095 lines
51 KiB
Markdown
1095 lines
51 KiB
Markdown
# llvm816 gap-closure: comprehensive step plan
|
||
|
||
This is the full, ordered, dependency-aware plan for closing the 18 feature gaps
|
||
identified in the 2026-05-30 audit, rolled together with every prerequisite the
|
||
adversarial reviewers found. The earlier "master plan" stopped at milestones +
|
||
per-item criticism; this document is the actionable step list.
|
||
|
||
**Total effort (reviewer-adjusted):** roughly 700-1000 hours across all 18 items
|
||
plus the Phase 0 + Phase 1 prerequisites the reviewers added. Original briefs
|
||
sum to ~280h; reviewers added ~420h of hidden work most planners missed.
|
||
|
||
**Default audience:** expert C/C++ developer porting code to the Apple IIgs,
|
||
with a secondary path for retrocomputing tinkerers who want source-level
|
||
debugging.
|
||
|
||
The plan is organized in six phases, with hard dependency arrows. Each step is
|
||
a concrete, individually-shippable piece of work; nothing is "TBD" at the step
|
||
level.
|
||
|
||
---
|
||
|
||
## Phase 0 - Architectural decisions (DECIDED 2026-05-31)
|
||
|
||
| # | Topic | Decision |
|
||
|---|-------|----------|
|
||
| 0.1 | EH model | **SJLJ + `_Unwind_RaiseException`-over-SJLJ stub** |
|
||
| 0.2 | LTO model | **ThinLTO** |
|
||
| 0.3 | Sanitizer scope | **UBSan-min + coverage only** (no ASan) |
|
||
| 0.4 | `localvars` split | **Full -O0 + -O2/IMG in one shot** (override of recommendation) |
|
||
| 0.5 | `cxxstdlib` split | **`cxxchrono` first, then `cxxstream+format+path`** |
|
||
| 0.6 | Sprite harness | **Standalone first, desktop-coupled follow-up** |
|
||
| 0.7 | Resource fork delivery | **AppleSingle blob** |
|
||
| 0.8 | `clangdwarffix` approach | **BPF-style MAI flag spike first, escalate to new relocs only if needed** |
|
||
| 0.9 | `IigsSoundParmT` | **Fix as breaking change** (existing demos likely already broken) |
|
||
| 0.10 | `rename` cross-dir | **Implement copy+delete fallback now** (override of recommendation) |
|
||
|
||
**Impact of overrides:**
|
||
|
||
- **0.4 (full localvars in one landing):** Phase 3.2 and 3.3 collapse into a
|
||
single ~55h item. Risk profile is higher (multi-stage delivery is now
|
||
one-stage), but with Phase 1.5 DBG_VALUE audit landed first the foundation
|
||
is solid. Expect 3-5 additional clang DWARF bugs to surface during -O2 /
|
||
IMG work; budget contingency.
|
||
- **0.10 (rename copy+delete fallback):** Adds ~6h to Phase 2.3. Real work is
|
||
the error-recovery path (partial-copy state, partial-delete state, source
|
||
vanished mid-operation). Use the existing GS/OS class-1 calls
|
||
(Open/Read/Write/Create/Close/Destroy) to compose the fallback; no new
|
||
toolbox wrappers required.
|
||
|
||
### Why these (rationale)
|
||
|
||
### 0.1 Pick exception-handling model (sanitizers, unwinder, cxxstdlib all depend on this)
|
||
|
||
Three options:
|
||
- **A.** Keep SJLJ as default. Ship `_Unwind_RaiseException`-over-SJLJ stub for
|
||
third-party C++ libraries. ~20h. Loses no functionality. Strongly
|
||
recommended.
|
||
- **B.** Build a real DWARF CFI unwinder (libunwind port + backend CFI emission
|
||
+ per-MIR-pass CFI annotation + `__jsl_indir` hand-CFI). ~260h floor. Real
|
||
throw across non-instrumented frames.
|
||
- **C.** Make EH model a Subtarget feature with two MCAsmInfo subclasses, ship
|
||
both. ~20h extra plumbing on top of (A) or (B).
|
||
|
||
**Recommendation:** A. Reviewer for `unwinder` called (B) a multi-week
|
||
structural change and a foot-in-the-door for a long string of follow-on bugs.
|
||
|
||
### 0.2 Pick LTO model
|
||
|
||
- **A.** ThinLTO. Preserves per-TU codegen attachments so Lua's per-file
|
||
`-mllvm -regalloc=basic` keeps working. Summary-based inlining decisions
|
||
less prone to over-inlining the project already fought
|
||
(`feedback_lapi_inline_threshold.md`, `feedback_coremark_matrix_test_regression.md`).
|
||
- **B.** Full LTO with whole-module merge. Simpler tool, harder integration.
|
||
Per-TU regalloc becomes unrepresentable.
|
||
|
||
**Recommendation:** A. Reviewer called the original "full LTO is simpler"
|
||
choice exactly backwards for this codebase.
|
||
|
||
### 0.3 Pick sanitizer scope
|
||
|
||
- **A.** UBSan-minimal + coverage only. Achievable. ~22h.
|
||
- **B.** UBSan + ASan + coverage. ASan's 8:1 shadow-memory model does not fit
|
||
the 65816 (full 16MB → 2MB shadow; programs run in 1-2 banks). Reviewer
|
||
called the brief wrong on architectural grounds.
|
||
|
||
**Recommendation:** A. Document ASan as out-of-scope.
|
||
|
||
### 0.4 Pick `localvars` split
|
||
|
||
- **A.** Ship -O0 stack-resident locals only (20-30h). Faster delivery, narrower
|
||
payload.
|
||
- **B.** Ship -O0 + -O2/IMG-resident + location-list crossing PCs + inlined
|
||
subroutines (50-80h).
|
||
|
||
**Recommendation:** A first, then (B) as a follow-up. Reviewer's split point
|
||
is the natural project boundary.
|
||
|
||
### 0.5 Pick `cxxstdlib` split
|
||
|
||
- **A.** Ship `cxxchrono` only (etl::chrono + libc time hooks). 3-4h.
|
||
- **B.** Ship `cxxstream+format+path` (string_stream + format + iigs::path +
|
||
`<cmath>` shim + smoke). 12-15h.
|
||
- **C.** Both, but as separate landings.
|
||
|
||
**Recommendation:** C, with A landing first. Reviewer noted mixing them hides
|
||
the format/`<cmath>` rabbit hole.
|
||
|
||
### 0.6 Pick sprite-engine harness
|
||
|
||
- **A.** Standalone. sprite.c does its own `$C029` + SCB + palette init. Use
|
||
`runInMame.sh` bare-metal harness.
|
||
- **B.** Desktop-coupled. Relies on `paintDesktopBackdrop`, runs through GS/OS
|
||
Finder launch (`runViaFinder.sh`).
|
||
- **C.** Both, ship A first.
|
||
|
||
**Recommendation:** C.
|
||
|
||
### 0.7 Pick resource-fork delivery shape
|
||
|
||
- **A.** AppleSingle blob (one file = data fork + resource fork; cadius
|
||
auto-detects). Cleaner.
|
||
- **B.** `_ResourceFork.bin` sidecar (cadius `Prodos_Add.c:386` supports it).
|
||
Cleaner separation.
|
||
|
||
**Recommendation:** A. Verify via a 1-hour disposable spike before writing the
|
||
bundler.
|
||
|
||
### 0.8 Pick `clangdwarffix` approach
|
||
|
||
- **A.** BPF-style one-liner: `MAI->setDwarfUsesRelocationsAcrossSections(false)`.
|
||
Reviewer cites `BPFTargetMachine.cpp:87` as the precedent. If it works the
|
||
whole 14h plan collapses to ~1h.
|
||
- **B.** New `R_W65816_DATA32` + `R_W65816_PCREL32` relocs through the full
|
||
pipeline (MC → ELF writer → link816 → `pc2line.py`).
|
||
|
||
**Recommendation:** Spike A first (10 minutes). Fall back to B only if A
|
||
introduces new failures. Reviewer documented (B) is a strict superset of work
|
||
covered by (A).
|
||
|
||
### 0.9 Decide whether to fix `IigsSoundParmT` as a breaking change
|
||
|
||
The current in-tree struct (6 bytes) does not match ORCA's authoritative
|
||
`SoundParamBlock` (18 bytes). `iigsPlayDocSample` is almost certainly silently
|
||
broken today. Either:
|
||
- **A.** Fix the struct as a breaking change. Existing demos relying on it
|
||
will need migration. Reviewer believes none actually work today, so the
|
||
"breakage" is theoretical.
|
||
- **B.** Add new corrected API (`iigsPlaySoundV2`?), leave broken one.
|
||
|
||
**Recommendation:** A. Item `docram` cannot be delivered honestly otherwise.
|
||
|
||
### 0.10 Decide rename() / cross-directory policy
|
||
|
||
GS/OS `ChangePath ($2004)` is rename-in-place-only. POSIX rename(old, new)
|
||
across directories is impossible without an explicit copy-then-delete path.
|
||
- **A.** Reject cross-dir new-paths up front with EINVAL.
|
||
- **B.** Implement copy-then-delete fallback. ~6h, error-recovery is the hard
|
||
part.
|
||
- **C.** Accept divergence; document loudly.
|
||
|
||
**Recommendation:** A initially, with (B) deferred until a real user complains.
|
||
|
||
---
|
||
|
||
## Phase 1 - Foundational prerequisites (everything later depends on these)
|
||
|
||
These are NOT in the original 18 items but were surfaced as preconditions by
|
||
multiple reviewers. They unblock the actual feature work.
|
||
|
||
### 1.1 GS/OS `fopen` hang investigation [BLOCKS: resourcemgr runtime, tmpfile real path, cxxstdlib::filesystem, posixfile real I/O]
|
||
|
||
`JSL $E100A8` doesn't return under real GS/OS 6.0.2 (per `STATUS.md`).
|
||
- Reproduce under MAME with the `-debug -debugger qt -oslog` bpset workflow
|
||
documented in `SESSION_RECOVERY.md`.
|
||
- Bisect: ABI mismatch, stack-shape mismatch, missing tool init, DP/SP layout.
|
||
- If unsolvable in a 4-8h budget: document the limitation and route all
|
||
GS/OS-dependent items through the stub linker (`iigsGsosStub.s`) with the
|
||
honest-failure sentinel from step 1.2.
|
||
|
||
**Effort:** 8-16h investigation. If fix is found, +4-8h to land. If
|
||
unsolvable, project moves on with stub-only paths for affected items.
|
||
|
||
### 1.2 Stub-mode sentinel for honest `iigsGsosStub.s` [BLOCKS: tmpfile, posixfile, cursor]
|
||
|
||
`iigsGsosStub.s` currently makes every GS/OS call succeed silently.
|
||
`__gsosAvailable()` returns TRUE in stub mode, so newly-added wrappers
|
||
(`gsosDestroy`, `gsosChangePath`, prefix/dir/info) will fall through the
|
||
catch-all stub and *appear to succeed while doing nothing*.
|
||
|
||
Two options to fix:
|
||
- Add explicit error stubs for each new wrapper (returns -1 / sets errno).
|
||
- Add a `__gsosIsRealImpl()` sentinel that distinguishes "real GS/OS linked"
|
||
from "universal-success stub linked".
|
||
|
||
**Recommendation:** Sentinel. ~3h. One source of truth.
|
||
|
||
### 1.3 `FK_Data_4` → `R_W65816_*32` reloc fix [BLOCKS: clangdwarffix → debugger → localvars → profiler]
|
||
|
||
Independent of Phase 0.8 choice — once the BPF-style spike either passes or
|
||
fails, this is the actual step.
|
||
|
||
Reviewer surfaced multiple landmines the original plan missed:
|
||
- `ELFObjectWriter::recordRelocation` (`MC/ELFObjectWriter.cpp:1329-1349`)
|
||
converts in-section diffs to PC-relative. Need BOTH `R_W65816_DATA32` AND
|
||
`R_W65816_PCREL32`.
|
||
- `link816.cpp:1275` has a hardcoded `r.offset + 3 > sec.size` width check
|
||
inside `writeDebugSidecar` that must become reloc-type-driven.
|
||
- Reloc-type emission must land BEFORE the MC change starts emitting the new
|
||
types or every intermediate `-g` build dies on unknown reloc.
|
||
- A `ninja clean && ninja` is required (TableGen-emitted enum dependencies do
|
||
not play well with incremental builds).
|
||
|
||
**Steps:**
|
||
1. (a) Spike `setDwarfUsesRelocationsAcrossSections(false)` in
|
||
`W65816MCAsmInfo`. Rebuild clang, `xxd .debug_line` of a `-g` hello.c,
|
||
verify non-zero `unit_length` / `header_length`. If green: skip to step 1.3.h.
|
||
2. (b) Add `R_W65816_DATA32` + `R_W65816_PCREL32` to
|
||
`W65816FixupKinds.h` / `W65816AsmBackend.cpp`.
|
||
3. (c) Extend `W65816ELFObjectWriter::getRelocType` to dispatch FK_Data_4 by
|
||
`IsPCRel`.
|
||
4. (d) Add 4-byte reloc handlers to `link816.cpp::applyReloc` (DATA32 = write
|
||
`sectionBase + addend`; PCREL32 = write `target - patchAddr + addend`).
|
||
5. (e) Generalize the `r.offset + 3 > sec.size` width check to use a small
|
||
switch on reloc type.
|
||
6. (f) Land link816 + AsmBackend in one commit (so intermediate builds don't
|
||
die). Then land the MC switch that starts emitting the new types.
|
||
7. (g) Update `pc2line.py` to use the now-correct `unit_length` /
|
||
`header_length`, keeping the tolerant zero-fallback for older artifacts.
|
||
8. (h) Audit `emitAbsoluteSymbolDiff` / `emitDwarfUnitLength` /
|
||
`makeEndMinusStartExpr` callers; verify `.debug_frame`, `.debug_loclists`,
|
||
`.eh_frame` also work.
|
||
9. (i) Drop `llvm-dwarfdump consumes without warnings` from `shipsAs` —
|
||
`EM_NONE` will still warn. File EM_ assignment as a separate gap item.
|
||
|
||
**Effort:** 1h (best case, MAI flag works) or 14-20h (full reloc path). Risk
|
||
HIGH on rebase pain.
|
||
|
||
### 1.4 Backend prerequisites bundle [BLOCKS: sanitizers, lto, unwinder]
|
||
|
||
Three small backend changes that unblock multiple items:
|
||
|
||
- (a) `setOperationAction(ISD::RETURNADDR, MVT::i32, Expand)` in
|
||
`W65816ISelLowering.cpp`. Today any code calling
|
||
`__builtin_return_address(0)` ICEs clang (since pointers are i32 but
|
||
RETURNADDR is registered for i16 Expand only). Required by UBSan's
|
||
caller-pc dedup AND by user code. ~30 min.
|
||
- (b) `setOperationAction(ISD::TRAP, MVT::Other, Custom)` + lower to
|
||
`BRK_pseudo` that writes a sentinel to `$70` before halting. Required by
|
||
`-fsanitize-trap=undefined`. ~2h.
|
||
- (c) Minimal `W65816TTI` (TargetTransformInfo) returning 4× generic cost for
|
||
i32 ops and 20× for soft-float libcalls. Required by LTO inliner so it
|
||
doesn't over-inline based on generic cost defaults. ~6h.
|
||
|
||
**Effort:** ~9h total. Land as three separate commits.
|
||
|
||
### 1.5 DBG_VALUE preservation audit across custom MIR passes [BLOCKS: -O2 localvars]
|
||
|
||
Custom MIR passes (`W65816StackRelToImg`, `W65816StackSlotMerge`,
|
||
`W65816SepRepCleanup`, `W65816LowerWide32`, `W65816ImgCalleeSave`,
|
||
`W65816SpillToX`, `W65816TiedDefSpill`) only use `getDebugLoc()` for source-line
|
||
info. None call `MachineInstr::transferDbgValues()` when slots move/coalesce
|
||
or when stack slots get promoted to IMG slots.
|
||
|
||
For each pass: grep for slot/register replacement. Each call site that
|
||
substitutes one operand for another must propagate DBG_VALUE.
|
||
|
||
**Effort:** 8-15h. Without this, -O2 locals are vapor regardless of how good
|
||
the DWARF parser is.
|
||
|
||
### 1.6 `IigsSoundParmT` correction [BLOCKS: docram]
|
||
|
||
Phase 0.9 decided this is a breaking change. Steps:
|
||
1. Replace 6-byte struct in `runtime/include/iigs/sound.h` with ORCA's 18-byte
|
||
layout (Pointer waveStart (4B) / Word waveSize pages (2B) / Word freqOffset
|
||
(2B) / Word docBuffer (2B) / Word bufferSize (2B) / Pointer nextWavePtr
|
||
(4B) / Word volSetting (2B)).
|
||
2. Rewrite `iigsPlayDocSample` to populate the corrected struct. Move channel
|
||
out of the struct into `FFStartSound`'s arg0.
|
||
3. Audit existing callsite at `smokeTest.sh:1147` and migrate.
|
||
4. Update `README.md:144-147` and `STATUS.md` claim that DOC-RAM staging is not
|
||
wrapped — those lines are about to be wrong.
|
||
5. Verify under real GS/OS or a known-good MAME version (silence vs. audio is
|
||
the validation gate).
|
||
|
||
**Effort:** 4-6h.
|
||
|
||
### 1.7 Build/harness prerequisites bundle
|
||
|
||
- (a) `runInMame.sh --check-u8 <addr>=<val>` for byte-level SHR pixel checks.
|
||
Required by sprites. ~1h.
|
||
- (b) `runViaFinder.sh --data /PATH=file` injection. Required by any GS/OS
|
||
demo with file I/O (`tmpfile`, `posixfile` GS/OS path, eventually
|
||
`cxxstdlib::filesystem`). ~1h.
|
||
- (c) buildGno-launched MAME smoke harness. Currently smoke runs C++ via
|
||
inline cpp HEREDOCs at build-time only; the `cxxsmoke`, `cxxstdlib`, and
|
||
`cursor` smoke checks need actual MAME-launched OMF execution. Mirror
|
||
`tests/lua/` pattern. ~4h.
|
||
- (d) Fix `softDouble.o.bak` in `runtime/` (15KB stale dated May 1). Required
|
||
before `buildsystem` can do `file(GLOB)` over `runtime/*.o`. Either delete
|
||
the .bak or generate the imports manifest from `runtime/build.sh`. ~30 min.
|
||
- (e) Generate `W65816RuntimeImports.cmake` from `runtime/build.sh` (or have
|
||
build.sh emit a manifest). Single source of truth for the runtime .o list.
|
||
~2h.
|
||
|
||
**Effort:** ~9h.
|
||
|
||
### 1.8 `srand` seeding + `ReadTimeHex` toolbox call [BLOCKS: tmpfile uniqueness, posixfile mkstemp]
|
||
|
||
`extras.c:124` seeds `rand()` to constant 1. `mkstemp`'s claimed uniqueness
|
||
guarantee is a lie without time-based seeding.
|
||
|
||
- Expose `ReadTimeHex` ($0D03) in `iigsToolbox.s` (currently absent).
|
||
- Add `srand` hook in `crt0Gsos.s` + `crt0Gno.s` that reads time and seeds.
|
||
|
||
**Effort:** ~2-3h.
|
||
|
||
### 1.9 `<cmath>` C++ shim [BLOCKS: cxxstdlib (format / chrono with FP)]
|
||
|
||
clang++ on llvm-mos has no system C++ stdlib; `#include <cmath>` fails. ETL's
|
||
`format.h` `#include`s `<cmath>` when `ETL_USING_FORMAT_FLOATING_POINT=1`.
|
||
|
||
- Create `runtime/include/c++/cmath` that pulls `<math.h>` (already
|
||
extern-C-wrapped) and exports `std::` aliases for the libc functions.
|
||
- Optionally add `<cstdlib>`, `<cstddef>` shims following the same pattern.
|
||
- Decide `ETL_USING_FORMAT_FLOATING_POINT` default policy in `etl_profile.h`:
|
||
recommend OFF by default with `--layer2` opt-in for FP format builds.
|
||
|
||
**Effort:** ~3h.
|
||
|
||
### 1.10 `PATH_MAX` and friends in `limits.h` [BLOCKS: posixfile]
|
||
|
||
`PATH_MAX` is not defined anywhere. Add to `runtime/include/limits.h` with a
|
||
comment tying it to `GSString.length` being u16 and the practical
|
||
NUL-terminated-path-fits-in-256-bytes rule.
|
||
|
||
**Effort:** ~30 min.
|
||
|
||
### 1.11 Weak-extern survival policy for LTO [BLOCKS: lto] [DONE]
|
||
|
||
`libc.c` declares dozens of `__attribute__((weak)) extern` GS/OS calls
|
||
(`gsosOpen/Read/Write/...`). Under LTO, the inliner may decide a weak-extern
|
||
is undefined and propagate that as constant 0 / NULL through callers, then DCE
|
||
the surrounding code.
|
||
|
||
- Marked all weak-extern decls in `libc.c` with `__attribute__((weak, retain, used))`:
|
||
the GS/OS dispatchers (`gsosOpen/Read/Write/Close/GetEOF/SetEOF/SetMark/GetMark/Create`),
|
||
`__gsosIsRealImpl`, `__putByte`, `__getByte`, `__putByteErr`, `__heap_start`,
|
||
`__heap_end`. `used` keeps the compiler from dropping references; `retain`
|
||
survives linker GC; both are no-ops in non-LTO builds.
|
||
- `libcxxabi.c::abiRunCxaAtexit` (`__run_cxa_atexit`) annotated with
|
||
`__attribute__((retain, used))` — its only callers live in crt0*.s
|
||
(`jsl __run_cxa_atexit`), which is invisible to LTO's IR view, so without
|
||
the attributes LTO would strip the body and crt0 would JSL into the weak
|
||
no-op fallback in libgcc.s and C++ global dtors would never run.
|
||
- Definitions in `libcGno.c` left unannotated: link-pull-in from libc.c's
|
||
weak-externs already keeps them alive; the LTO hazard is on the
|
||
declaration side, not the definition side (the linker pulls libcGno.o
|
||
in to resolve the libc.c weak-externs regardless of LTO).
|
||
- 145 smoke checks pass.
|
||
|
||
### 1.12 LTO × Layer 2 silent-miscompile gate [BLOCKS: lto]
|
||
|
||
`-mllvm -w65816-dbr-safe-ptrs` is per-TU. Mixing in an LTO set produces silent
|
||
wrong code.
|
||
|
||
Build the gate FIRST, before any LTO codegen work:
|
||
- Embed Layer 2 flag in IR as a module-level attribute on every TU.
|
||
- In the LTO driver pre-pass, hard-fail if attributes disagree.
|
||
|
||
**Effort:** ~3h.
|
||
|
||
### 1.13 ELF EM_ assignment [BLOCKS: clangdwarffix, llvm-dwarfdump tooling]
|
||
|
||
`llvm-dwarfdump` warns persistently because `EM_NONE` is set on output.
|
||
Assign a real (vendor-private if needed) `EM_` value.
|
||
|
||
**Effort:** ~2h.
|
||
|
||
**Phase 1 total: ~60-90h.**
|
||
|
||
---
|
||
|
||
## Phase 2 - M1 quick wins (parallel, no DWARF dependency)
|
||
|
||
These items have no cross-dependencies and can run concurrently once Phase 1
|
||
lands the build-harness prerequisites.
|
||
|
||
### 2.1 `clangdwarffix` (continued from Phase 1.3)
|
||
|
||
Phase 1.3 covered the reloc plumbing. Remaining work:
|
||
- Update smoke checks at `smokeTest.sh:5347` (encodes 3-byte width — the new
|
||
4-byte LE address starts with the same 3 LE bytes, so green, but fragile).
|
||
- Add `pc2line.py` cleanup to drop the zero-length fallback.
|
||
- Update docs (`USAGE.md`, `STATUS.md`) to drop the "llvm-dwarfdump warns"
|
||
caveat — depends on Phase 1.13.
|
||
|
||
**Effort:** ~3h after Phase 1.3 + 1.13.
|
||
|
||
### 2.2 `hexfloat` (`%a` / `%A` printf)
|
||
|
||
- Decide subnormal canonical form (recommend `0x0.{mantissa}p-1022`).
|
||
- Decide trailing-zero stripping policy (recommend glibc-style: strip when
|
||
precision unspecified).
|
||
- Implement `emitHexFloat` in `runtime/src/snprintf.c` with local
|
||
width/leftAlign/zeroPad arithmetic (do NOT reuse `emitNumber`'s monolithic
|
||
numeric body — only use it for the exponent).
|
||
- Use 4 u16 words instead of u64 shifts to dodge i64-codegen surprises (>>52
|
||
and 12-bit mask paths).
|
||
- Bring `%f`/`%g`/`%e` to Inf/NaN parity OR document the asymmetry (don't half-do
|
||
it).
|
||
- Add a *new* smoke probe block (don't extend the existing 0x7f bitmap — used
|
||
by two checks at `smokeTest.sh:2407` and `:2581`).
|
||
- Update `STATUS.md:48-52` (printf conversion table) and snprintf.c banner at
|
||
lines 21-23.
|
||
|
||
**Effort:** 6-8h.
|
||
|
||
### 2.3 `tmpfile` / `tmpnam` / `rename`
|
||
|
||
Following Phase 0.10's decision (copy+delete fallback for cross-dir rename):
|
||
|
||
- Per-FILE owned name buffer (extend FILE struct or use parallel
|
||
`tmpNames[MFS_MAX_FILES][L_tmpnam]` table). Update `__mfs[]` initializer.
|
||
- Add `gsosDestroy` ($2002 pCount=1) and `gsosChangePath` ($2004) wrappers in
|
||
`iigsGsos.s` + `iigsGsosStub.s` (real stub semantics from Phase 1.2).
|
||
- Promote `remove()` from mfs-only to mfs-then-GS/OS-Destroy.
|
||
- Promote `tmpfile()` from stub: generate unique name via `tmpnam`, open
|
||
O_CREAT|O_EXCL, set the auto-delete-on-close flag in the FILE.
|
||
- Promote `tmpnam()` from stub: read time via Phase 1.8 srand seed, format
|
||
`/RAM5/T{16-hex-chars}.TMP` or similar.
|
||
- Promote `rename()` from stub:
|
||
- **Fast path:** if new-path is in the same directory, route to ChangePath.
|
||
- **Cross-dir copy+delete fallback:** Open source RDONLY, Create destination,
|
||
chunked Read/Write loop (8KB buffer), Close both, Destroy source. Error
|
||
recovery: if Write fails mid-loop, Destroy destination + return -1. If
|
||
final Destroy of source fails, leave dest in place + return -1 with errno
|
||
set + emit a debug log line (destructive partial-state, but the data is
|
||
preserved). Source-vanished-mid-op is rare under GS/OS (no concurrent
|
||
process); leave as best-effort.
|
||
- Use GSString256 stack scratch (already present at `__gsosPathBuf` in
|
||
libc.c).
|
||
- Update mfs-path detection auto-detect `/` vs `:` separator.
|
||
- Smoke tests:
|
||
- create + write + close + remove + verify destroyed.
|
||
- rename within same dir (ChangePath path).
|
||
- rename across dirs (copy+delete fallback) — write 10KB file, verify
|
||
contents byte-identical post-rename, verify source gone.
|
||
|
||
**Effort:** 16-18h (was 10-12h; +6h for copy+delete fallback per Phase 0.10).
|
||
|
||
### 2.4 `docram` (DOC-RAM sample upload)
|
||
|
||
Phase 1.6 already corrected `IigsSoundParmT`. Remaining work:
|
||
|
||
- Add `iigsLoadDocSample(const int8_t *wave, uint16_t size, uint16_t docOffset)`
|
||
wrapper around `WriteRamBlock` toolbox call.
|
||
- Update `iigsPlaySoundV2` / `iigsPlayDocSample` to consume corrected struct.
|
||
- Add `demos/helloSample.c` standalone demo.
|
||
- Wire `runtime/src/sound.c` into `demos/build.sh` (currently missing).
|
||
- Add standalone MMStartUp+SoundStartUp helper to `iigs/sound.h` (since
|
||
`startdesk()` is too heavy for a CLI-style sample probe).
|
||
- Smoke test: WriteRamBlock returns cleanly + a marker store fires.
|
||
|
||
**Effort:** 6-8h after Phase 1.6.
|
||
|
||
### 2.5 `cursor` helpers
|
||
|
||
- Add `IigsCursorT` typedef to `runtime/include/iigs/toolbox.h`.
|
||
- Add `runtime/src/cursor.c` with `iigsCursorPushArrow`, `iigsCursorPushBusy`,
|
||
`iigsCursorPop`, `iigsCursorRegister(region, cursor)` (via TaskMaster
|
||
wmTaskMask cursor auto-track, NOT a custom idle hook).
|
||
- Save-stack stores a COPY of the CursorRecord (not the pointer — toolset
|
||
memory can move).
|
||
- Hard-error or asserted-no-op before `startdesk()` (InitCursor invariant).
|
||
- Decide: drop embedded cursor blobs from scope (just wrappers + Wait/IBeam ROM
|
||
shapes via `GetCursorAdr($800c)`) OR hand-code 4 cursor blobs and budget
|
||
~3-4h for mask/hotspot debugging.
|
||
- Recommend: drop embedded blobs; expose
|
||
`SetIigsCursor(const IigsCursorT*)` + `iigsCursorBusy()`/`iigsCursorArrow()`.
|
||
- Update `runtime/build.sh` (use `__attribute__((section(...)))` per cursor
|
||
blob if embedded; OR use `-fdata-sections` target-wide and re-verify smoke).
|
||
- Smoke: $70-marker MAME region-transition probe.
|
||
|
||
**Effort:** 14-18h.
|
||
|
||
### 2.6 `buildsystem` (CMake + Make integration)
|
||
|
||
- Decide on `TYPE` enumeration: `flat` | `flatMultiSeg` | `gsos` | `gno` (four
|
||
values, not three — reviewer caught this).
|
||
- Build `CMAKE_C_LINK_EXECUTABLE` override that fully bypasses CMake's link-line
|
||
generator (link816 takes no `-L`/`-l`/`-Wl`/response files).
|
||
- Generate `W65816RuntimeImports.cmake` from Phase 1.7.e (single source of
|
||
truth).
|
||
- Per-source-file CFLAGS override:
|
||
`set_source_files_properties(... PROPERTIES W65816_LAYER2 ON W65816_REGALLOC basic)`.
|
||
- Wrap all four runner harnesses (`runInMame.sh`, `runMultiSeg.sh`,
|
||
`runViaFinder.sh`, `runInGno.sh`) under `add_w65816_mame_test()`.
|
||
- Hand-build the link line in exact order (libcGno.o BEFORE libc.o for weak
|
||
override).
|
||
- ProDOS filetype/aux: pass `--filetype` to link816, emit `.meta` sidecar,
|
||
ctest wrapper reads `.meta` to construct cadius `#XX0000` suffix.
|
||
- Guard at CMake configure time: `TYPE=gno` + `SEGMENT_CAP` is an error
|
||
(omfEmit rejects this combo at `omfEmit.cpp:723-724`).
|
||
- C++ auto-link of `libcxxabi.o` + `libcxxabiSjlj.o` AFTER `libc.o`: read
|
||
SOURCES extensions, branch in CMake function body (genex can't reorder).
|
||
- Make template: scope explicitly to single-binary single-mode flat hello-world
|
||
ONLY. Document the gap.
|
||
- Smoke integration under `ulimit -t 90s`: cold-cache CMake configure can take
|
||
30+s; ensure graceful skip when `command -v cmake` fails.
|
||
- Optional: GENERATE_DEBUG keyword + ctest hookup for `pc2line.py` (depends on
|
||
Phase 1.3).
|
||
|
||
**Effort:** 55h. HIGH risk on link-line override.
|
||
|
||
### 2.7 `cxxsmoke` (modern C++ smoke coverage)
|
||
|
||
- Pre-spike: run each candidate snippet as a one-off demo through buildGno.sh
|
||
+ runInGno.sh BEFORE writing smoke checks. 30-min sanity gate.
|
||
- Decide demo placement: create `tests/cxxSmoke/` mirroring `tests/coremark/` /
|
||
`tests/lua/` pattern, NOT in `demos/` (where `buildGno.sh` auto-discovery
|
||
would build them as GNO commands).
|
||
- Add `-include etl_profile.h` to smoke compile line OR replace `etl::tuple`
|
||
structured-binding check with a user struct that has tuple_size /
|
||
tuple_element specializations defined in the heredoc.
|
||
- Five checks: range-for, generic lambda + capture-by-reference of i32 local
|
||
(the i32 path is where most recent fixes have lived — most likely to
|
||
regress), variadic templates, structured bindings, fold expressions.
|
||
- Each check: a buildGno-style probe with $70 marker on success.
|
||
- Smoke harness from Phase 1.7.c launches each under MAME and verifies marker.
|
||
- If any check fails: stop work, XFAIL the test with TODO note, book a
|
||
separate codegen-fix PR.
|
||
|
||
**Effort:** 10-12h (clean run). Best case 4h, worst case multi-day if a
|
||
codegen bug surfaces.
|
||
|
||
**Phase 2 total: ~110-130h.**
|
||
|
||
---
|
||
|
||
## Phase 3 - M2 source-level debugging end-to-end
|
||
|
||
### 3.1 `debugger` (interactive GDB-style front-end)
|
||
|
||
Reviewer's critical findings:
|
||
- `cpu.debug:bpset(addr)` 1-arg form CRASHES MAME. Use
|
||
`bpset(pc, '', 'logerror "BP-HIT PC=%X A=%X X=%X Y=%X S=%X DBR=%X\n",pc,a,x,y,s,db; go')`.
|
||
- `SESSION_RECOVERY.md:362-385` already documents the working `-debug -debugger
|
||
qt -oslog` workflow. Reuse, do not reinvent.
|
||
- Reentrancy SEGFAULT: `add_machine_pause_notifier` + `cpu.debug:go()` from a
|
||
callback. Design must NOT call `go()` from Lua resume command callbacks.
|
||
- MAME under `-debug` starts with `execution_state = 'stop'`. Harness must
|
||
explicitly call `dbg.execution_state = 'run'`.
|
||
- Drop `bt` from initial scope OR downgrade to best-effort single-frame parent
|
||
only. Real multi-frame `bt` requires either DW_AT_frame_base in .debug_info
|
||
or a per-function frame-size sidecar from link816 (new work item, not
|
||
budgeted).
|
||
- Add `finish`/return command (run-until-current-frame-RTL/RTS) — easier than
|
||
step-over JSL and the natural escape from accidental step-into.
|
||
|
||
**Steps:**
|
||
1. Add `demos/build.sh --debug` mode (adds `-g` to clang, `--debug-out`/`--map`
|
||
to link816, `_dbg` output naming).
|
||
2. Add `demos/buildGno.sh --debug` mode equivalent.
|
||
3. Build Python front-end consuming `-oslog` stream (one-way pipe). Use
|
||
`machine.debugger.command(string)` to inject debugger console commands at
|
||
runtime for set-bp / step / continue.
|
||
4. Pre-spike: confirm `bpset(pc, '', '')` form, verify bank-aware bp matching
|
||
(24-bit PB:PC vs 16-bit PC), confirm execution_state behavior after pre-run
|
||
bpset. 2h spike.
|
||
5. Implement commands: `b FUNC | FILE:LINE`, `c`, `s` (step-instr), `n`
|
||
(step-over: temp-bp at jsl_pc+4), `finish`, `p &GLOBAL` (map lookup only —
|
||
`p VAR` deferred to `localvars`).
|
||
6. Update `SESSION_RECOVERY.md` (not a new doc — keep one source of truth) to
|
||
reference the new workflow.
|
||
7. Add `--trace` mode that sets bp at `main`, captures one BP-HIT via -oslog,
|
||
asserts pc2line.py resolves it. Default-on smoke, no `DEBUGGER_E2E=1` gate.
|
||
8. Gate interactive `(dbg)` prompt portion behind `DEBUGGER_E2E=1` only.
|
||
|
||
**Effort:** 24-30h.
|
||
|
||
### 3.2 `localvars` (-O0 + -O2/IMG + location-lists + inlined subroutines, per Phase 0.4)
|
||
|
||
Depends on Phase 1.3 (DWARF reloc fix) + Phase 1.5 (DBG_VALUE preservation).
|
||
|
||
Per Phase 0.4 decision: full surface in one landing.
|
||
|
||
**Steps:**
|
||
1. Verify llvm-dwarfdump can parse a `-g` `.o` after Phase 1.3. Hard
|
||
precondition.
|
||
2. Validate +1 stack skew convention with deliberate probe (int x=0xABCD; int
|
||
y=0x1234; int z=0x5678; read fbreg offsets from memdump, verify alignment).
|
||
Add as smoke check.
|
||
3. Extend `pc2line.py` into a full DIE walker for `.debug_info` + `.debug_abbrev`
|
||
+ `.debug_addr` + `.debug_str` + `.debug_str_offsets`.
|
||
4. Implement a DW_OP evaluator for: DW_OP_fbreg, DW_OP_addr, DW_OP_constN,
|
||
DW_OP_reg0..7, DW_OP_breg0..7, DW_OP_call_frame_cfa.
|
||
5. Add `--locals 0xPC` mode that reads from a MAME memdump (snapshot or
|
||
`-oslog` register dump).
|
||
6. Wire `p VAR` in debugger (3.1) to call `pc2line.py --locals`.
|
||
7. **-O2 / IMG-resident locals:** rewrite DW_OP_regN refs to IMG slot indices
|
||
(IMG0..IMG15) into `DW_OP_breg<DP_base>+offset` form. LLVM emits the
|
||
fictitious-register form; pc2line maps it to actual DP $C0..$DE locations.
|
||
8. **Location lists:** parse `.debug_loclists` (DWARF 5) for PC-range-keyed
|
||
location expressions. Resolve to the correct entry for the queried PC.
|
||
9. **Inlined subroutines:** DW_TAG_inlined_subroutine descent. Multiple-
|
||
DIE-per-PC handling. Show inlined frame stack at the queried PC.
|
||
10. Smoke checks (covering -O0 AND -O2 paths):
|
||
- `add(3, 4)` -O0: locals print `a=3 b=4 c=7`.
|
||
- `popcount(0xF0F0)` -O2 with IMG-resident vars: locals resolve correctly.
|
||
- Multi-CU program (Lua-scale): locals from any CU resolve.
|
||
- Inlined-helper case: stack shows the inlined frame.
|
||
11. Expect 3-5 additional clang DWARF bugs to surface as -O2 / IMG / loclists
|
||
work probes `.debug_info` deeper. Each is its own upstream-or-local-patch
|
||
decision; budget contingency in this phase.
|
||
|
||
**Effort:** 50-75h (combined slice). Risk: HIGH (Phase 0.4 override accepts
|
||
this). Mitigation: land Phase 1.5 DBG_VALUE audit FIRST.
|
||
|
||
### 3.3 `posixfile` (POSIX file helpers)
|
||
|
||
Depends on Phase 0.10 (cross-dir policy), Phase 1.7.b (--data injection),
|
||
Phase 1.8 (srand), Phase 1.10 (PATH_MAX).
|
||
|
||
**Steps:**
|
||
1. Add 3 new GS/OS class-1 wrappers to `iigsGsos.s`:
|
||
- `Get_Prefix` ($200A) for `realpath`
|
||
- `Get_File_Info` ($2006) for `dirname`/`basename` semantics
|
||
- `Get_Dir_Entry` ($201C) for `glob`/directory iteration
|
||
2. Add corresponding parm-block typedefs to `runtime/include/iigs/gsos.h`.
|
||
3. Add stub-mode counterparts to `iigsGsosStub.s` (using Phase 1.2 sentinel).
|
||
4. Pre-spike: write `demos/gsosProbeDirEntry.c` exercising directory open +
|
||
Get_Dir_Entry iteration. Run under `runInGno.sh + GSOS_FILE_SMOKE=1`
|
||
BEFORE committing to glob's API. ~2h.
|
||
5. Implement `realpath` (uses prefix resolution + Get_File_Info).
|
||
6. Implement `dirname` / `basename` with auto-detect `/` vs `:` separator.
|
||
7. Implement `fnmatch` with FULL bracket-set support (`[A-Z]*`, `[!a-z]`) —
|
||
MANDATORY per reviewer, not optional.
|
||
8. Implement `glob` using directory iteration + fnmatch.
|
||
9. Implement `mkstemp` using Phase 1.8 srand seed. Template-must-be-writable
|
||
invariant (refuse non-writable template, document rodata-write risk in
|
||
header).
|
||
10. Smoke check each: 6 helpers × ~20 min.
|
||
11. Document GNO/POSIX-VFS limitation: realpath/glob route through GS/OS
|
||
class-1 on both bare-metal-with-GS/OS and GNO. GNO chdir-via-K* not
|
||
honored.
|
||
|
||
**Effort:** 18-26h.
|
||
|
||
### 3.4 `resourcemgr` (deferred or stub-only per Phase 1.1 outcome)
|
||
|
||
If Phase 1.1 resolves GS/OS fopen hang, proceed. Otherwise: stub-only landing
|
||
documented as such.
|
||
|
||
**Steps (full version):**
|
||
1. Decide bundler input format: `TYPECODE_ID.bin` per reviewer recommendation
|
||
(16-bit type + 16-bit ID encoded in filename like `8005_0001.bin`).
|
||
2. Verify AppleSingle round-trip with disposable 1-hour cadius spike before
|
||
writing full bundler.
|
||
3. Install or build ORCA's `rez` as hard dependency for layout cross-checking.
|
||
4. Write `tools/rsrcBundle/rsrcBundle.py`:
|
||
- Read TYPECODE_ID.bin files
|
||
- Build rResourceMap + rIndex
|
||
- Stitch with OMF data fork
|
||
- Emit AppleSingle
|
||
5. Write `tools/rsrcBundle/dumpFork.py` for diffing against rez output.
|
||
6. Implement `resourceProbeInit()` in `runtime/src/resource.c` (MMStartUp +
|
||
TLStartUp + ResourceStartUp + OpenResourceFile-on-own-pathname).
|
||
7. Build typed-C façade: LoadResource, GetResourceSize, HLock semantics
|
||
(handle relocation via Memory Manager).
|
||
8. Add ResourceShutDown hook via `__cxa_atexit`.
|
||
9. Build `demos/rsrcProbe.c` with marker discipline (write $025000=0x99 +
|
||
while(1); runViaFinder LAUNCHES only, no keypress automation).
|
||
10. Add `--rsrc <applesingle>` mode to `runViaFinder.sh`.
|
||
11. Update `demos/build.sh` to call `rsrcBundle` as post-step when `.rsrc/`
|
||
dir present.
|
||
12. WriteResource + UpdateResourceFile DEFERRED to a separate item (persistent
|
||
write needs disk-extract-and-diff verification).
|
||
|
||
**Effort:** 40-50h.
|
||
|
||
**Phase 3 total: ~120-180h.**
|
||
|
||
---
|
||
|
||
## Phase 4 - M3 IIgs application authoring kit (parallel with Phase 3)
|
||
|
||
### 4.1 `menubuilder`
|
||
|
||
**Steps:**
|
||
1. Pre-verify: does DrawMenuBar actually still hang post-InitCursor-landing?
|
||
Drop paintMenuBarTitles fallback if not. 30-min check.
|
||
2. Side-by-side dump struct offsets of NewWindowParm vs ORCA's window.h.
|
||
30-min ABI check.
|
||
3. Reconcile WmTaskRec (used in all 5 demos) with IigsEventT (used in
|
||
eventLoop.h). Either align field offsets or document why both exist.
|
||
4. Build menu mini-format assembler in `runtime/src/uiBuilder.c`:
|
||
- Handles `>>` (menu start), `>>@` (Apple menu), `\X` (icon), `\N###`
|
||
(numeric ID), `*Xx` (cmd-key), `--` (item prefix), `---` (divider), `D`
|
||
(disabled), `V` (visible/check), `*` (separator), `.\r` (terminator).
|
||
- Round-trip test against Menu Mgr's parser. 6h.
|
||
5. Window builder + control wrappers (cButton/cCheckBox/cEditLine/cScrollBar
|
||
using abstract 32-bit proc constants — NOT bank-E1 ROM addresses). 4h.
|
||
6. Add cmdId→itemID lookup table to IigsMenuT. Document dispatch contract.
|
||
7. Extend IigsEventCallbacksT with `onCmd` (menu-pick dispatcher).
|
||
8. Migrate ALL FIVE affected demos (frame.c, orcaFrame.c, minicad.c,
|
||
reversi.c, helloWindow.c). 6h.
|
||
9. Either include AlertTemplate/ItemTemplate wrapper (`uiBuilderAlert`) in
|
||
scope OR carve out a separate `alertbuilder` item. Recommend in-scope.
|
||
10. Smoke check: install menu with one item, simulate keystroke via scripted
|
||
MAME input, verify onCmd fires by setting $70=0x99. 4h.
|
||
11. Re-baseline OMF sizes; verify cRELOC budget headroom.
|
||
|
||
**Effort:** 25-30h.
|
||
|
||
### 4.2 `sprites` (320 mode, standalone per Phase 0.6)
|
||
|
||
**Steps:**
|
||
1. Standalone init: sprite.c does its own `$C029` NEWVIDEO bit 7 + SCB ($E1:9D00)
|
||
+ palette ($E1:9E00). 2h.
|
||
2. SHR-safe heap policy:
|
||
- Document `$C035` shadow register interaction.
|
||
- Sprite save buffers MUST live above $A000 OR in a bank != 0 (since
|
||
bank-0 $2000..$9FFF mirrors to $E1:2000..$E1:9FFF).
|
||
- Add `iigsSpriteAttachBuffer(void *buf, size_t size)` so caller controls
|
||
placement.
|
||
- Document this in `iigs/sprite.h` and `STATUS.md`.
|
||
3. Software sprite engine:
|
||
- 16×16 fixed sprite shape, 4bpp packed.
|
||
- Background save/restore.
|
||
- Transparent blit (mask).
|
||
- Sprite list (Begin/Add/RenderAll/EraseAll).
|
||
4. Integration with eventLoop's TaskMaster frame cadence.
|
||
5. Demo (`demos/spriteProbe.c`):
|
||
- Init SHR.
|
||
- Place 8 sprites.
|
||
- One frame of update.
|
||
- Verify via `runInMame.sh --check-u8` (from Phase 1.7.a) at known SHR
|
||
offsets.
|
||
6. Cycle benchmarks in `tests/sprites/`: "blit one 16×16 sprite in <2000 cyc",
|
||
"erase + redraw 8 sprites in <16000 cyc / 1 frame".
|
||
7. 640 mode DEFERRED to follow-up item (Phase 0.6 decision).
|
||
8. `pha;plb` DBR-to-$E1 optimization in inner loop: only if blit doesn't call
|
||
any libgcc helper while DBR is contaminated. Audit before enabling.
|
||
|
||
**Effort:** 22-28h.
|
||
|
||
**Phase 4 total: ~50-60h.**
|
||
|
||
---
|
||
|
||
## Phase 5 - M4 production-grade C++ toolchain
|
||
|
||
Per Phase 0.1/0.2, this is materially smaller than the original brief.
|
||
|
||
### 5.1 `unwinder` — `_Unwind_RaiseException`-over-SJLJ stub (Phase 0.1 option A)
|
||
|
||
Not a real DWARF unwinder. Provides the Itanium surface third-party C++
|
||
libraries expect.
|
||
|
||
- `runtime/src/libunwindStub.c`: `_Unwind_RaiseException`, `_Unwind_Resume`,
|
||
`_Unwind_GetIP`, `_Unwind_GetCFA` routed to existing SJLJ jmpbuf.
|
||
- Smoke: probe that throws + catches via the stub.
|
||
- Document: "third-party libcxx-using code links; throw across
|
||
non-instrumented frames terminates."
|
||
|
||
**Effort:** ~20h.
|
||
|
||
### 5.2 `lto` (ThinLTO per Phase 0.2)
|
||
|
||
Depends on Phase 1.4.c (TTI), Phase 1.11 (weak-extern survival),
|
||
Phase 1.12 (Layer 2 gate).
|
||
|
||
**Steps:**
|
||
1. Pre-spike (30 min): build llvm-link + llvm-dis, ThinLTO 3 small TUs
|
||
(extras.c + strtok.c + libcGno.c), `--mtriple=w65816 -inline-threshold=50`,
|
||
link with asm objects, run helloBeep. Validates the pipeline.
|
||
2. Add `llvm-link`, `llvm-as`, `llvm-dis` to `installLlvmMos.sh` ninja
|
||
targets. Extend existence-check at lines 75-78.
|
||
3. Build `scripts/ltoLink.sh` that:
|
||
- Reads bitcode + native asm objects
|
||
- Runs `llvm-link` on bitcode
|
||
- Runs `opt -O2 --mtriple=w65816 -inline-threshold=50` (explicitly set;
|
||
opt does NOT invoke TargetPassConfig so the TM-init hook for
|
||
inline-threshold doesn't fire).
|
||
- Runs `llc -filetype=obj`
|
||
- Hands resulting .o to link816.
|
||
4. Verify GlobalDCE doesn't strip `.init_array` boundary symbols. Mark with
|
||
`llvm.used` if needed.
|
||
5. Document: per-file `-mllvm -regalloc=basic` for Lua's lvm.c / ldebug.c /
|
||
ltablib.c is preserved by ThinLTO's per-TU codegen attachment.
|
||
6. CoreMark + Lua LTO smoke: success criterion "produces a working binary at
|
||
parity size or better."
|
||
7. Document LTO × Layer 2 hard-fail behavior (Phase 1.12).
|
||
|
||
**Effort:** 30-40h.
|
||
|
||
**Status (2026-06-02 PARTIAL - NoTTI-Lite mode):**
|
||
|
||
- `scripts/ltoLink.sh` LANDED. Driver: llvm-link merges bitcode, opt
|
||
-passes='w65816-layer2-gate' enforces Phase 1.12 (refuses on
|
||
mismatch), opt --mtriple=w65816 -passes='default<O2>'
|
||
-inline-threshold=50 runs IR-level optimization with the W65816-
|
||
appropriate inline threshold, llc -filetype=obj produces the final
|
||
native object. Flags: -o, --keep-temps, --layer2 (caller-asserts),
|
||
--inline-threshold N (override), --emit-ll (debug).
|
||
- `installLlvmMos.sh` now builds llvm-link / llvm-as / llvm-dis / opt
|
||
as part of the toolchain ninja targets and gates the existence check
|
||
on all four. Phase 5.2 step 2.
|
||
- W65816TTI (`W65816TargetTransformInfo.h` + override in
|
||
W65816TargetMachine) WIRED but `kMildCostModelEnabled = false`. The
|
||
Phase 1.4c bsearch hang (smoke #77) RE-SURFACED when qsort.c was
|
||
recompiled under TTI-active multipliers (2x i32, 5x float) — meeting
|
||
the "if bsearch smoke fails, ship NoTTI-Lite" criterion in the spec.
|
||
The TTI plumbing ships present-but-bypassed so flipping
|
||
`kMildCostModelEnabled` to true is the only change needed to enable
|
||
full Phase 5.2 cost-driven inlining once the underlying i32
|
||
termination-compare codegen bug is fixed.
|
||
- Layer 2 LTO hard-fail behavior (Phase 1.12) is documented in
|
||
W65816Layer2Gate.cpp header comment + ltoLink.sh step 2 comment.
|
||
The gate has been end-to-end-verified: mixed Layer 2 + non-Layer 2
|
||
bitcode IS rejected at LTO time with a deterministic
|
||
`LLVM ERROR: W65816 Layer 2 LTO gate: Layer 2 mode disagreement`.
|
||
- Per-TU codegen attachment (`-mllvm -regalloc=basic` for Lua's
|
||
lvm.c / ldebug.c / ltablib.c) is preserved by ThinLTO's per-function
|
||
attribute mechanism — those flags translate to function-level
|
||
attributes that survive bitcode merge. No code change needed.
|
||
- Size parity probe: `demos/ltoProbe.c` + `demos/ltoProbeHelper.c`
|
||
through ltoLink.sh produces 37781-byte GNO OMF vs 37785 bytes for
|
||
non-LTO (parity-or-better met). Runs cleanly under MAME + GNO with
|
||
the harness marker hit.
|
||
- All 162 smoke checks green after Phase 5.2 land + TTI bring-up.
|
||
|
||
**Deferred to a future phase:**
|
||
|
||
- Enabling the 2x i32 / 5x float TTI multipliers. Requires fixing the
|
||
i32 termination-compare codegen bug that the original Phase 1.4c
|
||
attempt surfaced (smoke #77 bsearch hang). Reproducer:
|
||
`kMildCostModelEnabled = true` + rebuild runtime + run smoke.
|
||
- CoreMark / Lua LTO smoke probes (the spec's step 6). CoreMark's
|
||
bank-budget pressure under aggressive inlining is exactly what TTI
|
||
was meant to address; without TTI active, ThinLTO of CoreMark is
|
||
expected to bloat past Layer 2's single-bank budget. Re-attempt
|
||
after the TTI re-enable lands.
|
||
|
||
|
||
### 5.3 `cxxchrono` (Phase 0.5 split — chrono only)
|
||
|
||
- Add `etl_get_steady_clock` + `etl_get_high_resolution_clock` +
|
||
`etl_get_system_clock` C-side hooks in `runtime/src/libc.c`.
|
||
- Verify ETL chrono milliseconds rep is i32 or i64 with `static_assert`. Set
|
||
`ETL_CHRONO_*_CLOCK_DURATION` in `etl_profile.h` to force i32 if i64.
|
||
- Add prototype to `runtime/include/time.h`.
|
||
- Smoke: chrono::steady_clock::now() returns monotonically increasing
|
||
millisecond values.
|
||
|
||
**Effort:** 3-4h.
|
||
|
||
### 5.4 `cxxstream+format+path` (Phase 0.5 split — the rest)
|
||
|
||
Depends on Phase 1.9 (`<cmath>` shim).
|
||
|
||
**Steps:**
|
||
1. Set `ETL_USING_FORMAT_FLOATING_POINT=0` default in `etl_profile.h`.
|
||
FP-format build is a separate `--layer2` target.
|
||
2. Define `runtime/include/c++/iigs/path.h` with ProDOS-aware path operations
|
||
(64-char component / 8-component / `:` separator limits validated).
|
||
3. `etl::string_stream` + `printf("%s", ss.str().c_str())` is the cout
|
||
replacement. Drop the `iigs/console.h` cout-shim idea — adds surface area
|
||
without value.
|
||
4. Add `runtime/include/c++/cstdlib`, `<cstddef>` shims.
|
||
5. 1-hour `etl::format` size spike before committing: measure `format_to(buf,
|
||
"{}", 42)` vs etlProbe size. If >10KB delta for one int format, document
|
||
and downgrade scope.
|
||
6. Smoke: cxxStdlibProbe demo through buildGno+MAME via Phase 1.7.c harness.
|
||
7. Document `std::iostream`, `std::regex`, `std::filesystem`, `std::format`
|
||
(the full versions, not ETL substitutes) as explicit out-of-scope with
|
||
reasons (size, locale dependencies, GS/OS fopen).
|
||
8. Set explicit per-component size budgets up front (regex link budget,
|
||
filesystem code budget). Skip with documentation if exceeded.
|
||
|
||
**Status (2026-06-02 LANDED):**
|
||
|
||
- `ETL_USING_FORMAT_FLOATING_POINT=0` default confirmed in
|
||
`runtime/include/c++/etl_profile.h` (via the `ETL_FORMAT_NO_FLOATING_POINT`
|
||
gate); FP-format is a `-UETL_FORMAT_NO_FLOATING_POINT` opt-in.
|
||
- `runtime/include/c++/iigs/path.h` provides `pathNormalize` / `pathJoin` /
|
||
`pathSplit` with 64-char component + 8-depth + `:`-or-`/` separator
|
||
validation. Header-only, no link footprint when unreferenced.
|
||
- `runtime/include/c++/sstream` aliases `etl::string_stream` as
|
||
`std::stringstream` so portable code that names `std::stringstream`
|
||
resolves to the ETL fixed-capacity surface. Cout-replacement idiom
|
||
documented in `iigs/path.h` header preamble and in the `<sstream>`
|
||
shim itself: `etl::string_stream ss(buf); ss << ...; printf("%s",
|
||
ss.str().c_str());`
|
||
- `<cstdlib>` / `<cstddef>` / `<cmath>` shims already exist (Phase 1.9).
|
||
- Chrono::milliseconds rep is i32 on the W65816 by way of the
|
||
`ETL_CHRONO_*_CLOCK_DURATION` overrides; `cxxStreamProbe` carries a
|
||
`static_assert(sizeof(etl::chrono::steady_clock::duration::rep) == 4)`
|
||
that fails compile if the override regresses.
|
||
- `etl::format` size spike (step 5): a 1-line `format_to(buf, "{}", 42)`
|
||
added **~82 KB** to the binary over the no-format flavor. Hard
|
||
downgrade per the step-5 rule (>10 KB threshold). `etl::format` is
|
||
the layer2-opt-in surface, NOT default; gated by
|
||
`-DCXX_STREAM_PROBE_WITH_FORMAT=1` in the demo.
|
||
- `demos/cxxStreamProbe.cpp` exercises stream<<int + path join/normalize/
|
||
split + chrono i32 contract + format sentinel. Bin 19199 bytes
|
||
(well under bank-0 budget). Smoke check 9/9 green under GS/OS 6.0.4 +
|
||
GNO in MAME.
|
||
- Smoke-check entry added to `scripts/smokeTest.sh` after the
|
||
`cxxChronoProbe` check (~6422 area).
|
||
|
||
**Explicit out-of-scope (step 7), documented here for future reviewers:**
|
||
|
||
- `std::iostream` (full): locale-aware num_put/num_get machinery, ctype
|
||
tables, and per-stream sentry construction cost ~15-25 KB even for a
|
||
single `cout << int`. Replacement: `etl::string_stream` +
|
||
`printf("%s", ss.str().c_str())`. Aliased as `std::stringstream` in
|
||
`<sstream>` for code-portability.
|
||
- `std::regex`: full NFA + DFA construction is a ~30-40 KB code budget
|
||
on the W65816 even with a single-character-class regex. No locale
|
||
surface available either. Replacement: caller-supplied scanner or
|
||
hand-rolled state machine. Documented out-of-scope.
|
||
- `std::filesystem`: directory-iterator + canonical-path resolution
|
||
+ permission-bit handling rely on POSIX surface the GS/OS FST does
|
||
not provide (no `lstat`, no `realpath`, no permission bits beyond
|
||
ProDOS access byte). Replacement: `iigs::path::*` + the existing
|
||
libc `opendir`/`readdir`/`stat` surface in `runtime/include/dirent.h`
|
||
and `runtime/include/sys/stat.h`. Documented out-of-scope.
|
||
- `std::format` (the C++20 surface): the ETL surrogate
|
||
(`etl::format_to`) measured at +82 KB for one int, the C++20 std::
|
||
surface would be larger again (full charconv float-to-text, locale
|
||
hooks). Documented out-of-scope; the layer2-opt-in `etl::format`
|
||
is the replacement.
|
||
|
||
**Effort:** 12-15h.
|
||
|
||
**Phase 5 total: ~65-90h (vs original brief's 120-220h — Phase 0 decisions
|
||
collapse the unwinder cost dramatically).**
|
||
|
||
---
|
||
|
||
## Phase 6 - M5 observability
|
||
|
||
### 6.1 `profiler` (function-attribution under MAME)
|
||
|
||
Depends on Phase 1.3 (DWARF reloc fix) + Phase 3.2 (`pc2line` DIE walker).
|
||
|
||
**Steps:**
|
||
1. Pre-spike (2-3h): minimum-viable PC sampler as one-off script. Validate
|
||
`emu.register_periodic` fires with usable density. Run against three
|
||
representative shapes: short hot bench (strLen), libcall-dominated bench
|
||
(popcount), multi-seg (Lua). If <30 samples or >50% misattribution, pivot
|
||
to `-debug` mode + `cpu.debug.bpset`-with-counter (additional 6h).
|
||
2. Switch attribution model to "sample count + hits-percent" (NOT emu.time()
|
||
weighting — sample sparsity makes cycle% dishonest).
|
||
3. Have link816 emit ALL local symbols (not just globalSyms) to a separate
|
||
map file, gated by `--map-locals`. Required for meaningful libgcc / libc
|
||
attribution. 1-2h link816 edit.
|
||
4. CLOCK_HZ as CLI arg (slow-mode default 1023000; `--fast-mode` for GS/OS
|
||
demos).
|
||
5. Add `--sample` mode to `runInMameCycles.sh` (and `runMultiSeg.sh`). Do NOT
|
||
fork into a separate `runInMameProfile.sh` — keep single-sourced.
|
||
6. Smoke: assert ≤10% samples in '?' (unattributed) + assert dominant bucket
|
||
matches expectation.
|
||
7. Defer `--line` mode to a follow-up.
|
||
|
||
**Effort:** 14-20h.
|
||
|
||
### 6.2 `sanitizers` (UBSan-minimal + coverage per Phase 0.3)
|
||
|
||
Depends on Phase 1.4.a (RETURNADDR i32) + Phase 1.4.b (TRAP→BRK).
|
||
|
||
**Steps:**
|
||
1. Document ASan as out-of-scope. STATUS.md + USAGE.md.
|
||
2. Driver toolchain decision: Option (a) skip driver-side changes; users pass
|
||
`-fsanitize=undefined -fsanitize-minimal-runtime` manually plus link
|
||
`runtime/ubsan.o`. RECOMMENDED — 10h effort. Option (b) is +6h.
|
||
3. Hand-roll `runtime/src/ubsan.c` based on `ubsan_minimal_handlers.cpp`:
|
||
- Macro-substitute `__builtin_return_address` (Phase 1.4.a makes it work
|
||
but at unknown cost; use Phase 1.4.b BRK trap PC for caller-pc dedup).
|
||
- `caller_pcs` dedup table OR stub it out.
|
||
- All 24 HANDLER pairs (recover + abort) + 2 RECOVER-only.
|
||
4. Route ubsan messages via `__putByteErr` (stderr, fd 3 in GNO).
|
||
5. Compile ubsan.c with `-fno-sanitize=undefined` (recursive ubsan footgun).
|
||
Update `runtime/build.sh`.
|
||
6. Add `tests/ubsan/` mirroring `tests/coremark/` pattern: build.sh,
|
||
ubsanProbe.c, manifest.
|
||
7. Probe scope: signed-overflow (add/sub/mul) + shift + divide. Three checks
|
||
verified via $025000 sentinels.
|
||
8. Document object-size cost honestly: empirically a 9-line indexed-read
|
||
function expands from 12 to 682 lines instrumented. 3 intentionally-
|
||
triggering ops may not fit single-bank.
|
||
9. Coverage: `-fprofile-instr-generate -fcoverage-mapping` smoke check that
|
||
verifies counters write to expected `.profraw` shape.
|
||
|
||
**Effort:** 22-28h.
|
||
|
||
**Phase 6 total: ~36-48h.**
|
||
|
||
---
|
||
|
||
## Critical-path summary
|
||
|
||
The dependency arrows that gate everything else:
|
||
|
||
```
|
||
Phase 1.3 (DWARF reloc fix)
|
||
├─→ 2.1 clangdwarffix completion
|
||
│ └─→ 3.1 debugger
|
||
│ └─→ 3.2 localvars (full -O0 + -O2/IMG slice per Phase 0.4)
|
||
│ └─→ 6.1 profiler
|
||
└─→ 1.5 DBG_VALUE audit (must land before 3.2)
|
||
|
||
Phase 1.1 (GS/OS fopen hang)
|
||
├─→ 3.3 posixfile real I/O
|
||
├─→ 3.4 resourcemgr (or defer to stub)
|
||
└─→ 5.4 cxxstream+format+path::filesystem (or document gap)
|
||
|
||
Phase 1.4 (backend prereqs)
|
||
├─→ 5.2 lto (1.4.c TTI)
|
||
└─→ 6.2 sanitizers (1.4.a RETURNADDR, 1.4.b TRAP→BRK)
|
||
|
||
Phase 1.6 (IigsSoundParmT fix)
|
||
└─→ 2.4 docram
|
||
|
||
Phase 1.11 + 1.12 (LTO weak-extern + Layer 2 gate)
|
||
└─→ 5.2 lto
|
||
```
|
||
|
||
---
|
||
|
||
## Recommended landing order (calendar weeks)
|
||
|
||
| Week | Phase | Items |
|
||
|------|-------|-------|
|
||
| 1 | Phase 0 (DONE) + 1.1 spike + 1.3.a spike | GS/OS fopen + MAI flag spikes |
|
||
| 2 | Phase 1.1-1.6 | Foundational prerequisites |
|
||
| 3 | Phase 1.7-1.13 | Build/harness + LTO gates |
|
||
| 4-5 | Phase 2 (parallel) | M1 quick wins: clangdwarffix, hexfloat, tmpfile (+copy/delete fallback), docram, cursor, buildsystem, cxxsmoke |
|
||
| 6-7 | Phase 3.1 + Phase 4 (parallel) | debugger; menubuilder + sprites |
|
||
| 8-9 | Phase 3.2 | localvars full slice (-O0 + -O2/IMG + loclists + inlined) |
|
||
| 10 | Phase 3.3-3.4 | posixfile; resourcemgr (or stub-only landing) |
|
||
| 11 | Phase 5 | unwinder-stub + ThinLTO + cxxchrono + cxxstream/format/path |
|
||
| 12 | Phase 6 | profiler + sanitizers (UBSan-min + coverage) |
|
||
|
||
**Total: 12 weeks of focused work for ~750-950h with Phase 0 decisions locked.**
|
||
|
||
Phase 0.4 override (full localvars in one shot) adds ~10-15h vs the split
|
||
approach; Phase 0.10 override (rename copy+delete) adds ~6h. Both are
|
||
absorbed in the per-phase budgets above.
|
||
|
||
---
|
||
|
||
## Risks I'm worried about (final list)
|
||
|
||
1. **`FK_Data_4` truncation discovery cascade.** The reviewer for `localvars`
|
||
found the IMM24 truncation bug while planning DWARF work. The bug is fixed
|
||
in Phase 1.3, but it's almost certainly the FIRST of several clang DWARF
|
||
bugs for this target. Budget contingency in Phase 3.2-3.3.
|
||
2. **`cxxsmoke` surfaces silent codegen regressions.** Every prior C++ probe
|
||
this project has run (cxxProbe, etlProbe) has surfaced at least one backend
|
||
bug. Phase 2.7 will likely do the same. Budget contingency.
|
||
3. **GS/OS fopen hang is unsolvable in budget.** If Phase 1.1 doesn't yield a
|
||
fix within 8-16h, multiple downstream items (`resourcemgr`,
|
||
`cxxstdlib::filesystem`, `tmpfile` real path, `posixfile` real I/O) ship
|
||
stub-only with documented limitations. This is acceptable but worth
|
||
confirming up front.
|
||
4. **Layer-2-aware LTO miscompile.** Phase 1.12 gate must be built FIRST. If
|
||
skipped, the resulting binaries are silently wrong in the most
|
||
performance-sensitive code path.
|
||
5. **`menubuilder` cRELOC budget pressure.** reversi.omf already at 40.5KB;
|
||
adding uiBuilder.c may push some demos past the cRELOC threshold. Re-
|
||
baseline post-migration.
|
||
6. **`unwinder` scope creep.** Phase 0.1 must be a hard decision. Going from
|
||
(A) stub to (B) real DWARF mid-work would derail the schedule.
|
||
7. **MEMORY.md truncation.** The index is already past the 200-line load
|
||
limit. Before starting any item, grep for
|
||
`feedback_*<item-substring>*.md` in the memory dir to surface anything the
|
||
loaded portion doesn't show.
|
||
8. **`sprites` SHR shadow scribble.** Phase 4.2.2 heap-vs-shadow policy is
|
||
load-bearing. Without explicit handling, sprite save buffers will land in
|
||
the visible display window and corrupt user pixels.
|
||
|
||
---
|
||
|
||
## How to use this document
|
||
|
||
- Start at Phase 0. Make each decision EXPLICITLY before any Phase 1 work.
|
||
- Phase 1 is FOUNDATIONAL. Skip nothing. Items in later phases will fail
|
||
silently if any Phase 1 prerequisite is missing.
|
||
- For any item touching DWARF: Phase 1.3 MUST be green first.
|
||
- For any item that does GS/OS file I/O: Phase 1.1 MUST be investigated.
|
||
- Reviewer-adjusted hours are working estimates; brief hours are systematically
|
||
low across the board.
|
||
- The `Critical-path summary` is the dependency graph — respect it.
|