Checkpoint

This commit is contained in:
Scott Duensing 2026-05-03 21:31:53 -05:00
parent d056cd026f
commit f542f4fa01
12 changed files with 2056 additions and 92 deletions

158
STATUS.md
View file

@ -124,15 +124,37 @@ which runs correctly under MAME (apple2gs).
IIgs IO window ($C000-$CFFF) if needed. `--gc-sections` IIgs IO window ($C000-$CFFF) if needed. `--gc-sections`
(default ON) drops unreachable functions: a minimal program (default ON) drops unreachable functions: a minimal program
with full runtime linked shrinks from ~43KB to ~1.5KB. with full runtime linked shrinks from ~43KB to ~1.5KB.
- `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's - `link816 --segment-cap N` packs `.text` greedily into multiple
native object format) for round-tripping with classic dev tools. bank-aligned segments, capped at N bytes per segment. Segment 1
stays at `--text-base` in bank 0 (alongside rodata + bss + init);
segments 2..M start at `--segment-bank-base` (default $040000)
in successive banks. `--manifest path.json` writes a JSON file
listing each segment's image, base, and entry offset.
Cross-bank `JSL` (IMM24 reloc) just works — patched at link
time with the full 24-bit address. Cross-bank IMM16 is
permitted (uses DBR for bank — caller pins DBR to data's bank);
cross-bank PCREL is rejected with a clear diagnostic.
`scripts/runMultiSeg.sh` is a mini in-Lua loader for MAME that
reads the manifest, places each segment's bytes, and runs from
segment 1's entry — used by smoke to verify cross-bank JSL
end-to-end (helper3 chain across 3 bank-aligned segments).
- `tools/omfEmit` produces OMF v2.1 files in two modes:
(a) single-segment — `--input flat.bin --map flat.map --base
ADDR --entry SYM`, KIND=0x0000 (CODE, dynamic), ORG=0 (loader
picks bank); (b) multi-segment — `--manifest path.json` reads
link816's manifest and emits one OMF segment per entry with
KIND=0x8800 (STATIC|ABSBANK|CODE) + ORG=segment-base, asking
the GS/OS Loader to place each at its declared bank-aligned
address. All intra-segment relocations were already patched by
the linker, so no INTERSEG/RELOC opcodes are needed for v1
static placement.
- `link816 --debug-out FILE` writes a DWARF sidecar with text/ - `link816 --debug-out FILE` writes a DWARF sidecar with text/
rodata/bss/init_array relocations applied to every `.debug_*` rodata/bss/init_array relocations applied to every `.debug_*`
section, so `.debug_addr` / `.debug_line` PC values are final- section, so `.debug_addr` / `.debug_line` PC values are final-
image addresses. image addresses.
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double, - `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
libgcc into linkable objects. libgcc into linkable objects.
- `scripts/smokeTest.sh` runs 124 end-to-end checks at -O2: - `scripts/smokeTest.sh` runs 126 end-to-end checks at -O2:
scalar ops, control flow, calling conventions, MAME execution scalar ops, control flow, calling conventions, MAME execution
regressions, link816 bss-base safety + weak-symbol resolution + regressions, link816 bss-base safety + weak-symbol resolution +
heap_end-vs-heap_start sanity, iigs/toolbox.h compile + link, heap_end-vs-heap_start sanity, iigs/toolbox.h compile + link,
@ -242,17 +264,125 @@ RAM through $FFFF, gaining 8KB of bank-0 space.)
## Yet to come ## Yet to come
(Empty — no known blocking gaps. C++ exceptions through clang - **Multi-bank BSS / init_array** — multi-segment splits text
`-fsjlj-exceptions` now compile, link, and execute. The smoke across banks but BSS + init_array still live in segment 1's bank
harness can't reliably DRIVE the C++ exception path through MAME (bank 0). Programs whose zero-init data exceeds the ~60KB bank-0
because of an unrelated MAME-side flakiness — its apple2gs CPU budget would need crt0 to walk a per-segment table of `(start,
emulation crashes intermittently when the test program exercises end)` pairs. Not blocking >64KB *code* programs; only matters
the full SJLJ flow with smoke's I/O environment, even though the for programs with very large global arrays.
same binary executes correctly when invoked interactively. The
pure-C SJLJ runtime smoke test exercises every runtime function - **GS/OS Loader OMF format compatibility** — the OMF format we
end-to-end, and the C++ frontend → backend path is verified at emit is now byte-equivalent to real Apple S16 segments at the
compile/link time only. This is a workaround, not a defect in header level. Verified by extracting the ABOUT segment from
our code: same binary runs fine outside the harness.) real `/SYSTEM/START` (FINDER) via Cadius (`/tmp/cadius/cadius`,
not AppleCommander which can't extract forks) and comparing
field-by-field against ours. Five fixes landed in
`src/link816/omfEmit.cpp` along the way:
(1) VERSION byte 0x21 → 0x02 (was BCD-style "2.1"; real format
is enum where 0x02 = v2.1). Cleared error $1102.
(2) Body opcode 0xF1 (DS = N zeros) → 0xF2 (compact LCONST,
2-byte length + N data bytes). Long-form 0xF5 LCONST is in
the spec but real Loader appears to mis-parse it (3 stale
copies of the segment ended up scattered in RAM). Every real
segment we decoded uses 0xF2.
(3) KIND 0x0000 (CODE) → 0x8000 (CODE|STATIC) for legacy
single-segment mode. Real ABOUT segment uses 0x8000; with
0x0000 the Loader returns $110A loadSegFailErr. Multi-segment
mode keeps 0x8800 (CODE|STATIC|ABSBANK) since each seg has a
fixed ORG.
(4) BANKSIZE 0 → 0x10000 (matches real code segments).
(5) LOAD_NAME emitted as 10 bytes of zeros immediately after
the 44-byte header (some sources omit it, real OMFs include it).
GS/OS 6.0.2 is installed under `tools/gsos/` and boots cleanly
to Finder in MAME. Replacing `/SYSTEM/START` with a known-good
OMF (the extracted ABOUT segment) gives error `$005C`
identical to what we get with our test program — meaning our
OMF is indistinguishable from real Apple S16 as far as the
Loader is concerned. The $005C is *not* OMF rejection; it is
the boot-launcher path failing because a minimal `/SYSTEM/START`
doesn't chain to a real Finder via QUIT-with-pathname.
`runtime/src/crt0Gsos.s` is committed: skips SEI/LC-reconfig
(GS/OS owns CPU state), zeros BSS, runs init_array, calls
main, then QUIT(pcount=2) chained to `gChainPath` (default
`/SYSTEM/START.ORIG`). Linkage works.
Tested with a marker write as the very first instruction of
crt0Gsos, replacing `/SYSTEM/START` with our OMF and saving
the original as `/SYSTEM/START.ORIG` for chain-back. After
110-second boot: marker `$00/0078` is still 0 — the Loader
places our segment in RAM (entry signature found in 3 banks
via memory search) but **never JSLs entry**. Tested ENTRY=0,
ENTRY=1 (with NOP pad), auxtype=0 and =DB03; all give the
same $005C without ever calling our code. Conclusion: the
boot-launcher path requires the `~ExpressLoad` segment that
every real `/SYSTEM/START` carries. Without ExpressLoad,
the bootstrap takes a code path that loads our segment but
never auto-calls it.
**OMF format → fully Loader-compatible** after reading
Merlin32 source. Final canonical fields (single-segment
Finder-launchable app):
- KIND=0x1000 (CODE|PRIV) — was 0x8000 (CODE|STATIC) which
came from extracting ABOUT from real FINDER, but ABOUT is a
sub-segment called as a subroutine, not a launchable app
- LABLEN=10 (fixed-width 10-byte LOAD_NAME and SEG_NAME,
space-padded) — was 0 (length-prefixed) which is what
/SYSTEM/START FINDER uses but the Loader will only LOAD,
not JSL-into, that format
- VERSION=0x02 (OMF v2.1)
- BANKSIZE=0x10000 for code segs
- Body opcode 0xF2 LCONST with NUMLEN-byte (=4) count
ExpressLoad emission also landed (`omfEmit --expressload`):
6-byte header + segment list + remap list + header info,
byte-equivalent to Merlin32's `BuildExpressLoadSegment`.
End-to-end runtime verification: new `scripts/runViaFinder.sh`
injects an OMF as `/SYSTEM.DISK/HELLO`, boots GS/OS in MAME,
drives Finder via Lua keyboard automation (S+Cmd-O to open
System.Disk, H+Cmd-O to launch HELLO), samples specified
memory addresses to verify execution. Pattern adapted from
`joeylib/scripts/run-iigs-mame.sh` from a sibling project.
Pure-asm marker tests (`sta $000078 long, value=$42`) are
confirmed running under real GS/OS Loader with
`runViaFinder.sh hello.omf --check 0x000078=0x42` returning
exit 0.
**Compiled C now runs under real GS/OS Loader.** Implemented
option (a) from the analysis: OMF cRELOC opcode emission.
- `link816 --reloc-out FILE` records every R_W65816_IMM24
relocation site (intra-segment 24-bit refs only — GS/OS
dispatcher calls and other cross-bank refs are filtered out)
as a binary sidecar of (patchOff, offsetRef) pairs.
- `omfEmit --relocs FILE` reads the sidecar and emits a
cRELOC opcode (0xF5) per site between the LCONST data and the
END opcode. Format per Merlin32: `0xF5 ByteCnt(=3) Shift(=0)
OffsetPatch(2) OffsetReference(2)` = 7 bytes.
- The Loader rewrites segment[OffsetPatch..OffsetPatch+2] to
`(segPlacedBase + OffsetReference)` at load time, fixing
every `jsl`/`jml`/`sta long`/`lda long` operand that targets
an in-segment symbol.
- End-to-end verified: a real C function call + for loop
(`sumTo(10)` → 55, `sumTo(100)` → 5050) compiled with clang
-O2, linked, OMF-emitted with cRELOC, injected as
`/SYSTEM.DISK/HELLO`, launched from Finder via MAME-Lua
keyboard automation, marker bytes verified at the expected
values. Smoke check #62 verifies cRELOC opcode count
matches the link816 sidecar count.
Smoke tests #59-#60 (omfEmit single + multi-segment) verify
the structural format invariants (VERSION=0x02, KIND=0x8000
or 0x8800, body opcode 0xF2 LCONST) so regressions are
caught. `scripts/runMultiSeg.sh` mini-loader continues to
cover the >64KB use case end-to-end.
- **C++ exceptions in CI smoke** — runs reliably outside smoke;
see context below. The SJLJ runtime end-to-end test passes;
the C++ frontend→backend path is compile/link verified in
smoke; full execution path is left out due to a MAME-side I/O
flakiness (same binary runs fine interactively).
- **GS/OS validated against a real ProDOS volume** — the wrapper - **GS/OS validated against a real ProDOS volume** — the wrapper
contract (PHA + PEA 0 + LDX + JSL $E100A8 + post-call SP fixup) contract (PHA + PEA 0 + LDX + JSL $E100A8 + post-call SP fixup)

232
docs/multiSegmentPlan.md Normal file
View file

@ -0,0 +1,232 @@
# Multi-segment OMF support — plan
## Why
Single-segment cap: ~60KB usable in bank 0 after the IO window ($C000-
$CFFF), the stack at $0FFF, and crt0 / runtime overhead. Real IIgs
applications need 100s of KB across multiple banks. GS/OS Loader is
designed for this — load each segment into its chosen bank, fix up
inter-segment references at load time, jump to the entry segment.
## Today
- `link816` produces a flat binary covering `[--text-base, ...]` in a
single bank-0 image. All sections are concatenated into one address
space. Inter-section relocations are resolved at link time.
- `omfEmit` wraps that flat binary in a single OMF segment (KIND=CODE,
ORG=0, SEGNUM=1, body = one DS opcode + END). No relocation records
emitted (image is already absolute).
- `crt0` enables LC RAM, zeroes BSS, runs `.init_array`, calls `main`.
- All cross-function calls already use JSL (3-byte long) — we never
emit JSR. That's accidentally helpful for multi-segment.
## Target
A program that builds 4 segments — say:
- Segment 1 ("MAIN"): crt0 + main + a few hot routines, in bank 1
- Segment 2 ("CODE"): bulk of code, in bank 2
- Segment 3 ("DATA"): rodata, in bank 3
- Segment 4 ("BSS"): uninitialized data + heap, in bank 4
GS/OS Loader places each segment, applies inter-segment relocations
(every `JSL foo` where `foo` lives in a different segment becomes a
`JSL <segment-relative-addr>` patched at load time with the absolute
address), and jumps to the entry.
## The four hard problems
### 1. Section → segment assignment policy
We need a deterministic rule that maps every input object's `.text` /
`.rodata` / `.bss` / `.init_array` section into a specific segment.
Three options:
**A. Per-object → one segment.** Each `.o` becomes one segment. Simple
mental model; bad locality (many tiny segments, lots of inter-segment
JSLs); GS/OS Loader has 8KB+ minimum overhead per segment.
**B. Greedy bin-packing.** Compute total code size; cap each segment at
N bytes (e.g. 32KB to leave headroom); pack `.text` sections into
segments greedily in input order. Same for `.rodata` / `.bss`.
Predictable, but a function near the end of segment N might want to
JSL a function at the start of segment N+1 — common pattern, every
call becomes inter-segment.
**C. Static call graph + clustering.** Compute call graph from the
relocations, cluster co-calling functions together, pack clusters into
segments to minimize inter-segment edges. Best locality, real linker
work.
**Recommendation: B for v1.** Add a `--segment-cap` option (default
32768). Real applications will want C eventually, but B unblocks
"my program is bigger than 64KB" today.
### 2. Inter-segment relocation tracking
When a `JSL foo` reloc resolves to a function in a different segment,
we MUST emit an OMF relocation record instead of patching the bytes
in-place. Currently `link816` patches everything at link time and emits
zero reloc records.
The reloc model becomes per-segment:
- Intra-segment IMM16 / PCREL: patch at link time, no OMF record.
- Intra-segment IMM24 (JSL): patch at link time (low 24 bits = segment-
relative offset for now; loader adjusts at load time when segment is
placed). Actually need OMF reloc here too because we don't know the
load bank.
- Inter-segment IMM24 (cross-bank JSL): emit `INTERSEG` opcode (`E2`)
pointing at `(target_segment_num, offset_within_segment)`.
- Inter-segment IMM16 data ref: requires the data segment to land in
the same bank as the referencing code OR we need the loader to fail
(16-bit absolute can't cross banks). In v1, force all data refs to be
to a "data segment" that's in a fixed bank, OR rewrite to long
addressing.
The IMM16 cross-segment problem is the killer. Three responses:
i. **Punt:** Disallow it. All `.rodata` references must be in the
same segment as the code, OR refs to global data must use long
addressing (rewrite at compile time via `__attribute__((far))`).
ii. **Promote to long at link time:** Detect IMM16 cross-segment
refs, rewrite the instruction's encoding from absolute (3-byte)
to absolute-long (4-byte). Changes code size, shifts everything
after the patched site — invasive.
iii. **Same-bank constraint:** Ensure the data segment's bank ==
the code's DBR. Means all code segments share one DBR, all data
lives in one segment in that DBR's bank.
**Recommendation: iii for v1.** All `.rodata` lives in one segment
in the bank our code uses for DBR. We already pin DBR to bank 0 in
crt0 (well, code does `pha;plb` for bank 2 sometimes for tests, but
not in general). For v1, all `.rodata` goes in bank 0 alongside the
first text segment, and code segments in higher banks reference data
via long absolute addressing. Need to confirm what addressing modes
our backend actually emits for global access.
### 3. crt0 / loader contract
Current crt0 assumes flat layout:
```
__start:
setup CPU mode, stack
enable LC RAM
zero BSS [__bss_start..__bss_end]
run .init_array
jsl main
spin
```
Multi-segment changes:
- BSS may span multiple segments (bank 0 LC + bank N segment). The
`__bss_start` / `__bss_end` symbols need to be per-segment, OR a
loop over a list of `(start, end)` pairs the linker emits.
- `.init_array` ditto.
- LC RAM enable only applies to bank 0 — fine.
- The OMF Loader will handle the actual memory placement; crt0 just
runs after Loader is done.
- The Loader's entry call lands at the segment marked with the entry
field. By convention that's segment 1.
**Decision:** Designate segment 1 as "init segment" containing crt0 +
its required symbols (`__bss_start_seg1`, `__init_array_start_seg1`,
etc.) and the linker emits a `__bss_table` and `__init_array_table`
arrays of `(start, end)` pointers walked by crt0. Same idea Mac OS X's
loader uses for multi-segment programs.
### 4. Build pipeline + tests
- `link816 --segment-cap N` emits multiple `(image, base, syms)`
triples plus inter-segment reloc records.
- New intermediate format between linker and `omfEmit`: a small
manifest file listing each segment's body, base, name, and reloc
records. Easier than passing all that on the CLI.
- `omfEmit` reads the manifest and emits a single multi-segment OMF
file with proper INTERSEG opcodes.
- Smoke needs new test: build a program with `--segment-cap 8192` so it
forces ≥2 segments even for our small benches; verify under MAME via
a GS/OS-loader-aware test path. (We don't have GS/OS-loaded tests
today — see "Risks" below.)
## Phased implementation
### Phase 1: linker emits per-segment images + manifest
- `link816 --segment-cap N --manifest manifest.json -o out`
- Pack `.text` greedy into segments 1..K capped at N bytes each.
- All `.rodata` into segment K+1 (the "data segment").
- All `.bss` into segment K+2.
- Resolve intra-segment relocations.
- Write inter-segment relocations into the manifest.
- Emit one flat binary per segment; manifest references them by path.
### Phase 2: omfEmit consumes manifest, emits multi-segment OMF
- One OMF segment header per manifest entry.
- DS opcodes for body bytes.
- INTERSEG (`E2`) opcodes for inter-segment reloc patch sites.
- RELOC (`E0`) opcodes for intra-segment relocations that need
load-time fixup (JSL targets within same segment but different bank
than expected).
- END opcode terminator per segment.
### Phase 3: runtime updates
- Linker emits `__bss_table[]` and `__init_array_table[]` instead of
single `__bss_start`/`__bss_end` symbols.
- crt0 walks those tables.
- `crt0.s` removes the LC-enable hardcoding from segment 1 if segment
1 isn't bank 0 (configurable).
### Phase 4: tests + smoke
- Bench harness builds with `--segment-cap 8192` to force multi-segment
even for small programs; verify output size growth (should be small —
just OMF headers + reloc records overhead).
- Need a GS/OS-aware MAME test path (boot a ProDOS volume with our OMF
binary, let GS/OS Loader load it, check markers in bank 2). This is
the test we deferred earlier in the GS/OS smoke task. **Phase 4
reopens the GS/OS-volume smoke decision** — multi-segment is the
main reason to even care about that.
## Scope estimate
- Phase 1: 2-3 sessions (linker rework, careful with reloc accounting)
- Phase 2: 1 session (mostly OMF format work, well-specified)
- Phase 3: 1 session (crt0 + linker symbol table changes)
- Phase 4: 2-3 sessions (GS/OS-loaded test infra is the slog, not the
multi-segment logic itself)
Total: ~6-8 focused sessions. Phases 1-3 deliver a working multi-
segment binary; phase 4 makes it testable in CI.
## Risks
- **DBR management is genuinely tricky.** Code in segment 2 (bank 2)
doing `lda foo` where foo is in segment K+1 (bank 0): the absolute
fetch uses DBR. If DBR != bank-of-foo, we read garbage. The cleanest
rule (DBR=0 always; data refs use long via `__attribute__((far))` or
a backend pass that promotes them) requires backend cooperation
we don't have. v1's "all data in one segment in DBR's bank" works
but constrains data size to ~60KB.
- **The Loader's behaviour around segment placement is poorly
documented.** Apple's GS/OS Loader picks banks dynamically; we may
end up with code segments in banks the loader chose, with relocations
that work, but layouts that surprise us. Mitigation: use STATIC
segments (KIND bit) initially so the loader can't move them.
- **Smoke needs a real GS/OS volume image.** This is the same blocker
as the deferred GS/OS file I/O smoke — needs a 2img/po image with
ProDOS volume + a way to run our OMF through the actual loader.
Without that, multi-segment logic is testable only by inspection of
the OMF bytes and a hand-rolled mini-loader (which we'd have to
write).
## Recommendation
Start Phase 1. The linker work is contained, mostly mechanical, and
the manifest format gives us a clean handoff to `omfEmit` work in
Phase 2. We can validate Phase 1 by inspecting the per-segment images
+ manifest before any OMF / loader work.
Phase 4's GS/OS-volume test path is the biggest unknown. Reasonable to
defer that decision until Phases 1-3 are working — at that point we
can decide whether to invest in proper GS/OS-loaded smoke or accept
"multi-segment OMF emits valid bytes per the spec" as the test bar.

View file

@ -32,6 +32,7 @@ cc() {
} }
asm "$SRC/crt0.s" asm "$SRC/crt0.s"
asm "$SRC/crt0Gsos.s"
asm "$SRC/libgcc.s" asm "$SRC/libgcc.s"
cc "$SRC/libc.c" cc "$SRC/libc.c"
cc "$SRC/strtol.c" cc "$SRC/strtol.c"

155
runtime/src/crt0Gsos.s Normal file
View file

@ -0,0 +1,155 @@
; crt0Gsos.s — GS/OS S16 application crt0.
;
; Use this INSTEAD OF crt0.s when building an OMF that the real
; GS/OS Loader will launch. Differences from crt0.s:
; - No SEI / interrupt-source clearing (GS/OS owns the IRQ chain).
; - No language-card RAM enable (GS/OS configures memory).
; - No stack base reset (GS/OS allocated and set our SP).
; - Honors GS/OS's DBR=our-bank, DP=allocated-page setup.
; - On main() return, calls GS/OS QUIT(pcount=2) to chain to a
; known next application (default: /SYSTEM/START.ORIG which
; test setups must save off the original boot launcher to).
;
; Entry from the System Loader (per Apple IIgs Toolbox Reference):
; E=0 (native), M=0 (16-bit accumulator), X=0 (16-bit index)
; DBR = the bank into which our entry segment was placed
; DP = pointer to a Memory-Manager-allocated DP page
; Stack at entry, top-down:
; PCL PCH PBR (3 bytes — JSL return addr to launcher)
; flags-lo flags-hi (2 bytes — launch flags)
; path-low path-mid path-bank pad (4 bytes — pathname long ptr)
;
; QUIT discards the entire stack so we never need to pop the launch
; frame ourselves.
.text
.globl __start
__start:
; Set DP=0. The C compiler assumes DP=0 for all `sta dp` and
; `[dp],y`-style accesses; GS/OS hands us a Memory-Manager-
; allocated DP page that we discard.
rep #0x30
lda #0
tcd
; BSS zero-init. With DBR=our bank, `stz abs,X` writes to
; ourBank:X — correct as long as __bss_start/__bss_end fit in
; the segment's bank.
rep #0x30
ldx #__bss_start
.Lbss_loop:
cpx #__bss_end
bcs .Lbss_done
sep #0x20
stz 0x0000, x
rep #0x20
inx
bra .Lbss_loop
.Lbss_done:
; Walk .init_array (C++ ctors).
;
; ⚠ KNOWN BROKEN under real GS/OS Loader for non-zero-bank
; placement: `jsl __jsl_indir` bakes a bank-0 operand at link
; time. When the Loader places us at bank $1f or similar, the
; JSL targets bank 0 (= GS/OS code) instead of our actual bank
; — so this loop crashes if init_array has any entries. Same
; applies to `jsl main` below. Closing the gap requires either
; RELOC opcode emission in omfEmit (so the Loader patches the
; JSL bank bytes at load time) or runtime self-patching of JSL
; opcodes in crt0. Tracked separately.
rep #0x30
ldx #__init_array_start
.Linit_loop:
cpx #__init_array_end
bcs .Linit_done
stx 0xe0
ldy #0
lda (0xe0), y
sta __indirTarget
phx
jsl __jsl_indir
plx
inx
inx
bra .Linit_loop
.Linit_done:
; Call main. Standard W65816 C ABI: arg0 in A; we pass none.
rep #0x30
jsl main
; ---- QUIT (pcount=2) chain to gChainPath ---------------------
; Parm block layout in DP $80..$87:
; $80,$81 pcount = 2
; $82..$85 pathname long ptr (lo, mid, bank, pad)
; $86,$87 flags = 0
;
; The path is a GSString (2-byte length + chars). It must live
; in bank-0 memory (GS/OS reads parm fields as bank-0). DP is in
; bank 0, so we copy the GSString from our segment into DP $A0.
rep #0x30
; Copy length byte first to compute total bytes to copy.
sep #0x20
lda gChainPath ; low byte of GSString length
clc
adc #2 ; +2 for the length word itself
tay ; Y = bytes to copy (paths < 256 chars)
rep #0x20
ldx #0
.LcopyPath:
sep #0x20
lda gChainPath, x ; DBR-relative read (DBR = our bank)
sta 0xa0, x ; DP write (in bank 0)
rep #0x20
inx
dey
bne .LcopyPath
; Build parm block at DP $80.
rep #0x30
lda #2
sta 0x80 ; pcount
tdc
clc
adc #0xa0
sta 0x82 ; pathname long-ptr low+mid 16
lda #0
sta 0x84 ; bank byte (0) + pad byte (0)
sta 0x86 ; flags = 0
; Push 32-bit parm-block pointer (low half + bank-0).
tdc
clc
adc #0x80
pha
pea 0
ldx #0x2029 ; QUIT class-1 call number
jsl 0xe100a8 ; GS/OS dispatcher
; QUIT only returns on failure. Clean up + BRK.
pla
pla
.byte 0x00, 0x00
.size __start, . - __start
; gChainPath — GSString chain target for QUIT after main(). Default
; is "/SYSTEM/START.ORIG" (saved-original boot launcher). Programs
; that need a different target must rename this symbol; the linker
; resolves whichever def is present.
;
; GSString: 2-byte length word + N chars. Length here = 18
; ("/SYSTEM/START.ORIG").
.section .rodata,"a"
.globl gChainPath
gChainPath:
.byte 18, 0
.ascii "/SYSTEM/START.ORIG"

View file

@ -322,9 +322,17 @@ __mulsi3:
; Clear running product at $e8/$ea. ; Clear running product at $e8/$ea.
stz 0xe8 stz 0xe8
stz 0xea stz 0xea
; Loop 32 times: examine LSB of multiplier, conditionally add ; Fast path: if multiplier's high half ($e2) is 0, we only
; multiplicand to product, then shift multiplier right and ; need 16 loop iterations (the full 32-iter shift-out would
; multiplicand left. Use Y as a 16-bit counter (X mode = 16). ; just shift in zeros after iter 16). Common in C code where
; both source operands are zext'd from i16 — e.g. `i*i` with
; i a `unsigned short`. Saves ~half the multiply cycles in
; that case (sumOfSquares: 80000 → ~40000 cyc/call).
lda 0xe2
bne .Lmulsi_full
ldy #0x10
bra .Lmulsi_loop
.Lmulsi_full:
ldy #0x20 ldy #0x20
.Lmulsi_loop: .Lmulsi_loop:
; Test bit 0 of multiplier (lo word). ; Test bit 0 of multiplier (lo word).

136
scripts/benchCyclesPrecise.sh Executable file
View file

@ -0,0 +1,136 @@
#!/usr/bin/env bash
# benchCyclesPrecise.sh — measure per-call cycle counts via the
# emu.time()-based runner (scripts/runInMameCycles.sh).
#
# For each benchmark in benchmarks/, build a wrapper that calls the
# bench function ITERS times between START / DONE markers; the runner
# captures emulated time and converts to cycles assuming the IIgs
# slow-mode clock (1023000 Hz — IIe-compatible default; our binary
# doesn't enable fast mode unless its wrapper does).
#
# Output: markdown table with cycles-per-call. Both clang and the
# Calypsi numbers (from `tools/calypsi/cc65816`) are reported when
# Calypsi is installed.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
BENCH_DIR="$PROJECT_ROOT/benchmarks"
CLANG="$PROJECT_ROOT/tools/llvm-mos-build/bin/clang"
LLVM_MC="$PROJECT_ROOT/tools/llvm-mos-build/bin/llvm-mc"
LINK="$PROJECT_ROOT/tools/link816"
RUNNER="$PROJECT_ROOT/scripts/runInMameCycles.sh"
oCrt0=$(mktemp --suffix=.o)
oLibgcc=$(mktemp --suffix=.o)
"$LLVM_MC" -arch=w65816 -filetype=obj "$PROJECT_ROOT/runtime/src/crt0.s" -o "$oCrt0"
"$LLVM_MC" -arch=w65816 -filetype=obj "$PROJECT_ROOT/runtime/src/libgcc.s" -o "$oLibgcc"
# Per-benchmark inputs / extern decls (mirrors benchCycles.sh).
benchInputs() {
case "$1" in
sumOfSquares) echo 'sumOfSquares(50)';;
fib) echo 'fib(10)';;
strcpy) echo 'mystrcpy(dst, "hello world!")';;
memcmp) echo 'mymemcmp("hello", "hello", 5)';;
bsearch) echo 'bsearch(arr, 8, 5)';;
dotProduct) echo 'dotProduct(va, vb, 4)';;
popcount) echo 'popcount(0x12345678UL)';;
crc32) echo 'crc32((const unsigned char *)"hello", 5)';;
*) echo "/* unknown */";;
esac
}
benchExtern() {
case "$1" in
sumOfSquares) echo 'extern unsigned long sumOfSquares(unsigned short n);';;
fib) echo 'extern unsigned short fib(unsigned short n);';;
strcpy) echo 'extern char *mystrcpy(char *d, const char *s); static char dst[16];';;
memcmp) echo 'extern int mymemcmp(const void *a, const void *b, unsigned int n);';;
bsearch) echo 'extern int bsearch(const int *arr, int n, int key); static const int arr[] = {1,2,3,4,5,6,7,8};';;
dotProduct) echo 'extern long dotProduct(const short *a, const short *b, unsigned int n); static const short va[] = {1,2,3,4}; static const short vb[] = {5,6,7,8};';;
popcount) echo 'extern int popcount(unsigned long x);';;
crc32) echo 'extern unsigned long crc32(const unsigned char *p, unsigned int n);';;
*) echo '';;
esac
}
# How many iterations to run each bench for. Bigger = more
# precise (smaller relative measurement noise) but longer runtime.
# Heavy benches get fewer iters; cheap benches get more.
benchIters() {
case "$1" in
sumOfSquares) echo 50;; # ~1600 cyc/call → ~80k cyc total
fib) echo 100;;
strcpy) echo 200;;
memcmp) echo 500;;
bsearch) echo 200;;
dotProduct) echo 200;;
popcount) echo 500;;
crc32) echo 200;;
*) echo 100;;
esac
}
runOneBench() {
local name="$1"
local extern_decl call_expr iters
extern_decl=$(benchExtern "$name")
call_expr=$(benchInputs "$name")
iters=$(benchIters "$name")
if [ -z "$extern_decl" ] || [ "$call_expr" = "/* unknown */" ]; then
echo "(no input config)"; return
fi
local cwrap obench owrap bin
cwrap=$(mktemp --suffix=.c)
owrap=$(mktemp --suffix=.o)
obench=$(mktemp --suffix=.o)
bin=$(mktemp --suffix=.bin)
cat > "$cwrap" <<EOF
$extern_decl
__attribute__((noinline)) static void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
volatile unsigned long sink;
#define ITERS $iters
int main(void) {
switchToBank2();
/* warm-up */
for (int w = 0; w < 5; w++) sink = (unsigned long)($call_expr);
*(volatile unsigned short *)0x5000 = 0xa1a1; /* START */
for (int i = 0; i < ITERS; i++) sink = (unsigned long)($call_expr);
*(volatile unsigned short *)0x5002 = 0xa2a2; /* DONE */
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cwrap" -o "$owrap" 2>/dev/null \
|| { echo "compile-fail"; rm -f "$cwrap" "$owrap"; return; }
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$BENCH_DIR/$name.c" -o "$obench" 2>/dev/null \
|| { echo "compile-fail"; rm -f "$cwrap" "$owrap" "$obench"; return; }
"$LINK" -o "$bin" --text-base 0x1000 "$oCrt0" "$oLibgcc" "$owrap" "$obench" 2>/dev/null \
|| { echo "link-fail"; rm -f "$cwrap" "$owrap" "$obench" "$bin"; return; }
local val
val=$(bash "$RUNNER" "$bin" "$iters" 2>&1 | grep -oE 'cyc_per_call=[0-9.]+' | head -1 | sed 's/cyc_per_call=//')
rm -f "$cwrap" "$owrap" "$obench" "$bin"
if [ -z "$val" ]; then
echo "(no read)"
else
printf '%.0f cyc/call' "$val"
fi
}
printf '| Benchmark | Per-call cycles (clang) |\n'
printf '|-----------|------------------------:|\n'
for src in "$BENCH_DIR"/*.c; do
name=$(basename "$src" .c)
result=$(runOneBench "$name")
printf '| %s | %s |\n' "$name" "$result"
done
rm -f "$oCrt0" "$oLibgcc"

109
scripts/runInMameCycles.sh Executable file
View file

@ -0,0 +1,109 @@
#!/usr/bin/env bash
# runInMameCycles.sh — measure emulated CPU time between START / DONE
# markers via MAME's emu.time().
#
# Usage: runInMameCycles.sh <binary> <iters>
# binary: 65816 image to load at $00:1000
# iters: number of bench iterations the binary ran (used to
# normalize delta to per-iteration cycles)
#
# The binary MUST:
# 1. Switch DBR to bank 2 (so the marker writes are observable
# at $025000 / $025002 — bank 0 there is also fine but harder
# to find atomically).
# 2. Write 0xA1A1 to $025000 *immediately before* the bench loop.
# 3. Write 0xA2A2 to $025002 *immediately after* the bench loop.
# 4. while(1){} after the DONE marker.
#
# Output (stdout):
# MAME-CYCLES iters=N delta_us=... cyc_per_call=... start_us=... done_us=...
# Exit 0 on success, 1 on time-out / missing markers.
#
# IIgs CPU clock rate. MAME's apple2gs starts in IIgs slow mode
# (1.023 MHz, IIe-compatible) until the IIgs ROM enables fast mode
# via $C036. We're booting our binary directly without going through
# the ROM, so we stay in slow mode unless the binary itself writes
# $80 to $C036. For the cycle harness we calibrate against slow
# mode (1023000 Hz) — both clang and Calypsi binaries run under
# the same emulator state, so the ratio is what matters. If you
# want fast-mode numbers, have the bench wrapper enable it.
set -euo pipefail
source "$(dirname "$0")/common.sh"
BIN="$1"
ITERS="${2:-100}"
SECS=10
CLOCK_HZ=1023000
[ -f "$BIN" ] || die "binary not found: $BIN"
LUA_PATH=$(mktemp --suffix=.lua)
trap 'rm -f "$LUA_PATH"' EXIT
cat > "$LUA_PATH" <<EOF
local frame = 0
local loaded = false
local start_t = nil
local done_t = nil
emu.register_frame_done(function()
frame = frame + 1
local cpu = manager.machine.devices[":maincpu"]
local mem = cpu.spaces["program"]
if frame == 30 and not loaded then
local f = io.open("$BIN", "rb")
if not f then print("BIN-MISSING"); manager.machine:exit(); return end
local data = f:read("*all"); f:close()
for i = 1, #data do
local addr = 0x001000 + i - 1
if not (addr >= 0x00C000 and addr < 0x00D000) then
mem:write_u8(addr, data:byte(i))
end
end
loaded = true
cpu.state["PC"].value = 0x1000
cpu.state["PB"].value = 0x00
cpu.state["DB"].value = 0x00
cpu.state["D"].value = 0x00
cpu.state["P"].value = 0x34
cpu.state["E"].value = 0
cpu.state["S"].value = 0x01FF
print("MAME-LOADED bytes=" .. #data)
return
end
if not loaded then return end
-- Poll markers on every frame after load. Capture emu.time()
-- the first frame each marker appears.
if not start_t and mem:read_u16(0x025000) == 0xa1a1 then
start_t = emu.time()
print(string.format("MAME-MARK START frame=%d t=%.9f", frame, start_t))
end
if start_t and not done_t and mem:read_u16(0x025002) == 0xa2a2 then
done_t = emu.time()
print(string.format("MAME-MARK DONE frame=%d t=%.9f", frame, done_t))
local delta = done_t - start_t
local delta_us = delta * 1e6
local cyc = delta * $CLOCK_HZ
local per_call = cyc / $ITERS
print(string.format("MAME-CYCLES iters=$ITERS delta_us=%.3f total_cyc=%.0f cyc_per_call=%.2f",
delta_us, cyc, per_call))
manager.machine:exit()
end
end)
EOF
OUT=$(timeout 60 mame apple2gs \
-rompath "$PROJECT_ROOT/tools/mame/roms" \
-plugins -autoboot_script "$LUA_PATH" \
-window -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | grep "^MAME-")
echo "$OUT"
if echo "$OUT" | grep -q "MAME-CYCLES"; then
exit 0
fi
warn "no MAME-CYCLES output — markers not observed within $SECS sec"
exit 1

127
scripts/runMultiSeg.sh Executable file
View file

@ -0,0 +1,127 @@
#!/usr/bin/env bash
# runMultiSeg.sh — run a multi-segment program in MAME via a
# mini in-Lua loader. Reads the link816 manifest, loads each
# segment's image at its base address, sets PC to segment 1's
# entry, lets the program run, then reads check-address values.
#
# Usage: runMultiSeg.sh <manifest.json> [check args like runInMame.sh]
set -euo pipefail
source "$(dirname "$0")/common.sh"
MANIFEST="$1"
shift
SECS=3
# Build address list as Lua table entries, mirroring runInMame.sh.
LUA_CHECKS=""
EXPECT_LIST=()
ADDR_LIST=()
if [ "$1" = "--check" ]; then
shift
for pair in "$@"; do
ADDR="${pair%=*}"
EXP="${pair#*=}"
ADDR_LIST+=("$ADDR")
EXPECT_LIST+=("$EXP")
LUA_CHECKS="$LUA_CHECKS print(string.format('MAME-READ addr=0x%06x val=0x%04x', $ADDR, mem:read_u16($ADDR)))"$'\n'
done
else
ADDR="$1"
EXP="$2"
ADDR_LIST+=("$ADDR")
EXPECT_LIST+=("$EXP")
LUA_CHECKS="print(string.format('MAME-READ addr=0x%06x val=0x%04x', $ADDR, mem:read_u16($ADDR)))"
fi
[ -f "$MANIFEST" ] || die "manifest not found: $MANIFEST"
# Parse manifest with python (every machine has it). Emit a Lua
# table of (image_path, base, entry_offset_from_seg1).
PARSED=$(python3 - <<EOF
import json, os, sys
m = json.load(open("$MANIFEST"))
for s in m["segments"]:
base = int(s["base"], 16)
entry = int(s.get("entry_offset", "0x0"), 16) if s["num"] == 1 else 0
sz = s["size"]
print(f'{s["image"]}|{base}|{entry}|{sz}')
EOF
)
[ -n "$PARSED" ] || die "manifest parse failed"
LUA_PATH=$(mktemp --suffix=.lua)
trap 'rm -f "$LUA_PATH"' EXIT
# Build the per-segment load lua.
LOAD_LUA=""
ENTRY_BASE=0
ENTRY_OFF=0
while IFS='|' read -r img base entry sz; do
LOAD_LUA="$LOAD_LUA
do
local f = io.open('$img', 'rb')
if not f then print('SEG-MISSING $img'); manager.machine:exit(); return end
local data = f:read('*all'); f:close()
for i = 1, #data do
local addr = $base + i - 1
if not (addr >= 0x00C000 and addr < 0x00D000) then
mem:write_u8(addr, data:byte(i))
end
end
print('SEG-LOADED base=0x' .. string.format('%06x', $base) .. ' bytes=' .. #data)
end
"
if [ "$entry" != "0" ] || [ "$ENTRY_BASE" = "0" ]; then
ENTRY_BASE="$base"
ENTRY_OFF="$entry"
fi
done <<< "$PARSED"
cat > "$LUA_PATH" <<EOF
local frame = 0
local loaded = false
emu.register_frame_done(function()
frame = frame + 1
if frame == 30 and not loaded then
local cpu = manager.machine.devices[":maincpu"]
local mem = cpu.spaces["program"]
$LOAD_LUA
loaded = true
cpu.state["PC"].value = $ENTRY_BASE + $ENTRY_OFF
cpu.state["PB"].value = ($ENTRY_BASE >> 16) & 0xff
cpu.state["DB"].value = 0x00
cpu.state["D"].value = 0x00
cpu.state["P"].value = 0x34
cpu.state["E"].value = 0
cpu.state["S"].value = 0x01FF
print('MAME-READY pc=0x' .. string.format('%06x', $ENTRY_BASE + $ENTRY_OFF))
end
if frame == 60 then
local cpu = manager.machine.devices[":maincpu"]
local mem = cpu.spaces["program"]
$LUA_CHECKS
manager.machine:exit()
end
end)
EOF
OUT=$(timeout 30 mame apple2gs \
-rompath "$PROJECT_ROOT/tools/mame/roms" \
-plugins -autoboot_script "$LUA_PATH" \
-window -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | grep -E "^(MAME-|SEG-)")
echo "$OUT"
mapfile -t GOT_LIST < <(printf '%s\n' "$OUT" | grep -oE 'val=0x[0-9a-f]+' | sed 's/val=0x//')
ok=1
for i in "${!EXPECT_LIST[@]}"; do
if [ "${GOT_LIST[$i]:-}" != "${EXPECT_LIST[$i]}" ]; then
warn "MAME mismatch at ${ADDR_LIST[$i]}: got 0x${GOT_LIST[$i]:-MISSING} expected 0x${EXPECT_LIST[$i]}"
ok=0
fi
done
if [ $ok -eq 1 ]; then
log "MAME (multi-seg) OK: ${#EXPECT_LIST[@]} reads matched"
exit 0
fi
exit 1

113
scripts/runViaFinder.sh Executable file
View file

@ -0,0 +1,113 @@
#!/usr/bin/env bash
# runViaFinder.sh — boot real GS/OS 6.0.2 in MAME, drive Finder via
# Lua keyboard automation to launch a user OMF, sample memory at
# specific frames to verify the program executed.
#
# Usage: runViaFinder.sh <omf-file> --check <addr>=<value>...
# The OMF file is injected as /SYSTEM.DISK/HELLO (top-level on the
# boot disk). Lua then waits for Finder, types S+Cmd-O to open the
# System.Disk volume window, then H+Cmd-O to launch HELLO.
#
# Memory checks happen at frame 5400 (~90s emulated, well after the
# launch path completes) and exit 0 / 1 depending on whether each
# requested address holds the requested value.
#
# Requires:
# - tools/gsos/sys602.po (GS/OS 6.0.2 boot disk)
# - /tmp/cadius/cadius (forked-file-aware ProDOS tool)
# - mame apple2gs in PATH
set -euo pipefail
OMF="$1"
shift
[ -f "$OMF" ] || { echo "missing: $OMF" >&2; exit 2; }
[ "${1:-}" = "--check" ] || { echo "usage: $0 <omf> --check <addr>=<val>..." >&2; exit 2; }
shift
PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
CADIUS=${CADIUS:-/tmp/cadius/cadius}
SYSDISK=${SYSDISK:-$PROJECT_ROOT/tools/gsos/sys602.po}
[ -x "$CADIUS" ] || { echo "cadius not found at $CADIUS" >&2; exit 2; }
[ -f "$SYSDISK" ] || { echo "sysdisk not found at $SYSDISK" >&2; exit 2; }
WORK=$(mktemp -d -t finderlaunch.XXXXXX)
trap 'rm -rf "$WORK"' EXIT
cp "$SYSDISK" "$WORK/disk.po"
cp "$OMF" "$WORK/HELLO#B30000"
"$CADIUS" ADDFILE "$WORK/disk.po" /SYSTEM.DISK "$WORK/HELLO#B30000" >/dev/null
LUA_CHECKS=""
EXPECTS=()
for pair in "$@"; do
[ "$pair" = "--check" ] && continue
addr="${pair%=*}"; val="${pair#*=}"
EXPECTS+=("$pair")
LUA_CHECKS="$LUA_CHECKS print(string.format('MAME-READ %s=%02x', '$addr', mem:read_u8($addr)))"$'\n'
done
cat > "$WORK/finder.lua" <<LUA
-- Boot Finder, navigate to HELLO icon, launch via Cmd-O.
local cpu = manager.machine.devices[":maincpu"]
local mem = cpu.spaces["program"]
local nat = manager.machine.natkeyboard
local frame = 0
local idx = 1
local function get_field(port, name)
local p = manager.machine.ioport.ports[port]
if p == nil then return nil end
return p.fields[name]
end
local key_cmd = get_field(":macadb:KEY3", "Command / Open Apple")
local function press(f) if f then f:set_value(1) end end
local function release(f) if f then f:set_value(0) end end
-- Keystroke timeline: open System.Disk, then launch HELLO.
local steps = {
{3300, function() nat:post("S") end}, -- select System.Disk
{3540, function() press(key_cmd) end},
{3546, function() nat:post("o") end}, -- Cmd-O opens volume
{3600, function() release(key_cmd) end},
{4200, function() nat:post("H") end}, -- select HELLO
{4500, function() press(key_cmd) end},
{4506, function() nat:post("o") end}, -- Cmd-O launches
{4560, function() release(key_cmd) end},
{5400, function()
$LUA_CHECKS
manager.machine:exit()
end},
}
emu.register_frame_done(function()
frame = frame + 1
while idx <= #steps and frame >= steps[idx][1] do
steps[idx][2]()
idx = idx + 1
end
end)
LUA
OUT=$(timeout 130 mame apple2gs -rompath "$PROJECT_ROOT/tools/mame/roms" \
-window -nothrottle -sound none \
-seconds_to_run 110 -flop3 "$WORK/disk.po" \
-autoboot_script "$WORK/finder.lua" </dev/null 2>&1)
# Verify each expected value.
fail=0
for pair in "${EXPECTS[@]}"; do
addr="${pair%=*}"; want="${pair#*=}"
line=$(echo "$OUT" | grep "MAME-READ $addr=" | tail -1)
got=$(echo "$line" | sed -E 's/.*=([0-9a-f]+).*/\1/')
# Compare numerically (handles case differences and 0x prefix variants).
gotN=$(printf '%d' "0x$got" 2>/dev/null || echo -1)
wantN=$(printf '%d' "$want" 2>/dev/null || echo -2)
if [ "$gotN" = "$wantN" ]; then
echo " $addr = 0x$got (want $want) ✓"
else
echo " $addr = 0x$got (want $want) ✗"
fail=1
fi
done
exit $fail

View file

@ -4424,6 +4424,48 @@ EOF
fi fi
rm -f "$cShFile" "$oShFile" "$binShFile" rm -f "$cShFile" "$oShFile" "$binShFile"
# Multi-segment link: --segment-cap forces >1 text segments
# at bank-aligned bases; mini multi-segment loader
# (scripts/runMultiSeg.sh) loads each + runs. helper3(10,20)
# chains compute → helper1 → helper2 → helper3 across
# whatever segment boundaries the packer landed on; result
# must be 0xBF ((31+61)*2+7 = 191). Verifies (a) text
# splitting at the cap, (b) bank-aligned segment placement,
# (c) cross-bank JSL works.
log "check: link816 --segment-cap splits text + cross-bank JSL works"
cMsegFile="$(mktemp --suffix=.c)"
oMsegFile="$(mktemp --suffix=.o)"
binMseg="$(mktemp --suffix=.bin)"
mfMseg="$(mktemp --suffix=.json)"
cat > "$cMsegFile" <<'EOF'
__attribute__((noinline)) static void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
__attribute__((noinline)) static int compute(int x) { return x * 3 + 1; }
__attribute__((noinline)) static int helper1(int a, int b) { return compute(a) + compute(b); }
__attribute__((noinline)) static int helper2(int a, int b) { return helper1(a, b) * 2; }
__attribute__((noinline)) static int helper3(int a, int b) { return helper2(a, b) + 7; }
int main(void) {
switchToBank2();
int r = helper3(10, 20);
*(volatile unsigned short *)0x5000 = (unsigned short)r;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cMsegFile" -o "$oMsegFile"
"$PROJECT_ROOT/tools/link816" -o "$binMseg" --text-base 0x1000 \
--segment-cap 512 --manifest "$mfMseg" \
"$oCrt0F" "$oLibgccFile" "$oMsegFile" >/dev/null 2>&1
if ! grep -q '"num": 2' "$mfMseg"; then
die "link816 --segment-cap 512 did not split into multiple segments"
fi
if ! bash "$PROJECT_ROOT/scripts/runMultiSeg.sh" "$mfMseg" --check \
0x025000=00bf </dev/null >/dev/null 2>&1; then
die "MAME: multi-segment helper3(10,20) != 0xBF"
fi
rm -f "$cMsegFile" "$oMsegFile" "$binMseg" "$mfMseg" \
"${binMseg%.bin}".seg*.bin
rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \ rm -f "$oLibcF" "$oStrtolF" "$oSnprintfF" "$oQsortF" \
"$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F" "$oExtrasF" "$oStrtokF" "$oMathF" "$oSfF" "$oSdF" "$oCrt0F"
else else
@ -4921,12 +4963,203 @@ EOF
if [ ! -s "$omfFile" ]; then if [ ! -s "$omfFile" ]; then
die "omfEmit produced empty/missing OMF" die "omfEmit produced empty/missing OMF"
fi fi
# Sanity-check the OMF: VERSION byte at offset 15 should be 0x21 # Sanity-check the OMF. VERSION byte at offset 15 is the OMF
# (OMF v2.1). KIND at offset 20-21 should be 0x0000 (CODE). # spec enum: 0x00=v1.0, 0x01=v2.0, 0x02=v2.1. Real GS/OS apps
# all have 0x02 — it is not BCD "2.1" as some online docs suggest.
# KIND at offset 20-21 should be 0x1000 (CODE|PRIV) — verified
# via Merlin32 reference: Merlin's hello.s16 with KIND=0x1000
# ran successfully under MAME-Lua-driven Finder launch on real
# GS/OS 6.0.2 (marker bytes at $00/0078 set to $42/$99 confirmed).
# LABLEN must be 10 (fixed-width space-padded names) — LABLEN=0
# (length-prefixed) is in the spec but not Loader-launchable.
ver=$(od -An -tx1 -N 1 -j 15 "$omfFile" | tr -d ' ') ver=$(od -An -tx1 -N 1 -j 15 "$omfFile" | tr -d ' ')
if [ "$ver" != "21" ]; then if [ "$ver" != "02" ]; then
die "OMF version byte at offset 15 is 0x$ver (expected 0x21 = v2.1)" die "OMF version byte at offset 15 is 0x$ver (expected 0x02 = v2.1)"
fi fi
lablen=$(od -An -tu1 -N 1 -j 13 "$omfFile" | tr -d ' ')
if [ "$lablen" != "10" ]; then
die "OMF LABLEN is $lablen (expected 10 = fixed-width names)"
fi
kindLo=$(od -An -tx1 -N 1 -j 20 "$omfFile" | tr -d ' ')
kindHi=$(od -An -tx1 -N 1 -j 21 "$omfFile" | tr -d ' ')
if [ "$kindLo" != "00" ] || [ "$kindHi" != "10" ]; then
die "OMF KIND is 0x$kindHi$kindLo (expected 0x1000 = CODE|PRIV)"
fi
# Body opcode at offset DISPDATA: should be 0xF2 (LCONST, what
# every real GS/OS app segment uses).
dispdata=$(od -An -tu2 -N 2 -j 42 "$omfFile" | tr -d ' ')
bodyOp=$(od -An -tx1 -N 1 -j "$dispdata" "$omfFile" | tr -d ' ')
if [ "$bodyOp" != "f2" ]; then
die "OMF body opcode at offset $dispdata is 0x$bodyOp (expected 0xF2 LCONST)"
fi
# omfEmit --manifest path: read a link816 multi-segment manifest
# and emit one OMF segment per entry. Each segment header has
# KIND=0x8800 (STATIC|ABSBANK|CODE), ORG=base address, SEGNUM
# 1..N. Smoke just verifies we get N>1 segments at the expected
# bank-aligned ORGs; the actual loader-side execution is covered
# by the in-tree mini-loader in the multi-segment MAME smoke
# check above.
log "check: omfEmit --manifest produces valid multi-segment OMF"
cMomfFile="$(mktemp --suffix=.c)"
oMomfFile="$(mktemp --suffix=.o)"
binMomf="$(mktemp --suffix=.bin)"
mfMomf="$(mktemp --suffix=.json)"
omfMomf="$(mktemp --suffix=.omf)"
cCrt0Momf="$(mktemp --suffix=.o)"
oLgMomf="$(mktemp --suffix=.o)"
cat > "$cMomfFile" <<'EOF'
__attribute__((noinline)) static int compute(int x) { return x * 3 + 1; }
__attribute__((noinline)) static int helper1(int a, int b) { return compute(a) + compute(b); }
__attribute__((noinline)) static int helper2(int a, int b) { return helper1(a, b) * 2; }
int main(void) { return helper2(10, 20); }
EOF
"$LLVM_MC" -arch=w65816 -filetype=obj "$PROJECT_ROOT/runtime/src/crt0.s" -o "$cCrt0Momf"
"$LLVM_MC" -arch=w65816 -filetype=obj "$PROJECT_ROOT/runtime/src/libgcc.s" -o "$oLgMomf"
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cMomfFile" -o "$oMomfFile"
"$PROJECT_ROOT/tools/link816" -o "$binMomf" --text-base 0x1000 \
--segment-cap 256 --manifest "$mfMomf" \
"$cCrt0Momf" "$oLgMomf" "$oMomfFile" >/dev/null 2>&1
"$PROJECT_ROOT/tools/omfEmit" --manifest "$mfMomf" --output "$omfMomf" >/dev/null 2>&1
if [ ! -s "$omfMomf" ]; then
die "omfEmit --manifest produced empty/missing OMF"
fi
# Walk segments, count + verify KIND + ORG.
nSeg=$(python3 -c "
import struct
data = open('$omfMomf','rb').read()
pos = 0; n = 0; bad = 0
while pos < len(data):
n += 1
bytecnt = struct.unpack_from('<I', data, pos)[0]
kind = struct.unpack_from('<H', data, pos+20)[0]
if kind != 0x8800: bad += 1
pos += bytecnt
print(n if bad == 0 else 0)
")
if [ "${nSeg:-0}" -lt 2 ]; then
die "omfEmit --manifest: expected >=2 segments with KIND=0x8800, got $nSeg"
fi
rm -f "$cMomfFile" "$oMomfFile" "$binMomf" "$mfMomf" "$omfMomf" \
"$cCrt0Momf" "$oLgMomf" "${binMomf%.bin}".seg*.bin
# omfEmit --expressload: emit a 2-segment OMF where seg 1 is
# ~ExpressLoad (KIND=0x8001 DATA|STATIC) and seg 2 is the user
# code (KIND=0x8000 CODE|STATIC). Verifies the ExpressLoad load
# script structure: 8-byte header, segment list with self-rel
# offset, remap list, header info entry containing data offset
# that points exactly at user seg's LCONST data start (= body
# opcode offset + 5 for 0xF2 + 4-byte length).
log "check: omfEmit --expressload produces valid 2-seg ExpressLoad OMF"
cElFile="$(mktemp --suffix=.c)"
oElFile="$(mktemp --suffix=.o)"
binEl="$(mktemp --suffix=.bin)"
mapEl="$(mktemp --suffix=.map)"
omfEl="$(mktemp --suffix=.omf)"
cat > "$cElFile" <<'EOF'
int main(void) { return 0; }
EOF
"$CLANG" --target=w65816 -O2 -c "$cElFile" -o "$oElFile"
"$PROJECT_ROOT/tools/link816" -o "$binEl" --text-base 0x1000 \
--map "$mapEl" --no-gc-sections \
"$PROJECT_ROOT/runtime/crt0Gsos.o" "$oElFile" \
"$PROJECT_ROOT/runtime/libgcc.o" >/dev/null 2>&1
"$PROJECT_ROOT/tools/omfEmit" --input "$binEl" --map "$mapEl" \
--base 0x1000 --entry __start --output "$omfEl" \
--name HELLO --expressload >/dev/null 2>&1
if [ ! -s "$omfEl" ]; then
die "omfEmit --expressload produced empty/missing OMF"
fi
# Validate structure with Python.
python3 -c "
import struct, sys
b = open('$omfEl','rb').read()
seg1_bytecnt = struct.unpack_from('<I', b, 0)[0]
seg1_kind = struct.unpack_from('<H', b, 20)[0]
seg2_off = seg1_bytecnt
seg2_kind = struct.unpack_from('<H', b, seg2_off+20)[0]
seg2_dispdata = struct.unpack_from('<H', b, seg2_off+42)[0]
seg2_body_op = b[seg2_off + seg2_dispdata]
seg2_data_off = seg2_off + seg2_dispdata + 5
# Walk ExpressLoad: header(6 = 4-byte reserved + 2-byte count) + segtbl entry(8) + remap(2) + offsets(16) ...
el_data_start = 0x48 # body op @ 0x43, len bytes 4, data at 0x48
# Segment list entry 0 starts at offset 6 in ExpressLoad data (after 6-byte header)
sr = struct.unpack_from('<H', b, el_data_start + 6)[0]
hdrinfo_off = el_data_start + 6 + sr
data_off_in_hdrinfo = struct.unpack_from('<I', b, hdrinfo_off)[0]
errs = []
if seg1_kind != 0x8001: errs.append(f'seg1 KIND={seg1_kind:#x} (want 0x8001)')
if seg2_kind != 0x1000: errs.append(f'seg2 KIND={seg2_kind:#x} (want 0x1000)')
if seg2_body_op != 0xF2: errs.append(f'seg2 body op={seg2_body_op:#x} (want 0xf2)')
if data_off_in_hdrinfo != seg2_data_off:
errs.append(f'ExpressLoad data_off={data_off_in_hdrinfo:#x} != seg2 data start {seg2_data_off:#x}')
if errs: print('FAIL:', '; '.join(errs)); sys.exit(1)
print('OK')
" || die "omfEmit --expressload structure validation failed"
rm -f "$cElFile" "$oElFile" "$binEl" "$mapEl" "$omfEl"
# link816 --reloc-out + omfEmit --relocs round-trip: emit IMM24
# site list, decode it, verify cRELOC opcodes appear at the end of
# the OMF body. This is the mechanism that makes compiled C
# runnable under real GS/OS Loader: cRELOC tells the Loader to
# rewrite intra-segment 24-bit refs (JSL/JML/STAlong) when placing
# the segment at non-zero bank.
log "check: link816 --reloc-out + omfEmit --relocs emit cRELOC opcodes"
cR1="$(mktemp --suffix=.c)"
oR1="$(mktemp --suffix=.o)"
binR1="$(mktemp --suffix=.bin)"
mapR1="$(mktemp --suffix=.map)"
relR1="$(mktemp --suffix=.reloc)"
omfR1="$(mktemp --suffix=.omf)"
cat > "$cR1" <<'EOF'
__attribute__((noinline)) static int helper(int x) { return x + 1; }
void main(void) {
*(volatile unsigned char *)0x00007F = (unsigned char)helper(0x40);
}
EOF
"$CLANG" --target=w65816 -O2 -c "$cR1" -o "$oR1"
"$PROJECT_ROOT/tools/link816" -o "$binR1" --text-base 0x1000 \
--map "$mapR1" --reloc-out "$relR1" --no-gc-sections \
"$PROJECT_ROOT/runtime/crt0Gsos.o" "$oR1" \
"$PROJECT_ROOT/runtime/libgcc.o" >/dev/null 2>&1
"$PROJECT_ROOT/tools/omfEmit" --input "$binR1" --map "$mapR1" \
--base 0x1000 --entry __start --output "$omfR1" \
--name HELLO --relocs "$relR1" >/dev/null 2>&1
if [ ! -s "$omfR1" ] || [ ! -s "$relR1" ]; then
die "link816 --reloc-out / omfEmit --relocs produced empty output"
fi
python3 -c "
import struct, sys
b = open('$omfR1','rb').read()
r = open('$relR1','rb').read()
nRel = struct.unpack_from('<I', r, 0)[0]
if nRel < 1: print(f'FAIL: expected >=1 reloc site, got {nRel}'); sys.exit(1)
# Body opcode at DISPDATA; LCONST data length follows. Walk body and
# count cRELOC opcodes (0xF5).
dispdata = struct.unpack_from('<H', b, 42)[0]
length = struct.unpack_from('<I', b, 8)[0]
bytecnt = struct.unpack_from('<I', b, 0)[0]
body_op = b[dispdata]
if body_op != 0xF2: print(f'FAIL: body op {body_op:#x} != 0xF2'); sys.exit(1)
lconst_len = struct.unpack_from('<I', b, dispdata+1)[0]
post = dispdata + 1 + 4 + lconst_len
nCreloc = 0
while post < bytecnt:
op = b[post]
if op == 0xF5:
# 1 + 1 (ByteCnt) + 1 (BitShift) + 2 (OffsetPatch) + 2 (OffsetReference) = 7 bytes
if b[post+1] != 3: print(f'FAIL: cRELOC ByteCnt {b[post+1]} != 3'); sys.exit(1)
nCreloc += 1
post += 7
elif op == 0x00:
break
else:
print(f'FAIL: unexpected body op {op:#x} at offset {post:#x}'); sys.exit(1)
if nCreloc != nRel:
print(f'FAIL: cRELOC count {nCreloc} != reloc-sidecar count {nRel}'); sys.exit(1)
print(f'OK: {nCreloc} cRELOC opcodes match sidecar')
" || die "cRELOC structure validation failed"
rm -f "$cR1" "$oR1" "$binR1" "$mapR1" "$relR1" "$omfR1"
fi fi
log "all smoke checks passed" log "all smoke checks passed"

View file

@ -290,13 +290,42 @@ struct InputObject {
// ---------------------------------------------------------------- Linker // ---------------------------------------------------------------- Linker
// Multi-segment text layout. Segment 1 lives in bank 0 alongside
// rodata/bss/init_array (the existing single-segment layout). When
// --segment-cap is set and total text exceeds it, additional code
// segments get bank-aligned bases starting at --segment-bank-base
// (default 0x040000 = bank 4 — chosen to leave bank 0 + LC RAM for
// segment 1, banks 1-3 free for data / markers / future use).
struct TextSeg {
uint32_t segNum = 1; // 1-based; 1 is the bank-0 segment
uint32_t base = 0; // 24-bit load address
uint32_t size = 0; // bytes occupied
std::vector<uint8_t> body; // patched bytes ready to write
};
struct Layout { struct Layout {
uint32_t textBase, textSize; uint32_t textBase, textSize; // segment 1's text (bank 0)
uint32_t rodataBase, rodataSize; uint32_t rodataBase, rodataSize;
uint32_t bssBase, bssSize; uint32_t bssBase, bssSize;
uint32_t initBase, initSize; uint32_t initBase, initSize;
// segments[0] = segment 1 (bank 0); segments[1..] = bank-N+ overflow.
// Always at least one entry.
std::vector<TextSeg> segments;
}; };
// One IMM24 (3-byte absolute) relocation site, recorded for OMF
// cRELOC emission. The Loader will rewrite the 3 bytes at `patchOff`
// to be (segPlacedBase + offsetRef) when the segment is placed at
// runtime — this is what makes our compiled C runnable from Finder
// when the segment lands at e.g. bank $1F instead of bank 0.
struct Imm24Site {
uint32_t patchOff; // offset within text image (== patchAddr - textBase)
uint32_t offsetRef; // offset within text image of target symbol
};
static std::vector<Imm24Site> gImm24Sites;
static uint32_t gTextBaseForSites = 0;
static bool gRecordSites = false;
static void applyReloc(std::vector<uint8_t> &buf, uint32_t off, static void applyReloc(std::vector<uint8_t> &buf, uint32_t off,
uint32_t patchAddr, uint32_t target, uint32_t patchAddr, uint32_t target,
uint8_t rtype, const std::string &symName) { uint8_t rtype, const std::string &symName) {
@ -309,9 +338,12 @@ static void applyReloc(std::vector<uint8_t> &buf, uint32_t off,
buf[off] = static_cast<uint8_t>(target & 0xFF); buf[off] = static_cast<uint8_t>(target & 0xFF);
break; break;
case R_W65816_IMM16: case R_W65816_IMM16:
if (target > 0xFFFF) // Keep only the low 16 bits. In single-bank programs this
die("R_W65816_IMM16 to '" + symName + "' = 0x" + // is a tautology (target IS 16-bit); in multi-segment
std::to_string(target) + " out of range"); // programs the target may live in a different bank, but
// IMM16 absolute uses DBR for the bank at runtime, so
// patching just the low 16 bits is correct as long as the
// caller's DBR points at the target's bank.
buf[off] = static_cast<uint8_t>(target & 0xFF); buf[off] = static_cast<uint8_t>(target & 0xFF);
buf[off + 1] = static_cast<uint8_t>((target >> 8) & 0xFF); buf[off + 1] = static_cast<uint8_t>((target >> 8) & 0xFF);
break; break;
@ -322,6 +354,23 @@ static void applyReloc(std::vector<uint8_t> &buf, uint32_t off,
buf[off] = static_cast<uint8_t>(target & 0xFF); buf[off] = static_cast<uint8_t>(target & 0xFF);
buf[off + 1] = static_cast<uint8_t>((target >> 8) & 0xFF); buf[off + 1] = static_cast<uint8_t>((target >> 8) & 0xFF);
buf[off + 2] = static_cast<uint8_t>((target >> 16) & 0xFF); buf[off + 2] = static_cast<uint8_t>((target >> 16) & 0xFF);
// Record the site for OMF cRELOC emission (only if recording is
// enabled — gRecordSites is set by the CLI when --reloc-out is
// requested). The patch offset is within the segment image; the
// reference offset is the in-segment offset of the target.
if (gRecordSites) {
// Only intra-segment refs need cRELOC; cross-bank refs (to
// GS/OS dispatcher etc.) target absolute fixed addresses
// and shouldn't be relocated by the Loader.
uint32_t targetBank = target & 0xFF0000;
uint32_t baseBank = gTextBaseForSites & 0xFF0000;
if (targetBank == baseBank) {
Imm24Site s;
s.patchOff = patchAddr - gTextBaseForSites;
s.offsetRef = target - gTextBaseForSites;
gImm24Sites.push_back(s);
}
}
break; break;
case R_W65816_PCREL8: case R_W65816_PCREL8:
Signed = static_cast<int64_t>(target) - (static_cast<int64_t>(patchAddr) + 1); Signed = static_cast<int64_t>(target) - (static_cast<int64_t>(patchAddr) + 1);
@ -357,6 +406,13 @@ struct Linker {
uint32_t rodataBase = 0; uint32_t rodataBase = 0;
uint32_t bssBase = 0x2000; uint32_t bssBase = 0x2000;
bool gcSections = true; bool gcSections = true;
// Multi-segment support. segmentCap == 0 means "no cap" — produce
// a single-segment image (existing behaviour). Non-zero caps the
// bytes per text segment; overflow text sections get bank-aligned
// bases starting at segmentBankBase.
uint32_t segmentCap = 0;
uint32_t segmentBankBase = 0x040000;
std::string manifestPath;
// Per-section identity: (object index, section index within obj). // Per-section identity: (object index, section index within obj).
using SecID = std::pair<size_t, uint32_t>; using SecID = std::pair<size_t, uint32_t>;
@ -453,12 +509,17 @@ struct Linker {
} }
// Per-object, per-section: in-merged-text/rodata/bss offset. // Per-object, per-section: in-merged-text/rodata/bss offset.
// For text: textWithin gives the offset within the *segment* the
// section is placed in; textSegOf names which segment (1-based).
// Single-segment builds put everything in segment 1; multi-segment
// builds may scatter sections across segments.
struct ObjOffsets { struct ObjOffsets {
uint32_t textBaseInMerged = 0; uint32_t textBaseInMerged = 0;
uint32_t rodataBaseInMerged = 0; uint32_t rodataBaseInMerged = 0;
uint32_t bssBaseInMerged = 0; uint32_t bssBaseInMerged = 0;
uint32_t initBaseInMerged = 0; uint32_t initBaseInMerged = 0;
std::map<uint32_t, uint32_t> textWithin; std::map<uint32_t, uint32_t> textWithin; // offset within its segment
std::map<uint32_t, uint32_t> textSegOf; // section idx -> segment num (1-based)
std::map<uint32_t, uint32_t> rodataWithin; std::map<uint32_t, uint32_t> rodataWithin;
std::map<uint32_t, uint32_t> bssWithin; std::map<uint32_t, uint32_t> bssWithin;
std::map<uint32_t, uint32_t> initWithin; std::map<uint32_t, uint32_t> initWithin;
@ -494,8 +555,12 @@ struct Linker {
uint32_t base = 0; uint32_t base = 0;
if (kind == "text") { if (kind == "text") {
auto wIt = oo.textWithin.find(sym.shndx); auto wIt = oo.textWithin.find(sym.shndx);
base = lastLayout.textBase + oo.textBaseInMerged auto sIt = oo.textSegOf.find(sym.shndx);
+ (wIt == oo.textWithin.end() ? 0 : wIt->second); uint32_t segNum = (sIt == oo.textSegOf.end()) ? 1 : sIt->second;
uint32_t segBase = (segNum >= 1 && segNum <= lastLayout.segments.size())
? lastLayout.segments[segNum - 1].base
: lastLayout.textBase;
base = segBase + (wIt == oo.textWithin.end() ? 0 : wIt->second);
} else if (kind == "rodata") { } else if (kind == "rodata") {
auto wIt = oo.rodataWithin.find(sym.shndx); auto wIt = oo.rodataWithin.find(sym.shndx);
base = lastLayout.rodataBase + oo.rodataBaseInMerged base = lastLayout.rodataBase + oo.rodataBaseInMerged
@ -531,18 +596,39 @@ struct Linker {
Layout link(std::vector<uint8_t> &outImage) { Layout link(std::vector<uint8_t> &outImage) {
// 1. Layout: each obj's sections at running offsets. // 1. Layout: each obj's sections at running offsets.
// Text is segment-aware: when --segment-cap is set and total
// text would exceed it, sections spill into segments 2, 3, ...
// each based at successive bank boundaries starting from
// segmentBankBase. Other sections (rodata/bss/init_array)
// stay in segment 1's bank for v1 — multi-bank data refs
// would need IMM16 promotion to long which we don't do yet.
objOff.resize(objs.size()); objOff.resize(objs.size());
uint32_t curText = 0, curRodata = 0, curBss = 0, curInit = 0; uint32_t curText = 0, curRodata = 0, curBss = 0, curInit = 0;
// gc-sections: compute the live-section set before accumulating std::vector<uint32_t> segSizes = {0}; // bytes packed into each segment (1-based; index 0 = seg 1)
// so dead sections drop out of every later layout/reloc step. uint32_t curSeg = 1;
computeLiveSet(); computeLiveSet();
for (size_t fi = 0; fi < objs.size(); ++fi) { for (size_t fi = 0; fi < objs.size(); ++fi) {
ObjOffsets &oo = objOff[fi]; ObjOffsets &oo = objOff[fi];
oo.textBaseInMerged = curText; oo.textBaseInMerged = curText;
for (uint32_t idx : objs[fi]->sectionsByKind("text")) { for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
if (!isLive(fi, idx)) continue; if (!isLive(fi, idx)) continue;
oo.textWithin[idx] = curText - oo.textBaseInMerged; uint32_t sz = objs[fi]->sections[idx].size;
curText += objs[fi]->sections[idx].size; // If adding this section would exceed the cap, start a
// new segment. Skip empty sections in the cap check
// (they fit anywhere). Sections larger than the cap
// get their own segment (we don't split a single
// section across banks — it'd violate intra-section
// PCREL and 16-bit absolute addressing).
if (segmentCap && sz > 0 &&
segSizes[curSeg - 1] > 0 &&
segSizes[curSeg - 1] + sz > segmentCap) {
curSeg++;
segSizes.push_back(0);
}
oo.textSegOf[idx] = curSeg;
oo.textWithin[idx] = segSizes[curSeg - 1];
segSizes[curSeg - 1] += sz;
curText += sz;
} }
oo.rodataBaseInMerged = curRodata; oo.rodataBaseInMerged = curRodata;
for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) { for (uint32_t idx : objs[fi]->sectionsByKind("rodata")) {
@ -563,13 +649,25 @@ struct Linker {
curInit += objs[fi]->sections[idx].size; curInit += objs[fi]->sections[idx].size;
} }
} }
// Build the segment list with bases.
std::vector<TextSeg> segments;
segments.resize(segSizes.size());
segments[0].segNum = 1;
segments[0].base = textBase;
segments[0].size = segSizes[0];
for (size_t k = 1; k < segSizes.size(); ++k) {
segments[k].segNum = static_cast<uint32_t>(k + 1);
segments[k].base = segmentBankBase + 0x10000u * (k - 1);
segments[k].size = segSizes[k];
}
Layout L; Layout L;
L.textBase = textBase; L.textBase = textBase;
L.textSize = curText; L.textSize = segSizes[0]; // segment-1 text size (bank 0)
L.bssSize = curBss; L.bssSize = curBss;
L.rodataBase = rodataBase ? rodataBase : (textBase + curText); L.rodataBase = rodataBase ? rodataBase : (textBase + segSizes[0]);
L.rodataSize = curRodata; L.rodataSize = curRodata;
L.segments = std::move(segments);
// Reject a --rodata-base that overlaps text. Without this // Reject a --rodata-base that overlaps text. Without this
// check, the gap between text-end and rodata-base goes // check, the gap between text-end and rodata-base goes
// negative, the unsigned subtraction wraps to ~4GB, and the // negative, the unsigned subtraction wraps to ~4GB, and the
@ -739,8 +837,12 @@ struct Linker {
uint32_t addr = 0; uint32_t addr = 0;
if (kind == "text") { if (kind == "text") {
auto it = oo.textWithin.find(sym.shndx); auto it = oo.textWithin.find(sym.shndx);
addr = textBase + oo.textBaseInMerged auto sIt = oo.textSegOf.find(sym.shndx);
+ (it == oo.textWithin.end() ? 0 : it->second) uint32_t segNum = (sIt == oo.textSegOf.end()) ? 1 : sIt->second;
uint32_t segBase = (segNum >= 1 && segNum <= L.segments.size())
? L.segments[segNum - 1].base
: textBase;
addr = segBase + (it == oo.textWithin.end() ? 0 : it->second)
+ sym.value; + sym.value;
} else if (kind == "rodata") { } else if (kind == "rodata") {
auto it = oo.rodataWithin.find(sym.shndx); auto it = oo.rodataWithin.find(sym.shndx);
@ -776,16 +878,20 @@ struct Linker {
} }
} }
// 3. Build text and rodata buffers. Skip dead sections under // 3. Build per-segment text buffers + rodata. Skip dead
// gc-sections (isLive() returns true for everything when gc // sections under gc-sections.
// is off). std::vector<std::vector<uint8_t>> segTextBufs(L.segments.size());
std::vector<uint8_t> textBuf; for (size_t k = 0; k < L.segments.size(); ++k)
textBuf.reserve(curText); segTextBufs[k].reserve(L.segments[k].size);
for (size_t fi = 0; fi < objs.size(); ++fi) { for (size_t fi = 0; fi < objs.size(); ++fi) {
const auto &oo = objOff[fi];
for (uint32_t idx : objs[fi]->sectionsByKind("text")) { for (uint32_t idx : objs[fi]->sectionsByKind("text")) {
if (!isLive(fi, idx)) continue; if (!isLive(fi, idx)) continue;
auto sIt = oo.textSegOf.find(idx);
uint32_t segNum = (sIt == oo.textSegOf.end()) ? 1 : sIt->second;
const uint8_t *p = objs[fi]->sectionData(idx); const uint8_t *p = objs[fi]->sectionData(idx);
textBuf.insert(textBuf.end(), p, p + objs[fi]->sections[idx].size); auto &buf = segTextBufs[segNum - 1];
buf.insert(buf.end(), p, p + objs[fi]->sections[idx].size);
} }
} }
std::vector<uint8_t> rodataBuf; std::vector<uint8_t> rodataBuf;
@ -799,7 +905,7 @@ struct Linker {
} }
} }
// 4. Apply relocations to text buffer. // 4. Apply relocations to text buffers (each in its own segment).
for (size_t fi = 0; fi < objs.size(); ++fi) { for (size_t fi = 0; fi < objs.size(); ++fi) {
const auto &obj = *objs[fi]; const auto &obj = *objs[fi];
const auto &oo = objOff[fi]; const auto &oo = objOff[fi];
@ -807,20 +913,53 @@ struct Linker {
if (!isLive(fi, textIdx)) continue; if (!isLive(fi, textIdx)) continue;
auto it = obj.relocs.find(textIdx); auto it = obj.relocs.find(textIdx);
if (it == obj.relocs.end()) continue; if (it == obj.relocs.end()) continue;
uint32_t inMerged = oo.textBaseInMerged + oo.textWithin.at(textIdx); auto sIt = oo.textSegOf.find(textIdx);
uint32_t segNum = (sIt == oo.textSegOf.end()) ? 1 : sIt->second;
uint32_t inSeg = oo.textWithin.at(textIdx);
uint32_t segBase = L.segments[segNum - 1].base;
auto &textBuf = segTextBufs[segNum - 1];
for (const Reloc &r : it->second) { for (const Reloc &r : it->second) {
uint32_t patchOff = inMerged + r.offset; uint32_t patchOff = inSeg + r.offset;
uint32_t patchAddr = textBase + patchOff; uint32_t patchAddr = segBase + patchOff;
uint32_t target; uint32_t target;
std::string resolvedName; std::string resolvedName;
if (!resolveSym(obj, oo, r, target, resolvedName)) if (!resolveSym(obj, oo, r, target, resolvedName))
die(obj.path + ": .text reloc to unresolved '" die(obj.path + ": .text reloc to unresolved '"
+ resolvedName + "'"); + resolvedName + "'");
// PCREL relocs can't span banks (the displacement
// is intra-bank only). Detect and report so the
// user can adjust packing. IMM16 cross-bank is
// tolerated: 16-bit absolute uses DBR for the
// bank, which we keep at 0 by default (so refs
// to bank-0 data work from any code segment),
// and we can't statically know the target bank
// intent anyway.
if (segmentCap && (r.type == R_W65816_PCREL16 ||
r.type == R_W65816_PCREL8)) {
uint32_t targetSegBank = target & 0xFF0000;
uint32_t patchSegBank = segBase & 0xFF0000;
if (targetSegBank != patchSegBank) {
char msg[200];
std::snprintf(msg, sizeof(msg),
"%s: cross-bank PCREL reloc to '%s' (target bank "
"0x%X, code bank 0x%X) — adjust --segment-cap "
"or pack referenced section into the same segment",
obj.path.c_str(), resolvedName.c_str(),
targetSegBank, patchSegBank);
die(msg);
}
}
applyReloc(textBuf, patchOff, patchAddr, target, r.type, applyReloc(textBuf, patchOff, patchAddr, target, r.type,
resolvedName); resolvedName);
} }
} }
} }
// Move per-segment patched text into the Layout for output.
for (size_t k = 0; k < L.segments.size(); ++k)
L.segments[k].body = std::move(segTextBufs[k]);
// Re-publish layout now that segment bodies are populated —
// writeMultiSegment reads from lastLayout.
lastLayout = L;
// 4b. Apply relocations to rodata/data buffer. Globals like // 4b. Apply relocations to rodata/data buffer. Globals like
// `int *p = &v;` need their initializer patched at link time // `int *p = &v;` need their initializer patched at link time
@ -849,11 +988,16 @@ struct Linker {
} }
} }
// 5. Compose output: text || (gap) || rodata. bss is virtual. // 5. Compose output: segment-1 text || (gap) || rodata.
// bss is virtual. Multi-segment builds emit additional text
// segments separately (see writeSegmentImages); the main -o
// output stays segment 1's image so existing single-segment
// smoke checks still work unchanged.
outImage.clear(); outImage.clear();
outImage = std::move(textBuf); const uint32_t seg1TextSize = static_cast<uint32_t>(L.segments[0].body.size());
if (L.rodataBase != textBase + curText) { outImage = L.segments[0].body;
uint32_t gap = L.rodataBase - (textBase + curText); if (L.rodataBase != textBase + seg1TextSize) {
uint32_t gap = L.rodataBase - (textBase + seg1TextSize);
outImage.insert(outImage.end(), gap, 0); outImage.insert(outImage.end(), gap, 0);
} }
outImage.insert(outImage.end(), rodataBuf.begin(), rodataBuf.end()); outImage.insert(outImage.end(), rodataBuf.begin(), rodataBuf.end());
@ -1025,6 +1169,82 @@ struct Linker {
} }
} }
// Write per-segment images for segments 2..N (segment 1 is the
// main -o output) and a JSON manifest describing all segments.
// Image filename convention: <outBase>.seg<N>.bin where outBase is
// the -o path with any trailing extension stripped. Manifest JSON
// is at the user-supplied --manifest path.
void writeMultiSegment(const std::string &mainOutPath,
const std::string &mfPath,
const std::string &entrySym) const {
if (lastLayout.segments.empty()) return;
// Strip the extension from mainOutPath for per-segment names.
std::string outBase = mainOutPath;
size_t dot = outBase.find_last_of('.');
size_t slash = outBase.find_last_of('/');
if (dot != std::string::npos &&
(slash == std::string::npos || dot > slash)) {
outBase = outBase.substr(0, dot);
}
// Per-segment images for K >= 2.
for (size_t k = 1; k < lastLayout.segments.size(); ++k) {
const auto &seg = lastLayout.segments[k];
char name[256];
std::snprintf(name, sizeof(name), "%s.seg%u.bin",
outBase.c_str(), seg.segNum);
std::ofstream f(name, std::ios::binary);
if (!f) die(std::string("cannot open '") + name + "' for writing");
f.write(reinterpret_cast<const char *>(seg.body.data()),
seg.body.size());
}
// Manifest. Hand-rolled JSON (no external dep).
if (mfPath.empty()) return;
std::ofstream mf(mfPath);
if (!mf) die("cannot open '" + mfPath + "' for writing");
char buf[512];
std::snprintf(buf, sizeof(buf),
"{\n"
" \"version\": 1,\n"
" \"main\": \"%s\",\n"
" \"entry\": \"%s\",\n"
" \"segments\": [\n", mainOutPath.c_str(), entrySym.c_str());
mf << buf;
for (size_t k = 0; k < lastLayout.segments.size(); ++k) {
const auto &seg = lastLayout.segments[k];
std::string imgPath = mainOutPath;
if (k > 0) {
char nm[256];
std::snprintf(nm, sizeof(nm), "%s.seg%u.bin",
outBase.c_str(), seg.segNum);
imgPath = nm;
}
uint32_t entryOff = 0;
// Set entry_offset on whichever segment actually contains
// the entry symbol — usually segment 1 (crt0's __start)
// but could be any segment if user picks a non-standard
// entry point.
auto it = globalSyms.find(entrySym);
if (it != globalSyms.end() && it->second >= seg.base &&
it->second < seg.base + seg.body.size()) {
entryOff = it->second - seg.base;
}
std::snprintf(buf, sizeof(buf),
" {\n"
" \"num\": %u,\n"
" \"name\": \"SEG%u\",\n"
" \"base\": \"0x%06x\",\n"
" \"size\": %zu,\n"
" \"image\": \"%s\",\n"
" \"entry_offset\": \"0x%04x\"\n"
" }%s\n",
seg.segNum, seg.segNum, seg.base, seg.body.size(),
imgPath.c_str(), entryOff,
(k + 1 < lastLayout.segments.size()) ? "," : "");
mf << buf;
}
mf << " ]\n}\n";
}
// Stash the last layout so writeMap can use it. // Stash the last layout so writeMap can use it.
Layout lastLayout; Layout lastLayout;
}; };
@ -1047,8 +1267,13 @@ static void usage(const char *argv0) {
std::fprintf(stderr, std::fprintf(stderr,
"usage: %s -o <output> [--text-base ADDR] [--rodata-base ADDR]\n" "usage: %s -o <output> [--text-base ADDR] [--rodata-base ADDR]\n"
" [--bss-base ADDR] [--map FILE] [--debug-out FILE]\n" " [--bss-base ADDR] [--map FILE] [--debug-out FILE]\n"
" [--no-gc-sections]\n" " [--reloc-out FILE] [--no-gc-sections]\n"
" <input.o> ...\n", " <input.o> ...\n"
"\n"
" --reloc-out FILE write IMM24 relocation site list (binary:\n"
" <count:u32><patchOff:u32 offsetRef:u32>...)\n"
" consumed by omfEmit --relocs to emit cRELOC\n"
" opcodes for runtime bank-byte fixup.\n",
argv0); argv0);
std::exit(2); std::exit(2);
} }
@ -1059,6 +1284,7 @@ int main(int argc, char **argv) {
std::string outPath; std::string outPath;
std::string mapPath; std::string mapPath;
std::string debugOutPath; std::string debugOutPath;
std::string relocOutPath;
Linker linker; Linker linker;
int i = 1; int i = 1;
@ -1082,6 +1308,9 @@ int main(int argc, char **argv) {
} else if (a == "--debug-out") { } else if (a == "--debug-out") {
if (++i >= argc) usage(argv[0]); if (++i >= argc) usage(argv[0]);
debugOutPath = argv[i++]; debugOutPath = argv[i++];
} else if (a == "--reloc-out") {
if (++i >= argc) usage(argv[0]);
relocOutPath = argv[i++];
} else if (a == "--gc-sections") { } else if (a == "--gc-sections") {
// Drop sections not reachable from __start / main / // Drop sections not reachable from __start / main /
// init_array. Requires `-ffunction-sections` (so each // init_array. Requires `-ffunction-sections` (so each
@ -1094,6 +1323,15 @@ int main(int argc, char **argv) {
} else if (a == "--no-gc-sections") { } else if (a == "--no-gc-sections") {
linker.gcSections = false; linker.gcSections = false;
i++; i++;
} else if (a == "--segment-cap") {
if (++i >= argc) usage(argv[0]);
linker.segmentCap = parseInt(argv[i++]);
} else if (a == "--segment-bank-base") {
if (++i >= argc) usage(argv[0]);
linker.segmentBankBase = parseInt(argv[i++]);
} else if (a == "--manifest") {
if (++i >= argc) usage(argv[0]);
linker.manifestPath = argv[i++];
} else if (a == "-h" || a == "--help") { } else if (a == "-h" || a == "--help") {
usage(argv[0]); usage(argv[0]);
} else if (!a.empty() && a[0] == '-') { } else if (!a.empty() && a[0] == '-') {
@ -1105,6 +1343,14 @@ int main(int argc, char **argv) {
} }
if (outPath.empty() || linker.objs.empty()) usage(argv[0]); if (outPath.empty() || linker.objs.empty()) usage(argv[0]);
// Enable IMM24 site recording before linking, so applyReloc populates
// gImm24Sites for cRELOC sidecar emission.
if (!relocOutPath.empty()) {
gRecordSites = true;
gTextBaseForSites = linker.textBase;
gImm24Sites.clear();
}
std::vector<uint8_t> image; std::vector<uint8_t> image;
Layout L = linker.link(image); Layout L = linker.link(image);
@ -1114,13 +1360,49 @@ int main(int argc, char **argv) {
if (!mapPath.empty()) linker.writeMap(mapPath); if (!mapPath.empty()) linker.writeMap(mapPath);
if (!debugOutPath.empty()) linker.writeDebugSidecar(debugOutPath); if (!debugOutPath.empty()) linker.writeDebugSidecar(debugOutPath);
if (!relocOutPath.empty()) {
// Sidecar binary format:
// u32 count
// { u32 patchOff; u32 offsetRef; } × count
// Both offsets are within the segment image (== link-time addr
// minus textBase). Consumed by omfEmit --relocs to emit cRELOC
// opcodes after the LCONST data.
std::ofstream rf(relocOutPath, std::ios::binary);
if (!rf) die("cannot open '" + relocOutPath + "' for writing");
uint32_t count = (uint32_t)gImm24Sites.size();
rf.write(reinterpret_cast<const char *>(&count), 4);
for (const auto &s : gImm24Sites) {
uint32_t po = s.patchOff, off = s.offsetRef;
rf.write(reinterpret_cast<const char *>(&po), 4);
rf.write(reinterpret_cast<const char *>(&off), 4);
}
}
// Multi-segment: write per-segment images + manifest if there's
// more than one segment OR --manifest was requested.
if (L.segments.size() > 1 || !linker.manifestPath.empty()) {
// Default entry symbol is __start (crt0's program entry,
// which calls main). GS/OS Loader runs from segment 1's
// entry; crt0 lives in segment 1 by convention (first
// input object is typically the runtime/crt0).
linker.writeMultiSegment(outPath, linker.manifestPath, "__start");
}
std::fprintf(stderr, std::fprintf(stderr,
"linked: text=[0x%04x+%u] rodata=[0x%04x+%u] bss=[0x%04x+%u] " "linked: text=[0x%04x+%u] rodata=[0x%04x+%u] bss=[0x%04x+%u] "
"-> %s (%zu bytes)\n", "-> %s (%zu bytes)",
L.textBase, L.textSize, L.rodataBase, L.rodataSize, L.textBase, L.textSize, L.rodataBase, L.rodataSize,
L.bssBase, L.bssSize, L.bssBase, L.bssSize,
outPath.c_str(), image.size()); outPath.c_str(), image.size());
if (L.segments.size() > 1) {
std::fprintf(stderr, " + %zu extra segments",
L.segments.size() - 1);
for (size_t k = 1; k < L.segments.size(); ++k) {
std::fprintf(stderr, " seg%u=[0x%06x+%zu]",
L.segments[k].segNum, L.segments[k].base,
L.segments[k].body.size());
}
}
std::fprintf(stderr, "\n");
return 0; return 0;
} }

View file

@ -1,14 +1,25 @@
// omfEmit — wrap a flat binary in a minimal Apple IIgs OMF v2.1 // omfEmit — wrap a flat binary (or a multi-segment manifest from
// container so GS/OS can load and execute it. // link816) in an Apple IIgs OMF v2.1 container.
// //
// Single-segment output (CODE, kind=0), no INTERSEG opcodes (multi- // Single-segment mode (legacy): one CODE segment with KIND=0,
// segment output is a follow-on). Header layout per OMF 2.1 spec: // no INTERSEG opcodes, ORG=0 (loader picks bank). Header layout
// 44-byte fixed header + 10-byte LOAD_NAME + 32-byte SEG_NAME, then // per OMF 2.1 spec: 44-byte fixed header + 10-byte LOAD_NAME +
// the body (DS opcode for the payload, END opcode terminator). // 32-byte SEG_NAME, then the body (DS opcode for the payload,
// END opcode terminator).
// //
// CLI mirrors the Python tool exactly:
// omfEmit --input flat.bin --map flat.map --base 0x8000 // omfEmit --input flat.bin --map flat.map --base 0x8000
// --entry main --output prog.omf [--name SEG] // --entry main --output prog.omf [--name SEG]
//
// Multi-segment mode: read the JSON manifest emitted by
// `link816 --manifest`, write one OMF segment per manifest entry.
// Each segment's ORG is set to its declared base (bank-aligned)
// so the loader places it at the exact address the linker assumed
// when it patched intra-segment IMM24 / IMM16 relocations. KIND
// uses the STATIC + ABSBANK attributes to ask the loader not to
// move segments around — necessary because all relocs were already
// baked in at link time (no INTERSEG opcodes emitted yet).
//
// omfEmit --manifest manifest.json --output prog.omf
#include <cstdint> #include <cstdint>
#include <cstdio> #include <cstdio>
@ -26,6 +37,14 @@ namespace {
std::exit(1); std::exit(1);
} }
// Populated by --relocs from a link816 sidecar. Each entry is
// (OffsetPatch, OffsetReference) — the in-segment offset to patch
// (3 bytes wide) and the in-segment offset of the target. Consumed
// by emitOneSeg to write cRELOC opcodes between LCONST and END.
} // close namespace
std::vector<std::pair<uint16_t, uint16_t>> gReloc24Sites;
namespace {
static std::vector<uint8_t> readFile(const std::string &path) { static std::vector<uint8_t> readFile(const std::string &path) {
std::ifstream f(path, std::ios::binary); std::ifstream f(path, std::ios::binary);
if (!f) die("cannot open '" + path + "' for reading"); if (!f) die("cannot open '" + path + "' for reading");
@ -67,59 +86,107 @@ static void put16(std::vector<uint8_t> &v, uint16_t x) {
v.push_back((x >> 8) & 0xFF); v.push_back((x >> 8) & 0xFF);
} }
static std::vector<uint8_t> emitOMF(const std::vector<uint8_t> &image, // Emit one OMF segment record. Caller composes multiple records
uint32_t entryOffset, // back-to-back to form a multi-segment OMF file.
const std::string &name) { //
// Body: DS (literal data) + END. // `org` : absolute load address. 0 means "loader picks" (single-
// segment mode). Non-zero (typical for multi-segment)
// requests STATIC ABSBANK placement at that exact address.
// `segNum` : 1-based segment number.
// `entryOff`: offset within this segment to the program entry point;
// only meaningful for the entry segment (typically 1),
// ignored otherwise.
// `kind` : OMF KIND field. Caller picks; v1 uses 0x8800 (STATIC |
// ABSBANK | CODE) for multi-segment static placement, or
// 0x0000 (CODE, dynamic) for single-segment legacy mode.
static std::vector<uint8_t> emitOneSeg(const std::vector<uint8_t> &image,
uint32_t entryOff,
uint32_t org,
uint16_t segNum,
uint16_t kind,
const std::string &name) {
std::vector<uint8_t> body; std::vector<uint8_t> body;
if (!image.empty()) { if (!image.empty()) {
body.push_back(0xF1); // DS opcode // LCONST opcode 0xF2: takes a NUMLEN-byte count followed by N
// literal bytes. With NUMLEN=4 (standard for v2.1), the count
// field is 4 bytes. Verified empirically against real /SYSTEM/
// START on GS/OS 6.0.2: every segment uses 0xF2 + 4-byte count.
body.push_back(0xF2); // LCONST opcode
put32(body, static_cast<uint32_t>(image.size())); put32(body, static_cast<uint32_t>(image.size()));
body.insert(body.end(), image.begin(), image.end()); body.insert(body.end(), image.begin(), image.end());
} }
// cRELOC opcodes (0xF5): one per IMM24 reloc site. Format per
// Merlin32's BuildOMFFile:
// 1B opcode (0xF5)
// 1B ByteCnt (3 for IMM24)
// 1B BitShift (0 = no shift)
// 2B OffsetPatch (offset in segment to patch)
// 2B OffsetReference (in-segment offset of target)
// The Loader rewrites segment[OffsetPatch..OffsetPatch+2] to be
// (segPlacedBase + OffsetReference) at load time. This is what
// makes JSL/JML/STAlong/etc. with intra-segment targets work when
// the Loader places us at non-zero bank.
for (const auto &s : ::gReloc24Sites) {
body.push_back(0xF5);
body.push_back(3); // ByteCnt
body.push_back(0); // BitShift
put16(body, s.first); // OffsetPatch
put16(body, s.second); // OffsetReference
}
body.push_back(0x00); // END opcode body.push_back(0x00); // END opcode
// LOAD_NAME: 10 bytes, space-padded. // Real OMF format (Merlin32 convention, verified GS/OS Loader-launchable):
std::string loadName = name.substr(0, 10); // - LABLEN = 10: both LOAD_NAME and SEG_NAME are 10 bytes wide,
while (loadName.size() < 10) loadName += ' '; // space-padded. This is what Merlin32 emits and what GS/OS
// Loader accepts when launching from Finder. Length-prefixed
// SEG_NAME: 1-byte length prefix + 31 bytes (truncated, padded with NUL). // names (LABLEN=0, what /SYSTEM/START FINDER and TOOL.SETUP
std::string segNameTxt = name.substr(0, 31); // use) is documented in the OMF spec but NOT accepted by the
std::vector<uint8_t> segName; // Loader for app launch — empirical finding: switching from
segName.push_back(static_cast<uint8_t>(segNameTxt.size())); // LABLEN=0 to LABLEN=10 was the key change that took our hello
for (char c : segNameTxt) segName.push_back((uint8_t)c); // from "OMF loaded but entry never JSL'd → $005C error" to
while (segName.size() < 32) segName.push_back(0); // "marker $0078 = $42 set, code ran".
constexpr uint8_t LABLEN_VAL = 10;
std::vector<uint8_t> loadName(10, 0x20); // 10 spaces
std::string segNameTxt = name.substr(0, 10); // truncate to LABLEN
std::vector<uint8_t> segName(LABLEN_VAL, 0x20); // 10-byte field, space-padded
for (size_t i = 0; i < segNameTxt.size(); i++)
segName[i] = (uint8_t)segNameTxt[i];
constexpr uint16_t DISPNAME = 44; constexpr uint16_t DISPNAME = 44;
const uint16_t DISPDATA = DISPNAME + 10 + 32; const uint16_t DISPDATA = static_cast<uint16_t>(
DISPNAME + loadName.size() + segName.size());
const uint32_t LENGTH = static_cast<uint32_t>(image.size()); const uint32_t LENGTH = static_cast<uint32_t>(image.size());
const uint32_t BYTECNT = DISPDATA + static_cast<uint32_t>(body.size()); const uint32_t BYTECNT = DISPDATA + static_cast<uint32_t>(body.size());
const uint32_t RESSPC = 0; const uint32_t RESSPC = 0;
// BANKSIZE = 0x10000 — segment fits in one 64KB bank.
// Earlier I tried 0 (matched one decoded file) but real
// executable code segments use 0x10000.
const uint32_t BANKSIZE = 0x10000; const uint32_t BANKSIZE = 0x10000;
const uint16_t KIND = 0x0000; // CODE
const uint32_t ORG = 0;
const uint32_t ALIGN = 0; const uint32_t ALIGN = 0;
const uint8_t NUMSEX = 0; const uint8_t NUMSEX = 0;
const uint16_t SEGNUM = 1;
const uint32_t ENTRY = entryOffset;
std::vector<uint8_t> hdr; std::vector<uint8_t> hdr;
put32(hdr, BYTECNT); put32(hdr, BYTECNT);
put32(hdr, RESSPC); put32(hdr, RESSPC);
put32(hdr, LENGTH); put32(hdr, LENGTH);
hdr.push_back(0x00); // undefined hdr.push_back(0x00); // undefined
hdr.push_back(10); // LABLEN hdr.push_back(LABLEN_VAL); // LABLEN (10 = fixed-width names)
hdr.push_back(4); // NUMLEN hdr.push_back(4); // NUMLEN
hdr.push_back(0x21); // VERSION 2.1 hdr.push_back(0x02); // VERSION (0x02 = OMF v2.1; 0x01 = v2.0)
// Earlier we used 0x21 here thinking it was BCD-encoded "2.1" —
// it's not. The VERSION byte uses an enum: 0x00=v1.0, 0x01=v2.0,
// 0x02=v2.1. Real GS/OS apps decoded from a system disk have
// 0x02 here. GS/OS Loader rejects 0x21 with error $1102 because
// there's no version with that code.
put32(hdr, BANKSIZE); put32(hdr, BANKSIZE);
put16(hdr, KIND); put16(hdr, kind);
hdr.push_back(0x00); hdr.push_back(0x00); // undefined (2 bytes) hdr.push_back(0x00); hdr.push_back(0x00); // undefined (2 bytes)
put32(hdr, ORG); put32(hdr, org);
put32(hdr, ALIGN); put32(hdr, ALIGN);
hdr.push_back(NUMSEX); hdr.push_back(NUMSEX);
hdr.push_back(0x00); // undefined hdr.push_back(0x00); // undefined
put16(hdr, SEGNUM); put16(hdr, segNum);
put32(hdr, ENTRY); put32(hdr, entryOff);
put16(hdr, DISPNAME); put16(hdr, DISPNAME);
put16(hdr, DISPDATA); put16(hdr, DISPDATA);
@ -133,6 +200,303 @@ static std::vector<uint8_t> emitOMF(const std::vector<uint8_t> &image,
return out; return out;
} }
// Legacy single-segment wrapper.
//
// KIND=0x1000 (CODE | PRIV). This is what Merlin32 emits for single-
// segment GS/OS apps and what GS/OS Loader actually launches via
// Finder double-click. KIND=0x8000 (CODE|STATIC) was earlier hypothesis
// based on extracting ABOUT from real FINDER, but ABOUT is a sub-
// segment of FINDER, not a standalone app — so its KIND isn't a valid
// model. PRIV bit signals "loaded with the rest of the app" and is the
// reliable choice empirically validated by Merlin32-built hello.s16
// running successfully under MAME-Lua-driven Finder launch.
static std::vector<uint8_t> emitOMF(const std::vector<uint8_t> &image,
uint32_t entryOffset,
const std::string &name) {
return emitOneSeg(image, entryOffset, /*org*/0, /*segNum*/1,
/*kind*/0x1000, name);
}
// Emit an ExpressLoad-able OMF wrapping a single user segment. This is
// what real GS/OS apps look like: a `~ExpressLoad` segment as seg 1,
// then the actual code as seg 2.
//
// Why we need ExpressLoad: replacing /SYSTEM/START with a single-
// segment OMF (no ExpressLoad) makes the GS/OS Loader place our
// segment in RAM but never JSL the entry — verified by writing a
// marker as the first instruction of crt0Gsos and observing the
// marker remained 0 across the entire boot.
//
// ExpressLoad format reverse-engineered from real /SYSTEM/START
// (FINDER) on GS/OS 6.0.2 disk. Each ExpressLoad-able file's seg 1
// is a `~ExpressLoad` data segment containing a load script.
//
// The load script (stored as the LCONST data of the ExpressLoad seg):
// +0..1 word file_ref = 0
// +2..3 word reserved = 0
// +4..5 word extra = 0 (Neil Parker's docs omit this)
// +6..7 word count = N - 2 where N = total segs
// +8.. 8B/seg segment list = (N - 1) entries:
// +0..1: self-rel offset to header info entry
// +2..3: flags = 0
// +4..7: handle = 0
// +Var 2B/seg remap list = (N - 1) words:
// new segment number for old position
// +Var Var/seg header info entries:
// +0..3: data offset in file (= body op + 5)
// +4..7: data length (= seg LENGTH field)
// +8..11: reloc offset in file (0 if no relocs)
// +12..15: reloc length (0 if no relocs)
// +16..47: header copy bytes [12..43] of the
// user segment, with DISPDATA zeroed
// +48..57: LOAD_NAME (10 bytes)
// +58.. : SEG_NAME (length-prefixed)
//
// All counts use NUMLEN=4 (4-byte length on LCONST opcodes).
static std::vector<uint8_t> emitOmfExpressLoad(
const std::vector<uint8_t> &image,
uint32_t entryOffset,
const std::string &userSegName) {
// Step 1: build the user segment using KIND=0x1000 (CODE|PRIV).
// Same KIND emitOMF uses for single-segment apps. Verified
// Loader-launchable via the Finder smoke path.
auto userSeg = emitOneSeg(image, entryOffset, /*org*/0, /*segNum*/2,
/*kind*/0x1000, userSegName);
// Step 2: figure out the file offsets we'll need to bake into the
// load script. We don't know the ExpressLoad segment's total size
// yet — but we can compute it because each component is a fixed
// function of the user segment name length.
//
// ExpressLoad LCONST data layout (matches Merlin32 source — see
// BuildExpressLoadSegment in Merlin32's a65816_OMF.c):
// 6 bytes header (4-byte reserved DWORD + 2-byte count WORD)
// 8 bytes segment list (1 entry per non-ExpressLoad segment)
// 2 bytes remap list (1 entry per non-ExpressLoad segment)
// 16 bytes header info offsets (data_off, data_len, reloc_off, reloc_len)
// + header_xpress: bytes [12..43] of user header (32 bytes) + LOAD_NAME (10) + SEG_NAME (1+N)
// = 6 + 8 + 2 + 16 + 32 + 10 + 1 + N = 75 + N bytes
//
// KEY FIX from earlier emitter version: header is 6 bytes, NOT 8.
// I had written 8 bytes (file_ref WORD + reserved WORD + extra WORD +
// count WORD) based on misreading /SYSTEM/START's bytes. Merlin32
// uses (reserved DWORD + count WORD) = 6 bytes total. /SYSTEM/START
// has count=0 in the 6-byte interpretation which means it uses some
// other variant (maybe APW Express's older format), but Merlin32's
// format is what we know is GS/OS-loader-accepted today.
constexpr uint32_t HDR_SIZE = 44;
constexpr uint32_t LOAD_NAME_SIZE = 10;
constexpr uint32_t SEG_NAME_SIZE = 10; // LABLEN=10 → fixed-width SEG_NAME
const uint32_t userNameLen = (uint32_t)userSegName.size();
const uint32_t userNameAreaSize = LOAD_NAME_SIZE + SEG_NAME_SIZE;
// ExpressLoad's own segment metrics. The name "~ExpressLoad" is 12
// chars and won't fit in a LABLEN=10 field, so the ExpressLoad seg
// uses LABLEN=0 (length-prefixed name): 1 length byte + 12 chars.
const std::string elName = "~ExpressLoad";
const uint32_t elNameAreaSize = LOAD_NAME_SIZE + 1 + (uint32_t)elName.size();
// header_xpress_length = (header bytes 12..43) + LOAD_NAME + SEG_NAME
// = 32 + 10 + 10 = 52 bytes
// Per-segment ExpressLoad data: 8 (table) + 2 (remap) + 16 (offsets) + 52 = 78 bytes
// Header (6 bytes) + per-segment data: 6 + 78 = 84
const uint32_t elDataSize = 84;
(void)userNameLen; // truncated in user seg name; LABLEN=10 fixed
// Body size = 1 byte LCONST opcode + 4 byte length + data + 1 byte END
const uint32_t elBodySize = 1 + 4 + elDataSize + 1;
const uint32_t elSegSize = HDR_SIZE + elNameAreaSize + elBodySize;
// User segment file offsets (after ExpressLoad seg).
const uint32_t userSegStart = elSegSize;
const uint32_t userBodyOpOff = userSegStart + HDR_SIZE + userNameAreaSize;
const uint32_t userDataOff = userBodyOpOff + 5; // 1 op + 4 length
// Step 3: build the ExpressLoad LCONST data.
std::vector<uint8_t> elData;
// Header (6 bytes): reserved DWORD + count WORD
put32(elData, 0); // reserved
put16(elData, 0); // count = N-2 = 0 (for 2 segs)
// Segment list (1 × 8 bytes)
// Self-rel offset = (header info offset within elData) - (this entry pos)
// = 16 - 6 = 10
constexpr uint32_t segListEntryOff = 6;
const uint32_t headerInfoOff = 6 + 8 + 2; // header + segtable + remap
put16(elData, (uint16_t)(headerInfoOff - segListEntryOff));
put16(elData, 0); // flags
put32(elData, 0); // handle
// Remap list: old seg 1 (which would be our user seg without
// ExpressLoad) maps to new seg 2 (since ExpressLoad takes seg 1).
put16(elData, 2);
// Header info entry for the user segment.
put32(elData, userDataOff); // data offset in file
put32(elData, (uint32_t)image.size()); // data length
put32(elData, 0); // reloc offset (0 = no relocs)
put32(elData, 0); // reloc length
// Header copy: bytes [12..43] of user segment header, DISPDATA → 0.
if (userSeg.size() < HDR_SIZE) die("internal: user seg too small");
elData.insert(elData.end(), userSeg.begin() + 12, userSeg.begin() + HDR_SIZE);
// DISPDATA is at offset 42..43 of the original header; in the copy
// (which omits the first 12 bytes), it lands at offset 30..31.
elData[elData.size() - 32 + 30] = 0;
elData[elData.size() - 32 + 31] = 0;
// LOAD_NAME (10 bytes, space-padded — matches Merlin convention)
for (int i = 0; i < (int)LOAD_NAME_SIZE; i++) elData.push_back(0x20);
// SEG_NAME (10 bytes fixed-width, space-padded)
std::string truncated = userSegName.substr(0, SEG_NAME_SIZE);
for (size_t i = 0; i < SEG_NAME_SIZE; i++) {
elData.push_back(i < truncated.size() ? (uint8_t)truncated[i] : 0x20);
}
if (elData.size() != elDataSize)
die("internal: ExpressLoad data size mismatch");
// Step 4: build the ExpressLoad segment header.
// KIND=0x8001 (DATA|STATIC), BANKSIZE=0 (DATA segs use 0, not 0x10000).
std::vector<uint8_t> elHdr;
const uint32_t elBytecnt = HDR_SIZE + elNameAreaSize + elBodySize;
put32(elHdr, elBytecnt); // BYTECNT
put32(elHdr, 0); // RESSPC
put32(elHdr, elDataSize); // LENGTH (= LCONST data size)
elHdr.push_back(0); // undef
elHdr.push_back(0); // LABLEN
elHdr.push_back(4); // NUMLEN
elHdr.push_back(2); // VERSION (0x02 = v2.1)
put32(elHdr, 0); // BANKSIZE = 0 for DATA seg
put16(elHdr, 0x8001); // KIND = DATA|STATIC
elHdr.push_back(0); elHdr.push_back(0); // undef
put32(elHdr, 0); // ORG
put32(elHdr, 0); // ALIGN
elHdr.push_back(0); // NUMSEX
elHdr.push_back(0); // undef
put16(elHdr, 1); // SEGNUM = 1
put32(elHdr, 0); // ENTRY = 0
put16(elHdr, (uint16_t)HDR_SIZE); // DISPNAME = 44
put16(elHdr, (uint16_t)(HDR_SIZE + elNameAreaSize)); // DISPDATA
if (elHdr.size() != HDR_SIZE) die("internal: el hdr size != 44");
// Step 5: assemble the ExpressLoad segment.
std::vector<uint8_t> elSeg;
elSeg.insert(elSeg.end(), elHdr.begin(), elHdr.end());
for (int i = 0; i < (int)LOAD_NAME_SIZE; i++) elSeg.push_back(0);
elSeg.push_back((uint8_t)elName.size());
for (char c : elName) elSeg.push_back((uint8_t)c);
// Body: LCONST opcode + 4-byte length + data + END
elSeg.push_back(0xF2);
put32(elSeg, elDataSize);
elSeg.insert(elSeg.end(), elData.begin(), elData.end());
elSeg.push_back(0x00);
if (elSeg.size() != elSegSize)
die("internal: ExpressLoad segment size mismatch");
// Step 6: concatenate ExpressLoad + user segment.
std::vector<uint8_t> result;
result.insert(result.end(), elSeg.begin(), elSeg.end());
result.insert(result.end(), userSeg.begin(), userSeg.end());
return result;
}
// Bare-bones manifest parser. link816's manifest is structured as
// `{ "segments": [ { "num": N, "base": "0xHHHHHH", "size": N,
// "image": "PATH", "entry_offset": "0xHHHH" }, ... ] }` with strict
// formatting (one field per line, no nested whitespace tricks). We
// match each field with simple regex/find — good enough since we're
// the only producer of this format.
struct ManifestSeg {
uint32_t num = 0;
uint32_t base = 0;
uint32_t entryOff = 0;
std::string image;
std::string name;
};
static std::string extractStringField(const std::string &block,
const std::string &key) {
std::string needle = "\"" + key + "\":";
size_t p = block.find(needle);
if (p == std::string::npos) return {};
// Skip whitespace after the colon. If the next non-space char
// isn't a quote, the value is a bare number — return empty so
// the caller falls through to the bare-number path (without
// accidentally consuming the next field's quoted string).
p += needle.size();
while (p < block.size() && std::isspace((unsigned char)block[p])) p++;
if (p >= block.size() || block[p] != '"') return {};
size_t e = block.find('"', p + 1);
if (e == std::string::npos) return {};
return block.substr(p + 1, e - p - 1);
}
static uint32_t extractNumberField(const std::string &block,
const std::string &key) {
// Number can appear bare (size: 1234) or as a hex string ("0x...").
std::string s = extractStringField(block, key);
if (!s.empty()) {
return static_cast<uint32_t>(std::stoul(s, nullptr, 0));
}
std::string needle = "\"" + key + "\":";
size_t p = block.find(needle);
if (p == std::string::npos) return 0;
p += needle.size();
while (p < block.size() && std::isspace((unsigned char)block[p])) p++;
size_t e = p;
while (e < block.size() &&
(std::isdigit((unsigned char)block[e]) ||
block[e] == 'x' || block[e] == 'X' ||
(block[e] >= 'a' && block[e] <= 'f') ||
(block[e] >= 'A' && block[e] <= 'F'))) e++;
if (e == p) return 0;
return static_cast<uint32_t>(std::stoul(block.substr(p, e - p),
nullptr, 0));
}
static std::vector<ManifestSeg> parseManifest(const std::string &path) {
std::ifstream f(path);
if (!f) die("cannot open '" + path + "' for reading");
std::string text((std::istreambuf_iterator<char>(f)),
std::istreambuf_iterator<char>());
std::vector<ManifestSeg> segs;
// Find "segments": [ ... ] then split into per-segment {} blocks.
size_t arrStart = text.find("\"segments\"");
if (arrStart == std::string::npos) die("manifest missing 'segments'");
arrStart = text.find('[', arrStart);
if (arrStart == std::string::npos) die("manifest 'segments' not array");
size_t pos = arrStart + 1;
while (pos < text.size()) {
size_t obStart = text.find('{', pos);
if (obStart == std::string::npos) break;
// Match closing } via brace depth.
int depth = 1;
size_t obEnd = obStart + 1;
while (obEnd < text.size() && depth > 0) {
if (text[obEnd] == '{') depth++;
else if (text[obEnd] == '}') depth--;
if (depth > 0) obEnd++;
}
if (depth != 0) die("manifest segment block unterminated");
std::string block = text.substr(obStart, obEnd - obStart + 1);
ManifestSeg seg;
seg.num = extractNumberField(block, "num");
seg.base = extractNumberField(block, "base");
seg.entryOff = extractNumberField(block, "entry_offset");
seg.image = extractStringField(block, "image");
seg.name = extractStringField(block, "name");
if (seg.image.empty()) die("manifest segment missing 'image'");
if (seg.name.empty()) seg.name = "SEG" + std::to_string(seg.num);
segs.push_back(std::move(seg));
pos = obEnd + 1;
size_t closing = text.find_first_not_of(" \t\n\r,", pos);
if (closing != std::string::npos && text[closing] == ']') break;
}
if (segs.empty()) die("manifest has no segments");
return segs;
}
static uint32_t parseInt(const std::string &s) { static uint32_t parseInt(const std::string &s) {
char *end = nullptr; char *end = nullptr;
unsigned long v = std::strtoul(s.c_str(), &end, 0); unsigned long v = std::strtoul(s.c_str(), &end, 0);
@ -146,17 +510,28 @@ static uint32_t parseInt(const std::string &s) {
static void usage(const char *argv0) { static void usage(const char *argv0) {
std::fprintf(stderr, std::fprintf(stderr,
"usage: %s --input FLAT --map FILE --base ADDR --entry SYM\n" "usage: %s --input FLAT --map FILE --base ADDR --entry SYM\n"
" --output OMF [--name NAME]\n", " --output OMF [--name NAME] [--expressload]\n"
argv0); " [--relocs FILE]\n"
" %s --manifest MFEST --output OMF\n"
"\n"
" --expressload emit ExpressLoad-able OMF (required for boot\n"
" launchers under real GS/OS Loader).\n"
" --relocs FILE read IMM24 reloc list from link816's --reloc-out\n"
" sidecar; emit cRELOC (0xF5) opcodes after LCONST\n"
" so the Loader patches intra-segment 24-bit refs\n"
" (JSL/JML/STAlong/etc.) when placing the segment.\n",
argv0, argv0);
std::exit(2); std::exit(2);
} }
} // namespace } // namespace
int main(int argc, char **argv) { int main(int argc, char **argv) {
std::string input, mapFile, output, entry = "main", name; std::string input, mapFile, output, entry = "main", name, manifest;
std::string relocFile;
uint32_t base = 0; uint32_t base = 0;
bool baseSet = false; bool baseSet = false;
bool expressload = false;
int i = 1; int i = 1;
while (i < argc) { while (i < argc) {
@ -166,11 +541,70 @@ int main(int argc, char **argv) {
else if (a == "--base") { if (++i >= argc) usage(argv[0]); base = parseInt(argv[i++]); baseSet = true; } else if (a == "--base") { if (++i >= argc) usage(argv[0]); base = parseInt(argv[i++]); baseSet = true; }
else if (a == "--entry") { if (++i >= argc) usage(argv[0]); entry = argv[i++]; } else if (a == "--entry") { if (++i >= argc) usage(argv[0]); entry = argv[i++]; }
else if (a == "--name") { if (++i >= argc) usage(argv[0]); name = argv[i++]; } else if (a == "--name") { if (++i >= argc) usage(argv[0]); name = argv[i++]; }
else if (a == "--manifest") { if (++i >= argc) usage(argv[0]); manifest = argv[i++]; }
else if (a == "--output" || a == "-o") { if (++i >= argc) usage(argv[0]); output = argv[i++]; } else if (a == "--output" || a == "-o") { if (++i >= argc) usage(argv[0]); output = argv[i++]; }
else if (a == "--expressload") { expressload = true; i++; }
else if (a == "--relocs") { if (++i >= argc) usage(argv[0]); relocFile = argv[i++]; }
else if (a == "-h" || a == "--help") usage(argv[0]); else if (a == "-h" || a == "--help") usage(argv[0]);
else die("unknown option '" + a + "'"); else die("unknown option '" + a + "'");
} }
if (input.empty() || mapFile.empty() || !baseSet || output.empty()) if (output.empty()) usage(argv[0]);
// Load IMM24 reloc list, if provided.
if (!relocFile.empty()) {
auto raw = readFile(relocFile);
if (raw.size() < 4) die("--relocs file too small");
uint32_t cnt = (uint32_t)raw[0] | ((uint32_t)raw[1] << 8)
| ((uint32_t)raw[2] << 16) | ((uint32_t)raw[3] << 24);
if (raw.size() != 4 + 8 * cnt)
die("--relocs file size mismatch: count=" + std::to_string(cnt)
+ " expected " + std::to_string(4 + 8*cnt) + " bytes, got "
+ std::to_string(raw.size()));
gReloc24Sites.reserve(cnt);
for (uint32_t k = 0; k < cnt; k++) {
size_t off = 4 + k * 8;
uint32_t patchOff = (uint32_t)raw[off] | ((uint32_t)raw[off+1] << 8)
| ((uint32_t)raw[off+2] << 16) | ((uint32_t)raw[off+3] << 24);
uint32_t offRef = (uint32_t)raw[off+4] | ((uint32_t)raw[off+5] << 8)
| ((uint32_t)raw[off+6] << 16) | ((uint32_t)raw[off+7] << 24);
if (patchOff > 0xFFFF || offRef > 0xFFFF)
die("reloc site out of 16-bit range — segment too large?");
gReloc24Sites.emplace_back((uint16_t)patchOff, (uint16_t)offRef);
}
}
// Multi-segment mode.
if (!manifest.empty()) {
auto segs = parseManifest(manifest);
std::vector<uint8_t> blob;
size_t totalPayload = 0;
for (size_t k = 0; k < segs.size(); ++k) {
const auto &s = segs[k];
auto img = readFile(s.image);
// Multi-segment: STATIC | ABSBANK | CODE. STATIC tells
// the loader not to relocate the segment (we baked all
// intra-segment relocations at link time and have no
// INTERSEG / RELOC opcodes); ABSBANK + ORG=base pins it
// to a specific bank. CODE is the default (type 0).
uint16_t kind = (k == 0) ? 0x8800u : 0x8800u;
uint32_t entryOff = (k == 0) ? s.entryOff : 0;
auto seg = emitOneSeg(img, entryOff, s.base,
static_cast<uint16_t>(s.num),
kind, s.name);
blob.insert(blob.end(), seg.begin(), seg.end());
totalPayload += img.size();
}
std::ofstream f(output, std::ios::binary);
if (!f) die("cannot open '" + output + "' for writing");
f.write(reinterpret_cast<const char *>(blob.data()), blob.size());
std::fprintf(stderr,
"OMF: %zu segments, %zu bytes payload -> %s (%zu bytes total)\n",
segs.size(), totalPayload, output.c_str(), blob.size());
return 0;
}
// Legacy single-segment mode (--input/--map/--base).
if (input.empty() || mapFile.empty() || !baseSet)
usage(argv[0]); usage(argv[0]);
auto image = readFile(input); auto image = readFile(input);
@ -193,14 +627,18 @@ int main(int argc, char **argv) {
name = (dot == std::string::npos) ? base_n : base_n.substr(0, dot); name = (dot == std::string::npos) ? base_n : base_n.substr(0, dot);
} }
auto blob = emitOMF(image, entryOff, name); auto blob = expressload
? emitOmfExpressLoad(image, entryOff, name)
: emitOMF(image, entryOff, name);
std::ofstream f(output, std::ios::binary); std::ofstream f(output, std::ios::binary);
if (!f) die("cannot open '" + output + "' for writing"); if (!f) die("cannot open '" + output + "' for writing");
f.write(reinterpret_cast<const char *>(blob.data()), blob.size()); f.write(reinterpret_cast<const char *>(blob.data()), blob.size());
std::fprintf(stderr, std::fprintf(stderr,
"OMF: 1 segment, %zu bytes payload, entry='%s' at +0x%x -> %s " "OMF: %d segment%s%s, %zu bytes payload, entry='%s' at +0x%x -> %s "
"(%zu bytes total)\n", "(%zu bytes total)\n",
expressload ? 2 : 1, expressload ? "s" : "",
expressload ? " (ExpressLoad)" : "",
image.size(), entry.c_str(), entryOff, image.size(), entry.c_str(), entryOff,
output.c_str(), blob.size()); output.c_str(), blob.size());
return 0; return 0;