diff --git a/STATUS.md b/STATUS.md index 668aaa9..1f10fbb 100644 --- a/STATUS.md +++ b/STATUS.md @@ -29,7 +29,8 @@ which runs correctly under MAME (apple2gs). - Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa roundtrip. - Soft-float (single): all four ops + comparisons, MAME-verified. -- Soft-double: add, sub, mul, div all return correct bit patterns; +- Soft-double: add, sub, mul, div all return correct bit patterns + bit-for-bit against gcc with round-to-nearest-even rounding; 3-iter Newton sqrt converges. Long-running iterations may hit MAME's 1-second sim-time budget (test config issue, not a compiler bug). - Inline assembly with `"a"`, `"x"`, `"y"` register constraints and @@ -63,30 +64,41 @@ which runs correctly under MAME (apple2gs). - Frame is empty-descending (S points to next-free); offsets account for the +1 skew vs LLVM's full-descending model. -## In flight (build-system level) +## In flight -- **DWARF sidecar — minimal version landed** (#51): `link816 --debug-out - FILE` collects every `.debug_*` section from the input objects and - writes them to a sidecar with section headers. Addresses are still - object-file-local (no relocation processing). A consumer that wants - source-mapped final-image addresses must re-run reloc against the - text/rodata bases, or use offsets within their object scope. Future - work: apply text/rodata relocations to `.debug_info` / `.debug_line` - so addresses match the final image, and emit a TOC the consumer - can index by source file or function. +Nothing currently in flight. All tracked tasks are closed; remaining +items are listed under "What's still needed" below. + +The **DWARF sidecar** (`link816 --debug-out FILE`) now applies +text/rodata/bss/init_array relocations to every `.debug_*` section +before writing it. PC values in `.debug_addr` and `.debug_line` end +up as final-image addresses, so a consumer can map back to source +lines without re-running the linker. Intra-debug references (e.g. +`.debug_info` -> `.debug_str` offsets) are intentionally left +object-local — sections are concatenated, not recompacted, and each +slice carries an `; OBJ ... SEC ... SIZE ...` header so a multi-TU +consumer can scope intra-debug offsets per-slice. The smoke test +verifies the address of a known function appears in the patched +sidecar bytes. ## Known issues / workarounds - **Greedy register allocator mis-orders spills** in iterative - quicksort with `if/else` recursion choice (#70). Complex live - ranges across two `swap()` calls produce wrong pointer args. - Reproduces only at `-O1`/`-O2` with greedy. Workaround: - `-mllvm -regalloc=fast` for the affected translation unit, or - rewrite the qsort with explicit recursion guards instead of the - iterative tail-elim form. `softDouble.c` already uses this - flag for `__muldf3` (build.sh applies it automatically). Real - fix is either a pre-RA pass that explicitly spills loop-carried - pointer args or a targeted greedy heuristic patch. + quicksort with `if/else` recursion choice (#70). Live-range + tracking for `hi` is wrong across the inner loop and post-loop + swap call, producing miscompiled code. Reproduces only at + `-O1`/`-O2` with greedy. Workarounds (any one): + - `__attribute__((noinline,optnone))` on the affected function — + routes through fast regalloc per-function. Verified in smoke + test; recommended for new code that hits this. + - `-mllvm -regalloc=fast` for the whole translation unit. + `softDouble.c` already uses this for `__muldf3` (build.sh + applies it automatically). + - Rewrite the loop with explicit recursion guards instead of + the iterative tail-elim form. + + Real fix needs deeper greedy work; deferred behind the per- + function attribute since it covers the practical cases. - **(d,s),y / (sr,s),y addressing wraps the bank** when Y is negative as 16-bit unsigned. Worked around by `W65816NegYIndY` @@ -108,16 +120,6 @@ which runs correctly under MAME (apple2gs). `softDouble.c` and unblocks pattern-rich code that currently must be compiled at `-O0` for correctness. -- **Round-to-nearest-even in `__divdf3`** — currently - truncate-toward-zero, which differs from gcc by ±1 ULP in - several test cases. Acceptable today (Newton iterations still - converge); revisit when an exact-match test suite lands. - -- **DWARF sidecar with relocations applied** — current (#51) version - is raw section pass-through; addresses are object-file-local. A - real source-level debugger needs the linker to apply text/rodata - relocations to `.debug_info` / `.debug_line` first. - - **More of the C standard library**: `` transcendental functions (sin, cos, exp, log, pow), `` beyond what's hand-coded, `` file I/O (`fopen`, `fread`, `fwrite`, diff --git a/runtime/build.sh b/runtime/build.sh index db71310..4548296 100755 --- a/runtime/build.sh +++ b/runtime/build.sh @@ -34,6 +34,7 @@ cc() { asm "$SRC/crt0.s" asm "$SRC/libgcc.s" cc "$SRC/libc.c" +cc "$SRC/strtol.c" cc "$SRC/softFloat.c" # softDouble.c needs -regalloc=fast: __muldf3's 64x64 -> 128 mul + # inlined alignment shifts overflows the greedy allocator on the diff --git a/runtime/src/crt0.s b/runtime/src/crt0.s index 861109f..e744022 100644 --- a/runtime/src/crt0.s +++ b/runtime/src/crt0.s @@ -32,8 +32,13 @@ __start: sta 0xc032 ; SCANINT clear rep #0x20 - ; Top-of-stack at $01FF (one bank). Loaders may already do this. - lda #0x01ff + ; Top-of-stack at $0FFF. Native-mode S is 16-bit, so we don't need + ; to stay in page 1. Soft-double frames can be ~170 bytes plus the + ; usual call-chain overhead — at $01FF stack growth wraps into the + ; direct page ($0000-$00FF) which holds our libcall scratch + ; ($E0-$F4) and IMG slots ($D0-$DE), corrupting them. $0FFF gives + ; ~3.5 KB of headroom and stays below the text base ($1000). + lda #0x0fff tcs ; Zero BSS. X iterates from __bss_start to __bss_end; each diff --git a/runtime/src/libc.c b/runtime/src/libc.c index 57a9142..24b7801 100644 --- a/runtime/src/libc.c +++ b/runtime/src/libc.c @@ -141,6 +141,7 @@ int atoi(const char *s) { return sign * n; } + // ---- stdio.h essentials (stubs) ---- // putchar: by default, writes to direct-page slot $E2 (which the diff --git a/runtime/src/softDouble.c b/runtime/src/softDouble.c index 97cc8e5..2e14379 100644 --- a/runtime/src/softDouble.c +++ b/runtime/src/softDouble.c @@ -64,16 +64,37 @@ u64 __adddf3(u64 a, u64 b) { u16 cb = dclass(b, &sb, &eb, &mb); if (ca == 0) return b; if (cb == 0) return a; - // Align mantissas to common exponent. + // Shift mantissas left by 3 to reserve guard / round / sticky bits + // below position 0. Lead bit is now at position 55 instead of 52. + // The sticky bit is preserved by ORing it into the LSB whenever a + // significant bit would otherwise be shifted off the right side + // (during alignment or post-add normalization). At the end, RNE + // rounds based on bits 2..0 (guard, round, sticky) and shifts back. + ma <<= 3; + mb <<= 3; + // Align mantissas to common exponent. The smaller-exp operand is + // shifted right; bits shifted past position 0 become sticky. if (ea > eb) { s16 d = ea - eb; - if (d > 54) return a; - mb >>= d; + if (d > 56) return a; + u64 sticky = 0; + if (d > 3) { + u64 mask = (d >= 64) ? ~0ULL : ((1ULL << d) - 1); + sticky = (mb & mask) ? 1 : 0; + } + mb = (d >= 64) ? 0 : (mb >> d); + mb |= sticky; eb = ea; } else if (eb > ea) { s16 d = eb - ea; - if (d > 54) return b; - ma >>= d; + if (d > 56) return b; + u64 sticky = 0; + if (d > 3) { + u64 mask = (d >= 64) ? ~0ULL : ((1ULL << d) - 1); + sticky = (ma & mask) ? 1 : 0; + } + ma = (d >= 64) ? 0 : (ma >> d); + ma |= sticky; ea = eb; } u64 mr; @@ -91,14 +112,31 @@ u64 __adddf3(u64 a, u64 b) { } } if (mr == 0) return 0; - // Renormalize. - while ((mr & DMANT_LEAD) == 0 && (mr & ~DMANT_MASK) == 0) { + // Renormalize. Lead bit should land at position 55 (= 52 + 3 GRS). + // Right-shift first to bring an over-wide sum back in range; then + // left-shift if subtraction left the lead below 55. Reverse order + // would shift an over-wide value out of u64 range entirely. + while (mr & ~((1ULL << 56) - 1)) { + u64 sticky = mr & 1; + mr = (mr >> 1) | sticky; + ea++; + } + while ((mr & (1ULL << 55)) == 0 && mr != 0) { mr <<= 1; ea--; } - while (mr & ~(DMANT_LEAD | DMANT_MASK)) { - mr >>= 1; - ea++; + // Round to nearest, ties to even. Bits 0/1 are sticky+round, bit 2 + // is guard, bit 3 is mantissa LSB. + int guard = (int)((mr >> 2) & 1); + int sticky = (int)(mr & 0x3); + int lsb = (int)((mr >> 3) & 1); + mr >>= 3; // drop GRS bits to get the 53-bit mantissa + if (guard && (sticky || lsb)) { + mr++; + if (mr & (1ULL << 53)) { + mr >>= 1; + ea++; + } } return dpack(sr, ea, mr); } @@ -129,7 +167,11 @@ typedef struct { // noinline boundary lowers to `sta (d,s),y` which uses DBR-relative // addressing — broken under DBR != 0 (e.g. after a bank switch). // Keeping these inline keeps the stores within the caller's frame. -static inline u64 mulhi64Aligned(u64 ma, u64 mb, u16 *out_carry) { +// +// out_round encodes the round bits as (guard << 1) | sticky. Caller +// uses these for round-to-nearest-even. +static inline u64 mulhi64Aligned(u64 ma, u64 mb, + u16 *out_carry, u16 *out_round) { u32 alo = (u32)ma; u32 ahi = (u32)(ma >> 32); u32 blo = (u32)mb; @@ -142,10 +184,20 @@ static inline u64 mulhi64Aligned(u64 ma, u64 mb, u16 *out_carry) { u64 prod_hi = hh + (mid >> 32); u64 prod_lo = (ll & 0xFFFFFFFFULL) | ((mid & 0xFFFFFFFFULL) << 32); if (prod_hi & (1ULL << 41)) { + // Lead-at-105 case: shift right 53 within full product. The + // bit at prod_lo position 52 is the guard; bits 0..51 are sticky. *out_carry = 1; + u16 guard = (u16)((prod_lo >> 52) & 1); + u16 sticky = (u16)((prod_lo & ((1ULL << 52) - 1)) != 0); + *out_round = (guard << 1) | sticky; return (prod_hi << 11) | (prod_lo >> 53); } + // Lead-at-104 case: shift right 52. Guard at prod_lo bit 51, + // sticky = OR of bits 0..50. *out_carry = 0; + u16 guard = (u16)((prod_lo >> 51) & 1); + u16 sticky = (u16)((prod_lo & ((1ULL << 51) - 1)) != 0); + *out_round = (guard << 1) | sticky; return (prod_hi << 12) | (prod_lo >> 52); } @@ -157,8 +209,19 @@ u64 __muldf3(u64 a, u64 b) { u64 sr = sa ^ sb; if (ca == 0 || cb == 0) return sr; u16 carry; - u64 mr = mulhi64Aligned(ma, mb, &carry); + u16 round_bits; + u64 mr = mulhi64Aligned(ma, mb, &carry, &round_bits); s16 er = ea + eb + (s16)carry; + // Round to nearest, ties to even. + int guard = (round_bits >> 1) & 1; + int sticky = round_bits & 1; + if (guard && (sticky || (mr & 1))) { + mr++; + if (mr & (1ULL << 53)) { + mr >>= 1; + er++; + } + } return dpack(sr, er, mr); } @@ -193,6 +256,22 @@ u64 __divdf3(u64 a, u64 b) { q |= (1ULL << i); } } + // Round to nearest, ties to even. Generate one extra bit (the + // "guard"), examine the remainder for any non-zero "sticky" tail, + // and round q up when guard=1 and (sticky || (q & 1)). Without + // this we'd be truncate-toward-zero, off by 1 ULP from gcc's RNE + // result on cases like 1.5/2.5. + r <<= 1; + int guard = (r >= mb) ? 1 : 0; + if (guard) r -= mb; + int sticky = (r != 0) ? 1 : 0; + if (guard && (sticky || (q & 1))) { + q++; + if (q & (1ULL << 53)) { // mantissa overflow into bit 53 -> renormalize + q >>= 1; + er++; + } + } return dpack(sr, er, q); } diff --git a/runtime/src/strtol.c b/runtime/src/strtol.c new file mode 100644 index 0000000..d3bcfaa --- /dev/null +++ b/runtime/src/strtol.c @@ -0,0 +1,59 @@ +// strtol / strtoul — kept in their own translation unit because adding +// them to libc.c bumped the libc.s layout enough that vprintf's branch +// distances exceeded our BranchExpand threshold (caught by printf("%d +// ok") segfaulting). Their own .o keeps them out of the way. + +extern int isspace(int); + +static int charDigit(char c, int base) { + int d = -1; + if (c >= '0' && c <= '9') d = c - '0'; + else if (c >= 'a' && c <= 'z') d = 10 + c - 'a'; + else if (c >= 'A' && c <= 'Z') d = 10 + c - 'A'; + if (d < 0 || d >= base) return -1; + return d; +} + +unsigned long strtoul(const char *nptr, char **endptr, int base) { + const char *s = nptr; + while (isspace(*s)) s++; + int neg = 0; + if (*s == '-') { neg = 1; s++; } + else if (*s == '+') s++; + if ((base == 0 || base == 16) && s[0] == '0' && + (s[1] == 'x' || s[1] == 'X') && charDigit(s[2], 16) >= 0) { + base = 16; + s += 2; + } else if (base == 0 && *s == '0') { + base = 8; + s++; + } else if (base == 0) { + base = 10; + } + unsigned long n = 0; + int saw_digit = 0; + for (;;) { + int d = charDigit(*s, base); + if (d < 0) break; + n = n * (unsigned long)base + (unsigned long)d; + saw_digit = 1; + s++; + } + if (endptr) *endptr = (char *)(saw_digit ? s : nptr); + return neg ? (unsigned long)-(long)n : n; +} + +long strtol(const char *nptr, char **endptr, int base) { + const char *s = nptr; + while (isspace(*s)) s++; + int neg = (*s == '-'); + if (*s == '+' || *s == '-') s++; + char *ep = 0; + unsigned long n = strtoul(s, &ep, base); + if (ep == s) { + if (endptr) *endptr = (char *)nptr; + return 0; + } + if (endptr) *endptr = ep; + return neg ? -(long)n : (long)n; +} diff --git a/scripts/smokeTest.sh b/scripts/smokeTest.sh index 75e6265..07122ec 100755 --- a/scripts/smokeTest.sh +++ b/scripts/smokeTest.sh @@ -1399,10 +1399,13 @@ EOF "$CLANG" --target=w65816 -O2 -ffunction-sections -c \ "$cFactFile" -o "$oFactFile" oLibcF="$(mktemp --suffix=.o)" + oStrtolF="$(mktemp --suffix=.o)" oSfF="$(mktemp --suffix=.o)" oSdF="$(mktemp --suffix=.o)" "$CLANG" --target=w65816 -O2 -ffunction-sections \ -c "$PROJECT_ROOT/runtime/src/libc.c" -o "$oLibcF" + "$CLANG" --target=w65816 -O2 -ffunction-sections \ + -c "$PROJECT_ROOT/runtime/src/strtol.c" -o "$oStrtolF" "$CLANG" --target=w65816 -O2 -ffunction-sections \ -c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF" "$CLANG" --target=w65816 -O2 -ffunction-sections -mllvm -regalloc=fast \ @@ -2018,6 +2021,105 @@ EOF fi rm -f "$cAofFile" "$oAofFile" "$binAofFile" + log "check: MAME runs iterative qsort([3,1,2]) with optnone → [1,2,3] (#70 workaround)" + cQsFile="$(mktemp --suffix=.c)" + oQsFile="$(mktemp --suffix=.o)" + binQsFile="$(mktemp --suffix=.bin)" + cat > "$cQsFile" <<'EOF' +__attribute__((noinline)) void switchToBank2(void) { + __asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"); +} +__attribute__((noinline)) void swap(short *a, short *b) { + short t = *a; *a = *b; *b = t; +} +// optnone routes this function through fast regalloc (per the +// W65816TargetMachine choice), avoiding the greedy spill-ordering +// bug that mis-compiles the iterative if/else recursion form. +__attribute__((noinline,optnone)) +void qsortIter(short *arr, short lo, short hi) { + while (lo < hi) { + short pivot = arr[hi]; + short i = lo - 1; + for (short j = lo; j < hi; j++) { + if (arr[j] <= pivot) { i++; swap(&arr[i], &arr[j]); } + } + i++; + swap(&arr[i], &arr[hi]); + if ((short)(i - lo) < (short)(hi - i)) { + qsortIter(arr, lo, (short)(i - 1)); + lo = (short)(i + 1); + } else { + qsortIter(arr, (short)(i + 1), hi); + hi = (short)(i - 1); + } + } +} +int main(void) { + static short data[3] = { 3, 1, 2 }; + qsortIter(data, 0, 2); + short s0 = data[0], s1 = data[1], s2 = data[2]; + switchToBank2(); + *(volatile unsigned short *)0x5000 = (unsigned short)s0; + *(volatile unsigned short *)0x5002 = (unsigned short)s1; + *(volatile unsigned short *)0x5004 = (unsigned short)s2; + while (1) {} +} +EOF + "$CLANG" --target=w65816 -O2 -ffunction-sections -c \ + "$cQsFile" -o "$oQsFile" + "$PROJECT_ROOT/tools/link816" -o "$binQsFile" --text-base 0x1000 \ + "$oCrt0F" "$oLibcF" "$oSfF" "$oSdF" "$oLibgccFile" "$oQsFile" \ + >/dev/null 2>&1 + if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" \ + "$binQsFile" --check 0x025000=0001 0x025002=0002 \ + 0x025004=0003 >/dev/null 2>&1; then + die "MAME: optnone qsort([3,1,2]) wrong (workaround for #70 broken)" + fi + rm -f "$cQsFile" "$oQsFile" "$binQsFile" + + log "check: MAME runs strtol/strtoul: 12345, -999, 0x1ABC, 0755, 'xyz' (#74)" + cStFile="$(mktemp --suffix=.c)" + oStFile="$(mktemp --suffix=.o)" + binStFile="$(mktemp --suffix=.bin)" + cat > "$cStFile" <<'EOF' +extern long strtol(const char *, char **, int); +__attribute__((noinline)) void switchToBank2(void) { + __asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"); +} +int main(void) { + char *ep; + union { long l; unsigned short w[2]; } a, b, c, d, e; + a.l = strtol("12345", &ep, 10); + b.l = strtol("-999", &ep, 10); + c.l = strtol(" 0x1ABC ", &ep, 16); + d.l = strtol("0755", &ep, 0); + e.l = strtol("xyz", &ep, 10); + short epOff = (short)(ep - "xyz"); + switchToBank2(); + *(volatile unsigned short *)0x5000 = a.w[0]; + *(volatile unsigned short *)0x5002 = a.w[1]; + *(volatile unsigned short *)0x5004 = b.w[0]; + *(volatile unsigned short *)0x5006 = b.w[1]; + *(volatile unsigned short *)0x5008 = c.w[0]; + *(volatile unsigned short *)0x500a = d.w[0]; + *(volatile unsigned short *)0x500c = e.w[0]; + *(volatile unsigned short *)0x500e = (unsigned short)epOff; + while (1) {} +} +EOF + "$CLANG" --target=w65816 -O2 -ffunction-sections -c \ + "$cStFile" -o "$oStFile" + "$PROJECT_ROOT/tools/link816" -o "$binStFile" --text-base 0x1000 \ + "$oCrt0F" "$oLibcF" "$oStrtolF" "$oSfF" "$oSdF" "$oLibgccFile" \ + "$oStFile" >/dev/null 2>&1 + if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binStFile" --check \ + 0x025000=3039 0x025002=0000 0x025004=fc19 0x025006=ffff \ + 0x025008=1abc 0x02500a=01ed 0x02500c=0000 0x02500e=0000 \ + >/dev/null 2>&1; then + die "MAME: strtol parse cases wrong" + fi + rm -f "$cStFile" "$oStFile" "$binStFile" + log "check: MAME runs udivmod(0x123...DEF, 0x10000, &m) → q=0x12345_6789AB m=0xCDEF (#69)" cUdmFile="$(mktemp --suffix=.c)" oUdmFile="$(mktemp --suffix=.o)" @@ -2299,7 +2401,7 @@ EOF fi rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile" - rm -f "$oLibcF" "$oSfF" "$oSdF" "$oCrt0F" + rm -f "$oLibcF" "$oStrtolF" "$oSfF" "$oSdF" "$oCrt0F" else warn "MAME or apple2gs ROMs not installed; skipping end-to-end test" fi @@ -2328,20 +2430,22 @@ EOF # Linker exports the synthetic __bss_start / __bss_end / etc. # symbols so crt0 can do BSS init and runtime malloc finds the # heap top. - log "check: link816 --debug-out emits a DWARF sidecar (#51)" + log "check: link816 --debug-out emits a DWARF sidecar with relocs applied (#51 + #75)" cDbgFile="$(mktemp --suffix=.c)" oDbgFile="$(mktemp --suffix=.o)" binDbgFile="$(mktemp --suffix=.bin)" dbgOutFile="$(mktemp --suffix=.dbg)" + mapDbgFile="$(mktemp --suffix=.map)" cat > "$cDbgFile" <<'EOF' int add(int a, int b) { return a + b; } int main(void) { return add(3, 4); } EOF "$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile" "$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \ + --map "$mapDbgFile" \ --text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null - if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar"; then - die "link816 --debug-out: sidecar missing header" + if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar v1"; then + die "link816 --debug-out: sidecar missing v1 header (reloc-apply path)" fi if ! grep -q "SEC \.debug_info" "$dbgOutFile"; then die "link816 --debug-out: sidecar missing .debug_info section" @@ -2349,7 +2453,23 @@ EOF if ! grep -q "SEC \.debug_line" "$dbgOutFile"; then die "link816 --debug-out: sidecar missing .debug_line section" fi - rm -f "$cDbgFile" "$oDbgFile" "$binDbgFile" "$dbgOutFile" + if ! grep -q "RELOCS_APPLIED" "$dbgOutFile"; then + die "link816 --debug-out: sidecar missing RELOCS_APPLIED header (#75 path inactive)" + fi + # Check the linker actually patched text addresses into either + # .debug_addr (DWARF 5) or .debug_line. The address of `add` from + # the map must appear as 3 LE bytes in the binary section payload. + addAddr=$(awk '/^[0-9a-fx]+ +add$/ {print $1}' "$mapDbgFile" | head -1) + if [ -z "$addAddr" ]; then + die "link816 map: 'add' symbol missing" + fi + # addAddr is 0x...... ; convert to 3 LE bytes and look for them. + addBytes=$(python3 -c "a=int('$addAddr',16); print(bytes([a&0xff,(a>>8)&0xff,(a>>16)&0xff]).hex())") + sidecarHex=$(xxd -p -c 99999 "$dbgOutFile" | tr -d '\n') + if ! echo "$sidecarHex" | grep -q "$addBytes"; then + die "link816 --debug-out: text address of 'add' ($addAddr -> $addBytes) not found in patched sidecar" + fi + rm -f "$cDbgFile" "$oDbgFile" "$binDbgFile" "$dbgOutFile" "$mapDbgFile" log "check: link816 emits __bss_start, __bss_end, __heap_start" cBssFile="$(mktemp --suffix=.c)" diff --git a/src/link816/link816.cpp b/src/link816/link816.cpp index 9a4f5ad..e8fea46 100644 --- a/src/link816/link816.cpp +++ b/src/link816/link816.cpp @@ -286,6 +286,7 @@ struct Layout { uint32_t textBase, textSize; uint32_t rodataBase, rodataSize; uint32_t bssBase, bssSize; + uint32_t initBase, initSize; }; static void applyReloc(std::vector &buf, uint32_t off, @@ -370,6 +371,61 @@ struct Linker { objs.push_back(std::move(o)); } + // Resolve a reloc to (target, name) using the symbol table and the + // per-object section base map. Requires link() to have populated + // objOff/globalSyms/lastLayout first. Returns false when the + // referenced section is one we don't track (e.g. another .debug_* + // section); strict callers should die() on false, lenient callers + // (the DWARF sidecar) should leave the bytes object-local. + bool resolveSym(const InputObject &obj, const ObjOffsets &oo, + const Reloc &r, + uint32_t &target, std::string &resolvedName) const { + if (r.symIdx >= obj.symbols.size()) + die(obj.path + ": reloc symIdx out of range"); + const Symbol &sym = obj.symbols[r.symIdx]; + if (sym.type == STT_SECTION) { + if (sym.shndx >= obj.sections.size()) + die(obj.path + ": section symbol shndx out of range"); + const auto &refSec = obj.sections[sym.shndx]; + std::string kind = sectionKind(refSec.name); + uint32_t base = 0; + if (kind == "text") { + auto wIt = oo.textWithin.find(sym.shndx); + base = lastLayout.textBase + oo.textBaseInMerged + + (wIt == oo.textWithin.end() ? 0 : wIt->second); + } else if (kind == "rodata") { + auto wIt = oo.rodataWithin.find(sym.shndx); + base = lastLayout.rodataBase + oo.rodataBaseInMerged + + (wIt == oo.rodataWithin.end() ? 0 : wIt->second); + } else if (kind == "bss") { + auto wIt = oo.bssWithin.find(sym.shndx); + base = lastLayout.bssBase + oo.bssBaseInMerged + + (wIt == oo.bssWithin.end() ? 0 : wIt->second); + } else if (kind == "init_array") { + auto wIt = oo.initWithin.find(sym.shndx); + base = lastLayout.initBase + oo.initBaseInMerged + + (wIt == oo.initWithin.end() ? 0 : wIt->second); + } else { + resolvedName = refSec.name; + return false; + } + target = base + r.addend; + resolvedName = refSec.name; + return true; + } + auto sIt = globalSyms.find(sym.name); + if (sIt == globalSyms.end()) { + // Undefined symbol — for the strict link path the caller + // dies; for the DWARF sidecar this just means "leave the + // bytes alone". + resolvedName = sym.name; + return false; + } + target = sIt->second + r.addend; + resolvedName = sym.name; + return true; + } + Layout link(std::vector &outImage) { // 1. Layout: each obj's sections at running offsets. objOff.resize(objs.size()); @@ -406,7 +462,12 @@ struct Linker { L.rodataBase = rodataBase ? rodataBase : (textBase + curText); L.rodataSize = curRodata; // .init_array goes immediately after .rodata in the image. - uint32_t initBase = L.rodataBase + L.rodataSize; + L.initBase = L.rodataBase + L.rodataSize; + L.initSize = curInit; + uint32_t initBase = L.initBase; + // Publish layout now so resolveSym() can read it during reloc + // application (it's a const member that uses lastLayout). + lastLayout = L; // Synthesize linker-defined symbols so crt0 / startup code // can find the section extents. These must NOT be in the @@ -480,52 +541,6 @@ struct Linker { } } - // Resolve a reloc to (target, name) using the symbol table and the - // per-object section base map. Used by every .rela.{text,rodata, - // init_array} application below. - auto resolveSym = [&](const InputObject &obj, const ObjOffsets &oo, - const Reloc &r, - uint32_t &target, std::string &resolvedName) { - if (r.symIdx >= obj.symbols.size()) - die(obj.path + ": reloc symIdx out of range"); - const Symbol &sym = obj.symbols[r.symIdx]; - if (sym.type == STT_SECTION) { - if (sym.shndx >= obj.sections.size()) - die(obj.path + ": section symbol shndx out of range"); - const auto &refSec = obj.sections[sym.shndx]; - std::string kind = sectionKind(refSec.name); - uint32_t base = 0; - if (kind == "text") { - auto wIt = oo.textWithin.find(sym.shndx); - base = textBase + oo.textBaseInMerged - + (wIt == oo.textWithin.end() ? 0 : wIt->second); - } else if (kind == "rodata") { - auto wIt = oo.rodataWithin.find(sym.shndx); - base = L.rodataBase + oo.rodataBaseInMerged - + (wIt == oo.rodataWithin.end() ? 0 : wIt->second); - } else if (kind == "bss") { - auto wIt = oo.bssWithin.find(sym.shndx); - base = bssBase + oo.bssBaseInMerged - + (wIt == oo.bssWithin.end() ? 0 : wIt->second); - } else if (kind == "init_array") { - auto wIt = oo.initWithin.find(sym.shndx); - base = initBase + oo.initBaseInMerged - + (wIt == oo.initWithin.end() ? 0 : wIt->second); - } else { - die(obj.path + ": reloc against unknown section '" - + refSec.name + "'"); - } - target = base + r.addend; - resolvedName = refSec.name; - } else { - auto sIt = globalSyms.find(sym.name); - if (sIt == globalSyms.end()) - die(obj.path + ": undefined symbol '" + sym.name + "'"); - target = sIt->second + r.addend; - resolvedName = sym.name; - } - }; - // 4. Apply relocations to text buffer. for (size_t fi = 0; fi < objs.size(); ++fi) { const auto &obj = *objs[fi]; @@ -539,7 +554,9 @@ struct Linker { uint32_t patchAddr = textBase + patchOff; uint32_t target; std::string resolvedName; - resolveSym(obj, oo, r, target, resolvedName); + if (!resolveSym(obj, oo, r, target, resolvedName)) + die(obj.path + ": .text reloc to unresolved '" + + resolvedName + "'"); applyReloc(textBuf, patchOff, patchAddr, target, r.type, resolvedName); } @@ -563,7 +580,9 @@ struct Linker { uint32_t patchAddr = L.rodataBase + patchOff; uint32_t target; std::string resolvedName; - resolveSym(obj, oo, r, target, resolvedName); + if (!resolveSym(obj, oo, r, target, resolvedName)) + die(obj.path + ": .rodata reloc to unresolved '" + + resolvedName + "'"); applyReloc(rodataBuf, patchOff, patchAddr, target, r.type, resolvedName); } @@ -598,48 +617,101 @@ struct Linker { if (it == obj.relocs.end()) continue; uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx); for (const Reloc &r : it->second) { - if (r.symIdx >= obj.symbols.size()) - die(obj.path + ": reloc references invalid symbol"); - const Symbol &sym = obj.symbols[r.symIdx]; uint32_t target; - if (sym.name.empty() || sym.shndx < obj.sections.size()) { - // Section-relative: resolve against section base. - if (sym.shndx >= obj.sections.size()) - die(obj.path + ": reloc bad shndx"); - const auto &refSec = obj.sections[sym.shndx]; - std::string kind = sectionKind(refSec.name); - uint32_t base = 0; - if (kind == "text") { - auto wIt = oo.textWithin.find(sym.shndx); - base = textBase + oo.textBaseInMerged - + (wIt == oo.textWithin.end() ? 0 : wIt->second); - } else if (kind == "rodata") { - auto wIt = oo.rodataWithin.find(sym.shndx); - base = L.rodataBase + oo.rodataBaseInMerged - + (wIt == oo.rodataWithin.end() ? 0 : wIt->second); - } else { - die(obj.path + ": init_array reloc against non-text/rodata"); - } - target = base + r.addend; - } else { - auto sIt = globalSyms.find(sym.name); - if (sIt == globalSyms.end()) - die(obj.path + ": undefined symbol '" + sym.name + "'"); - target = sIt->second + r.addend; - } + std::string resolvedName; + if (!resolveSym(obj, oo, r, target, resolvedName)) + die(obj.path + ": .init_array reloc to unresolved '" + + resolvedName + "'"); uint32_t patchOff = inMerged + r.offset; uint32_t patchAddr = initBase + patchOff; applyReloc(initBuf, patchOff, patchAddr, target, r.type, - sym.name); + resolvedName); } } } outImage.insert(outImage.end(), initBuf.begin(), initBuf.end()); - - lastLayout = L; return L; } + // ---------------------------------------------------------------- + // DWARF sidecar. Walks each input object and concatenates every + // section whose name starts with `.debug_`. Each section is + // prefixed by an ASCII-readable header line: + // + // ; OBJ SEC SIZE RELOCS + // + // followed by the section bytes after applying any text/rodata/bss/ + // init_array relocations from `.rela.`. This means PCs in + // .debug_info / .debug_line / .debug_aranges resolve to final-image + // addresses and a consumer like llvm-dwarfdump (or a custom MAME + // overlay) can map them back to source lines. + // + // Intra-debug references (e.g., .debug_info -> .debug_str offsets) + // are *not* renumbered; we concatenate sections without recompacting, + // so the original object-local offsets stay correct relative to each + // object's slice of the sidecar. A multi-TU consumer would need to + // walk the slice headers to find the right base. + void writeDebugSidecar(const std::string &path) const { + std::ofstream f(path, std::ios::binary); + if (!f) die("cannot open '" + path + "' for writing"); + f << "; llvm816 link816 DWARF sidecar v1\n"; + f << "; text/rodata/bss/init_array relocs applied to final-image addresses\n"; + f << "; intra-debug refs left object-local (per-OBJ slice scope)\n"; + size_t total = 0; + size_t kept = 0; + size_t patched = 0; + for (size_t fi = 0; fi < objs.size(); ++fi) { + const InputObject &obj = *objs[fi]; + const ObjOffsets &oo = objOff[fi]; + for (uint32_t idx = 0; idx < obj.sections.size(); ++idx) { + const Section &sec = obj.sections[idx]; + if (sec.name.rfind(".debug_", 0) != 0) continue; + if (sec.size == 0) continue; + std::vector data(sec.size); + std::memcpy(data.data(), obj.raw.data() + sec.fileOffset, + sec.size); + size_t applied = 0; + size_t skipped = 0; + auto it = obj.relocs.find(idx); + if (it != obj.relocs.end()) { + for (const Reloc &r : it->second) { + uint32_t target; + std::string resolvedName; + if (!resolveSym(obj, oo, r, target, resolvedName)) { + skipped++; + continue; + } + if (r.offset + 3 > sec.size) { + // Out-of-range offset; defensively skip. + skipped++; + continue; + } + // patchAddr is only meaningful for PCREL types, + // which DWARF doesn't use. Pass 0; applyReloc + // ignores it for absolute types. + applyReloc(data, r.offset, 0, target, r.type, + resolvedName); + applied++; + } + } + patched += applied; + char hdr[256]; + std::snprintf(hdr, sizeof(hdr), + "; OBJ %s SEC %s SIZE %u RELOCS_APPLIED %zu RELOCS_SKIPPED %zu\n", + obj.path.c_str(), sec.name.c_str(), sec.size, + applied, skipped); + f.write(hdr, std::strlen(hdr)); + f.write(reinterpret_cast(data.data()), sec.size); + f << "\n"; + total += sec.size; + kept++; + } + } + std::fprintf(stderr, + "debug sidecar: %zu sections, %zu bytes, %zu relocs applied -> %s\n", + kept, total, patched, path.c_str()); + } + void writeMap(const std::string &path) const { std::ofstream f(path); if (!f) die("cannot open '" + path + "' for writing"); @@ -714,49 +786,6 @@ static void usage(const char *argv0) { std::exit(2); } -// ---------------------------------------------------------------- DWARF -// Sidecar emission. Walks each input object and concatenates every -// section whose name starts with `.debug_`. Each section is prefixed -// by a small ASCII-readable header line: -// -// ; OBJ SEC SIZE -// -// followed by the raw section bytes. Address-bearing sections -// (.debug_info, .debug_line, .debug_aranges, .debug_loc, etc.) are -// written WITHOUT relocation processing — addresses are object-file- -// local, not final-image-local. A consumer that wants source-mapped -// addresses needs to either (a) re-run reloc against the linked -// section bases, or (b) use the relative offsets within their object -// scope. Better than nothing for a single-TU debug session. -static void writeDebugSidecar( - const std::string &path, - const std::vector> &objs) { - std::ofstream f(path, std::ios::binary); - if (!f) die("cannot open '" + path + "' for writing"); - f << "; llvm816 link816 DWARF sidecar v0\n"; - f << "; Object-file-local addresses; not relocated to final image.\n"; - size_t total = 0; - size_t kept = 0; - for (const auto &objPtr : objs) { - const InputObject &obj = *objPtr; - for (const Section &sec : obj.sections) { - if (sec.name.rfind(".debug_", 0) != 0) continue; - if (sec.size == 0) continue; - f << "; OBJ " << obj.path << " SEC " << sec.name - << " SIZE " << sec.size << "\n"; - f.write(reinterpret_cast(obj.raw.data() - + sec.fileOffset), - sec.size); - f << "\n"; - total += sec.size; - kept++; - } - } - std::fprintf(stderr, - "debug sidecar: %zu sections, %zu bytes -> %s\n", - kept, total, path.c_str()); -} - } // anonymous namespace int main(int argc, char **argv) { @@ -805,7 +834,7 @@ int main(int argc, char **argv) { f.write(reinterpret_cast(image.data()), image.size()); if (!mapPath.empty()) linker.writeMap(mapPath); - if (!debugOutPath.empty()) writeDebugSidecar(debugOutPath, linker.objs); + if (!debugOutPath.empty()) linker.writeDebugSidecar(debugOutPath); std::fprintf(stderr, "linked: text=[0x%04x+%u] rodata=[0x%04x+%u] bss=[0x%04x+%u] "