Checkpoint.

This commit is contained in:
Scott Duensing 2026-04-30 20:12:11 -05:00
parent f80a49dc1e
commit a702f4a970
8 changed files with 469 additions and 173 deletions

View file

@ -29,7 +29,8 @@ which runs correctly under MAME (apple2gs).
- Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa - Strings: hand-rolled `strlen`, `strcmp`, `strcpy`, `strchr`, atoi/itoa
roundtrip. roundtrip.
- Soft-float (single): all four ops + comparisons, MAME-verified. - Soft-float (single): all four ops + comparisons, MAME-verified.
- Soft-double: add, sub, mul, div all return correct bit patterns; - Soft-double: add, sub, mul, div all return correct bit patterns
bit-for-bit against gcc with round-to-nearest-even rounding;
3-iter Newton sqrt converges. Long-running iterations may hit MAME's 3-iter Newton sqrt converges. Long-running iterations may hit MAME's
1-second sim-time budget (test config issue, not a compiler bug). 1-second sim-time budget (test config issue, not a compiler bug).
- Inline assembly with `"a"`, `"x"`, `"y"` register constraints and - Inline assembly with `"a"`, `"x"`, `"y"` register constraints and
@ -63,30 +64,41 @@ which runs correctly under MAME (apple2gs).
- Frame is empty-descending (S points to next-free); offsets account - Frame is empty-descending (S points to next-free); offsets account
for the +1 skew vs LLVM's full-descending model. for the +1 skew vs LLVM's full-descending model.
## In flight (build-system level) ## In flight
- **DWARF sidecar — minimal version landed** (#51): `link816 --debug-out Nothing currently in flight. All tracked tasks are closed; remaining
FILE` collects every `.debug_*` section from the input objects and items are listed under "What's still needed" below.
writes them to a sidecar with section headers. Addresses are still
object-file-local (no relocation processing). A consumer that wants The **DWARF sidecar** (`link816 --debug-out FILE`) now applies
source-mapped final-image addresses must re-run reloc against the text/rodata/bss/init_array relocations to every `.debug_*` section
text/rodata bases, or use offsets within their object scope. Future before writing it. PC values in `.debug_addr` and `.debug_line` end
work: apply text/rodata relocations to `.debug_info` / `.debug_line` up as final-image addresses, so a consumer can map back to source
so addresses match the final image, and emit a TOC the consumer lines without re-running the linker. Intra-debug references (e.g.
can index by source file or function. `.debug_info` -> `.debug_str` offsets) are intentionally left
object-local — sections are concatenated, not recompacted, and each
slice carries an `; OBJ ... SEC ... SIZE ...` header so a multi-TU
consumer can scope intra-debug offsets per-slice. The smoke test
verifies the address of a known function appears in the patched
sidecar bytes.
## Known issues / workarounds ## Known issues / workarounds
- **Greedy register allocator mis-orders spills** in iterative - **Greedy register allocator mis-orders spills** in iterative
quicksort with `if/else` recursion choice (#70). Complex live quicksort with `if/else` recursion choice (#70). Live-range
ranges across two `swap()` calls produce wrong pointer args. tracking for `hi` is wrong across the inner loop and post-loop
Reproduces only at `-O1`/`-O2` with greedy. Workaround: swap call, producing miscompiled code. Reproduces only at
`-mllvm -regalloc=fast` for the affected translation unit, or `-O1`/`-O2` with greedy. Workarounds (any one):
rewrite the qsort with explicit recursion guards instead of the - `__attribute__((noinline,optnone))` on the affected function —
iterative tail-elim form. `softDouble.c` already uses this routes through fast regalloc per-function. Verified in smoke
flag for `__muldf3` (build.sh applies it automatically). Real test; recommended for new code that hits this.
fix is either a pre-RA pass that explicitly spills loop-carried - `-mllvm -regalloc=fast` for the whole translation unit.
pointer args or a targeted greedy heuristic patch. `softDouble.c` already uses this for `__muldf3` (build.sh
applies it automatically).
- Rewrite the loop with explicit recursion guards instead of
the iterative tail-elim form.
Real fix needs deeper greedy work; deferred behind the per-
function attribute since it covers the practical cases.
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is - **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
negative as 16-bit unsigned. Worked around by `W65816NegYIndY` negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
@ -108,16 +120,6 @@ which runs correctly under MAME (apple2gs).
`softDouble.c` and unblocks pattern-rich code that currently `softDouble.c` and unblocks pattern-rich code that currently
must be compiled at `-O0` for correctness. must be compiled at `-O0` for correctness.
- **Round-to-nearest-even in `__divdf3`** — currently
truncate-toward-zero, which differs from gcc by ±1 ULP in
several test cases. Acceptable today (Newton iterations still
converge); revisit when an exact-match test suite lands.
- **DWARF sidecar with relocations applied** — current (#51) version
is raw section pass-through; addresses are object-file-local. A
real source-level debugger needs the linker to apply text/rodata
relocations to `.debug_info` / `.debug_line` first.
- **More of the C standard library**: `<math.h>` transcendental - **More of the C standard library**: `<math.h>` transcendental
functions (sin, cos, exp, log, pow), `<string.h>` beyond what's functions (sin, cos, exp, log, pow), `<string.h>` beyond what's
hand-coded, `<stdio.h>` file I/O (`fopen`, `fread`, `fwrite`, hand-coded, `<stdio.h>` file I/O (`fopen`, `fread`, `fwrite`,

View file

@ -34,6 +34,7 @@ cc() {
asm "$SRC/crt0.s" asm "$SRC/crt0.s"
asm "$SRC/libgcc.s" asm "$SRC/libgcc.s"
cc "$SRC/libc.c" cc "$SRC/libc.c"
cc "$SRC/strtol.c"
cc "$SRC/softFloat.c" cc "$SRC/softFloat.c"
# softDouble.c needs -regalloc=fast: __muldf3's 64x64 -> 128 mul + # softDouble.c needs -regalloc=fast: __muldf3's 64x64 -> 128 mul +
# inlined alignment shifts overflows the greedy allocator on the # inlined alignment shifts overflows the greedy allocator on the

View file

@ -32,8 +32,13 @@ __start:
sta 0xc032 ; SCANINT clear sta 0xc032 ; SCANINT clear
rep #0x20 rep #0x20
; Top-of-stack at $01FF (one bank). Loaders may already do this. ; Top-of-stack at $0FFF. Native-mode S is 16-bit, so we don't need
lda #0x01ff ; to stay in page 1. Soft-double frames can be ~170 bytes plus the
; usual call-chain overhead — at $01FF stack growth wraps into the
; direct page ($0000-$00FF) which holds our libcall scratch
; ($E0-$F4) and IMG slots ($D0-$DE), corrupting them. $0FFF gives
; ~3.5 KB of headroom and stays below the text base ($1000).
lda #0x0fff
tcs tcs
; Zero BSS. X iterates from __bss_start to __bss_end; each ; Zero BSS. X iterates from __bss_start to __bss_end; each

View file

@ -141,6 +141,7 @@ int atoi(const char *s) {
return sign * n; return sign * n;
} }
// ---- stdio.h essentials (stubs) ---- // ---- stdio.h essentials (stubs) ----
// putchar: by default, writes to direct-page slot $E2 (which the // putchar: by default, writes to direct-page slot $E2 (which the

View file

@ -64,16 +64,37 @@ u64 __adddf3(u64 a, u64 b) {
u16 cb = dclass(b, &sb, &eb, &mb); u16 cb = dclass(b, &sb, &eb, &mb);
if (ca == 0) return b; if (ca == 0) return b;
if (cb == 0) return a; if (cb == 0) return a;
// Align mantissas to common exponent. // Shift mantissas left by 3 to reserve guard / round / sticky bits
// below position 0. Lead bit is now at position 55 instead of 52.
// The sticky bit is preserved by ORing it into the LSB whenever a
// significant bit would otherwise be shifted off the right side
// (during alignment or post-add normalization). At the end, RNE
// rounds based on bits 2..0 (guard, round, sticky) and shifts back.
ma <<= 3;
mb <<= 3;
// Align mantissas to common exponent. The smaller-exp operand is
// shifted right; bits shifted past position 0 become sticky.
if (ea > eb) { if (ea > eb) {
s16 d = ea - eb; s16 d = ea - eb;
if (d > 54) return a; if (d > 56) return a;
mb >>= d; u64 sticky = 0;
if (d > 3) {
u64 mask = (d >= 64) ? ~0ULL : ((1ULL << d) - 1);
sticky = (mb & mask) ? 1 : 0;
}
mb = (d >= 64) ? 0 : (mb >> d);
mb |= sticky;
eb = ea; eb = ea;
} else if (eb > ea) { } else if (eb > ea) {
s16 d = eb - ea; s16 d = eb - ea;
if (d > 54) return b; if (d > 56) return b;
ma >>= d; u64 sticky = 0;
if (d > 3) {
u64 mask = (d >= 64) ? ~0ULL : ((1ULL << d) - 1);
sticky = (ma & mask) ? 1 : 0;
}
ma = (d >= 64) ? 0 : (ma >> d);
ma |= sticky;
ea = eb; ea = eb;
} }
u64 mr; u64 mr;
@ -91,15 +112,32 @@ u64 __adddf3(u64 a, u64 b) {
} }
} }
if (mr == 0) return 0; if (mr == 0) return 0;
// Renormalize. // Renormalize. Lead bit should land at position 55 (= 52 + 3 GRS).
while ((mr & DMANT_LEAD) == 0 && (mr & ~DMANT_MASK) == 0) { // Right-shift first to bring an over-wide sum back in range; then
// left-shift if subtraction left the lead below 55. Reverse order
// would shift an over-wide value out of u64 range entirely.
while (mr & ~((1ULL << 56) - 1)) {
u64 sticky = mr & 1;
mr = (mr >> 1) | sticky;
ea++;
}
while ((mr & (1ULL << 55)) == 0 && mr != 0) {
mr <<= 1; mr <<= 1;
ea--; ea--;
} }
while (mr & ~(DMANT_LEAD | DMANT_MASK)) { // Round to nearest, ties to even. Bits 0/1 are sticky+round, bit 2
// is guard, bit 3 is mantissa LSB.
int guard = (int)((mr >> 2) & 1);
int sticky = (int)(mr & 0x3);
int lsb = (int)((mr >> 3) & 1);
mr >>= 3; // drop GRS bits to get the 53-bit mantissa
if (guard && (sticky || lsb)) {
mr++;
if (mr & (1ULL << 53)) {
mr >>= 1; mr >>= 1;
ea++; ea++;
} }
}
return dpack(sr, ea, mr); return dpack(sr, ea, mr);
} }
@ -129,7 +167,11 @@ typedef struct {
// noinline boundary lowers to `sta (d,s),y` which uses DBR-relative // noinline boundary lowers to `sta (d,s),y` which uses DBR-relative
// addressing — broken under DBR != 0 (e.g. after a bank switch). // addressing — broken under DBR != 0 (e.g. after a bank switch).
// Keeping these inline keeps the stores within the caller's frame. // Keeping these inline keeps the stores within the caller's frame.
static inline u64 mulhi64Aligned(u64 ma, u64 mb, u16 *out_carry) { //
// out_round encodes the round bits as (guard << 1) | sticky. Caller
// uses these for round-to-nearest-even.
static inline u64 mulhi64Aligned(u64 ma, u64 mb,
u16 *out_carry, u16 *out_round) {
u32 alo = (u32)ma; u32 alo = (u32)ma;
u32 ahi = (u32)(ma >> 32); u32 ahi = (u32)(ma >> 32);
u32 blo = (u32)mb; u32 blo = (u32)mb;
@ -142,10 +184,20 @@ static inline u64 mulhi64Aligned(u64 ma, u64 mb, u16 *out_carry) {
u64 prod_hi = hh + (mid >> 32); u64 prod_hi = hh + (mid >> 32);
u64 prod_lo = (ll & 0xFFFFFFFFULL) | ((mid & 0xFFFFFFFFULL) << 32); u64 prod_lo = (ll & 0xFFFFFFFFULL) | ((mid & 0xFFFFFFFFULL) << 32);
if (prod_hi & (1ULL << 41)) { if (prod_hi & (1ULL << 41)) {
// Lead-at-105 case: shift right 53 within full product. The
// bit at prod_lo position 52 is the guard; bits 0..51 are sticky.
*out_carry = 1; *out_carry = 1;
u16 guard = (u16)((prod_lo >> 52) & 1);
u16 sticky = (u16)((prod_lo & ((1ULL << 52) - 1)) != 0);
*out_round = (guard << 1) | sticky;
return (prod_hi << 11) | (prod_lo >> 53); return (prod_hi << 11) | (prod_lo >> 53);
} }
// Lead-at-104 case: shift right 52. Guard at prod_lo bit 51,
// sticky = OR of bits 0..50.
*out_carry = 0; *out_carry = 0;
u16 guard = (u16)((prod_lo >> 51) & 1);
u16 sticky = (u16)((prod_lo & ((1ULL << 51) - 1)) != 0);
*out_round = (guard << 1) | sticky;
return (prod_hi << 12) | (prod_lo >> 52); return (prod_hi << 12) | (prod_lo >> 52);
} }
@ -157,8 +209,19 @@ u64 __muldf3(u64 a, u64 b) {
u64 sr = sa ^ sb; u64 sr = sa ^ sb;
if (ca == 0 || cb == 0) return sr; if (ca == 0 || cb == 0) return sr;
u16 carry; u16 carry;
u64 mr = mulhi64Aligned(ma, mb, &carry); u16 round_bits;
u64 mr = mulhi64Aligned(ma, mb, &carry, &round_bits);
s16 er = ea + eb + (s16)carry; s16 er = ea + eb + (s16)carry;
// Round to nearest, ties to even.
int guard = (round_bits >> 1) & 1;
int sticky = round_bits & 1;
if (guard && (sticky || (mr & 1))) {
mr++;
if (mr & (1ULL << 53)) {
mr >>= 1;
er++;
}
}
return dpack(sr, er, mr); return dpack(sr, er, mr);
} }
@ -193,6 +256,22 @@ u64 __divdf3(u64 a, u64 b) {
q |= (1ULL << i); q |= (1ULL << i);
} }
} }
// Round to nearest, ties to even. Generate one extra bit (the
// "guard"), examine the remainder for any non-zero "sticky" tail,
// and round q up when guard=1 and (sticky || (q & 1)). Without
// this we'd be truncate-toward-zero, off by 1 ULP from gcc's RNE
// result on cases like 1.5/2.5.
r <<= 1;
int guard = (r >= mb) ? 1 : 0;
if (guard) r -= mb;
int sticky = (r != 0) ? 1 : 0;
if (guard && (sticky || (q & 1))) {
q++;
if (q & (1ULL << 53)) { // mantissa overflow into bit 53 -> renormalize
q >>= 1;
er++;
}
}
return dpack(sr, er, q); return dpack(sr, er, q);
} }

59
runtime/src/strtol.c Normal file
View file

@ -0,0 +1,59 @@
// strtol / strtoul — kept in their own translation unit because adding
// them to libc.c bumped the libc.s layout enough that vprintf's branch
// distances exceeded our BranchExpand threshold (caught by printf("%d
// ok") segfaulting). Their own .o keeps them out of the way.
extern int isspace(int);
static int charDigit(char c, int base) {
int d = -1;
if (c >= '0' && c <= '9') d = c - '0';
else if (c >= 'a' && c <= 'z') d = 10 + c - 'a';
else if (c >= 'A' && c <= 'Z') d = 10 + c - 'A';
if (d < 0 || d >= base) return -1;
return d;
}
unsigned long strtoul(const char *nptr, char **endptr, int base) {
const char *s = nptr;
while (isspace(*s)) s++;
int neg = 0;
if (*s == '-') { neg = 1; s++; }
else if (*s == '+') s++;
if ((base == 0 || base == 16) && s[0] == '0' &&
(s[1] == 'x' || s[1] == 'X') && charDigit(s[2], 16) >= 0) {
base = 16;
s += 2;
} else if (base == 0 && *s == '0') {
base = 8;
s++;
} else if (base == 0) {
base = 10;
}
unsigned long n = 0;
int saw_digit = 0;
for (;;) {
int d = charDigit(*s, base);
if (d < 0) break;
n = n * (unsigned long)base + (unsigned long)d;
saw_digit = 1;
s++;
}
if (endptr) *endptr = (char *)(saw_digit ? s : nptr);
return neg ? (unsigned long)-(long)n : n;
}
long strtol(const char *nptr, char **endptr, int base) {
const char *s = nptr;
while (isspace(*s)) s++;
int neg = (*s == '-');
if (*s == '+' || *s == '-') s++;
char *ep = 0;
unsigned long n = strtoul(s, &ep, base);
if (ep == s) {
if (endptr) *endptr = (char *)nptr;
return 0;
}
if (endptr) *endptr = ep;
return neg ? -(long)n : (long)n;
}

View file

@ -1399,10 +1399,13 @@ EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \ "$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cFactFile" -o "$oFactFile" "$cFactFile" -o "$oFactFile"
oLibcF="$(mktemp --suffix=.o)" oLibcF="$(mktemp --suffix=.o)"
oStrtolF="$(mktemp --suffix=.o)"
oSfF="$(mktemp --suffix=.o)" oSfF="$(mktemp --suffix=.o)"
oSdF="$(mktemp --suffix=.o)" oSdF="$(mktemp --suffix=.o)"
"$CLANG" --target=w65816 -O2 -ffunction-sections \ "$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/libc.c" -o "$oLibcF" -c "$PROJECT_ROOT/runtime/src/libc.c" -o "$oLibcF"
"$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/strtol.c" -o "$oStrtolF"
"$CLANG" --target=w65816 -O2 -ffunction-sections \ "$CLANG" --target=w65816 -O2 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF" -c "$PROJECT_ROOT/runtime/src/softFloat.c" -o "$oSfF"
"$CLANG" --target=w65816 -O2 -ffunction-sections -mllvm -regalloc=fast \ "$CLANG" --target=w65816 -O2 -ffunction-sections -mllvm -regalloc=fast \
@ -2018,6 +2021,105 @@ EOF
fi fi
rm -f "$cAofFile" "$oAofFile" "$binAofFile" rm -f "$cAofFile" "$oAofFile" "$binAofFile"
log "check: MAME runs iterative qsort([3,1,2]) with optnone → [1,2,3] (#70 workaround)"
cQsFile="$(mktemp --suffix=.c)"
oQsFile="$(mktemp --suffix=.o)"
binQsFile="$(mktemp --suffix=.bin)"
cat > "$cQsFile" <<'EOF'
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
__attribute__((noinline)) void swap(short *a, short *b) {
short t = *a; *a = *b; *b = t;
}
// optnone routes this function through fast regalloc (per the
// W65816TargetMachine choice), avoiding the greedy spill-ordering
// bug that mis-compiles the iterative if/else recursion form.
__attribute__((noinline,optnone))
void qsortIter(short *arr, short lo, short hi) {
while (lo < hi) {
short pivot = arr[hi];
short i = lo - 1;
for (short j = lo; j < hi; j++) {
if (arr[j] <= pivot) { i++; swap(&arr[i], &arr[j]); }
}
i++;
swap(&arr[i], &arr[hi]);
if ((short)(i - lo) < (short)(hi - i)) {
qsortIter(arr, lo, (short)(i - 1));
lo = (short)(i + 1);
} else {
qsortIter(arr, (short)(i + 1), hi);
hi = (short)(i - 1);
}
}
}
int main(void) {
static short data[3] = { 3, 1, 2 };
qsortIter(data, 0, 2);
short s0 = data[0], s1 = data[1], s2 = data[2];
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)s0;
*(volatile unsigned short *)0x5002 = (unsigned short)s1;
*(volatile unsigned short *)0x5004 = (unsigned short)s2;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cQsFile" -o "$oQsFile"
"$PROJECT_ROOT/tools/link816" -o "$binQsFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oSfF" "$oSdF" "$oLibgccFile" "$oQsFile" \
>/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" \
"$binQsFile" --check 0x025000=0001 0x025002=0002 \
0x025004=0003 >/dev/null 2>&1; then
die "MAME: optnone qsort([3,1,2]) wrong (workaround for #70 broken)"
fi
rm -f "$cQsFile" "$oQsFile" "$binQsFile"
log "check: MAME runs strtol/strtoul: 12345, -999, 0x1ABC, 0755, 'xyz' (#74)"
cStFile="$(mktemp --suffix=.c)"
oStFile="$(mktemp --suffix=.o)"
binStFile="$(mktemp --suffix=.bin)"
cat > "$cStFile" <<'EOF'
extern long strtol(const char *, char **, int);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
int main(void) {
char *ep;
union { long l; unsigned short w[2]; } a, b, c, d, e;
a.l = strtol("12345", &ep, 10);
b.l = strtol("-999", &ep, 10);
c.l = strtol(" 0x1ABC ", &ep, 16);
d.l = strtol("0755", &ep, 0);
e.l = strtol("xyz", &ep, 10);
short epOff = (short)(ep - "xyz");
switchToBank2();
*(volatile unsigned short *)0x5000 = a.w[0];
*(volatile unsigned short *)0x5002 = a.w[1];
*(volatile unsigned short *)0x5004 = b.w[0];
*(volatile unsigned short *)0x5006 = b.w[1];
*(volatile unsigned short *)0x5008 = c.w[0];
*(volatile unsigned short *)0x500a = d.w[0];
*(volatile unsigned short *)0x500c = e.w[0];
*(volatile unsigned short *)0x500e = (unsigned short)epOff;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cStFile" -o "$oStFile"
"$PROJECT_ROOT/tools/link816" -o "$binStFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oStrtolF" "$oSfF" "$oSdF" "$oLibgccFile" \
"$oStFile" >/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binStFile" --check \
0x025000=3039 0x025002=0000 0x025004=fc19 0x025006=ffff \
0x025008=1abc 0x02500a=01ed 0x02500c=0000 0x02500e=0000 \
>/dev/null 2>&1; then
die "MAME: strtol parse cases wrong"
fi
rm -f "$cStFile" "$oStFile" "$binStFile"
log "check: MAME runs udivmod(0x123...DEF, 0x10000, &m) → q=0x12345_6789AB m=0xCDEF (#69)" log "check: MAME runs udivmod(0x123...DEF, 0x10000, &m) → q=0x12345_6789AB m=0xCDEF (#69)"
cUdmFile="$(mktemp --suffix=.c)" cUdmFile="$(mktemp --suffix=.c)"
oUdmFile="$(mktemp --suffix=.o)" oUdmFile="$(mktemp --suffix=.o)"
@ -2299,7 +2401,7 @@ EOF
fi fi
rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile" rm -f "$cDmaFile" "$oDmaFile" "$binDmaFile"
rm -f "$oLibcF" "$oSfF" "$oSdF" "$oCrt0F" rm -f "$oLibcF" "$oStrtolF" "$oSfF" "$oSdF" "$oCrt0F"
else else
warn "MAME or apple2gs ROMs not installed; skipping end-to-end test" warn "MAME or apple2gs ROMs not installed; skipping end-to-end test"
fi fi
@ -2328,20 +2430,22 @@ EOF
# Linker exports the synthetic __bss_start / __bss_end / etc. # Linker exports the synthetic __bss_start / __bss_end / etc.
# symbols so crt0 can do BSS init and runtime malloc finds the # symbols so crt0 can do BSS init and runtime malloc finds the
# heap top. # heap top.
log "check: link816 --debug-out emits a DWARF sidecar (#51)" log "check: link816 --debug-out emits a DWARF sidecar with relocs applied (#51 + #75)"
cDbgFile="$(mktemp --suffix=.c)" cDbgFile="$(mktemp --suffix=.c)"
oDbgFile="$(mktemp --suffix=.o)" oDbgFile="$(mktemp --suffix=.o)"
binDbgFile="$(mktemp --suffix=.bin)" binDbgFile="$(mktemp --suffix=.bin)"
dbgOutFile="$(mktemp --suffix=.dbg)" dbgOutFile="$(mktemp --suffix=.dbg)"
mapDbgFile="$(mktemp --suffix=.map)"
cat > "$cDbgFile" <<'EOF' cat > "$cDbgFile" <<'EOF'
int add(int a, int b) { return a + b; } int add(int a, int b) { return a + b; }
int main(void) { return add(3, 4); } int main(void) { return add(3, 4); }
EOF EOF
"$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile" "$CLANG" --target=w65816 -O2 -g -ffunction-sections -c "$cDbgFile" -o "$oDbgFile"
"$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \ "$PROJECT_ROOT/tools/link816" -o "$binDbgFile" --debug-out "$dbgOutFile" \
--map "$mapDbgFile" \
--text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null --text-base 0x1000 "$oDbgFile" "$oLibgccFile" 2>/dev/null
if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar"; then if ! head -1 "$dbgOutFile" | grep -q "DWARF sidecar v1"; then
die "link816 --debug-out: sidecar missing header" die "link816 --debug-out: sidecar missing v1 header (reloc-apply path)"
fi fi
if ! grep -q "SEC \.debug_info" "$dbgOutFile"; then if ! grep -q "SEC \.debug_info" "$dbgOutFile"; then
die "link816 --debug-out: sidecar missing .debug_info section" die "link816 --debug-out: sidecar missing .debug_info section"
@ -2349,7 +2453,23 @@ EOF
if ! grep -q "SEC \.debug_line" "$dbgOutFile"; then if ! grep -q "SEC \.debug_line" "$dbgOutFile"; then
die "link816 --debug-out: sidecar missing .debug_line section" die "link816 --debug-out: sidecar missing .debug_line section"
fi fi
rm -f "$cDbgFile" "$oDbgFile" "$binDbgFile" "$dbgOutFile" if ! grep -q "RELOCS_APPLIED" "$dbgOutFile"; then
die "link816 --debug-out: sidecar missing RELOCS_APPLIED header (#75 path inactive)"
fi
# Check the linker actually patched text addresses into either
# .debug_addr (DWARF 5) or .debug_line. The address of `add` from
# the map must appear as 3 LE bytes in the binary section payload.
addAddr=$(awk '/^[0-9a-fx]+ +add$/ {print $1}' "$mapDbgFile" | head -1)
if [ -z "$addAddr" ]; then
die "link816 map: 'add' symbol missing"
fi
# addAddr is 0x...... ; convert to 3 LE bytes and look for them.
addBytes=$(python3 -c "a=int('$addAddr',16); print(bytes([a&0xff,(a>>8)&0xff,(a>>16)&0xff]).hex())")
sidecarHex=$(xxd -p -c 99999 "$dbgOutFile" | tr -d '\n')
if ! echo "$sidecarHex" | grep -q "$addBytes"; then
die "link816 --debug-out: text address of 'add' ($addAddr -> $addBytes) not found in patched sidecar"
fi
rm -f "$cDbgFile" "$oDbgFile" "$binDbgFile" "$dbgOutFile" "$mapDbgFile"
log "check: link816 emits __bss_start, __bss_end, __heap_start" log "check: link816 emits __bss_start, __bss_end, __heap_start"
cBssFile="$(mktemp --suffix=.c)" cBssFile="$(mktemp --suffix=.c)"

View file

@ -286,6 +286,7 @@ struct Layout {
uint32_t textBase, textSize; uint32_t textBase, textSize;
uint32_t rodataBase, rodataSize; uint32_t rodataBase, rodataSize;
uint32_t bssBase, bssSize; uint32_t bssBase, bssSize;
uint32_t initBase, initSize;
}; };
static void applyReloc(std::vector<uint8_t> &buf, uint32_t off, static void applyReloc(std::vector<uint8_t> &buf, uint32_t off,
@ -370,6 +371,61 @@ struct Linker {
objs.push_back(std::move(o)); objs.push_back(std::move(o));
} }
// Resolve a reloc to (target, name) using the symbol table and the
// per-object section base map. Requires link() to have populated
// objOff/globalSyms/lastLayout first. Returns false when the
// referenced section is one we don't track (e.g. another .debug_*
// section); strict callers should die() on false, lenient callers
// (the DWARF sidecar) should leave the bytes object-local.
bool resolveSym(const InputObject &obj, const ObjOffsets &oo,
const Reloc &r,
uint32_t &target, std::string &resolvedName) const {
if (r.symIdx >= obj.symbols.size())
die(obj.path + ": reloc symIdx out of range");
const Symbol &sym = obj.symbols[r.symIdx];
if (sym.type == STT_SECTION) {
if (sym.shndx >= obj.sections.size())
die(obj.path + ": section symbol shndx out of range");
const auto &refSec = obj.sections[sym.shndx];
std::string kind = sectionKind(refSec.name);
uint32_t base = 0;
if (kind == "text") {
auto wIt = oo.textWithin.find(sym.shndx);
base = lastLayout.textBase + oo.textBaseInMerged
+ (wIt == oo.textWithin.end() ? 0 : wIt->second);
} else if (kind == "rodata") {
auto wIt = oo.rodataWithin.find(sym.shndx);
base = lastLayout.rodataBase + oo.rodataBaseInMerged
+ (wIt == oo.rodataWithin.end() ? 0 : wIt->second);
} else if (kind == "bss") {
auto wIt = oo.bssWithin.find(sym.shndx);
base = lastLayout.bssBase + oo.bssBaseInMerged
+ (wIt == oo.bssWithin.end() ? 0 : wIt->second);
} else if (kind == "init_array") {
auto wIt = oo.initWithin.find(sym.shndx);
base = lastLayout.initBase + oo.initBaseInMerged
+ (wIt == oo.initWithin.end() ? 0 : wIt->second);
} else {
resolvedName = refSec.name;
return false;
}
target = base + r.addend;
resolvedName = refSec.name;
return true;
}
auto sIt = globalSyms.find(sym.name);
if (sIt == globalSyms.end()) {
// Undefined symbol — for the strict link path the caller
// dies; for the DWARF sidecar this just means "leave the
// bytes alone".
resolvedName = sym.name;
return false;
}
target = sIt->second + r.addend;
resolvedName = sym.name;
return true;
}
Layout link(std::vector<uint8_t> &outImage) { Layout link(std::vector<uint8_t> &outImage) {
// 1. Layout: each obj's sections at running offsets. // 1. Layout: each obj's sections at running offsets.
objOff.resize(objs.size()); objOff.resize(objs.size());
@ -406,7 +462,12 @@ struct Linker {
L.rodataBase = rodataBase ? rodataBase : (textBase + curText); L.rodataBase = rodataBase ? rodataBase : (textBase + curText);
L.rodataSize = curRodata; L.rodataSize = curRodata;
// .init_array goes immediately after .rodata in the image. // .init_array goes immediately after .rodata in the image.
uint32_t initBase = L.rodataBase + L.rodataSize; L.initBase = L.rodataBase + L.rodataSize;
L.initSize = curInit;
uint32_t initBase = L.initBase;
// Publish layout now so resolveSym() can read it during reloc
// application (it's a const member that uses lastLayout).
lastLayout = L;
// Synthesize linker-defined symbols so crt0 / startup code // Synthesize linker-defined symbols so crt0 / startup code
// can find the section extents. These must NOT be in the // can find the section extents. These must NOT be in the
@ -480,52 +541,6 @@ struct Linker {
} }
} }
// Resolve a reloc to (target, name) using the symbol table and the
// per-object section base map. Used by every .rela.{text,rodata,
// init_array} application below.
auto resolveSym = [&](const InputObject &obj, const ObjOffsets &oo,
const Reloc &r,
uint32_t &target, std::string &resolvedName) {
if (r.symIdx >= obj.symbols.size())
die(obj.path + ": reloc symIdx out of range");
const Symbol &sym = obj.symbols[r.symIdx];
if (sym.type == STT_SECTION) {
if (sym.shndx >= obj.sections.size())
die(obj.path + ": section symbol shndx out of range");
const auto &refSec = obj.sections[sym.shndx];
std::string kind = sectionKind(refSec.name);
uint32_t base = 0;
if (kind == "text") {
auto wIt = oo.textWithin.find(sym.shndx);
base = textBase + oo.textBaseInMerged
+ (wIt == oo.textWithin.end() ? 0 : wIt->second);
} else if (kind == "rodata") {
auto wIt = oo.rodataWithin.find(sym.shndx);
base = L.rodataBase + oo.rodataBaseInMerged
+ (wIt == oo.rodataWithin.end() ? 0 : wIt->second);
} else if (kind == "bss") {
auto wIt = oo.bssWithin.find(sym.shndx);
base = bssBase + oo.bssBaseInMerged
+ (wIt == oo.bssWithin.end() ? 0 : wIt->second);
} else if (kind == "init_array") {
auto wIt = oo.initWithin.find(sym.shndx);
base = initBase + oo.initBaseInMerged
+ (wIt == oo.initWithin.end() ? 0 : wIt->second);
} else {
die(obj.path + ": reloc against unknown section '"
+ refSec.name + "'");
}
target = base + r.addend;
resolvedName = refSec.name;
} else {
auto sIt = globalSyms.find(sym.name);
if (sIt == globalSyms.end())
die(obj.path + ": undefined symbol '" + sym.name + "'");
target = sIt->second + r.addend;
resolvedName = sym.name;
}
};
// 4. Apply relocations to text buffer. // 4. Apply relocations to text buffer.
for (size_t fi = 0; fi < objs.size(); ++fi) { for (size_t fi = 0; fi < objs.size(); ++fi) {
const auto &obj = *objs[fi]; const auto &obj = *objs[fi];
@ -539,7 +554,9 @@ struct Linker {
uint32_t patchAddr = textBase + patchOff; uint32_t patchAddr = textBase + patchOff;
uint32_t target; uint32_t target;
std::string resolvedName; std::string resolvedName;
resolveSym(obj, oo, r, target, resolvedName); if (!resolveSym(obj, oo, r, target, resolvedName))
die(obj.path + ": .text reloc to unresolved '"
+ resolvedName + "'");
applyReloc(textBuf, patchOff, patchAddr, target, r.type, applyReloc(textBuf, patchOff, patchAddr, target, r.type,
resolvedName); resolvedName);
} }
@ -563,7 +580,9 @@ struct Linker {
uint32_t patchAddr = L.rodataBase + patchOff; uint32_t patchAddr = L.rodataBase + patchOff;
uint32_t target; uint32_t target;
std::string resolvedName; std::string resolvedName;
resolveSym(obj, oo, r, target, resolvedName); if (!resolveSym(obj, oo, r, target, resolvedName))
die(obj.path + ": .rodata reloc to unresolved '"
+ resolvedName + "'");
applyReloc(rodataBuf, patchOff, patchAddr, target, applyReloc(rodataBuf, patchOff, patchAddr, target,
r.type, resolvedName); r.type, resolvedName);
} }
@ -598,48 +617,101 @@ struct Linker {
if (it == obj.relocs.end()) continue; if (it == obj.relocs.end()) continue;
uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx); uint32_t inMerged = oo.initBaseInMerged + oo.initWithin.at(idx);
for (const Reloc &r : it->second) { for (const Reloc &r : it->second) {
if (r.symIdx >= obj.symbols.size())
die(obj.path + ": reloc references invalid symbol");
const Symbol &sym = obj.symbols[r.symIdx];
uint32_t target; uint32_t target;
if (sym.name.empty() || sym.shndx < obj.sections.size()) { std::string resolvedName;
// Section-relative: resolve against section base. if (!resolveSym(obj, oo, r, target, resolvedName))
if (sym.shndx >= obj.sections.size()) die(obj.path + ": .init_array reloc to unresolved '"
die(obj.path + ": reloc bad shndx"); + resolvedName + "'");
const auto &refSec = obj.sections[sym.shndx];
std::string kind = sectionKind(refSec.name);
uint32_t base = 0;
if (kind == "text") {
auto wIt = oo.textWithin.find(sym.shndx);
base = textBase + oo.textBaseInMerged
+ (wIt == oo.textWithin.end() ? 0 : wIt->second);
} else if (kind == "rodata") {
auto wIt = oo.rodataWithin.find(sym.shndx);
base = L.rodataBase + oo.rodataBaseInMerged
+ (wIt == oo.rodataWithin.end() ? 0 : wIt->second);
} else {
die(obj.path + ": init_array reloc against non-text/rodata");
}
target = base + r.addend;
} else {
auto sIt = globalSyms.find(sym.name);
if (sIt == globalSyms.end())
die(obj.path + ": undefined symbol '" + sym.name + "'");
target = sIt->second + r.addend;
}
uint32_t patchOff = inMerged + r.offset; uint32_t patchOff = inMerged + r.offset;
uint32_t patchAddr = initBase + patchOff; uint32_t patchAddr = initBase + patchOff;
applyReloc(initBuf, patchOff, patchAddr, target, r.type, applyReloc(initBuf, patchOff, patchAddr, target, r.type,
sym.name); resolvedName);
} }
} }
} }
outImage.insert(outImage.end(), initBuf.begin(), initBuf.end()); outImage.insert(outImage.end(), initBuf.begin(), initBuf.end());
lastLayout = L;
return L; return L;
} }
// ----------------------------------------------------------------
// DWARF sidecar. Walks each input object and concatenates every
// section whose name starts with `.debug_`. Each section is
// prefixed by an ASCII-readable header line:
//
// ; OBJ <objname> SEC <sectionname> SIZE <bytes> RELOCS <n>
//
// followed by the section bytes after applying any text/rodata/bss/
// init_array relocations from `.rela.<sec>`. This means PCs in
// .debug_info / .debug_line / .debug_aranges resolve to final-image
// addresses and a consumer like llvm-dwarfdump (or a custom MAME
// overlay) can map them back to source lines.
//
// Intra-debug references (e.g., .debug_info -> .debug_str offsets)
// are *not* renumbered; we concatenate sections without recompacting,
// so the original object-local offsets stay correct relative to each
// object's slice of the sidecar. A multi-TU consumer would need to
// walk the slice headers to find the right base.
void writeDebugSidecar(const std::string &path) const {
std::ofstream f(path, std::ios::binary);
if (!f) die("cannot open '" + path + "' for writing");
f << "; llvm816 link816 DWARF sidecar v1\n";
f << "; text/rodata/bss/init_array relocs applied to final-image addresses\n";
f << "; intra-debug refs left object-local (per-OBJ slice scope)\n";
size_t total = 0;
size_t kept = 0;
size_t patched = 0;
for (size_t fi = 0; fi < objs.size(); ++fi) {
const InputObject &obj = *objs[fi];
const ObjOffsets &oo = objOff[fi];
for (uint32_t idx = 0; idx < obj.sections.size(); ++idx) {
const Section &sec = obj.sections[idx];
if (sec.name.rfind(".debug_", 0) != 0) continue;
if (sec.size == 0) continue;
std::vector<uint8_t> data(sec.size);
std::memcpy(data.data(), obj.raw.data() + sec.fileOffset,
sec.size);
size_t applied = 0;
size_t skipped = 0;
auto it = obj.relocs.find(idx);
if (it != obj.relocs.end()) {
for (const Reloc &r : it->second) {
uint32_t target;
std::string resolvedName;
if (!resolveSym(obj, oo, r, target, resolvedName)) {
skipped++;
continue;
}
if (r.offset + 3 > sec.size) {
// Out-of-range offset; defensively skip.
skipped++;
continue;
}
// patchAddr is only meaningful for PCREL types,
// which DWARF doesn't use. Pass 0; applyReloc
// ignores it for absolute types.
applyReloc(data, r.offset, 0, target, r.type,
resolvedName);
applied++;
}
}
patched += applied;
char hdr[256];
std::snprintf(hdr, sizeof(hdr),
"; OBJ %s SEC %s SIZE %u RELOCS_APPLIED %zu RELOCS_SKIPPED %zu\n",
obj.path.c_str(), sec.name.c_str(), sec.size,
applied, skipped);
f.write(hdr, std::strlen(hdr));
f.write(reinterpret_cast<const char *>(data.data()), sec.size);
f << "\n";
total += sec.size;
kept++;
}
}
std::fprintf(stderr,
"debug sidecar: %zu sections, %zu bytes, %zu relocs applied -> %s\n",
kept, total, patched, path.c_str());
}
void writeMap(const std::string &path) const { void writeMap(const std::string &path) const {
std::ofstream f(path); std::ofstream f(path);
if (!f) die("cannot open '" + path + "' for writing"); if (!f) die("cannot open '" + path + "' for writing");
@ -714,49 +786,6 @@ static void usage(const char *argv0) {
std::exit(2); std::exit(2);
} }
// ---------------------------------------------------------------- DWARF
// Sidecar emission. Walks each input object and concatenates every
// section whose name starts with `.debug_`. Each section is prefixed
// by a small ASCII-readable header line:
//
// ; OBJ <objname> SEC <sectionname> SIZE <bytes>
//
// followed by the raw section bytes. Address-bearing sections
// (.debug_info, .debug_line, .debug_aranges, .debug_loc, etc.) are
// written WITHOUT relocation processing — addresses are object-file-
// local, not final-image-local. A consumer that wants source-mapped
// addresses needs to either (a) re-run reloc against the linked
// section bases, or (b) use the relative offsets within their object
// scope. Better than nothing for a single-TU debug session.
static void writeDebugSidecar(
const std::string &path,
const std::vector<std::unique_ptr<InputObject>> &objs) {
std::ofstream f(path, std::ios::binary);
if (!f) die("cannot open '" + path + "' for writing");
f << "; llvm816 link816 DWARF sidecar v0\n";
f << "; Object-file-local addresses; not relocated to final image.\n";
size_t total = 0;
size_t kept = 0;
for (const auto &objPtr : objs) {
const InputObject &obj = *objPtr;
for (const Section &sec : obj.sections) {
if (sec.name.rfind(".debug_", 0) != 0) continue;
if (sec.size == 0) continue;
f << "; OBJ " << obj.path << " SEC " << sec.name
<< " SIZE " << sec.size << "\n";
f.write(reinterpret_cast<const char *>(obj.raw.data()
+ sec.fileOffset),
sec.size);
f << "\n";
total += sec.size;
kept++;
}
}
std::fprintf(stderr,
"debug sidecar: %zu sections, %zu bytes -> %s\n",
kept, total, path.c_str());
}
} // anonymous namespace } // anonymous namespace
int main(int argc, char **argv) { int main(int argc, char **argv) {
@ -805,7 +834,7 @@ int main(int argc, char **argv) {
f.write(reinterpret_cast<const char *>(image.data()), image.size()); f.write(reinterpret_cast<const char *>(image.data()), image.size());
if (!mapPath.empty()) linker.writeMap(mapPath); if (!mapPath.empty()) linker.writeMap(mapPath);
if (!debugOutPath.empty()) writeDebugSidecar(debugOutPath, linker.objs); if (!debugOutPath.empty()) linker.writeDebugSidecar(debugOutPath);
std::fprintf(stderr, std::fprintf(stderr,
"linked: text=[0x%04x+%u] rodata=[0x%04x+%u] bss=[0x%04x+%u] " "linked: text=[0x%04x+%u] rodata=[0x%04x+%u] bss=[0x%04x+%u] "