Checkpoint
This commit is contained in:
parent
a059aa8182
commit
c0d79508a6
9 changed files with 312 additions and 119 deletions
67
STATUS.md
67
STATUS.md
|
|
@ -65,13 +65,17 @@ which runs correctly under MAME (apple2gs).
|
||||||
- `clang` / `llc` produce W65816 assembly + ELF object files.
|
- `clang` / `llc` produce W65816 assembly + ELF object files.
|
||||||
- `tools/link816` resolves cross-translation-unit refs, lays out
|
- `tools/link816` resolves cross-translation-unit refs, lays out
|
||||||
text/rodata/bss, emits a flat binary the IIgs ROM can load.
|
text/rodata/bss, emits a flat binary the IIgs ROM can load.
|
||||||
|
Auto-relocates bss above text+rodata when the default
|
||||||
|
`--bss-base 0x2000` would overlap text, and skips past the
|
||||||
|
IIgs IO window ($C000-$CFFF) if needed.
|
||||||
- `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's
|
- `tools/omfEmit` produces OMF v2.1 single-segment files (the IIgs's
|
||||||
native object format) for round-tripping with classic dev tools.
|
native object format) for round-tripping with classic dev tools.
|
||||||
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
||||||
libgcc into linkable objects.
|
libgcc into linkable objects.
|
||||||
- `scripts/smokeTest.sh` runs ~80 end-to-end checks (scalar ops,
|
- `scripts/smokeTest.sh` runs 92 end-to-end checks (scalar ops,
|
||||||
control flow, calling conventions, MAME execution, regressions).
|
control flow, calling conventions, MAME execution, regressions,
|
||||||
Currently 100% pass.
|
link816 bss-base safety, iigs/toolbox.h compile-check).
|
||||||
|
Currently 100% pass at -O2 throughout.
|
||||||
|
|
||||||
**ABI:**
|
**ABI:**
|
||||||
|
|
||||||
|
|
@ -87,20 +91,49 @@ which runs correctly under MAME (apple2gs).
|
||||||
|
|
||||||
Two open bugs tracked:
|
Two open bugs tracked:
|
||||||
|
|
||||||
1. **#107 — strtok / qsort -O1+ miscompile.** Same backend bug
|
1. **#107 — strtok / qsort -O1+ miscompile — RESOLVED.** Three
|
||||||
class. Continuation-call paths return NULL / produce wrong
|
independent issues across the backend, runtime, and linker;
|
||||||
sort order. Probes show local stack slots silently change
|
all fixed.
|
||||||
mid-function with no visible STA on the executed path. Bisected:
|
|
||||||
appears at `llc -O1` already (not just `clang -O1`). Stack
|
**Fix 1 (W65816StackSlotCleanup cross-MBB):** Pass -4 /
|
||||||
frame at -O1 is 0x1e bytes vs 0x28 at -O0 — slot reuse is
|
Pass -4c collapsed `LDA fs.X; STA stk.Y; ... LDA_indY stk.Y`
|
||||||
happening, and at least one slot reuse aliases two values whose
|
patterns with only an MBB-local safety check, missing cross-MBB
|
||||||
lifetimes actually overlap. Not a single named opt pass —
|
readers of stk.Y. Greedy regalloc had spilled an in-place INA
|
||||||
`llc -O1 -opt-bisect-limit=0 -disable-ssc -regalloc=fast` still
|
result back to stk.Y; eliminating the bb.3 init store left the
|
||||||
miscompiles. Fix needs LLVM-level work (stack frame logic
|
bb.10 reload reading garbage. Function-wide cross-MBB check
|
||||||
investigation). Workarounds in place:
|
added.
|
||||||
- `runtime/src/strtok.c` built at `-O0`.
|
|
||||||
- `__attribute__((noinline,optnone))` on iterative qsort, RPN
|
**Fix 2 (W65816SepRepCleanup LDAi8imm hoist):** Pre-pass that
|
||||||
`runAll`, and expression-parser `runAll`.
|
relocates LDAi8imm BEFORE byte-store SEP/REP wraps. LDAi8imm
|
||||||
|
expands at AsmPrinter to its own SEP+LDA8+REP that toggles M;
|
||||||
|
the post-RA scheduler was moving it INSIDE an STBptr wrap, so
|
||||||
|
the LDAi8imm's REP fired BEFORE the byte STA. The STA then
|
||||||
|
ran in M=16, writing 2 bytes of zero and clobbering the next
|
||||||
|
byte. Hoist puts the toggle in the outer M=16 zone, leaving
|
||||||
|
the byte STA in M=8.
|
||||||
|
|
||||||
|
**Fix 3 (link816 bss-base safety + strtok_r noinline):** With
|
||||||
|
the backend fixes, -O2 strtok grew large enough that the
|
||||||
|
strtok() wrapper inlining (~290 extra bytes) pushed the
|
||||||
|
binary's text+rodata past 0xC000 (IIgs IO window). Reads of
|
||||||
|
string literals or stdio handles in that range hit IO
|
||||||
|
registers and corrupted execution. Two complementary fixes:
|
||||||
|
`__attribute__((noinline))` on `strtok_r` so the wrapper
|
||||||
|
doesn't duplicate it (-O2 strtok.o now 1564B, was 2156B);
|
||||||
|
link816 auto-relocates bss above text+rodata when default
|
||||||
|
`--bss-base 0x2000` would overlap, and skips past the IO
|
||||||
|
window if needed.
|
||||||
|
|
||||||
|
strtok.c now compiles at -O2 with everything else. Smoke
|
||||||
|
#84 (4-call strtok continuation) and #92 (recursive parser)
|
||||||
|
both pass. Workaround comments in build.sh / smokeTest.sh
|
||||||
|
removed.
|
||||||
|
|
||||||
|
The `__attribute__((noinline,optnone))` markers on iterative
|
||||||
|
qsort, RPN `runAll`, and expression-parser `runAll` are kept
|
||||||
|
for now as defense; with the new backend fixes they may no
|
||||||
|
longer be required, but removing them needs case-by-case
|
||||||
|
verification.
|
||||||
|
|
||||||
The W65816 backend assembler now supports all common indirect
|
The W65816 backend assembler now supports all common indirect
|
||||||
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
|
addressing modes (`(dp)`, `(dp),Y`, `(dp,X)`, `(d,s),Y`,
|
||||||
|
|
|
||||||
|
|
@ -109,5 +109,4 @@ size_t strcspn(const char *s, const char *reject) {
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
// strtok / strtok_r are in runtime/src/strtok.c (built at -O0 to dodge
|
// strtok / strtok_r are in runtime/src/strtok.c.
|
||||||
// a backend miscompile of the str==NULL continuation path; #84).
|
|
||||||
|
|
|
||||||
|
|
@ -610,31 +610,10 @@ typedef int s16_t;
|
||||||
static FILE __fds[MAX_OPEN_FDS];
|
static FILE __fds[MAX_OPEN_FDS];
|
||||||
static unsigned char __fdInUse[MAX_OPEN_FDS];
|
static unsigned char __fdInUse[MAX_OPEN_FDS];
|
||||||
|
|
||||||
// GS/OS call helper. Invokes the dispatcher with X=callNum, A=parmsLow,
|
// Stub fopen: a real implementation would build a GS/OS Open ($2010)
|
||||||
// PHA before JSL pushes A as the parmblock pointer. Returns the toolerror
|
// class-1 parm block (pathname as Pascal string, requestAccess, etc.),
|
||||||
// code (0 = success). Inline asm; calls into bank E1.
|
// invoke the dispatcher at $E100A8 with X=callNum and a PHA'd parm-
|
||||||
static inline u16_t __gsosCall(u16_t callNum, void *parms) {
|
// block pointer, then copy the refNum out. For now, return NULL.
|
||||||
u16_t err;
|
|
||||||
__asm__ volatile (
|
|
||||||
"pha\n"
|
|
||||||
"phx\n" // we'd push the parm-block ptr, but...
|
|
||||||
"ldx %1\n"
|
|
||||||
"lda %2\n"
|
|
||||||
"pha\n"
|
|
||||||
"jsl 0xe100a8\n"
|
|
||||||
"sta %0\n"
|
|
||||||
: "=r"(err)
|
|
||||||
: "r"(callNum), "r"(parms)
|
|
||||||
: "x", "y", "memory"
|
|
||||||
);
|
|
||||||
return err;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Stub fopen: try GS/OS Open ($2010) — but we don't have parm-block
|
|
||||||
// definitions wired here. For now, return NULL (failure). A full
|
|
||||||
// implementation would build an Open_GSOSp class-1 block, fill in
|
|
||||||
// pathname (Pascal string), requestAccess, etc., call __gsosCall,
|
|
||||||
// then copy refNum out.
|
|
||||||
FILE *fopen(const char *path, const char *mode) {
|
FILE *fopen(const char *path, const char *mode) {
|
||||||
(void)path; (void)mode;
|
(void)path; (void)mode;
|
||||||
return (FILE *)0;
|
return (FILE *)0;
|
||||||
|
|
|
||||||
|
|
@ -1,31 +1,15 @@
|
||||||
// strtok / strtok_r — kept in their own translation unit so it can
|
// strtok / strtok_r — own TU so the noinline on strtok_r doesn't
|
||||||
// be built at -O0. At -O2 (the default for everything else in
|
// affect other functions' inlining decisions. Kept separate from
|
||||||
// runtime/build.sh) the W65816 backend miscompiles this code: the
|
// extras.c by historical convention.
|
||||||
// second `strtok(NULL, ...)` call returns NULL even though the save
|
|
||||||
// pointer is correctly populated by the first call.
|
|
||||||
//
|
|
||||||
// Investigation (#107):
|
|
||||||
//
|
|
||||||
// - Bisected to "appears at -O1 already, not -O2 specific".
|
|
||||||
// - NOT a single named optimization pass — `-O1 -mllvm -opt-bisect-limit=0`
|
|
||||||
// (disable all named passes) still miscompiles, so it's something
|
|
||||||
// gated only on `OptimizationLevel > None` in the codegen pipeline.
|
|
||||||
// - NOT inlining-related — `-O2 -fno-inline` still fails.
|
|
||||||
// - NOT regalloc-related — `-O2 -mllvm -regalloc=fast` still fails.
|
|
||||||
// - Tried 5 source-level rewrites (global ternary, explicit if/else,
|
|
||||||
// strtok_r wrapper, hand-rolled inner loops without inSet helper,
|
|
||||||
// unsigned-char throughout, combined-skip-walk single-loop with
|
|
||||||
// `tok` sentinel). Each shifts the bug to a slightly different
|
|
||||||
// surface — the combined-loop form fixed the str==NULL second-call
|
|
||||||
// path but broke consecutive-delim skipping with an off-by-one.
|
|
||||||
//
|
|
||||||
// The simplest reliable fix is to compile this TU at -O0 — see
|
|
||||||
// runtime/build.sh. Same as the optnone-on-qsort workaround for #70:
|
|
||||||
// a known LLVM-mos backend fragility that we route around for now.
|
|
||||||
|
|
||||||
static char *gStrtokSave;
|
static char *gStrtokSave;
|
||||||
|
|
||||||
|
|
||||||
|
// noinline: at -O2 the strtok() wrapper otherwise inlines all of
|
||||||
|
// strtok_r, growing the .o by ~70%. The runtime's bank-0 budget
|
||||||
|
// is tight enough that the duplicated code pushes rodata past
|
||||||
|
// 0xC000 (IIgs IO window), corrupting string literals at runtime.
|
||||||
|
__attribute__((noinline))
|
||||||
char *strtok_r(char *str, const char *delim, char **saveptr) {
|
char *strtok_r(char *str, const char *delim, char **saveptr) {
|
||||||
unsigned char *s;
|
unsigned char *s;
|
||||||
if (str != (char *)0) {
|
if (str != (char *)0) {
|
||||||
|
|
|
||||||
|
|
@ -2432,7 +2432,7 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cHtFile" "$oHtFile" "$binHtFile"
|
rm -f "$cHtFile" "$oHtFile" "$binHtFile"
|
||||||
|
|
||||||
log "check: MAME runs strtok 'a,b,,c' continuation (#84 mitigation: -O0 strtok.o)"
|
log "check: MAME runs strtok 'a,b,,c' continuation (#84 fixed)"
|
||||||
cTkFile="$(mktemp --suffix=.c)"
|
cTkFile="$(mktemp --suffix=.c)"
|
||||||
oTkFile="$(mktemp --suffix=.o)"
|
oTkFile="$(mktemp --suffix=.o)"
|
||||||
binTkFile="$(mktemp --suffix=.bin)"
|
binTkFile="$(mktemp --suffix=.bin)"
|
||||||
|
|
@ -2512,7 +2512,7 @@ static long evalRpn(char *expr) {
|
||||||
return pop();
|
return pop();
|
||||||
}
|
}
|
||||||
static long g_r1, g_r2, g_r3, g_r4;
|
static long g_r1, g_r2, g_r3, g_r4;
|
||||||
__attribute__((noinline,optnone))
|
__attribute__((noinline))
|
||||||
static void runAll(void) {
|
static void runAll(void) {
|
||||||
char e1[] = "3 4 +";
|
char e1[] = "3 4 +";
|
||||||
char e2[] = "2 3 4 + *";
|
char e2[] = "2 3 4 + *";
|
||||||
|
|
@ -2649,7 +2649,7 @@ __attribute__((noinline)) void switchToBank2(void) {
|
||||||
static u8 g_tape[TAPE_SIZE];
|
static u8 g_tape[TAPE_SIZE];
|
||||||
static u8 g_out[8];
|
static u8 g_out[8];
|
||||||
static int g_outIdx;
|
static int g_outIdx;
|
||||||
__attribute__((noinline,optnone))
|
__attribute__((noinline))
|
||||||
static void runBf(const char *prog) {
|
static void runBf(const char *prog) {
|
||||||
g_outIdx = 0;
|
g_outIdx = 0;
|
||||||
for (int i = 0; i < TAPE_SIZE; i++) g_tape[i] = 0;
|
for (int i = 0; i < TAPE_SIZE; i++) g_tape[i] = 0;
|
||||||
|
|
@ -2742,10 +2742,10 @@ static long parseExpr(void) {
|
||||||
if (c == '+') v = v + r; else v = v - r;
|
if (c == '+') v = v + r; else v = v - r;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
__attribute__((noinline,optnone))
|
__attribute__((noinline))
|
||||||
static long evalExpr(const char *s) { g_p = s; return parseExpr(); }
|
static long evalExpr(const char *s) { g_p = s; return parseExpr(); }
|
||||||
static long g_r1, g_r2, g_r3, g_r4, g_r5;
|
static long g_r1, g_r2, g_r3, g_r4, g_r5;
|
||||||
__attribute__((noinline,optnone))
|
__attribute__((noinline))
|
||||||
static void runAll(void) {
|
static void runAll(void) {
|
||||||
g_r1 = evalExpr("3 + 4");
|
g_r1 = evalExpr("3 + 4");
|
||||||
g_r2 = evalExpr("2 * 3 + 4");
|
g_r2 = evalExpr("2 * 3 + 4");
|
||||||
|
|
@ -3142,6 +3142,30 @@ EOF
|
||||||
die "inline asm: 'inc a' missing from output"
|
die "inline asm: 'inc a' missing from output"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# iigs/toolbox.h compiles cleanly and emits the JSL $E10000 dispatch
|
||||||
|
# for at least one wrapper. Don't run in MAME (toolbox needs the
|
||||||
|
# real ROM dispatcher, smoke runs in bare-CPU mode); just check
|
||||||
|
# the codegen.
|
||||||
|
log "check: iigs/toolbox.h wrappers compile and emit JSL E10000"
|
||||||
|
cToolFile="$(mktemp --suffix=.c)"
|
||||||
|
sToolFile="$(mktemp --suffix=.s)"
|
||||||
|
trap 'rm -f "$irFile" "$sFile" "$irCallFile" "$sCallFile" "$irMaFile" "$sMaFile" "$irI8File" "$sI8File" "$cFile" "$oFile2" "$cI32File" "$oI32File" "$cFibFile" "$sFibFile" "$cMulFile" "$sMulFile" "$cAllocaFile" "$sAllocaFile" "$cStrFile" "$sStrFile" "$cIndFile" "$sIndFile" "$irCoalesceFile" "$sCoalesceFile" "$cMixFile" "$sMixFile" "$cLinkFile" "$oLinkFile" "$oLibgccFile" "$binLinkFile" "$mapLinkFile" "$cFltFile" "$oFltFile" "$oSfFile" "$binFltFile" "$mapFltFile" "$cAsmFile" "$sAsmFile" "$cToolFile" "$sToolFile"' EXIT
|
||||||
|
cat > "$cToolFile" <<'EOF'
|
||||||
|
#include <iigs/toolbox.h>
|
||||||
|
void greet(void) {
|
||||||
|
TBoxWriteCString("Hello");
|
||||||
|
TBoxBeep();
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
|
||||||
|
-S "$cToolFile" -o "$sToolFile"
|
||||||
|
if ! grep -qE '\bjsl\s+0xe10000\b' "$sToolFile"; then
|
||||||
|
die "iigs/toolbox.h: JSL \$E10000 (Tool Locator) not emitted"
|
||||||
|
fi
|
||||||
|
if ! grep -qE '\bldx\s+#0x290[Bb]\b' "$sToolFile"; then
|
||||||
|
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
|
||||||
|
fi
|
||||||
|
|
||||||
# Linker exports the synthetic __bss_start / __bss_end / etc.
|
# Linker exports the synthetic __bss_start / __bss_end / etc.
|
||||||
# symbols so crt0 can do BSS init and runtime malloc finds the
|
# symbols so crt0 can do BSS init and runtime malloc finds the
|
||||||
# heap top.
|
# heap top.
|
||||||
|
|
@ -3206,6 +3230,45 @@ EOF
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
||||||
|
log "check: link816 auto-relocates bss above text when default 0x2000 overlaps"
|
||||||
|
# Synthesize a small object that BLOATS text past 0x2000 so the
|
||||||
|
# default --bss-base 0x2000 would land inside text. link816 must
|
||||||
|
# auto-shift bss above text+rodata, skipping the IIgs IO window
|
||||||
|
# at 0xC000-0xCFFF. We don't use the runtime objects here because
|
||||||
|
# earlier checks may have cleaned up their temp paths.
|
||||||
|
cBigFile="$(mktemp --suffix=.c)"
|
||||||
|
oBigFile="$(mktemp --suffix=.o)"
|
||||||
|
binBssAutoFile="$(mktemp --suffix=.bin)"
|
||||||
|
mapBssAutoFile="$(mktemp --suffix=.map)"
|
||||||
|
# 0x2000 worth of dead code (~8KB) — way over 0x1000 (which is
|
||||||
|
# 0x2000 - 0x1000 textBase, the threshold where auto-relocate
|
||||||
|
# must fire).
|
||||||
|
{
|
||||||
|
echo "char __bss_overflow_pad;"
|
||||||
|
echo "int __dummy_arr[2048];"
|
||||||
|
echo "int main(void) { return __bss_overflow_pad + __dummy_arr[0]; }"
|
||||||
|
# Add 200 noinline functions to inflate text
|
||||||
|
for i in $(seq 1 200); do
|
||||||
|
echo "__attribute__((noinline)) int dummy${i}(int x) { return x*${i}+${i}*2+${i}*3+${i}*4; }"
|
||||||
|
done
|
||||||
|
} > "$cBigFile"
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binBssAutoFile" --text-base 0x1000 \
|
||||||
|
--map "$mapBssAutoFile" "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err || \
|
||||||
|
die "link816 bss-base test: link failed: $(cat /tmp/bsslink.err)"
|
||||||
|
bssAddr=$(grep "^__bss_start = " "$mapBssAutoFile" | awk '{print $3}' || echo "MISSING")
|
||||||
|
if [ -z "$bssAddr" ] || [ "$bssAddr" = "MISSING" ]; then
|
||||||
|
die "link816 bss-base test: __bss_start not in map"
|
||||||
|
fi
|
||||||
|
bssVal=$((bssAddr))
|
||||||
|
if [ "$bssVal" -le "$((0x2000))" ]; then
|
||||||
|
die "link816 bss-base safety: __bss_start=$bssAddr should be > 0x2000 (text overflows)"
|
||||||
|
fi
|
||||||
|
if [ "$bssVal" -ge "$((0xC000))" ] && [ "$bssVal" -lt "$((0xD000))" ]; then
|
||||||
|
die "link816 bss-base safety: __bss_start=$bssAddr is inside IIgs IO window 0xC000-0xCFFF"
|
||||||
|
fi
|
||||||
|
rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile"
|
||||||
|
|
||||||
# OMF emitter — wrap the linked binary as a single-segment OMF
|
# OMF emitter — wrap the linked binary as a single-segment OMF
|
||||||
# file ready for IIgs loading.
|
# file ready for IIgs loading.
|
||||||
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
||||||
|
|
|
||||||
|
|
@ -457,7 +457,6 @@ struct Linker {
|
||||||
Layout L;
|
Layout L;
|
||||||
L.textBase = textBase;
|
L.textBase = textBase;
|
||||||
L.textSize = curText;
|
L.textSize = curText;
|
||||||
L.bssBase = bssBase;
|
|
||||||
L.bssSize = curBss;
|
L.bssSize = curBss;
|
||||||
L.rodataBase = rodataBase ? rodataBase : (textBase + curText);
|
L.rodataBase = rodataBase ? rodataBase : (textBase + curText);
|
||||||
L.rodataSize = curRodata;
|
L.rodataSize = curRodata;
|
||||||
|
|
@ -465,6 +464,25 @@ struct Linker {
|
||||||
L.initBase = L.rodataBase + L.rodataSize;
|
L.initBase = L.rodataBase + L.rodataSize;
|
||||||
L.initSize = curInit;
|
L.initSize = curInit;
|
||||||
uint32_t initBase = L.initBase;
|
uint32_t initBase = L.initBase;
|
||||||
|
// bss-base safety: default 0x2000 only works if text doesn't
|
||||||
|
// grow past it. When text + rodata + init_array would
|
||||||
|
// overflow the 0x2000 bss start, shift bss above them so
|
||||||
|
// crt0's bss-init doesn't zero loaded text bytes. Caller
|
||||||
|
// can still force a specific bssBase via --bss-base.
|
||||||
|
// The IIgs IO window at $C000-$CFFF is unusable; if loadEnd
|
||||||
|
// would push bss into IO, jump above it to bank 1 ($10000+).
|
||||||
|
uint32_t loadEnd = L.initBase + L.initSize;
|
||||||
|
L.bssBase = bssBase;
|
||||||
|
if (L.bssBase < loadEnd) {
|
||||||
|
// Page-align upward for nicer addresses in the map.
|
||||||
|
L.bssBase = (loadEnd + 0xFF) & ~0xFFu;
|
||||||
|
// If bss would land in the IIgs IO window ($C000-$CFFF),
|
||||||
|
// skip past it to $D000. bss reads/writes via DBR=0
|
||||||
|
// would be intercepted by IO if we placed it there.
|
||||||
|
if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) {
|
||||||
|
L.bssBase = 0xD000;
|
||||||
|
}
|
||||||
|
}
|
||||||
// Publish layout now so resolveSym() can read it during reloc
|
// Publish layout now so resolveSym() can read it during reloc
|
||||||
// application (it's a const member that uses lastLayout).
|
// application (it's a const member that uses lastLayout).
|
||||||
lastLayout = L;
|
lastLayout = L;
|
||||||
|
|
@ -507,7 +525,7 @@ struct Linker {
|
||||||
+ sym.value;
|
+ sym.value;
|
||||||
} else if (kind == "bss") {
|
} else if (kind == "bss") {
|
||||||
auto it = oo.bssWithin.find(sym.shndx);
|
auto it = oo.bssWithin.find(sym.shndx);
|
||||||
addr = bssBase + oo.bssBaseInMerged
|
addr = L.bssBase + oo.bssBaseInMerged
|
||||||
+ (it == oo.bssWithin.end() ? 0 : it->second)
|
+ (it == oo.bssWithin.end() ? 0 : it->second)
|
||||||
+ sym.value;
|
+ sym.value;
|
||||||
} else if (kind == "init_array") {
|
} else if (kind == "init_array") {
|
||||||
|
|
|
||||||
|
|
@ -38,6 +38,25 @@ public:
|
||||||
|
|
||||||
void emitInstruction(const MachineInstr *MI) override;
|
void emitInstruction(const MachineInstr *MI) override;
|
||||||
|
|
||||||
|
// Reset per-function state (defensive — SkipNextSepImm should
|
||||||
|
// already be cleared by the next emitInstruction, but guarantee
|
||||||
|
// it's not leaked across functions if a function ends mid-elision).
|
||||||
|
void emitFunctionBodyEnd() override { SkipNextSepImm = -1; }
|
||||||
|
// Reset on MBB entry too — labels emit before the MIs of a new MBB,
|
||||||
|
// and a stale flag from a previous MBB's last LDAi8imm could
|
||||||
|
// accidentally swallow the new MBB's first SEP.
|
||||||
|
void emitBasicBlockStart(const MachineBasicBlock &MBB) override {
|
||||||
|
SkipNextSepImm = -1;
|
||||||
|
AsmPrinter::emitBasicBlockStart(MBB);
|
||||||
|
}
|
||||||
|
|
||||||
|
// When >=0, the next SEP #imm instruction with this immediate is
|
||||||
|
// dropped (consumed) and this field reset to -1. Used by the
|
||||||
|
// LDAi8imm expansion to elide a redundant REP/SEP pair when the
|
||||||
|
// following MI is the byte-store wrap's opening SEP — the LDAi8imm
|
||||||
|
// already left M=8, so the wrap's SEP would be a no-op.
|
||||||
|
int SkipNextSepImm = -1;
|
||||||
|
|
||||||
static char ID;
|
static char ID;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
@ -73,6 +92,22 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
W65816_MC::verifyInstructionPredicates(MI->getOpcode(),
|
W65816_MC::verifyInstructionPredicates(MI->getOpcode(),
|
||||||
getSubtargetInfo().getFeatureBits());
|
getSubtargetInfo().getFeatureBits());
|
||||||
|
|
||||||
|
// Drop a SEP that the previous LDAi8imm expansion marked redundant.
|
||||||
|
// The LDAi8imm peephole leaves M=8 set when its successor is a SEP
|
||||||
|
// #$20 — that SEP would re-set the same flag, so we elide it.
|
||||||
|
if (SkipNextSepImm >= 0 && !MI->isDebugInstr()) {
|
||||||
|
if (MI->getOpcode() == W65816::SEP &&
|
||||||
|
MI->getNumOperands() >= 1 && MI->getOperand(0).isImm() &&
|
||||||
|
MI->getOperand(0).getImm() == SkipNextSepImm) {
|
||||||
|
SkipNextSepImm = -1;
|
||||||
|
return; // consume the SEP, don't emit
|
||||||
|
}
|
||||||
|
// Conservative: any non-debug, non-matching MI between LDAi8imm
|
||||||
|
// and the expected SEP invalidates the elision (it might re-clear
|
||||||
|
// M, observe P, etc.). Reset and proceed normally.
|
||||||
|
SkipNextSepImm = -1;
|
||||||
|
}
|
||||||
|
|
||||||
W65816MCInstLower MCInstLowering(OutContext, *this);
|
W65816MCInstLower MCInstLowering(OutContext, *this);
|
||||||
|
|
||||||
// Expand codegen pseudos into their MC-layer realisations. Keep this
|
// Expand codegen pseudos into their MC-layer realisations. Keep this
|
||||||
|
|
@ -168,6 +203,14 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
// i8 immediate — requires M=1 so the CPU reads only 1 immediate
|
// i8 immediate — requires M=1 so the CPU reads only 1 immediate
|
||||||
// byte. The function runs in M=0 (prologue convention), so wrap
|
// byte. The function runs in M=0 (prologue convention), so wrap
|
||||||
// with SEP/REP. Adjacent i8 ops collapse via W65816SepRepCleanup.
|
// with SEP/REP. Adjacent i8 ops collapse via W65816SepRepCleanup.
|
||||||
|
//
|
||||||
|
// Lookahead optimization: if the next MI is `SEP #$20` (the
|
||||||
|
// STBptr inserter's byte-store wrap opener, hoisted above this
|
||||||
|
// LDAi8imm by W65816SepRepCleanup), the closing REP here would
|
||||||
|
// immediately be cancelled by that SEP. Skip both — the byte
|
||||||
|
// STA runs in our LDAi8imm-set M=8 mode. Saves 4B / 6cyc per
|
||||||
|
// hit. We mark the next-SEP-to-skip via a per-AsmPrinter flag
|
||||||
|
// so the SEP visit drops it.
|
||||||
MCInst Sep; Sep.setOpcode(W65816::SEP);
|
MCInst Sep; Sep.setOpcode(W65816::SEP);
|
||||||
Sep.addOperand(MCOperand::createImm(0x20));
|
Sep.addOperand(MCOperand::createImm(0x20));
|
||||||
EmitToStreamer(*OutStreamer, Sep);
|
EmitToStreamer(*OutStreamer, Sep);
|
||||||
|
|
@ -176,9 +219,21 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
int64_t Val = MI->getOperand(1).getImm() & 0xFF;
|
int64_t Val = MI->getOperand(1).getImm() & 0xFF;
|
||||||
Lda.addOperand(MCOperand::createImm(Val));
|
Lda.addOperand(MCOperand::createImm(Val));
|
||||||
EmitToStreamer(*OutStreamer, Lda);
|
EmitToStreamer(*OutStreamer, Lda);
|
||||||
MCInst Rep; Rep.setOpcode(W65816::REP);
|
bool SkipRep = false;
|
||||||
Rep.addOperand(MCOperand::createImm(0x20));
|
auto It = std::next(MI->getIterator());
|
||||||
EmitToStreamer(*OutStreamer, Rep);
|
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It;
|
||||||
|
if (It != MI->getParent()->end() &&
|
||||||
|
It->getOpcode() == W65816::SEP &&
|
||||||
|
It->getNumOperands() >= 1 && It->getOperand(0).isImm() &&
|
||||||
|
It->getOperand(0).getImm() == 0x20) {
|
||||||
|
SkipRep = true;
|
||||||
|
SkipNextSepImm = 0x20;
|
||||||
|
}
|
||||||
|
if (!SkipRep) {
|
||||||
|
MCInst Rep; Rep.setOpcode(W65816::REP);
|
||||||
|
Rep.addOperand(MCOperand::createImm(0x20));
|
||||||
|
EmitToStreamer(*OutStreamer, Rep);
|
||||||
|
}
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
case W65816::LDAabs: {
|
case W65816::LDAabs: {
|
||||||
|
|
|
||||||
|
|
@ -214,6 +214,71 @@ bool W65816SepRepCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
const auto &STI = MF.getSubtarget<W65816Subtarget>();
|
const auto &STI = MF.getSubtarget<W65816Subtarget>();
|
||||||
const auto &TII = *STI.getInstrInfo();
|
const auto &TII = *STI.getInstrInfo();
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
|
// Pre-pass: hoist LDAi8imm out of byte-store SEP/REP wraps.
|
||||||
|
// The post-RA scheduler can move LDAi8imm (which is marked
|
||||||
|
// hasSideEffects=0 at MIR but expands at AsmPrinter to its OWN
|
||||||
|
// SEP+LDA8+REP that toggles M) INSIDE an STBptr inserter's
|
||||||
|
// SEP/REP wrap. When that happens, the LDAi8imm's expansion
|
||||||
|
// REP fires BEFORE the byte STA, leaving the STA in M=16 — the
|
||||||
|
// store becomes a 16-bit zero write, corrupting the byte AFTER
|
||||||
|
// the intended target. Detect the pattern and hoist the
|
||||||
|
// LDAi8imm above the outer SEP. #107 strtok_r BB0_15 was this
|
||||||
|
// exact bug.
|
||||||
|
{
|
||||||
|
SmallVector<MachineInstr *, 8> SepHoists;
|
||||||
|
for (auto It = MBB.begin(); It != MBB.end(); ++It) {
|
||||||
|
if (It->getOpcode() != W65816::SEP) continue;
|
||||||
|
if (It->getNumOperands() < 1 || !It->getOperand(0).isImm()) continue;
|
||||||
|
if (It->getOperand(0).getImm() != 0x20) continue;
|
||||||
|
// Walk forward looking for LDAi8imm before any STAfi_indY
|
||||||
|
// or REP at this nesting level.
|
||||||
|
auto Walker = std::next(It);
|
||||||
|
MachineInstr *LdaToHoist = nullptr;
|
||||||
|
while (Walker != MBB.end()) {
|
||||||
|
if (Walker->isDebugInstr()) { ++Walker; continue; }
|
||||||
|
unsigned Opc = Walker->getOpcode();
|
||||||
|
// Hit a REP — wrap is closing without LDAi8imm inside.
|
||||||
|
if (Opc == W65816::REP) break;
|
||||||
|
// Hit a call / branch / asm — bail.
|
||||||
|
if (Walker->isCall() || Walker->isBranch() ||
|
||||||
|
Walker->isReturn() || Walker->isInlineAsm()) break;
|
||||||
|
// Hit an STAfi_indY — this is the byte store; an LDAi8imm
|
||||||
|
// before it would be the bug, but if we found one already
|
||||||
|
// we'd have hoisted it; nothing to do here, stop scanning.
|
||||||
|
if (Opc == W65816::STAfi_indY) break;
|
||||||
|
if (Opc == W65816::LDAi8imm) {
|
||||||
|
LdaToHoist = &*Walker;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
++Walker;
|
||||||
|
}
|
||||||
|
if (LdaToHoist)
|
||||||
|
SepHoists.push_back(LdaToHoist);
|
||||||
|
}
|
||||||
|
for (MachineInstr *Lda : SepHoists) {
|
||||||
|
// Find the SEP we entered before the LDA. Walk backward.
|
||||||
|
auto Back = Lda->getIterator();
|
||||||
|
MachineInstr *OuterSep = nullptr;
|
||||||
|
while (Back != MBB.begin()) {
|
||||||
|
--Back;
|
||||||
|
if (Back->isDebugInstr()) continue;
|
||||||
|
if (Back->getOpcode() == W65816::SEP &&
|
||||||
|
Back->getNumOperands() >= 1 &&
|
||||||
|
Back->getOperand(0).isImm() &&
|
||||||
|
Back->getOperand(0).getImm() == 0x20) {
|
||||||
|
OuterSep = &*Back;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if (Back->isCall() || Back->isBranch() || Back->isInlineAsm())
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if (!OuterSep) continue;
|
||||||
|
Lda->removeFromParent();
|
||||||
|
MBB.insert(OuterSep->getIterator(), Lda);
|
||||||
|
Changed = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
SmallVector<MachineInstr *, 8> Toggles;
|
SmallVector<MachineInstr *, 8> Toggles;
|
||||||
for (MachineInstr &MI : MBB) {
|
for (MachineInstr &MI : MBB) {
|
||||||
unsigned Opc = MI.getOpcode();
|
unsigned Opc = MI.getOpcode();
|
||||||
|
|
|
||||||
|
|
@ -199,6 +199,28 @@ static bool tryEliminateDeadStore(MachineBasicBlock &MBB,
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Returns true if any MachineInstr in MF (other than those in `ignore`)
|
||||||
|
// references frame index `FI`. Used by Pass -4 / Pass -4c safety
|
||||||
|
// checks before erasing an init store: a cross-MBB reload would
|
||||||
|
// otherwise read stale/uninitialized data. See #107.
|
||||||
|
static bool slotHasOtherRefs(const MachineFunction &MF, int FI,
|
||||||
|
ArrayRef<const MachineInstr *> ignore) {
|
||||||
|
for (const MachineBasicBlock &MBB : MF) {
|
||||||
|
for (const MachineInstr &MI : MBB) {
|
||||||
|
bool skip = false;
|
||||||
|
for (const MachineInstr *I : ignore) {
|
||||||
|
if (&MI == I) { skip = true; break; }
|
||||||
|
}
|
||||||
|
if (skip) continue;
|
||||||
|
for (const MachineOperand &MO : MI.operands()) {
|
||||||
|
if (MO.isFI() && MO.getIndex() == FI)
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
// Match `STAfi reg, FI, 0; ... ; LDAfi destReg, FI, 0` when reg == destReg
|
// Match `STAfi reg, FI, 0; ... ; LDAfi destReg, FI, 0` when reg == destReg
|
||||||
// and nothing in between clobbers reg or the slot. Erase the LDAfi.
|
// and nothing in between clobbers reg or the slot. Erase the LDAfi.
|
||||||
static bool tryEliminateLoadAfterStore(MachineBasicBlock &MBB,
|
static bool tryEliminateLoadAfterStore(MachineBasicBlock &MBB,
|
||||||
|
|
@ -348,26 +370,12 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
++It3;
|
++It3;
|
||||||
}
|
}
|
||||||
if (!IndYTarget) continue;
|
if (!IndYTarget) continue;
|
||||||
// Function-wide safety check: slotB must have NO other refs
|
// Function-wide safety check (#107): slotB must have no other
|
||||||
// besides Sta2 (which we erase) and IndYTarget (which we
|
// refs besides Sta2 (which we erase) and IndYTarget (which we
|
||||||
// rewrite). Cross-MBB reads of slotB would otherwise see
|
// rewrite). Cross-MBB reads would otherwise see stale data.
|
||||||
// stale / uninitialized values after Sta2 is gone. Same fix
|
const MachineInstr *ignoreA[] = {&Sta2, IndYTarget};
|
||||||
// class as Pass -4c (#107).
|
if (slotHasOtherRefs(MF, SlotB, ignoreA))
|
||||||
bool OtherUse = false;
|
continue;
|
||||||
for (MachineBasicBlock &MBBO : MF) {
|
|
||||||
for (MachineInstr &MIO : MBBO) {
|
|
||||||
if (&MIO == &Sta2 || &MIO == IndYTarget) continue;
|
|
||||||
for (const MachineOperand &MO : MIO.operands()) {
|
|
||||||
if (MO.isFI() && MO.getIndex() == SlotB) {
|
|
||||||
OtherUse = true;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (OtherUse) break;
|
|
||||||
}
|
|
||||||
if (OtherUse) break;
|
|
||||||
}
|
|
||||||
if (OtherUse) continue;
|
|
||||||
// Apply rewrite: IndYTarget's slotB → slotA.
|
// Apply rewrite: IndYTarget's slotB → slotA.
|
||||||
for (unsigned i = 0; i < IndYTarget->getNumOperands(); ++i) {
|
for (unsigned i = 0; i < IndYTarget->getNumOperands(); ++i) {
|
||||||
if (IndYTarget->getOperand(i).isFI() &&
|
if (IndYTarget->getOperand(i).isFI() &&
|
||||||
|
|
@ -565,27 +573,16 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
if (LaterUse) break;
|
if (LaterUse) break;
|
||||||
}
|
}
|
||||||
if (LaterUse) continue;
|
if (LaterUse) continue;
|
||||||
// Function-wide safety check: slotB must have NO references in
|
// Function-wide safety check (#107): slotB must have no other
|
||||||
// any OTHER basic block. We're erasing the only init store
|
// refs besides Sta (erased) and the IndY uses (rewritten).
|
||||||
// (Sta); a cross-MBB reload of slotB would expect that init.
|
// The IndYUses list already collects all in-MBB IndY refs;
|
||||||
// Caught by strtok_r (#107) where greedy spilled an in-place
|
// anything else (including cross-MBB) would expect the init.
|
||||||
// INA into a slot that bb.3 also initialized; killing the
|
SmallVector<const MachineInstr *, 8> ignore;
|
||||||
// bb.3 init left bb.10 reading garbage.
|
ignore.push_back(&Sta);
|
||||||
bool CrossMBBUse = false;
|
for (MachineInstr *I : IndYUses)
|
||||||
for (MachineBasicBlock &MBBO : MF) {
|
ignore.push_back(I);
|
||||||
if (&MBBO == &MBB) continue;
|
if (slotHasOtherRefs(MF, SlotB, ignore))
|
||||||
for (MachineInstr &MIO : MBBO) {
|
continue;
|
||||||
for (const MachineOperand &MO : MIO.operands()) {
|
|
||||||
if (MO.isFI() && MO.getIndex() == SlotB) {
|
|
||||||
CrossMBBUse = true;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (CrossMBBUse) break;
|
|
||||||
}
|
|
||||||
if (CrossMBBUse) break;
|
|
||||||
}
|
|
||||||
if (CrossMBBUse) continue;
|
|
||||||
// Apply rewrites: redirect every IndY use of slotB → slotC.
|
// Apply rewrites: redirect every IndY use of slotB → slotC.
|
||||||
for (MachineInstr *IndY : IndYUses) {
|
for (MachineInstr *IndY : IndYUses) {
|
||||||
for (unsigned i = 0; i < IndY->getNumOperands(); ++i) {
|
for (unsigned i = 0; i < IndY->getNumOperands(); ++i) {
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue