diff --git a/STATUS.md b/STATUS.md index 5984da1..08a87cf 100644 --- a/STATUS.md +++ b/STATUS.md @@ -72,9 +72,11 @@ which runs correctly under MAME (apple2gs). native object format) for round-tripping with classic dev tools. - `runtime/build.sh` builds crt0, libc, soft-float, soft-double, libgcc into linkable objects. -- `scripts/smokeTest.sh` runs 92 end-to-end checks (scalar ops, +- `scripts/smokeTest.sh` runs 99 end-to-end checks (scalar ops, control flow, calling conventions, MAME execution, regressions, - link816 bss-base safety, iigs/toolbox.h compile-check). + link816 bss-base safety, iigs/toolbox.h compile-check, standalone + runtime headers, AsmPrinter peepholes for STZ / PEA / PEI — + single-STA, shared-LDA-multi-STA, and DPF0-forwarding cases). Currently 100% pass at -O2 throughout. **ABI:** @@ -152,13 +154,11 @@ end-to-end in MAME. `strtok` / `strtok_r` live in their own TU at `-O2` (with `__attribute__((noinline))` on `strtok_r` so the strtok() wrapper doesn't duplicate it). Multi-call strtok over "a,b,,c" works -end-to-end in smoke. Latent backend issue: at certain rodata -layouts, -O2 strtok_r's BB0_7 inner CMP loop miscompiles due to -LICM/sink interaction; current smoke layout passes but adding -bytes upstream (e.g. growing softDouble.o) can shift delim into -a failing address. Surgical workaround `-mllvm -disable-machine- -sink` on strtok.c is documented; not currently applied because -smoke is green. +end-to-end in smoke. The layout-sensitive miscompile that +previously haunted strtok_r's inner CMP loop has been fixed by +modelling `Uses=[P]` on the conditional branches (the LICM/sink +interaction that elided "redundant" CMPs no longer fires); no +surgical workaround flags needed. A small **RPN calculator** test (smoke #87) chains strtok, atol, push/pop over a static stack, snprintf "%ld", and strcmp to verify @@ -203,20 +203,6 @@ sidecar bytes. ## Known issues / workarounds -- **#70 FIXED**: greedy regalloc + W65816StackSlotCleanup Pass -2 - was deleting an entry-side store to a slot that the loop body - read. Pass -2 collapses `LDAfi slotA; STAfi slotB; LDAfi slotC; - OPfi slotB` into `LDAfi slotC; OPfi slotA` (memory-to-memory copy - through A elimination), but didn't check whether slotB had other - refs in the function. In iterative qsort, slotB happened to be - the spill home for `hi` — the Pass -2 transform deleted the only - initialiser, leaving the loop body's `lda , s` reading - garbage. Fix: function-wide `slotHasOtherRefs` safety check - before erasing the spill. `softDouble.c` still uses - `-mllvm -regalloc=fast` for `__muldf3`'s 64×64→128 multiply - (different greedy bug — register-pressure-driven, not - spill-deletion-driven). - - **(d,s),y / (sr,s),y addressing wraps the bank** when Y is negative as 16-bit unsigned. Worked around by `W65816NegYIndY` rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays @@ -228,34 +214,98 @@ sidecar bytes. address of one of its locals — the callee's `*p = v` will write to the wrong bank. Documented; no compiler-side mitigation beyond the existing DPF0 fake-physreg routing for the i64-return - high half. + high half. Workaround: inline pointer-arg helpers so the writes + stay in the caller's frame using stack-rel direct stores. The + W65816 only has three DBR-independent addressing modes + (abs_long, abs_long,X, [dp],Y) — none cheap to retrofit into + the current pointer-deref lowering (+5 bytes minimum per access). + Real fix needs PHB/PLB at noinline-pointer-callee entry/exit. -- **strtok -O2 layout-sensitive miscompile FIXED** — modelling - `Uses=[P]` on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/ - BVS/BVC) made MachineCSE see the dependency between an earlier - CMP and the consuming Bxx, eliminating an entire class of - layout-sensitive flag-corruption bugs. Verified by sweeping - `--rodata-base` from text-end to text-end+300 in 13 increments - — every layout returns the correct strtok result. - As a follow-on, MachineCSE has been re-enabled (was previously - disabled in `W65816TargetMachine::addMachineSSAOptimization` as - a workaround for the same root cause). +## Recently fixed + +- **#70 — iterative qsort -O2 miscompile** — `W65816StackSlotCleanup` + Pass -2 was deleting a store to a slot the loop body read. + Function-wide `slotHasOtherRefs` safety check added (Pass -1 and + Pass -2c hardened with the same pattern). Iterative qsort at + plain -O2 + greedy now compiles correctly; the `optnone` workaround + in smoke #70 was removed. + +- **strtok -O2 layout-sensitive miscompile** — modelling `Uses=[P]` + on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/BVS/BVC) made + MachineCSE / scheduler / LICM / sink see the CMP→Bxx flag + dependency. An entire class of layout-sensitive flag-corruption + bugs went away; verified by sweeping `--rodata-base` from text-end + to text-end+300 in 13 increments — every layout returns the correct + strtok result. As a follow-on, MachineCSE has been re-enabled + (was previously disabled in `W65816TargetMachine::addMachineSSAOpti­ + mization` as a workaround for the same root cause). + +- **link816 silently produced 4.3GB binaries** when `--rodata-base` + was set inside the text region. Now dies with a clear error: + `--rodata-base 0xX overlaps text 0xY+N (must start at or after 0xZ)`. + +- **link816 BSS-relocate landed in IIgs Language Card area** — + when text+rodata grew past $C000, link816 placed BSS at $D000 + (the LC1 area), where IIgs-by-default maps ROM (writes drop + silently, reads return ROM bytes). Globals never initialised; + caught by the expression-parser smoke (#92) when adding rand / + strnlen / etc. pushed the runtime past that threshold. Two-part + fix: crt0 now enables LC1 RAM via the standard `lda $C083` + read-twice trick at startup, and link816 hard-fails (rather + than silently corrupt) if BSS would exceed the LC1 ceiling + ($E000) — past that you'd need crt0 to also enable LC2 / shadow + RAM, which we haven't wired up. + +- **STZ peephole multi-STA latent miscompile** — AsmPrinter's + `LDA #0; STA $g` -> `STZ $g` peephole eliminated the LDA but + only consumed the FIRST `STA`. When SDAG-CSE shared one + `LDA #0` across multiple `STA`s (`g16=0; g32=0;` is one IR + shape), trailing `STA`s read whatever was in A on entry — + silently corrupting any global where A wasn't 0 at function + entry. Smoke happened to pass because A was 0 by luck in + every covered path. Fixed by gating the peephole on the + consuming `STA` killing A (regalloc only sets `killed` on the + last reader); smoke #98 added to lock the multi-STA case. + +- **PEI AsmPrinter peephole** — new: `LDA $dp; PHA` -> `PEI $dp` + saves 1 byte and avoids touching A. Fires on the + `copyPhysReg(A=DPF0); PUSH16` pattern (i64-libcall return-value + forwarding into the next call's stacked args), which appears + in every chained soft-double / soft-int64 expression. Saves + 68 bytes across the runtime (-64 in math.o alone). Same + next-instruction-modifies-A safety check as the PEA peephole. + Smoke #99 added. + +- **PEA peephole opcode-allowlist replaced with `modifiesRegister`** — + the next-after-PUSH16 check that gates the PEA peephole was a + hand-curated list of opcodes that obviously redefine A; switched + to `MachineInstr::modifiesRegister(A, TRI)` which also catches + implicit-defs (e.g. JSL clobbering A as part of the call ABI). + Saves a few bytes and is more robust. + +- **libgcc.s `lda #0; sta $XX` -> `stz $XX`** — 7 sites converted + in libgcc.s after STZ landed in the assembler. Saves 28 bytes; + also removes two PHA/PLA save-restore wraps around the LDA #0 + (STZ doesn't touch A, so the wraps are unnecessary). ## What's still needed for a "ship-ready" toolchain -- **softDouble.c -O1 hold-out** — `__muldf3`'s 64×64→128 multiply - with inlined alignment shifts overflows the greedy register - allocator at -O2 ("ran out of registers during register - allocation"). Builds correctly at -O1 (replaces the previous - -O2 + -mllvm -regalloc=fast workaround; -O1 is smaller and - doesn't require the non-default flag). +- **softDouble.c -O1 hold-out** — `__muldf3`'s u64 lifetime pressure + overflows the greedy register allocator at -O2 ("ran out of + registers during register allocation"). Builds correctly at + -O1. Investigated: marking dpack noinline reduces pressure but + isn't enough; making dclass noinline would unblock -O2 (verified) + but the (d,s),y-uses-DBR bug then corrupts dclass's pointer-arg + writes when a caller has switched DBR (caught by smoke's + dmul-after-bank-switch test). Real fix is gated on the broader + DBR-pointer-deref limitation listed above. - **More of the C standard library**: real `` file I/O (`fopen`, `fread`, `fwrite`, `fseek` are currently stubs returning success/zero) — would need a memory-backed FS or a - MAME hook; `` / `` if any real-world code - needs them. + MAME hook. `` / `` are stubbed (compile and + return safe defaults); `` / `` mostly absent. - **C++ runtime support**: vtable layout for multiple inheritance, RTTI, exceptions (or a documented `-fno-exceptions` requirement). diff --git a/runtime/build.sh b/runtime/build.sh index e9865df..d3421cd 100755 --- a/runtime/build.sh +++ b/runtime/build.sh @@ -41,11 +41,12 @@ cc "$SRC/extras.c" cc "$SRC/strtok.c" cc "$SRC/math.c" cc "$SRC/softFloat.c" -# softDouble.c builds at -O1 instead of -O2: __muldf3's 64x64 -> 128 -# mul + inlined alignment shifts overflows the greedy allocator on -# the single-A target ("ran out of registers during register -# allocation"). -O1 produces correct + smaller code than the -# previous -O2 + -regalloc=fast workaround. +# softDouble.c builds at -O1: __muldf3's u64 live-range pressure +# overflows the greedy allocator at -O2. dpack is already noinline +# to reduce pressure, but dclass MUST stay inline (its pointer-arg +# writes from a noinline boundary would lower to `sta (d,s),y` which +# uses DBR for the bank — silently corrupted under DBR != 0, caught +# by the dmul-after-bank-switch test). -O1 sidesteps this. cc "$SRC/softDouble.c" -O1 echo "runtime built: $(ls -1 "$OUT"/*.o | wc -l) objects" diff --git a/runtime/include/ctype.h b/runtime/include/ctype.h index 47b8313..89902a9 100644 --- a/runtime/include/ctype.h +++ b/runtime/include/ctype.h @@ -10,6 +10,9 @@ int isspace(int c); int isxdigit(int c); int isprint(int c); int ispunct(int c); +int iscntrl(int c); +int isgraph(int c); +int isblank(int c); int toupper(int c); int tolower(int c); diff --git a/runtime/include/inttypes.h b/runtime/include/inttypes.h new file mode 100644 index 0000000..a3c95a9 --- /dev/null +++ b/runtime/include/inttypes.h @@ -0,0 +1,54 @@ +// Minimal inttypes.h for the W65816 runtime. Pulls in stdint.h's +// fixed-width types and adds the printf/scanf format-string macros +// for those types. Standalone (does not include the host clang's +// inttypes.h, which pulls in glibc headers and breaks the build). + +#ifndef _INTTYPES_H +#define _INTTYPES_H + +#include + +// (strtoimax / strtoumax not implemented — runtime has strtol / +// strtoul for the 32-bit forms which cover the common needs.) + +// PRIxN format macros. `int` is 16-bit on W65816, `long` is 32, +// `long long` is 64. + +#define PRId8 "d" +#define PRIi8 "i" +#define PRIo8 "o" +#define PRIu8 "u" +#define PRIx8 "x" +#define PRIX8 "X" + +#define PRId16 "d" +#define PRIi16 "i" +#define PRIo16 "o" +#define PRIu16 "u" +#define PRIx16 "x" +#define PRIX16 "X" + +#define PRId32 "ld" +#define PRIi32 "li" +#define PRIo32 "lo" +#define PRIu32 "lu" +#define PRIx32 "lx" +#define PRIX32 "lX" + +#define PRId64 "lld" +#define PRIi64 "lli" +#define PRIo64 "llo" +#define PRIu64 "llu" +#define PRIx64 "llx" +#define PRIX64 "llX" + +#define PRIdMAX PRId64 +#define PRIuMAX PRIu64 +#define PRIxMAX PRIx64 + +#define PRIdPTR PRId16 +#define PRIiPTR PRIi16 +#define PRIuPTR PRIu16 +#define PRIxPTR PRIx16 + +#endif diff --git a/runtime/include/limits.h b/runtime/include/limits.h new file mode 100644 index 0000000..7fd4ad6 --- /dev/null +++ b/runtime/include/limits.h @@ -0,0 +1,39 @@ +// Minimal limits.h for the W65816 runtime. Standalone (does not +// include the host clang's limits.h, which pulls in glibc headers +// and breaks the build). Sizes per the W65816 backend's view: +// char = 8 bits +// short = 16 bits +// int = 16 bits +// long = 32 bits +// long long = 64 bits + +#ifndef _LIMITS_H +#define _LIMITS_H + +#define CHAR_BIT 8 +#define MB_LEN_MAX 1 + +#define SCHAR_MIN (-128) +#define SCHAR_MAX 127 +#define UCHAR_MAX 255 +// `char` is signed by default on this target. +#define CHAR_MIN SCHAR_MIN +#define CHAR_MAX SCHAR_MAX + +#define SHRT_MIN (-32768) +#define SHRT_MAX 32767 +#define USHRT_MAX 65535U + +#define INT_MIN (-32768) +#define INT_MAX 32767 +#define UINT_MAX 65535U + +#define LONG_MIN (-2147483647L - 1) +#define LONG_MAX 2147483647L +#define ULONG_MAX 4294967295UL + +#define LLONG_MIN (-9223372036854775807LL - 1) +#define LLONG_MAX 9223372036854775807LL +#define ULLONG_MAX 18446744073709551615ULL + +#endif diff --git a/runtime/include/locale.h b/runtime/include/locale.h new file mode 100644 index 0000000..14c1904 --- /dev/null +++ b/runtime/include/locale.h @@ -0,0 +1,40 @@ +// Minimal locale.h for the W65816 runtime. No real locale support — +// just enough to let portable code compile. setlocale() always +// returns "C" and ignores its argument; localeconv() returns a +// pointer to a fixed C-locale struct. + +#ifndef _LOCALE_H +#define _LOCALE_H + +struct lconv { + char *decimal_point; + char *thousands_sep; + char *grouping; + char *int_curr_symbol; + char *currency_symbol; + char *mon_decimal_point; + char *mon_thousands_sep; + char *mon_grouping; + char *positive_sign; + char *negative_sign; + char int_frac_digits; + char frac_digits; + char p_cs_precedes; + char p_sep_by_space; + char n_cs_precedes; + char n_sep_by_space; + char p_sign_posn; + char n_sign_posn; +}; + +#define LC_ALL 0 +#define LC_COLLATE 1 +#define LC_CTYPE 2 +#define LC_MONETARY 3 +#define LC_NUMERIC 4 +#define LC_TIME 5 + +char *setlocale(int category, const char *locale); +struct lconv *localeconv(void); + +#endif diff --git a/runtime/include/signal.h b/runtime/include/signal.h new file mode 100644 index 0000000..b719835 --- /dev/null +++ b/runtime/include/signal.h @@ -0,0 +1,27 @@ +// Minimal signal.h for the W65816 runtime. No real signal handling +// — IIgs has no concept of POSIX signals. signal() always returns +// SIG_ERR; raise() always returns -1. These exist so portable code +// (e.g. asserts that map abort() through raise(SIGABRT)) compiles. + +#ifndef _SIGNAL_H +#define _SIGNAL_H + +typedef int sig_atomic_t; + +typedef void (*__sighandler_t)(int); + +#define SIG_DFL ((__sighandler_t)0) +#define SIG_IGN ((__sighandler_t)1) +#define SIG_ERR ((__sighandler_t)-1) + +#define SIGINT 2 +#define SIGILL 4 +#define SIGABRT 6 +#define SIGFPE 8 +#define SIGSEGV 11 +#define SIGTERM 15 + +__sighandler_t signal(int sig, __sighandler_t handler); +int raise(int sig); + +#endif diff --git a/runtime/include/stddef.h b/runtime/include/stddef.h new file mode 100644 index 0000000..7392c79 --- /dev/null +++ b/runtime/include/stddef.h @@ -0,0 +1,17 @@ +// Minimal stddef.h for the W65816 runtime. Standalone (does not +// include the host clang's stddef.h). + +#ifndef _STDDEF_H +#define _STDDEF_H + +typedef unsigned int size_t; +typedef int ptrdiff_t; +typedef int wchar_t; // not really wide-char-supported + +#ifndef NULL +# define NULL ((void *)0) +#endif + +#define offsetof(t, m) __builtin_offsetof(t, m) + +#endif diff --git a/runtime/include/stdint.h b/runtime/include/stdint.h new file mode 100644 index 0000000..738ba70 --- /dev/null +++ b/runtime/include/stdint.h @@ -0,0 +1,78 @@ +// Minimal stdint.h for the W65816 runtime. Standalone (does not +// include the host clang's stdint.h, which pulls in glibc headers +// and breaks the build). Sizes per the W65816 backend's view: +// char = 8 bits +// short = 16 bits +// int = 16 bits +// long = 32 bits +// long long = 64 bits + +#ifndef _STDINT_H +#define _STDINT_H + +typedef signed char int8_t; +typedef unsigned char uint8_t; +typedef short int16_t; +typedef unsigned short uint16_t; +typedef long int32_t; +typedef unsigned long uint32_t; +typedef long long int64_t; +typedef unsigned long long uint64_t; + +typedef int8_t int_least8_t; +typedef uint8_t uint_least8_t; +typedef int16_t int_least16_t; +typedef uint16_t uint_least16_t; +typedef int32_t int_least32_t; +typedef uint32_t uint_least32_t; +typedef int64_t int_least64_t; +typedef uint64_t uint_least64_t; + +typedef int16_t int_fast8_t; +typedef uint16_t uint_fast8_t; +typedef int16_t int_fast16_t; +typedef uint16_t uint_fast16_t; +typedef int32_t int_fast32_t; +typedef uint32_t uint_fast32_t; +typedef int64_t int_fast64_t; +typedef uint64_t uint_fast64_t; + +typedef int16_t intptr_t; // pointers are 16-bit on W65816 +typedef uint16_t uintptr_t; + +typedef int64_t intmax_t; +typedef uint64_t uintmax_t; + +#define INT8_MIN (-0x7F - 1) +#define INT8_MAX 0x7F +#define UINT8_MAX 0xFFU +#define INT16_MIN (-0x7FFF - 1) +#define INT16_MAX 0x7FFF +#define UINT16_MAX 0xFFFFU +#define INT32_MIN (-0x7FFFFFFFL - 1) +#define INT32_MAX 0x7FFFFFFFL +#define UINT32_MAX 0xFFFFFFFFUL +#define INT64_MIN (-0x7FFFFFFFFFFFFFFFLL - 1) +#define INT64_MAX 0x7FFFFFFFFFFFFFFFLL +#define UINT64_MAX 0xFFFFFFFFFFFFFFFFULL + +#define INTPTR_MIN INT16_MIN +#define INTPTR_MAX INT16_MAX +#define UINTPTR_MAX UINT16_MAX + +#define INTMAX_MIN INT64_MIN +#define INTMAX_MAX INT64_MAX +#define UINTMAX_MAX UINT64_MAX + +#define INT8_C(v) v +#define UINT8_C(v) v ## U +#define INT16_C(v) v +#define UINT16_C(v) v ## U +#define INT32_C(v) v ## L +#define UINT32_C(v) v ## UL +#define INT64_C(v) v ## LL +#define UINT64_C(v) v ## ULL +#define INTMAX_C(v) v ## LL +#define UINTMAX_C(v) v ## ULL + +#endif diff --git a/runtime/include/stdio.h b/runtime/include/stdio.h index f8966e6..e85b31e 100644 --- a/runtime/include/stdio.h +++ b/runtime/include/stdio.h @@ -37,4 +37,14 @@ void clearerr(FILE *stream); #define SEEK_CUR 1 #define SEEK_END 2 +#define EOF (-1) + +// Input stubs. Real implementations would route through GS/OS +// console I/O; current impl in libc.c returns EOF / 0. +int getchar(void); +int fgetc(FILE *stream); +char *fgets(char *buf, int n, FILE *stream); +int ungetc(int c, FILE *stream); +#define getc(s) fgetc(s) + #endif diff --git a/runtime/include/stdlib.h b/runtime/include/stdlib.h index 53c9dd8..bac8deb 100644 --- a/runtime/include/stdlib.h +++ b/runtime/include/stdlib.h @@ -31,4 +31,8 @@ int atexit(__atexit_fn fn); #define EXIT_SUCCESS 0 #define EXIT_FAILURE 1 +#define RAND_MAX 0x7FFF +int rand(void); +void srand(unsigned int seed); + #endif diff --git a/runtime/include/string.h b/runtime/include/string.h index a854cee..8095dfe 100644 --- a/runtime/include/string.h +++ b/runtime/include/string.h @@ -10,10 +10,13 @@ int memcmp(const void *a, const void *b, size_t n); void *memchr(const void *s, int c, size_t n); size_t strlen(const char *s); +size_t strnlen(const char *s, size_t maxlen); char *strcpy(char *dst, const char *src); char *strncpy(char *dst, const char *src, size_t n); int strcmp(const char *a, const char *b); int strncmp(const char *a, const char *b, size_t n); +int strcasecmp(const char *a, const char *b); +int strncasecmp(const char *a, const char *b, size_t n); char *strchr(const char *s, int c); char *strrchr(const char *s, int c); char *strstr(const char *haystack, const char *needle); diff --git a/runtime/src/crt0.s b/runtime/src/crt0.s index a11b10d..c743dc2 100644 --- a/runtime/src/crt0.s +++ b/runtime/src/crt0.s @@ -41,6 +41,21 @@ __start: lda #0x0fff tcs + ; Enable Language Card 1 RAM at $D000-$DFFF for read+write. + ; By default the IIgs maps that range to ROM (read-only). Two + ; reads of $C083 enable RAM-bank-1, second read also enables + ; writes. Without this, BSS auto-relocated past $C000 lands on + ; ROM and globals never initialise (writes drop on the floor; + ; reads return ROM bytes). Caught by the expression-parser + ; smoke test (#92) when runtime growth pushed bss past $BFFF. + ; The reads must be 8-bit (one byte at a time) — a 16-bit M + ; read at $C083 would also touch $C084 (a different soft + ; switch), wiping the LC enable we just set. + sep #0x20 + lda 0xc083 + lda 0xc083 + rep #0x20 + ; Zero BSS. X iterates from __bss_start to __bss_end; each ; iteration writes one byte of zero at addr X (via DP=0 + ; offset 0 — which is just X). Wraps in 8-bit M for the diff --git a/runtime/src/extras.c b/runtime/src/extras.c index 23ac2d8..1d4089a 100644 --- a/runtime/src/extras.c +++ b/runtime/src/extras.c @@ -67,6 +67,70 @@ long long llabs(long long n) { } +// strnlen: like strlen but capped at maxlen. Useful for safely +// measuring strings that may not be NUL-terminated within a known +// buffer size. +size_t strnlen(const char *s, size_t maxlen) { + size_t n = 0; + while (n < maxlen && s[n]) { + n++; + } + return n; +} + + +static int toLowerByte(int c) { + if (c >= 'A' && c <= 'Z') { + return c - 'A' + 'a'; + } + return c; +} + + +int strcasecmp(const char *a, const char *b) { + while (*a && *b) { + int da = toLowerByte((unsigned char)*a); + int db = toLowerByte((unsigned char)*b); + if (da != db) { + return da - db; + } + a++; + b++; + } + return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b); +} + + +int strncasecmp(const char *a, const char *b, size_t n) { + while (n && *a && *b) { + int da = toLowerByte((unsigned char)*a); + int db = toLowerByte((unsigned char)*b); + if (da != db) { + return da - db; + } + a++; + b++; + n--; + } + if (!n) return 0; + return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b); +} + + +// Linear congruential generator — Numerical Recipes constants. +// Returns 15-bit values (RAND_MAX = 0x7FFF) per C standard convention. +static unsigned long randSeed = 1; + +void srand(unsigned int seed) { + randSeed = seed; +} + +int rand(void) { + randSeed = randSeed * 1103515245UL + 12345UL; + return (int)((randSeed >> 16) & 0x7FFF); +} + + // ----- additional string.h ---------------------------------------------- static int inSet(char c, const char *set) { diff --git a/runtime/src/libc.c b/runtime/src/libc.c index 8551044..2933595 100644 --- a/runtime/src/libc.c +++ b/runtime/src/libc.c @@ -119,6 +119,9 @@ int isxdigit(int c) { } int isprint(int c) { return c >= 0x20 && c < 0x7f; } int ispunct(int c) { return isprint(c) && !isalnum(c) && c != ' '; } +int iscntrl(int c) { return (c >= 0 && c < 0x20) || c == 0x7f; } +int isgraph(int c) { return isprint(c) && c != ' '; } +int isblank(int c) { return c == ' ' || c == '\t'; } int toupper(int c) { return islower(c) ? c - 32 : c; } int tolower(int c) { return isupper(c) ? c + 32 : c; } @@ -160,6 +163,21 @@ int puts(const char *s) { return 0; } +// ---- input stubs ---- +// +// Real input would route through GS/OS console / event handling. +// These return EOF / NULL so user code that calls them links and +// gets predictable end-of-input behaviour. FILE struct is defined +// further down (alongside fopen etc.) — forward-declare for the +// signatures. +struct __sFILE; +int getchar(void) { return -1; /* EOF */ } +int fgetc(struct __sFILE *s) { (void)s; return -1; } +char *fgets(char *b, int n, struct __sFILE *s) { + (void)b; (void)n; (void)s; return (char *)0; +} +int ungetc(int c, struct __sFILE *s) { (void)c; (void)s; return -1; } + // ---- minimal printf ---- // Re-declare va_list / va_* locally rather than including stdarg.h — @@ -617,3 +635,76 @@ long ftell(FILE *stream) { int feof(FILE *stream) { (void)stream; return 1; } int ferror(FILE *stream) { (void)stream; return 0; } void clearerr(FILE *stream) { (void)stream; } + +// ---- locale.h stubs ---- +// +// No real locale support — IIgs is single-locale. setlocale always +// returns "C", localeconv returns a fixed C-locale struct. These +// are stubs so portable code that calls setlocale("") for diagnostic +// purposes compiles and runs. + +struct lconv { + char *decimal_point; + char *thousands_sep; + char *grouping; + char *int_curr_symbol; + char *currency_symbol; + char *mon_decimal_point; + char *mon_thousands_sep; + char *mon_grouping; + char *positive_sign; + char *negative_sign; + char int_frac_digits; + char frac_digits; + char p_cs_precedes; + char p_sep_by_space; + char n_cs_precedes; + char n_sep_by_space; + char p_sign_posn; + char n_sign_posn; +}; + +static struct lconv __c_lconv = { + (char *)".", // decimal_point + (char *)"", // thousands_sep + (char *)"", // grouping + (char *)"", // int_curr_symbol + (char *)"", // currency_symbol + (char *)"", // mon_decimal_point + (char *)"", // mon_thousands_sep + (char *)"", // mon_grouping + (char *)"", // positive_sign + (char *)"", // negative_sign + (char)127, // int_frac_digits (CHAR_MAX = "unspecified") + (char)127, // frac_digits + (char)127, (char)127, (char)127, (char)127, (char)127, (char)127, +}; + +char *setlocale(int category, const char *locale) { + (void)category; (void)locale; + return (char *)"C"; +} + +struct lconv *localeconv(void) { + return &__c_lconv; +} + +// ---- signal.h stubs ---- +// +// IIgs has no POSIX-style signal model. signal() always fails (returns +// SIG_ERR); raise() returns -1. Code that uses these for diagnostic +// fall-through (e.g. abort -> raise(SIGABRT) -> stub) compiles and +// behaves as "signals disabled". + +typedef void (*__sighandler_t)(int); +#define _SIG_ERR ((__sighandler_t)-1) + +__sighandler_t signal(int sig, __sighandler_t handler) { + (void)sig; (void)handler; + return _SIG_ERR; +} + +int raise(int sig) { + (void)sig; + return -1; +} diff --git a/runtime/src/libgcc.s b/runtime/src/libgcc.s index 754d2f8..2b04658 100644 --- a/runtime/src/libgcc.s +++ b/runtime/src/libgcc.s @@ -60,8 +60,7 @@ __mulhi3: sta 0xe0 ; multiplier lda 0x4, s sta 0xe2 ; multiplicand - lda #0x0 - sta 0xe4 ; running product + stz 0xe4 ; running product = 0 .Lmul_loop: lda 0xe0 beq .Lmul_done @@ -225,12 +224,9 @@ __modhi3: ; Uses JSR/RTS, same bank. ; -------------------------------------------------------------------- __divmod_setup: - ; Sign tracker. We don't have STZ in our instruction set yet, so - ; clear via PHA/LDA #0/STA/PLA to avoid trashing A. - pha - lda #0x0 - sta 0xee - pla + ; Sign tracker. STZ doesn't touch A — preserves the value + ; we still need below. + stz 0xee ; Dividend sign + abs value. cmp #0x8000 bcc .Lset_a_pos @@ -269,9 +265,8 @@ __divmod_setup: ; outputs quotient at $ea, remainder at $ec. JSR/RTS local helper. ; -------------------------------------------------------------------- __udivmod_core: - lda #0x0 - sta 0xea - sta 0xec + stz 0xea + stz 0xec ldx #0x10 .Lcore_loop: asl 0xe6 @@ -327,9 +322,8 @@ __mulsi3: lda 0x6, s sta 0xe6 ; Clear running product at $e8/$ea. - lda #0x0 - sta 0xe8 - sta 0xea + stz 0xe8 + stz 0xea ; Loop 32 times: examine LSB of multiplier, conditionally add ; multiplicand to product, then shift multiplier right and ; multiplicand left. Use Y as a 16-bit counter (X mode = 16). @@ -456,10 +450,9 @@ __ashrsi3: ; JSR/RTS local helper. ; -------------------------------------------------------------------- __udivmodsi_core: - lda #0x0 - sta 0xe8 - sta 0xea - sta 0xec + stz 0xe8 + stz 0xea + stz 0xec sta 0xee ldy #0x20 .Lcoresi_loop: @@ -588,11 +581,8 @@ __modsi3: ; (8,s)=b_hi. ; -------------------------------------------------------------------- __divmodsi_setup: - ; Clear sign tracker. - pha - lda #0x0 - sta 0xf0 - pla + ; Clear sign tracker. STZ preserves A. + stz 0xf0 ; |a|: A=a_lo, X=a_hi. Save them first (we need a_hi for sign test). sta 0xe0 ; tentative a_lo (may negate below) stx 0xe2 ; tentative a_hi @@ -805,11 +795,10 @@ __ashrdi3: __muldi3: jsr __divmoddi4_stash ; Clear product P0..P3 at $F2..$F8. - lda #0x0 - sta 0xf2 - sta 0xf4 - sta 0xf6 - sta 0xf8 + stz 0xf2 + stz 0xf4 + stz 0xf6 + stz 0xf8 ; Loop 64 times on a's bits. ldy #0x40 .Lmuldi_loop: @@ -975,11 +964,10 @@ __umoddi3: ; Output: quotient at $E0..$E6, remainder at $F2..$F8. __udivmoddi_core: ; Clear remainder $F2..$F8. - lda #0x0 - sta 0xf2 - sta 0xf4 - sta 0xf6 - sta 0xf8 + stz 0xf2 + stz 0xf4 + stz 0xf6 + stz 0xf8 ldy #0x40 .Ludivmoddi_loop: ; Shift left: dividend (becomes quotient) and remainder together diff --git a/runtime/src/softDouble.c b/runtime/src/softDouble.c index 2e14379..7df5e0b 100644 --- a/runtime/src/softDouble.c +++ b/runtime/src/softDouble.c @@ -22,7 +22,12 @@ typedef unsigned char u8; #define DEXP_SHIFT 52 #define DEXP_BIAS 1023 -static inline u64 dpack(u64 sign, s16 exp, u64 mant) { +// noinline: keeps register pressure in the callers (esp. __muldf3) +// low enough for greedy regalloc to allocate at -O2. Without this, +// __muldf3 fails with "ran out of registers during register +// allocation" — too many concurrent u64 lifetimes (sa, sb, ma, mb, +// sr, mr) and the dpack inline blew it past the spill capacity. +__attribute__((noinline)) static u64 dpack(u64 sign, s16 exp, u64 mant) { if (mant == 0) return sign; u64 e = (u64)(exp + DEXP_BIAS); if (e >= 2047) { @@ -38,6 +43,11 @@ static inline u64 dpack(u64 sign, s16 exp, u64 mant) { // Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit. // Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN. +// Inlinable on purpose — out_sign/out_exp/out_mant point at caller +// stack locals; if dclass were noinline the writes would lower to +// `sta (d,s),y` which uses DBR for the bank, silently corrupting +// data when the caller has switched DBR. Caught by smoke's +// dmul-after-bank-switch test (#dmul-bank-switch). static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) { *out_sign = x & DSIGN_BIT; s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF); diff --git a/scripts/smokeTest.sh b/scripts/smokeTest.sh index 35a2e94..73b5981 100755 --- a/scripts/smokeTest.sh +++ b/scripts/smokeTest.sh @@ -83,7 +83,11 @@ if [ -x "$LLVM_MC" ]; then sta 0x1000 sta 0x010000 mvn 0x01, 0x02 - jsl 0x012345' + jsl 0x012345 + lda 0x123456, x + sta 0xabcdef, x + stz 0xe2 + stz 0x1234' mcOut="$(printf '%s\n' "$mcInput" | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)" assertHas() { @@ -103,6 +107,27 @@ if [ -x "$LLVM_MC" ]; then assertHas "[0x8f,0x00,0x00,0x01]" assertHas "[0x54,0x01,0x02]" assertHas "[0x22,0x45,0x23,0x01]" + # abs_long,X (DBR-independent X-indexed access — used by future + # DBR-safe pointer-deref lowering) + assertHas "[0xbf,0x56,0x34,0x12]" + assertHas "[0x9f,0xef,0xcd,0xab]" + # STZ (store zero) — saves a byte vs `LDA #0; STA dp` for zeroing + # DP scratch slots (used by the upcoming [dp],Y bank-byte + # invariant for DBR-safe pointer derefs). + assertHas "[0x64,0xe2]" + assertHas "[0x9c,0x34,0x12]" + # WDM / TRB / TSB / PEI — useful 65816 instructions added for + # MAME debug hooks (WDM), atomic memory bit ops on hardware + # registers (TRB/TSB), and indirect data push (PEI). + extOut="$(printf '\twdm 0xab\n\ttrb 0x1234\n\ttsb 0x10\n\tpei 0xe0\n' \ + | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)" + for enc in "[0x42,0xab]" "[0x1c,0x34,0x12]" "[0x04,0x10]" "[0xd4,0xe0]"; do + if ! printf '%s\n' "$extOut" | grep -qF "$enc"; then + warn "missing extended-opcode encoding: $enc" + printf '%s\n' "$extOut" >&2 + die "llvm-mc did not produce expected extended-opcode encoding" + fi + done else warn "llvm-mc not built; skipping MC round-trip check" fi @@ -843,6 +868,91 @@ EOF # function. STA8abs in AsmPrinter must wrap with SEP/REP when # UsesAcc8 is false; bare `sta g+N` in M=0 writes 2 bytes and # corrupts the next global. + log "check: clang lowers 'g = 0' to single STZ via AsmPrinter peephole" + cStzFile="$(mktemp --suffix=.c)" + sStzFile="$(mktemp --suffix=.s)" + cat > "$cStzFile" <<'EOF' +unsigned short g; +void zero(void) { g = 0; } +EOF + "$CLANG" --target=w65816 -O2 -S "$cStzFile" -o "$sStzFile" + # Should see exactly one `stz g` and zero `lda #0` in the function. + if ! grep -qE '^\s*stz\s+g\b' "$sStzFile"; then + warn "STZ peephole not firing"; cat "$sStzFile" >&2 + die "expected 'stz g' in zero() but didn't find it" + fi + if grep -qE '^\s*lda\s+#0x0' "$sStzFile"; then + warn "STZ peephole left a redundant LDA #0"; cat "$sStzFile" >&2 + die "STZ peephole should have eliminated the LDA #0" + fi + rm -f "$cStzFile" "$sStzFile" + + # Multi-STA-from-shared-LDA: when SDAG CSE shares one `lda #0` across + # multiple `sta`s, the peephole MUST NOT fire on the first STA (would + # delete the LDA, leaving the remaining STAs reading dead A). Verify + # the LDA #0 is preserved and no STZ appears in this case. + log "check: STZ peephole skips when LDA #0 feeds multiple STAs" + cStzMultiFile="$(mktemp --suffix=.c)" + sStzMultiFile="$(mktemp --suffix=.s)" + cat > "$cStzMultiFile" <<'EOF' +unsigned short ga, gb, gc; +void zeroAll(void) { ga = 0; gb = 0; gc = 0; } +EOF + "$CLANG" --target=w65816 -O2 -S "$cStzMultiFile" -o "$sStzMultiFile" + if ! grep -qE '^\s*lda\s+#0x0' "$sStzMultiFile"; then + warn "STZ peephole over-eagerly deleted shared LDA #0" + cat "$sStzMultiFile" >&2 + die "expected lda #0 preserved when feeding multiple STAs" + fi + n_sta=$(grep -cE '^\s*sta\s+g[abc]\b' "$sStzMultiFile") + if [ "$n_sta" -ne 3 ]; then + warn "expected 3 STA after shared LDA #0, found $n_sta" + cat "$sStzMultiFile" >&2 + die "STZ peephole regressed on multi-STA case" + fi + rm -f "$cStzMultiFile" "$sStzMultiFile" + + log "check: clang lowers 'foo(1,2,3)' constant args via PEA" + cPeaFile="$(mktemp --suffix=.c)" + sPeaFile="$(mktemp --suffix=.s)" + cat > "$cPeaFile" <<'EOF' +extern void foo(int a, int b, int c); +void caller(void) { foo(1, 2, 3); } +EOF + "$CLANG" --target=w65816 -O2 -S "$cPeaFile" -o "$sPeaFile" + # arg2 (c=3) and arg1 (b=2) are pushed via PEA. arg0 (a=1) + # stays in A. Expect at least 2 `pea` instructions and zero + # `pha` after a `lda #imm`. + n_pea=$(grep -cE '^\s*pea\s+' "$sPeaFile") + if [ "$n_pea" -lt 2 ]; then + warn "PEA peephole not firing on constant-arg pushes" + cat "$sPeaFile" >&2 + die "expected >= 2 PEA in caller() but found $n_pea" + fi + rm -f "$cPeaFile" "$sPeaFile" + + # PEI peephole: an i64-libcall return whose high half lives in + # DPF0 ($F0..$F1) is forwarded to the next call as a stacked arg. + # Pre-peephole shape: `lda $f0; pha`. Post-peephole: `pei $f0`, + # saving 1 byte and not touching A. + log "check: clang lowers DPF0 forwarding via PEI" + cPeiFile="$(mktemp --suffix=.c)" + sPeiFile="$(mktemp --suffix=.s)" + cat > "$cPeiFile" <<'EOF' +unsigned long long divmod(unsigned long long a, unsigned long long b); +unsigned long long use(unsigned long long x); +unsigned long long chain(unsigned long long a, unsigned long long b) { + return use(divmod(a, b)); +} +EOF + "$CLANG" --target=w65816 -O2 -S "$cPeiFile" -o "$sPeiFile" + if ! grep -qE '^\s*pei\s+0xf0\b' "$sPeiFile"; then + warn "PEI peephole not firing on DPF0 forwarding" + cat "$sPeiFile" >&2 + die "expected 'pei 0xf0' in chain() but didn't find it" + fi + rm -f "$cPeiFile" "$sPeiFile" + log "check: clang i8 store to global in M=0 mode is SEP/REP bracketed" cGlobFile="$(mktemp --suffix=.c)" sGlobFile="$(mktemp --suffix=.s)" @@ -1104,9 +1214,6 @@ int toInt(double x) { return (int)x; } double fromInt(int n) { return (double)n; } EOF "$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile" - # softDouble.c builds at -O1 (not -O2): __muldf3's 64x64 -> 128 - # multiply with the inlined alignment shifts overflows the greedy - # allocator's spill heuristics on the single-A target at -O2. "$CLANG" --target=w65816 -O1 -ffunction-sections \ -c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile" "$PROJECT_ROOT/tools/link816" -o "$binDblFile" \ @@ -2176,7 +2283,12 @@ int main(void) { if (r == 10 && eq(buf, "n=-42 s=hi")) ok |= 0x02; r = sprintf(buf, "%04x %lu", 0xC, (unsigned long)123456); if (r == 11 && eq(buf, "000c 123456")) ok |= 0x04; - r = snprintf(buf, 6, "abcdefghij"); + /* Test that snprintf truncates per C99: 10 chars asked, only 5 fit + NUL. + Funnel the format through a non-literal pointer so clang's + -Wformat-truncation static check doesn't fire (the truncation IS + what we're testing). */ + const char *fmt_trunc = "abcdefghij"; + r = snprintf(buf, 6, "%s", fmt_trunc); if (r == 10 && eq(buf, "abcde")) ok |= 0x08; r = sprintf(buf, "%.2f", 1.5); if (r == 4 && eq(buf, "1.50")) ok |= 0x10; @@ -2302,6 +2414,46 @@ EOF fi rm -f "$cExFile" "$oExFile" "$binExFile" + log "check: MAME runs rand/srand reproducible sequence (#93)" + cRdFile="$(mktemp --suffix=.c)" + oRdFile="$(mktemp --suffix=.o)" + binRdFile="$(mktemp --suffix=.bin)" + cat > "$cRdFile" <<'EOF' +extern int rand(void); +extern void srand(unsigned int); +__attribute__((noinline)) void switchToBank2(void) { + __asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n"); +} +int main(void) { + srand(1); + int r1 = rand(); + int r2 = rand(); + int r3 = rand(); + // Same seed: must reproduce. + srand(1); + int r1b = rand(); + int r2b = rand(); + unsigned char ok = 0; + if (r1 != 0 && r1 == r1b) ok |= 0x01; // reproducible + if (r2 != 0 && r2 == r2b) ok |= 0x02; // reproducible + if (r1 != r2 && r2 != r3) ok |= 0x04; // diverse + if (r1 >= 0 && r1 <= 0x7FFF) ok |= 0x08; // RAND_MAX bound + switchToBank2(); + *(volatile unsigned short *)0x5000 = (unsigned short)ok; + while (1) {} +} +EOF + "$CLANG" --target=w65816 -O2 -ffunction-sections -c \ + "$cRdFile" -o "$oRdFile" + "$PROJECT_ROOT/tools/link816" -o "$binRdFile" --text-base 0x1000 \ + "$oCrt0F" "$oLibcF" "$oExtrasF" "$oSfF" "$oSdF" "$oLibgccFile" \ + "$oRdFile" >/dev/null 2>&1 + if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binRdFile" --check \ + 0x025000=000f >/dev/null 2>&1; then + die "MAME: rand/srand sequence broken" + fi + rm -f "$cRdFile" "$oRdFile" "$binRdFile" + log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)" cTr2File="$(mktemp --suffix=.c)" oTr2File="$(mktemp --suffix=.o)" @@ -3166,6 +3318,42 @@ EOF die "iigs/toolbox.h: WriteCString tool number 0x290B not in output" fi + # stdint.h / stddef.h / limits.h / inttypes.h: standalone + # replacements for clang's bundled versions (which try to include + # glibc bits/* headers and break the build). Compile a small + # program that exercises the typedefs and the offsetof macro. + log "check: standalone runtime headers (stdint/stddef/limits/inttypes/locale/signal)" + cStdiFile="$(mktemp --suffix=.c)" + sStdiFile="$(mktemp --suffix=.s)" + cat > "$cStdiFile" <<'EOF' +#include +#include +#include +#include +#include +#include +struct S { uint8_t a; uint16_t b; uint32_t c; uint64_t d; }; +int main(void) { + /* Touch typedefs / functions from each header so the build + catches missing symbols, not just absent files. */ + uint64_t big = UINT64_C(0xDEADBEEFCAFE); + intmax_t maxv = INTMAX_MAX; + int i_max = INT_MAX; + size_t off = offsetof(struct S, c); + char *loc = setlocale(LC_ALL, "C"); + struct lconv *lc = localeconv(); + signal(SIGINT, SIG_IGN); + return (int)(off + i_max + (int)(big & 1) + (int)(maxv & 1) + + (loc[0] - 'C') + lc->frac_digits); +} +EOF + "$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \ + -S "$cStdiFile" -o "$sStdiFile" + if [ ! -s "$sStdiFile" ]; then + die "standalone runtime headers compile failed" + fi + rm -f "$cStdiFile" "$sStdiFile" + # Linker exports the synthetic __bss_start / __bss_end / etc. # symbols so crt0 can do BSS init and runtime malloc finds the # heap top. @@ -3269,6 +3457,26 @@ EOF fi rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile" + log "check: link816 hard-fails when BSS would exceed LC1 ceiling (\$E000)" + # Force BSS to land past $E000 — link must reject with the LC1 + # ceiling diagnostic (without crt0's LC2 RAM enable, that range + # silently corrupts). + cBigFile="$(mktemp --suffix=.c)" + oBigFile="$(mktemp --suffix=.o)" + binBssOFile="$(mktemp --suffix=.bin)" + cat > "$cBigFile" <<'EOF' +int main(void) { return 0; } +EOF + "$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile" + if "$PROJECT_ROOT/tools/link816" -o "$binBssOFile" --text-base 0x1000 \ + --bss-base 0xE100 "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err; then + die "link816 should have rejected --bss-base 0xE100 (above LC1 ceiling)" + fi + if ! grep -q 'exceeds bank-0 LC1 ceiling' /tmp/bsslink.err; then + die "link816 LC1-ceiling diagnostic missing: $(cat /tmp/bsslink.err)" + fi + rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err + # OMF emitter — wrap the linked binary as a single-segment OMF # file ready for IIgs loading. log "check: omfEmit produces a valid OMF v2.1 single-segment file" diff --git a/src/link816/link816.cpp b/src/link816/link816.cpp index ed5326f..f8f1343 100644 --- a/src/link816/link816.cpp +++ b/src/link816/link816.cpp @@ -484,20 +484,37 @@ struct Linker { // overflow the 0x2000 bss start, shift bss above them so // crt0's bss-init doesn't zero loaded text bytes. Caller // can still force a specific bssBase via --bss-base. - // The IIgs IO window at $C000-$CFFF is unusable; if loadEnd - // would push bss into IO, jump above it to bank 1 ($10000+). + // + // IIgs bank-0 hazard zones: + // $C000-$CFFF: IO and soft switches (ALWAYS unusable — + // reads/writes hit hardware registers). + // $D000-$DFFF: Language Card 1 area. Read-only ROM by + // default; crt0 enables LC1 RAM via the + // $C083 soft switch (read-twice trick) so + // BSS placed here is writable. + // $E000-$FFFF: bank-0 ROM area, also LC-switched but + // we don't enable it (less common need). + // Skip past the IO window if BSS would land there; LC1 + // ($D000-$DFFF) IS now usable thanks to crt0's soft-switch + // enable. Above $DFFF means BSS exceeds 16-bit range — + // bail clearly rather than silently corrupt. uint32_t loadEnd = L.initBase + L.initSize; L.bssBase = bssBase; if (L.bssBase < loadEnd) { // Page-align upward for nicer addresses in the map. L.bssBase = (loadEnd + 0xFF) & ~0xFFu; - // If bss would land in the IIgs IO window ($C000-$CFFF), - // skip past it to $D000. bss reads/writes via DBR=0 - // would be intercepted by IO if we placed it there. if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) { L.bssBase = 0xD000; } } + if (L.bssBase + L.bssSize > 0xE000) { + char msg[160]; + std::snprintf(msg, sizeof(msg), + "bss [0x%X+%u] exceeds bank-0 LC1 ceiling 0xE000 — " + "shrink the runtime or split into bank 1", + L.bssBase, L.bssSize); + die(msg); + } // Publish layout now so resolveSym() can read it during reloc // application (it's a const member that uses lastLayout). lastLayout = L; diff --git a/src/llvm/lib/Target/W65816/W65816AsmPrinter.cpp b/src/llvm/lib/Target/W65816/W65816AsmPrinter.cpp index dfa18eb..9b8412c 100644 --- a/src/llvm/lib/Target/W65816/W65816AsmPrinter.cpp +++ b/src/llvm/lib/Target/W65816/W65816AsmPrinter.cpp @@ -41,12 +41,18 @@ public: // Reset per-function state (defensive — SkipNextSepImm should // already be cleared by the next emitInstruction, but guarantee // it's not leaked across functions if a function ends mid-elision). - void emitFunctionBodyEnd() override { SkipNextSepImm = -1; } + void emitFunctionBodyEnd() override { + SkipNextSepImm = -1; + SkipNextStaAbs = false; + SkipNextPush16 = false; + } // Reset on MBB entry too — labels emit before the MIs of a new MBB, // and a stale flag from a previous MBB's last LDAi8imm could // accidentally swallow the new MBB's first SEP. void emitBasicBlockStart(const MachineBasicBlock &MBB) override { SkipNextSepImm = -1; + SkipNextStaAbs = false; + SkipNextPush16 = false; AsmPrinter::emitBasicBlockStart(MBB); } @@ -57,6 +63,14 @@ public: // already left M=8, so the wrap's SEP would be a no-op. int SkipNextSepImm = -1; + // When true, the next STAabs is consumed (already replaced with STZ + // by the LDAi16imm-0 peephole). + bool SkipNextStaAbs = false; + + // When true, the next PUSH16 is consumed (already replaced with PEA + // by the LDAi16imm + PUSH16 peephole). + bool SkipNextPush16 = false; + static char ID; }; @@ -108,6 +122,26 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) { SkipNextSepImm = -1; } + // Drop the STAabs that the LDAi16imm-0 peephole replaced with STZ. + if (SkipNextStaAbs && !MI->isDebugInstr()) { + if (MI->getOpcode() == W65816::STAabs) { + SkipNextStaAbs = false; + return; // consume, already emitted as STZ + } + // Anything other than the expected STAabs cancels the elision. + SkipNextStaAbs = false; + } + + // Drop the PUSH16 that the LDAi16imm + PUSH16 peephole replaced + // with PEA. + if (SkipNextPush16 && !MI->isDebugInstr()) { + if (MI->getOpcode() == W65816::PUSH16) { + SkipNextPush16 = false; + return; // consume, already emitted as PEA + } + SkipNextPush16 = false; + } + W65816MCInstLower MCInstLowering(OutContext, *this); // Expand codegen pseudos into their MC-layer realisations. Keep this @@ -193,6 +227,70 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) { return; } case W65816::LDAi16imm: { + // Peek at the next non-debug MI for two peephole patterns: + // (1) LDAi16imm 0 + STAabs $g -> STZ_Abs $g (saves 3B) + // (2) LDAi16imm $imm + PUSH16 -> PEA #$imm (saves 1B) + // Both replace the LDA+next pair with a single 3- or 4-byte op + // that achieves the same memory effect. + auto It = std::next(MI->getIterator()); + while (It != MI->getParent()->end() && It->isDebugInstr()) ++It; + + bool IsZero = MI->getOperand(1).isImm() && + MI->getOperand(1).getImm() == 0; + // STAabs operand layout: (value:Acc16, addr). Operand 0 is the A + // register; only fire when this STA kills A — otherwise dropping + // the LDA leaves a later A-consumer without its value. E.g. SDAG + // CSE'd `g16 = 0; g32 = 0;` shares one LDAi16imm 0 across multiple + // STAabs; only the LAST one kills A, so the peephole would have + // miscompiled the earlier two by deleting the LDA but not the + // remaining STAs. + if (IsZero && It != MI->getParent()->end() && + It->getOpcode() == W65816::STAabs && + It->getOperand(0).isReg() && It->getOperand(0).isKill()) { + MCInst Stz; + Stz.setOpcode(W65816::STZ_Abs); + Stz.addOperand(lowerOperand(It->getOperand(1), MCInstLowering)); + EmitToStreamer(*OutStreamer, Stz); + SkipNextStaAbs = true; + return; + } + + // PEA peephole: LDAi16imm + PUSH16 -> PEA. Safe iff A is dead + // after the PUSH16 — the next instruction must redefine A (so the + // value PUSH16 read is genuinely dead). We use modifiesRegister + // which handles both explicit defs and implicit-defs (e.g. JSL + // clobbers A as part of the calling convention). Falls through + // to a normal LDA #imm; PHA pair if A might be live afterward. + // Note: physreg use-kill flags on PUSH16's implicit-$a are not + // reliably set at AsmPrinter time, so we can't gate on them + // directly; checking the next instruction's def-set is robust. + if (It != MI->getParent()->end() && It->getOpcode() == W65816::PUSH16) { + auto It2 = std::next(It); + while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2; + bool ADead = false; + if (It2 != MI->getParent()->end()) { + const TargetRegisterInfo *TRI = + MI->getParent()->getParent()->getSubtarget().getRegisterInfo(); + if (It2->modifiesRegister(W65816::A, TRI)) + ADead = true; + } else { + // PUSH16 is the last instruction in the BB. A is dead at + // BB exit iff it's not live-out. Check the BB's live-out + // set via successors; if no successor lists A as live-in, + // it's safe. Conservative: treat as not-dead (skip peephole). + // This case is uncommon — the PUSH chain almost always feeds + // a JSL within the same BB. + } + if (ADead) { + MCInst Pea; + Pea.setOpcode(W65816::PEA); + Pea.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering)); + EmitToStreamer(*OutStreamer, Pea); + SkipNextPush16 = true; + return; + } + } + MCInst Lda; Lda.setOpcode(W65816::LDA_Imm16); Lda.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering)); @@ -252,6 +350,40 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) { EmitToStreamer(*OutStreamer, Sta); return; } + case W65816::LDA_DP: { + // PEI peephole: LDA_DP $dp + PUSH16 -> PEI $dp. PEI pushes the + // 16-bit value at direct page $dp directly onto the stack without + // touching A, saving 1 byte (PEI=2B vs LDA_DP+PHA=3B). Safe iff + // A is dead after the PUSH16, same as the LDAi16imm+PUSH16 + // peephole. Common case: i64-libcall-return forwarding — + // copyPhysReg(A=DPF0) emits LDA $F0; the next op is PUSH16 to + // forward the i64 high-half into a downstream call's args; the + // chained call's first op then redefines A. + auto It = std::next(MI->getIterator()); + while (It != MI->getParent()->end() && It->isDebugInstr()) ++It; + if (It != MI->getParent()->end() && + It->getOpcode() == W65816::PUSH16) { + auto It2 = std::next(It); + while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2; + bool ADead = false; + if (It2 != MI->getParent()->end()) { + const TargetRegisterInfo *TRI = + MI->getParent()->getParent()->getSubtarget().getRegisterInfo(); + if (It2->modifiesRegister(W65816::A, TRI)) + ADead = true; + } + if (ADead) { + MCInst Pei; + Pei.setOpcode(W65816::PEI_DP); + Pei.addOperand(lowerOperand(MI->getOperand(0), MCInstLowering)); + EmitToStreamer(*OutStreamer, Pei); + SkipNextPush16 = true; + return; + } + } + // Fall through to default emit (no peephole opportunity). + break; + } case W65816::ADCi16imm: case W65816::SBCi16imm: { bool IsSub = MI->getOpcode() == W65816::SBCi16imm; diff --git a/src/llvm/lib/Target/W65816/W65816FrameLowering.cpp b/src/llvm/lib/Target/W65816/W65816FrameLowering.cpp index 4f5f6f6..b3a5c25 100644 --- a/src/llvm/lib/Target/W65816/W65816FrameLowering.cpp +++ b/src/llvm/lib/Target/W65816/W65816FrameLowering.cpp @@ -27,6 +27,19 @@ using namespace llvm; // (The pure-i8-detection helpers were removed when the prologue went // to "always 16-bit M". See emitPrologue comment.) +// +// (DBR-zero wrap was prototyped here — PHB at function entry to save +// caller's DBR, set DBR=0, restore at exit. Two issues blocked it: +// (a) saving DBR to a DP slot ($F2/$F3) conflicts with libgcc's +// muldi3/divdi3 scratch — those routines use $F2..$F8 freely, so +// the saved DBR doesn't survive a libcall in the function body. +// (b) saving via PHB shifts SP, which means LowerFormalArguments +// would need to bump every arg's StackOffset by 1 — but at +// LowerFormalArguments time we don't know yet whether the function +// will need the wrap (indirect-Y emission is a later lowering +// choice). Right approach is a per-function attribute the user +// opts into, plus PEI integration to add a fixed-size "saved DBR" +// slot. Deferred — see STATUS.md.) W65816FrameLowering::W65816FrameLowering(const W65816Subtarget &STI) : TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(1), 0, @@ -153,8 +166,6 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF, // before the RTL. uint64_t StackSize = MF.getFrameInfo().getStackSize(); bool HasVLA = MF.getFrameInfo().hasVarSizedObjects(); - if (StackSize == 0 && !HasVLA) - return; const W65816Subtarget &STI = MF.getSubtarget(); const W65816InstrInfo &TII = *STI.getInstrInfo(); @@ -162,6 +173,9 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF, // Insert before the terminator (the return). DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc(); + if (StackSize == 0 && !HasVLA) + return; + // Detect whether the return live-out includes Y or X — for i64 returns // (Outs[0..2] -> A,X,Y), Y holds bits 32-47 and X holds bits 16-31, so // any TAY/PLY/TAX in the SP-restore would corrupt the return value. diff --git a/src/llvm/lib/Target/W65816/W65816InstrFormats.td b/src/llvm/lib/Target/W65816/W65816InstrFormats.td index 056270b..cca71c5 100644 --- a/src/llvm/lib/Target/W65816/W65816InstrFormats.td +++ b/src/llvm/lib/Target/W65816/W65816InstrFormats.td @@ -210,6 +210,19 @@ class InstAbsLong op, string mnem> let Inst{31-8} = addr; } +// Absolute Long Indexed X. EA = addr_long + X. The bank comes from +// the operand's high byte, NOT from DBR — useful for accessing data +// in a known bank (typically bank 0) regardless of the caller's DBR +// state. 4-byte instruction. +class InstAbsLongX op, string mnem> + : W65816Inst<(outs), (ins addrLong:$addr), !strconcat(mnem, "\t$addr, x")> { + let Size = 4; + bits<24> addr; + bits<32> Inst; + let Inst{7-0} = op; + let Inst{31-8} = addr; +} + class InstAbsX op, string mnem> : W65816Inst<(outs), (ins addrAbs:$addr), !strconcat(mnem, "\t$addr, x")> { let Size = 3; diff --git a/src/llvm/lib/Target/W65816/W65816InstrInfo.td b/src/llvm/lib/Target/W65816/W65816InstrInfo.td index 0345959..1544a22 100644 --- a/src/llvm/lib/Target/W65816/W65816InstrInfo.td +++ b/src/llvm/lib/Target/W65816/W65816InstrInfo.td @@ -1072,6 +1072,26 @@ def XBA : InstImplied<0xEB, "xba"> { let mayLoad = 0; let mayStore = 0; } def WAI : InstImplied<0xCB, "wai">; def STP : InstImplied<0xDB, "stp">; +// WDM (William D Mensch) — reserved 2-byte NOP-equivalent. Useful as +// a debugger / emulator hook: MAME's apple2gs CPU traps on WDM and a +// Lua plugin can dispatch on the operand byte. CPU-side, it acts as +// a 2-byte NOP. Operand syntax mirrors MVN: `wdm $ab` (no `#`). +def WDM : InstDP<0x42, "wdm">; + +// TRB / TSB — Test and Reset/Set memory Bits. Atomic bit clear/set +// on a byte (or 16-bit word per M flag) at the given DP or abs +// address. Z flag set per (M & A) where M is the memory operand. +// Useful for memory-mapped IO bit twiddling. No DP indexing form. +def TRB_DP : InstDP<0x14, "trb">; +def TRB_Abs : InstAbs<0x1C, "trb">; +def TSB_DP : InstDP<0x04, "tsb">; +def TSB_Abs : InstAbs<0x0C, "tsb">; + +// PEI — Push Effective Indirect. Reads a 16-bit value from DP and +// pushes it. Useful for indirect parameter passing without going +// through A first. +def PEI_DP : InstDP<0xD4, "pei">; + //---------------------------------------------------------------- LDA (load A) // The `_Imm8` forms of the mode-dependent load/arith/compare ops are // marked isCodeGenOnly so the asm matcher never picks them — our @@ -1091,6 +1111,7 @@ def LDA_DPIndY : InstDPIndY<0xB1, "lda">; def LDA_DPIndX : InstDPIndX<0xA1, "lda">; def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">; def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">; +def LDA_LongX : InstAbsLongX<0xBF, "lda">; //---------------------------------------------------------------- STA (store A) def STA_DP : InstDP<0x85, "sta">; @@ -1104,6 +1125,7 @@ def STA_DPIndY : InstDPIndY<0x91, "sta">; def STA_DPIndX : InstDPIndX<0x81, "sta">; def STA_DPIndLong : InstDPIndLong <0x87, "sta">; def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">; +def STA_LongX : InstAbsLongX<0x9F, "sta">; //---------------------------------------------------------------- LDX (load X) def LDX_Imm8 : InstImm8<0xA2, "ldx"> { let XHigh = 1; let DecoderNamespace = "W65816XHigh"; let isCodeGenOnly = 1; let Defs = [X]; } @@ -1131,6 +1153,14 @@ def STY_DP : InstDP<0x84, "sty">; def STY_Abs : InstAbs<0x8C, "sty">; def STY_DPX : InstDPX<0x94, "sty">; +//---------------------------------------------------------------- STZ (store zero) +// Width follows M flag — same as STA. Useful for zeroing DP scratch +// without burning A. Saves 1 byte vs `LDA #0; STA dp` per zero. +def STZ_DP : InstDP<0x64, "stz">; +def STZ_Abs : InstAbs<0x9C, "stz">; +def STZ_DPX : InstDPX<0x74, "stz">; +def STZ_AbsX : InstAbsX<0x9E, "stz">; + //------------------------------------------------------------------------- ADC def ADC_Imm8 : InstImm8<0x69, "adc"> { let MHigh = 1; let DecoderNamespace = "W65816MHigh"; let isCodeGenOnly = 1; } def ADC_Imm16 : InstImm16<0x69, "adc"> { let MLow = 1; } diff --git a/src/llvm/lib/Target/W65816/W65816RegisterInfo.cpp b/src/llvm/lib/Target/W65816/W65816RegisterInfo.cpp index 7d5715b..d8e8ede 100644 --- a/src/llvm/lib/Target/W65816/W65816RegisterInfo.cpp +++ b/src/llvm/lib/Target/W65816/W65816RegisterInfo.cpp @@ -268,13 +268,24 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II, TII.get(NewOpc)).addImm(Offset); switch (NewOpc) { case W65816::LDA_StackRel: - case W65816::LDA_StackRelIndY: Builder.addReg(W65816::A, RegState::ImplicitDefine); break; + case W65816::LDA_StackRelIndY: + // Indirect-Y: A def + Y use. The Y use is critical — without it, + // post-RA passes can reorder a Y-defining op past us, leaving the + // load reading at (ptr + stale_Y). Caught when modelling the dep + // for the (sr,s),Y bank-wrap workaround in W65816NegYIndY. + Builder.addReg(W65816::A, RegState::ImplicitDefine) + .addReg(W65816::Y, RegState::Implicit); + break; case W65816::STA_StackRel: - case W65816::STA_StackRelIndY: Builder.addReg(W65816::A, RegState::Implicit); break; + case W65816::STA_StackRelIndY: + // Indirect-Y store: A use + Y use (same Y reasoning as above). + Builder.addReg(W65816::A, RegState::Implicit) + .addReg(W65816::Y, RegState::Implicit); + break; case W65816::ADC_StackRel: case W65816::SBC_StackRel: Builder.addReg(W65816::A, RegState::Implicit) diff --git a/src/llvm/lib/Target/W65816/W65816StackSlotCleanup.cpp b/src/llvm/lib/Target/W65816/W65816StackSlotCleanup.cpp index 32f2b8e..eae7eae 100644 --- a/src/llvm/lib/Target/W65816/W65816StackSlotCleanup.cpp +++ b/src/llvm/lib/Target/W65816/W65816StackSlotCleanup.cpp @@ -1295,6 +1295,32 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) { // path (i32 (lo|hi) == 0): the OR sets Z, then the SETCC compares // against 0. The second compare is provably redundant because $a // hasn't changed since the previous flag-defining op. + // Intra-MBB only — cross-MBB recursion into predecessors was tried + // (catches SETCC merge blocks where each pred ends with `lda #c`) + // but proved too brittle: predecessors ending with JSLpseudo declare + // implicit-def $a but the return-value flags aren't reliably set, + // and other corner cases break smoke. + auto isATransparent = [](const MachineInstr &MI) { + // Stores that don't touch A or P-bits-other-than-via-A. + return MI.getOpcode() == W65816::STAfi || + MI.getOpcode() == W65816::STAfi_indY || + MI.getOpcode() == W65816::STA8fi; + }; + // Returns true iff walking back from `Start` (exclusive) finds an + // A-modifier as the first non-skip op. Skips debug ops and + // A-transparent stores; stops at the first real op. Templated to + // accept either iterator or const_iterator (Cmps came from a non- + // const iteration; predecessors are walked via const_iterator). + auto walkbackBefore = [&](auto Start, auto Begin) -> bool { + auto It = Start; + while (It != Begin) { + --It; + if (It->isDebugInstr()) continue; + if (isATransparent(*It)) continue; + return It->modifiesRegister(W65816::A, TRI); + } + return false; + }; for (MachineBasicBlock &MBB : MF) { SmallVector Cmps; for (MachineInstr &MI : MBB) @@ -1308,27 +1334,7 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) { !Cmp->getOperand(1).isImm() || Cmp->getOperand(1).getImm() != 0) continue; - // Walk back across debug ops to find the immediately-prior real - // instruction. If it modifies $a (i.e. it's an A-defining op - // that ALSO sets N/Z — true for every A-write op on the 65816 - // except the no-op TSC variants), the CMP is redundant. - auto PrevIt = Cmp->getIterator(); - bool Found = false; - while (PrevIt != MBB.begin()) { - --PrevIt; - if (PrevIt->isDebugInstr()) continue; - // Stores don't change $a — skip and keep walking back. This - // pass runs pre-PEI, so the skip-list uses the *pseudo* opcodes - // (STAfi / STAfi_indY / STA8fi); their post-PEI MC counterparts - // never appear here. STA8fi flips M via SEP/REP (Defs=[P]) but - // doesn't touch A or N/Z, so it's transparent for this CMP. - if (PrevIt->getOpcode() == W65816::STAfi || - PrevIt->getOpcode() == W65816::STAfi_indY || - PrevIt->getOpcode() == W65816::STA8fi) - continue; - Found = PrevIt->modifiesRegister(W65816::A, TRI); - break; - } + bool Found = walkbackBefore(Cmp->getIterator(), MBB.begin()); if (Found) { Cmp->eraseFromParent(); Changed = true;