Checkpoint

This commit is contained in:
Scott Duensing 2026-05-01 20:24:30 -05:00
parent 18ef7e1fa6
commit d6a34075a5
25 changed files with 1042 additions and 117 deletions

134
STATUS.md
View file

@ -72,9 +72,11 @@ which runs correctly under MAME (apple2gs).
native object format) for round-tripping with classic dev tools. native object format) for round-tripping with classic dev tools.
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double, - `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
libgcc into linkable objects. libgcc into linkable objects.
- `scripts/smokeTest.sh` runs 92 end-to-end checks (scalar ops, - `scripts/smokeTest.sh` runs 99 end-to-end checks (scalar ops,
control flow, calling conventions, MAME execution, regressions, control flow, calling conventions, MAME execution, regressions,
link816 bss-base safety, iigs/toolbox.h compile-check). link816 bss-base safety, iigs/toolbox.h compile-check, standalone
runtime headers, AsmPrinter peepholes for STZ / PEA / PEI —
single-STA, shared-LDA-multi-STA, and DPF0-forwarding cases).
Currently 100% pass at -O2 throughout. Currently 100% pass at -O2 throughout.
**ABI:** **ABI:**
@ -152,13 +154,11 @@ end-to-end in MAME.
`strtok` / `strtok_r` live in their own TU at `-O2` (with `strtok` / `strtok_r` live in their own TU at `-O2` (with
`__attribute__((noinline))` on `strtok_r` so the strtok() wrapper `__attribute__((noinline))` on `strtok_r` so the strtok() wrapper
doesn't duplicate it). Multi-call strtok over "a,b,,c" works doesn't duplicate it). Multi-call strtok over "a,b,,c" works
end-to-end in smoke. Latent backend issue: at certain rodata end-to-end in smoke. The layout-sensitive miscompile that
layouts, -O2 strtok_r's BB0_7 inner CMP loop miscompiles due to previously haunted strtok_r's inner CMP loop has been fixed by
LICM/sink interaction; current smoke layout passes but adding modelling `Uses=[P]` on the conditional branches (the LICM/sink
bytes upstream (e.g. growing softDouble.o) can shift delim into interaction that elided "redundant" CMPs no longer fires); no
a failing address. Surgical workaround `-mllvm -disable-machine- surgical workaround flags needed.
sink` on strtok.c is documented; not currently applied because
smoke is green.
A small **RPN calculator** test (smoke #87) chains strtok, atol, A small **RPN calculator** test (smoke #87) chains strtok, atol,
push/pop over a static stack, snprintf "%ld", and strcmp to verify push/pop over a static stack, snprintf "%ld", and strcmp to verify
@ -203,20 +203,6 @@ sidecar bytes.
## Known issues / workarounds ## Known issues / workarounds
- **#70 FIXED**: greedy regalloc + W65816StackSlotCleanup Pass -2
was deleting an entry-side store to a slot that the loop body
read. Pass -2 collapses `LDAfi slotA; STAfi slotB; LDAfi slotC;
OPfi slotB` into `LDAfi slotC; OPfi slotA` (memory-to-memory copy
through A elimination), but didn't check whether slotB had other
refs in the function. In iterative qsort, slotB happened to be
the spill home for `hi` — the Pass -2 transform deleted the only
initialiser, leaving the loop body's `lda <hi-slot>, s` reading
garbage. Fix: function-wide `slotHasOtherRefs` safety check
before erasing the spill. `softDouble.c` still uses
`-mllvm -regalloc=fast` for `__muldf3`'s 64×64→128 multiply
(different greedy bug — register-pressure-driven, not
spill-deletion-driven).
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is - **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
negative as 16-bit unsigned. Worked around by `W65816NegYIndY` negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
@ -228,34 +214,98 @@ sidecar bytes.
address of one of its locals — the callee's `*p = v` will write address of one of its locals — the callee's `*p = v` will write
to the wrong bank. Documented; no compiler-side mitigation to the wrong bank. Documented; no compiler-side mitigation
beyond the existing DPF0 fake-physreg routing for the i64-return beyond the existing DPF0 fake-physreg routing for the i64-return
high half. high half. Workaround: inline pointer-arg helpers so the writes
stay in the caller's frame using stack-rel direct stores. The
W65816 only has three DBR-independent addressing modes
(abs_long, abs_long,X, [dp],Y) — none cheap to retrofit into
the current pointer-deref lowering (+5 bytes minimum per access).
Real fix needs PHB/PLB at noinline-pointer-callee entry/exit.
- **strtok -O2 layout-sensitive miscompile FIXED** — modelling ## Recently fixed
`Uses=[P]` on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/
BVS/BVC) made MachineCSE see the dependency between an earlier - **#70 — iterative qsort -O2 miscompile** — `W65816StackSlotCleanup`
CMP and the consuming Bxx, eliminating an entire class of Pass -2 was deleting a store to a slot the loop body read.
layout-sensitive flag-corruption bugs. Verified by sweeping Function-wide `slotHasOtherRefs` safety check added (Pass -1 and
`--rodata-base` from text-end to text-end+300 in 13 increments Pass -2c hardened with the same pattern). Iterative qsort at
— every layout returns the correct strtok result. plain -O2 + greedy now compiles correctly; the `optnone` workaround
As a follow-on, MachineCSE has been re-enabled (was previously in smoke #70 was removed.
disabled in `W65816TargetMachine::addMachineSSAOptimization` as
a workaround for the same root cause). - **strtok -O2 layout-sensitive miscompile** — modelling `Uses=[P]`
on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/BVS/BVC) made
MachineCSE / scheduler / LICM / sink see the CMP→Bxx flag
dependency. An entire class of layout-sensitive flag-corruption
bugs went away; verified by sweeping `--rodata-base` from text-end
to text-end+300 in 13 increments — every layout returns the correct
strtok result. As a follow-on, MachineCSE has been re-enabled
(was previously disabled in `W65816TargetMachine::addMachineSSAOpti­
mization` as a workaround for the same root cause).
- **link816 silently produced 4.3GB binaries** when `--rodata-base`
was set inside the text region. Now dies with a clear error:
`--rodata-base 0xX overlaps text 0xY+N (must start at or after 0xZ)`.
- **link816 BSS-relocate landed in IIgs Language Card area**
when text+rodata grew past $C000, link816 placed BSS at $D000
(the LC1 area), where IIgs-by-default maps ROM (writes drop
silently, reads return ROM bytes). Globals never initialised;
caught by the expression-parser smoke (#92) when adding rand /
strnlen / etc. pushed the runtime past that threshold. Two-part
fix: crt0 now enables LC1 RAM via the standard `lda $C083`
read-twice trick at startup, and link816 hard-fails (rather
than silently corrupt) if BSS would exceed the LC1 ceiling
($E000) — past that you'd need crt0 to also enable LC2 / shadow
RAM, which we haven't wired up.
- **STZ peephole multi-STA latent miscompile** — AsmPrinter's
`LDA #0; STA $g` -> `STZ $g` peephole eliminated the LDA but
only consumed the FIRST `STA`. When SDAG-CSE shared one
`LDA #0` across multiple `STA`s (`g16=0; g32=0;` is one IR
shape), trailing `STA`s read whatever was in A on entry —
silently corrupting any global where A wasn't 0 at function
entry. Smoke happened to pass because A was 0 by luck in
every covered path. Fixed by gating the peephole on the
consuming `STA` killing A (regalloc only sets `killed` on the
last reader); smoke #98 added to lock the multi-STA case.
- **PEI AsmPrinter peephole** — new: `LDA $dp; PHA` -> `PEI $dp`
saves 1 byte and avoids touching A. Fires on the
`copyPhysReg(A=DPF0); PUSH16` pattern (i64-libcall return-value
forwarding into the next call's stacked args), which appears
in every chained soft-double / soft-int64 expression. Saves
68 bytes across the runtime (-64 in math.o alone). Same
next-instruction-modifies-A safety check as the PEA peephole.
Smoke #99 added.
- **PEA peephole opcode-allowlist replaced with `modifiesRegister`**
the next-after-PUSH16 check that gates the PEA peephole was a
hand-curated list of opcodes that obviously redefine A; switched
to `MachineInstr::modifiesRegister(A, TRI)` which also catches
implicit-defs (e.g. JSL clobbering A as part of the call ABI).
Saves a few bytes and is more robust.
- **libgcc.s `lda #0; sta $XX` -> `stz $XX`** — 7 sites converted
in libgcc.s after STZ landed in the assembler. Saves 28 bytes;
also removes two PHA/PLA save-restore wraps around the LDA #0
(STZ doesn't touch A, so the wraps are unnecessary).
## What's still needed for a "ship-ready" toolchain ## What's still needed for a "ship-ready" toolchain
- **softDouble.c -O1 hold-out**`__muldf3`'s 64×64→128 multiply - **softDouble.c -O1 hold-out**`__muldf3`'s u64 lifetime pressure
with inlined alignment shifts overflows the greedy register overflows the greedy register allocator at -O2 ("ran out of
allocator at -O2 ("ran out of registers during register registers during register allocation"). Builds correctly at
allocation"). Builds correctly at -O1 (replaces the previous -O1. Investigated: marking dpack noinline reduces pressure but
-O2 + -mllvm -regalloc=fast workaround; -O1 is smaller and isn't enough; making dclass noinline would unblock -O2 (verified)
doesn't require the non-default flag). but the (d,s),y-uses-DBR bug then corrupts dclass's pointer-arg
writes when a caller has switched DBR (caught by smoke's
dmul-after-bank-switch test). Real fix is gated on the broader
DBR-pointer-deref limitation listed above.
- **More of the C standard library**: real `<stdio.h>` file I/O - **More of the C standard library**: real `<stdio.h>` file I/O
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs (`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
returning success/zero) — would need a memory-backed FS or a returning success/zero) — would need a memory-backed FS or a
MAME hook; `<locale.h>` / `<wchar.h>` if any real-world code MAME hook. `<locale.h>` / `<signal.h>` are stubbed (compile and
needs them. return safe defaults); `<wchar.h>` / `<time.h>` mostly absent.
- **C++ runtime support**: vtable layout for multiple inheritance, - **C++ runtime support**: vtable layout for multiple inheritance,
RTTI, exceptions (or a documented `-fno-exceptions` requirement). RTTI, exceptions (or a documented `-fno-exceptions` requirement).

View file

@ -41,11 +41,12 @@ cc "$SRC/extras.c"
cc "$SRC/strtok.c" cc "$SRC/strtok.c"
cc "$SRC/math.c" cc "$SRC/math.c"
cc "$SRC/softFloat.c" cc "$SRC/softFloat.c"
# softDouble.c builds at -O1 instead of -O2: __muldf3's 64x64 -> 128 # softDouble.c builds at -O1: __muldf3's u64 live-range pressure
# mul + inlined alignment shifts overflows the greedy allocator on # overflows the greedy allocator at -O2. dpack is already noinline
# the single-A target ("ran out of registers during register # to reduce pressure, but dclass MUST stay inline (its pointer-arg
# allocation"). -O1 produces correct + smaller code than the # writes from a noinline boundary would lower to `sta (d,s),y` which
# previous -O2 + -regalloc=fast workaround. # uses DBR for the bank — silently corrupted under DBR != 0, caught
# by the dmul-after-bank-switch test). -O1 sidesteps this.
cc "$SRC/softDouble.c" -O1 cc "$SRC/softDouble.c" -O1
echo "runtime built: $(ls -1 "$OUT"/*.o | wc -l) objects" echo "runtime built: $(ls -1 "$OUT"/*.o | wc -l) objects"

View file

@ -10,6 +10,9 @@ int isspace(int c);
int isxdigit(int c); int isxdigit(int c);
int isprint(int c); int isprint(int c);
int ispunct(int c); int ispunct(int c);
int iscntrl(int c);
int isgraph(int c);
int isblank(int c);
int toupper(int c); int toupper(int c);
int tolower(int c); int tolower(int c);

View file

@ -0,0 +1,54 @@
// Minimal inttypes.h for the W65816 runtime. Pulls in stdint.h's
// fixed-width types and adds the printf/scanf format-string macros
// for those types. Standalone (does not include the host clang's
// inttypes.h, which pulls in glibc headers and breaks the build).
#ifndef _INTTYPES_H
#define _INTTYPES_H
#include <stdint.h>
// (strtoimax / strtoumax not implemented — runtime has strtol /
// strtoul for the 32-bit forms which cover the common needs.)
// PRIxN format macros. `int` is 16-bit on W65816, `long` is 32,
// `long long` is 64.
#define PRId8 "d"
#define PRIi8 "i"
#define PRIo8 "o"
#define PRIu8 "u"
#define PRIx8 "x"
#define PRIX8 "X"
#define PRId16 "d"
#define PRIi16 "i"
#define PRIo16 "o"
#define PRIu16 "u"
#define PRIx16 "x"
#define PRIX16 "X"
#define PRId32 "ld"
#define PRIi32 "li"
#define PRIo32 "lo"
#define PRIu32 "lu"
#define PRIx32 "lx"
#define PRIX32 "lX"
#define PRId64 "lld"
#define PRIi64 "lli"
#define PRIo64 "llo"
#define PRIu64 "llu"
#define PRIx64 "llx"
#define PRIX64 "llX"
#define PRIdMAX PRId64
#define PRIuMAX PRIu64
#define PRIxMAX PRIx64
#define PRIdPTR PRId16
#define PRIiPTR PRIi16
#define PRIuPTR PRIu16
#define PRIxPTR PRIx16
#endif

39
runtime/include/limits.h Normal file
View file

@ -0,0 +1,39 @@
// Minimal limits.h for the W65816 runtime. Standalone (does not
// include the host clang's limits.h, which pulls in glibc headers
// and breaks the build). Sizes per the W65816 backend's view:
// char = 8 bits
// short = 16 bits
// int = 16 bits
// long = 32 bits
// long long = 64 bits
#ifndef _LIMITS_H
#define _LIMITS_H
#define CHAR_BIT 8
#define MB_LEN_MAX 1
#define SCHAR_MIN (-128)
#define SCHAR_MAX 127
#define UCHAR_MAX 255
// `char` is signed by default on this target.
#define CHAR_MIN SCHAR_MIN
#define CHAR_MAX SCHAR_MAX
#define SHRT_MIN (-32768)
#define SHRT_MAX 32767
#define USHRT_MAX 65535U
#define INT_MIN (-32768)
#define INT_MAX 32767
#define UINT_MAX 65535U
#define LONG_MIN (-2147483647L - 1)
#define LONG_MAX 2147483647L
#define ULONG_MAX 4294967295UL
#define LLONG_MIN (-9223372036854775807LL - 1)
#define LLONG_MAX 9223372036854775807LL
#define ULLONG_MAX 18446744073709551615ULL
#endif

40
runtime/include/locale.h Normal file
View file

@ -0,0 +1,40 @@
// Minimal locale.h for the W65816 runtime. No real locale support —
// just enough to let portable code compile. setlocale() always
// returns "C" and ignores its argument; localeconv() returns a
// pointer to a fixed C-locale struct.
#ifndef _LOCALE_H
#define _LOCALE_H
struct lconv {
char *decimal_point;
char *thousands_sep;
char *grouping;
char *int_curr_symbol;
char *currency_symbol;
char *mon_decimal_point;
char *mon_thousands_sep;
char *mon_grouping;
char *positive_sign;
char *negative_sign;
char int_frac_digits;
char frac_digits;
char p_cs_precedes;
char p_sep_by_space;
char n_cs_precedes;
char n_sep_by_space;
char p_sign_posn;
char n_sign_posn;
};
#define LC_ALL 0
#define LC_COLLATE 1
#define LC_CTYPE 2
#define LC_MONETARY 3
#define LC_NUMERIC 4
#define LC_TIME 5
char *setlocale(int category, const char *locale);
struct lconv *localeconv(void);
#endif

27
runtime/include/signal.h Normal file
View file

@ -0,0 +1,27 @@
// Minimal signal.h for the W65816 runtime. No real signal handling
// — IIgs has no concept of POSIX signals. signal() always returns
// SIG_ERR; raise() always returns -1. These exist so portable code
// (e.g. asserts that map abort() through raise(SIGABRT)) compiles.
#ifndef _SIGNAL_H
#define _SIGNAL_H
typedef int sig_atomic_t;
typedef void (*__sighandler_t)(int);
#define SIG_DFL ((__sighandler_t)0)
#define SIG_IGN ((__sighandler_t)1)
#define SIG_ERR ((__sighandler_t)-1)
#define SIGINT 2
#define SIGILL 4
#define SIGABRT 6
#define SIGFPE 8
#define SIGSEGV 11
#define SIGTERM 15
__sighandler_t signal(int sig, __sighandler_t handler);
int raise(int sig);
#endif

17
runtime/include/stddef.h Normal file
View file

@ -0,0 +1,17 @@
// Minimal stddef.h for the W65816 runtime. Standalone (does not
// include the host clang's stddef.h).
#ifndef _STDDEF_H
#define _STDDEF_H
typedef unsigned int size_t;
typedef int ptrdiff_t;
typedef int wchar_t; // not really wide-char-supported
#ifndef NULL
# define NULL ((void *)0)
#endif
#define offsetof(t, m) __builtin_offsetof(t, m)
#endif

78
runtime/include/stdint.h Normal file
View file

@ -0,0 +1,78 @@
// Minimal stdint.h for the W65816 runtime. Standalone (does not
// include the host clang's stdint.h, which pulls in glibc headers
// and breaks the build). Sizes per the W65816 backend's view:
// char = 8 bits
// short = 16 bits
// int = 16 bits
// long = 32 bits
// long long = 64 bits
#ifndef _STDINT_H
#define _STDINT_H
typedef signed char int8_t;
typedef unsigned char uint8_t;
typedef short int16_t;
typedef unsigned short uint16_t;
typedef long int32_t;
typedef unsigned long uint32_t;
typedef long long int64_t;
typedef unsigned long long uint64_t;
typedef int8_t int_least8_t;
typedef uint8_t uint_least8_t;
typedef int16_t int_least16_t;
typedef uint16_t uint_least16_t;
typedef int32_t int_least32_t;
typedef uint32_t uint_least32_t;
typedef int64_t int_least64_t;
typedef uint64_t uint_least64_t;
typedef int16_t int_fast8_t;
typedef uint16_t uint_fast8_t;
typedef int16_t int_fast16_t;
typedef uint16_t uint_fast16_t;
typedef int32_t int_fast32_t;
typedef uint32_t uint_fast32_t;
typedef int64_t int_fast64_t;
typedef uint64_t uint_fast64_t;
typedef int16_t intptr_t; // pointers are 16-bit on W65816
typedef uint16_t uintptr_t;
typedef int64_t intmax_t;
typedef uint64_t uintmax_t;
#define INT8_MIN (-0x7F - 1)
#define INT8_MAX 0x7F
#define UINT8_MAX 0xFFU
#define INT16_MIN (-0x7FFF - 1)
#define INT16_MAX 0x7FFF
#define UINT16_MAX 0xFFFFU
#define INT32_MIN (-0x7FFFFFFFL - 1)
#define INT32_MAX 0x7FFFFFFFL
#define UINT32_MAX 0xFFFFFFFFUL
#define INT64_MIN (-0x7FFFFFFFFFFFFFFFLL - 1)
#define INT64_MAX 0x7FFFFFFFFFFFFFFFLL
#define UINT64_MAX 0xFFFFFFFFFFFFFFFFULL
#define INTPTR_MIN INT16_MIN
#define INTPTR_MAX INT16_MAX
#define UINTPTR_MAX UINT16_MAX
#define INTMAX_MIN INT64_MIN
#define INTMAX_MAX INT64_MAX
#define UINTMAX_MAX UINT64_MAX
#define INT8_C(v) v
#define UINT8_C(v) v ## U
#define INT16_C(v) v
#define UINT16_C(v) v ## U
#define INT32_C(v) v ## L
#define UINT32_C(v) v ## UL
#define INT64_C(v) v ## LL
#define UINT64_C(v) v ## ULL
#define INTMAX_C(v) v ## LL
#define UINTMAX_C(v) v ## ULL
#endif

View file

@ -37,4 +37,14 @@ void clearerr(FILE *stream);
#define SEEK_CUR 1 #define SEEK_CUR 1
#define SEEK_END 2 #define SEEK_END 2
#define EOF (-1)
// Input stubs. Real implementations would route through GS/OS
// console I/O; current impl in libc.c returns EOF / 0.
int getchar(void);
int fgetc(FILE *stream);
char *fgets(char *buf, int n, FILE *stream);
int ungetc(int c, FILE *stream);
#define getc(s) fgetc(s)
#endif #endif

View file

@ -31,4 +31,8 @@ int atexit(__atexit_fn fn);
#define EXIT_SUCCESS 0 #define EXIT_SUCCESS 0
#define EXIT_FAILURE 1 #define EXIT_FAILURE 1
#define RAND_MAX 0x7FFF
int rand(void);
void srand(unsigned int seed);
#endif #endif

View file

@ -10,10 +10,13 @@ int memcmp(const void *a, const void *b, size_t n);
void *memchr(const void *s, int c, size_t n); void *memchr(const void *s, int c, size_t n);
size_t strlen(const char *s); size_t strlen(const char *s);
size_t strnlen(const char *s, size_t maxlen);
char *strcpy(char *dst, const char *src); char *strcpy(char *dst, const char *src);
char *strncpy(char *dst, const char *src, size_t n); char *strncpy(char *dst, const char *src, size_t n);
int strcmp(const char *a, const char *b); int strcmp(const char *a, const char *b);
int strncmp(const char *a, const char *b, size_t n); int strncmp(const char *a, const char *b, size_t n);
int strcasecmp(const char *a, const char *b);
int strncasecmp(const char *a, const char *b, size_t n);
char *strchr(const char *s, int c); char *strchr(const char *s, int c);
char *strrchr(const char *s, int c); char *strrchr(const char *s, int c);
char *strstr(const char *haystack, const char *needle); char *strstr(const char *haystack, const char *needle);

View file

@ -41,6 +41,21 @@ __start:
lda #0x0fff lda #0x0fff
tcs tcs
; Enable Language Card 1 RAM at $D000-$DFFF for read+write.
; By default the IIgs maps that range to ROM (read-only). Two
; reads of $C083 enable RAM-bank-1, second read also enables
; writes. Without this, BSS auto-relocated past $C000 lands on
; ROM and globals never initialise (writes drop on the floor;
; reads return ROM bytes). Caught by the expression-parser
; smoke test (#92) when runtime growth pushed bss past $BFFF.
; The reads must be 8-bit (one byte at a time) — a 16-bit M
; read at $C083 would also touch $C084 (a different soft
; switch), wiping the LC enable we just set.
sep #0x20
lda 0xc083
lda 0xc083
rep #0x20
; Zero BSS. X iterates from __bss_start to __bss_end; each ; Zero BSS. X iterates from __bss_start to __bss_end; each
; iteration writes one byte of zero at addr X (via DP=0 + ; iteration writes one byte of zero at addr X (via DP=0 +
; offset 0 — which is just X). Wraps in 8-bit M for the ; offset 0 — which is just X). Wraps in 8-bit M for the

View file

@ -67,6 +67,70 @@ long long llabs(long long n) {
} }
// strnlen: like strlen but capped at maxlen. Useful for safely
// measuring strings that may not be NUL-terminated within a known
// buffer size.
size_t strnlen(const char *s, size_t maxlen) {
size_t n = 0;
while (n < maxlen && s[n]) {
n++;
}
return n;
}
static int toLowerByte(int c) {
if (c >= 'A' && c <= 'Z') {
return c - 'A' + 'a';
}
return c;
}
int strcasecmp(const char *a, const char *b) {
while (*a && *b) {
int da = toLowerByte((unsigned char)*a);
int db = toLowerByte((unsigned char)*b);
if (da != db) {
return da - db;
}
a++;
b++;
}
return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b);
}
int strncasecmp(const char *a, const char *b, size_t n) {
while (n && *a && *b) {
int da = toLowerByte((unsigned char)*a);
int db = toLowerByte((unsigned char)*b);
if (da != db) {
return da - db;
}
a++;
b++;
n--;
}
if (!n) return 0;
return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b);
}
// Linear congruential generator — Numerical Recipes constants.
// Returns 15-bit values (RAND_MAX = 0x7FFF) per C standard convention.
static unsigned long randSeed = 1;
void srand(unsigned int seed) {
randSeed = seed;
}
int rand(void) {
randSeed = randSeed * 1103515245UL + 12345UL;
return (int)((randSeed >> 16) & 0x7FFF);
}
// ----- additional string.h ---------------------------------------------- // ----- additional string.h ----------------------------------------------
static int inSet(char c, const char *set) { static int inSet(char c, const char *set) {

View file

@ -119,6 +119,9 @@ int isxdigit(int c) {
} }
int isprint(int c) { return c >= 0x20 && c < 0x7f; } int isprint(int c) { return c >= 0x20 && c < 0x7f; }
int ispunct(int c) { return isprint(c) && !isalnum(c) && c != ' '; } int ispunct(int c) { return isprint(c) && !isalnum(c) && c != ' '; }
int iscntrl(int c) { return (c >= 0 && c < 0x20) || c == 0x7f; }
int isgraph(int c) { return isprint(c) && c != ' '; }
int isblank(int c) { return c == ' ' || c == '\t'; }
int toupper(int c) { return islower(c) ? c - 32 : c; } int toupper(int c) { return islower(c) ? c - 32 : c; }
int tolower(int c) { return isupper(c) ? c + 32 : c; } int tolower(int c) { return isupper(c) ? c + 32 : c; }
@ -160,6 +163,21 @@ int puts(const char *s) {
return 0; return 0;
} }
// ---- input stubs ----
//
// Real input would route through GS/OS console / event handling.
// These return EOF / NULL so user code that calls them links and
// gets predictable end-of-input behaviour. FILE struct is defined
// further down (alongside fopen etc.) — forward-declare for the
// signatures.
struct __sFILE;
int getchar(void) { return -1; /* EOF */ }
int fgetc(struct __sFILE *s) { (void)s; return -1; }
char *fgets(char *b, int n, struct __sFILE *s) {
(void)b; (void)n; (void)s; return (char *)0;
}
int ungetc(int c, struct __sFILE *s) { (void)c; (void)s; return -1; }
// ---- minimal printf ---- // ---- minimal printf ----
// Re-declare va_list / va_* locally rather than including stdarg.h — // Re-declare va_list / va_* locally rather than including stdarg.h —
@ -617,3 +635,76 @@ long ftell(FILE *stream) {
int feof(FILE *stream) { (void)stream; return 1; } int feof(FILE *stream) { (void)stream; return 1; }
int ferror(FILE *stream) { (void)stream; return 0; } int ferror(FILE *stream) { (void)stream; return 0; }
void clearerr(FILE *stream) { (void)stream; } void clearerr(FILE *stream) { (void)stream; }
// ---- locale.h stubs ----
//
// No real locale support — IIgs is single-locale. setlocale always
// returns "C", localeconv returns a fixed C-locale struct. These
// are stubs so portable code that calls setlocale("") for diagnostic
// purposes compiles and runs.
struct lconv {
char *decimal_point;
char *thousands_sep;
char *grouping;
char *int_curr_symbol;
char *currency_symbol;
char *mon_decimal_point;
char *mon_thousands_sep;
char *mon_grouping;
char *positive_sign;
char *negative_sign;
char int_frac_digits;
char frac_digits;
char p_cs_precedes;
char p_sep_by_space;
char n_cs_precedes;
char n_sep_by_space;
char p_sign_posn;
char n_sign_posn;
};
static struct lconv __c_lconv = {
(char *)".", // decimal_point
(char *)"", // thousands_sep
(char *)"", // grouping
(char *)"", // int_curr_symbol
(char *)"", // currency_symbol
(char *)"", // mon_decimal_point
(char *)"", // mon_thousands_sep
(char *)"", // mon_grouping
(char *)"", // positive_sign
(char *)"", // negative_sign
(char)127, // int_frac_digits (CHAR_MAX = "unspecified")
(char)127, // frac_digits
(char)127, (char)127, (char)127, (char)127, (char)127, (char)127,
};
char *setlocale(int category, const char *locale) {
(void)category; (void)locale;
return (char *)"C";
}
struct lconv *localeconv(void) {
return &__c_lconv;
}
// ---- signal.h stubs ----
//
// IIgs has no POSIX-style signal model. signal() always fails (returns
// SIG_ERR); raise() returns -1. Code that uses these for diagnostic
// fall-through (e.g. abort -> raise(SIGABRT) -> stub) compiles and
// behaves as "signals disabled".
typedef void (*__sighandler_t)(int);
#define _SIG_ERR ((__sighandler_t)-1)
__sighandler_t signal(int sig, __sighandler_t handler) {
(void)sig; (void)handler;
return _SIG_ERR;
}
int raise(int sig) {
(void)sig;
return -1;
}

View file

@ -60,8 +60,7 @@ __mulhi3:
sta 0xe0 ; multiplier sta 0xe0 ; multiplier
lda 0x4, s lda 0x4, s
sta 0xe2 ; multiplicand sta 0xe2 ; multiplicand
lda #0x0 stz 0xe4 ; running product = 0
sta 0xe4 ; running product
.Lmul_loop: .Lmul_loop:
lda 0xe0 lda 0xe0
beq .Lmul_done beq .Lmul_done
@ -225,12 +224,9 @@ __modhi3:
; Uses JSR/RTS, same bank. ; Uses JSR/RTS, same bank.
; -------------------------------------------------------------------- ; --------------------------------------------------------------------
__divmod_setup: __divmod_setup:
; Sign tracker. We don't have STZ in our instruction set yet, so ; Sign tracker. STZ doesn't touch A — preserves the value
; clear via PHA/LDA #0/STA/PLA to avoid trashing A. ; we still need below.
pha stz 0xee
lda #0x0
sta 0xee
pla
; Dividend sign + abs value. ; Dividend sign + abs value.
cmp #0x8000 cmp #0x8000
bcc .Lset_a_pos bcc .Lset_a_pos
@ -269,9 +265,8 @@ __divmod_setup:
; outputs quotient at $ea, remainder at $ec. JSR/RTS local helper. ; outputs quotient at $ea, remainder at $ec. JSR/RTS local helper.
; -------------------------------------------------------------------- ; --------------------------------------------------------------------
__udivmod_core: __udivmod_core:
lda #0x0 stz 0xea
sta 0xea stz 0xec
sta 0xec
ldx #0x10 ldx #0x10
.Lcore_loop: .Lcore_loop:
asl 0xe6 asl 0xe6
@ -327,9 +322,8 @@ __mulsi3:
lda 0x6, s lda 0x6, s
sta 0xe6 sta 0xe6
; Clear running product at $e8/$ea. ; Clear running product at $e8/$ea.
lda #0x0 stz 0xe8
sta 0xe8 stz 0xea
sta 0xea
; Loop 32 times: examine LSB of multiplier, conditionally add ; Loop 32 times: examine LSB of multiplier, conditionally add
; multiplicand to product, then shift multiplier right and ; multiplicand to product, then shift multiplier right and
; multiplicand left. Use Y as a 16-bit counter (X mode = 16). ; multiplicand left. Use Y as a 16-bit counter (X mode = 16).
@ -456,10 +450,9 @@ __ashrsi3:
; JSR/RTS local helper. ; JSR/RTS local helper.
; -------------------------------------------------------------------- ; --------------------------------------------------------------------
__udivmodsi_core: __udivmodsi_core:
lda #0x0 stz 0xe8
sta 0xe8 stz 0xea
sta 0xea stz 0xec
sta 0xec
sta 0xee sta 0xee
ldy #0x20 ldy #0x20
.Lcoresi_loop: .Lcoresi_loop:
@ -588,11 +581,8 @@ __modsi3:
; (8,s)=b_hi. ; (8,s)=b_hi.
; -------------------------------------------------------------------- ; --------------------------------------------------------------------
__divmodsi_setup: __divmodsi_setup:
; Clear sign tracker. ; Clear sign tracker. STZ preserves A.
pha stz 0xf0
lda #0x0
sta 0xf0
pla
; |a|: A=a_lo, X=a_hi. Save them first (we need a_hi for sign test). ; |a|: A=a_lo, X=a_hi. Save them first (we need a_hi for sign test).
sta 0xe0 ; tentative a_lo (may negate below) sta 0xe0 ; tentative a_lo (may negate below)
stx 0xe2 ; tentative a_hi stx 0xe2 ; tentative a_hi
@ -805,11 +795,10 @@ __ashrdi3:
__muldi3: __muldi3:
jsr __divmoddi4_stash jsr __divmoddi4_stash
; Clear product P0..P3 at $F2..$F8. ; Clear product P0..P3 at $F2..$F8.
lda #0x0 stz 0xf2
sta 0xf2 stz 0xf4
sta 0xf4 stz 0xf6
sta 0xf6 stz 0xf8
sta 0xf8
; Loop 64 times on a's bits. ; Loop 64 times on a's bits.
ldy #0x40 ldy #0x40
.Lmuldi_loop: .Lmuldi_loop:
@ -975,11 +964,10 @@ __umoddi3:
; Output: quotient at $E0..$E6, remainder at $F2..$F8. ; Output: quotient at $E0..$E6, remainder at $F2..$F8.
__udivmoddi_core: __udivmoddi_core:
; Clear remainder $F2..$F8. ; Clear remainder $F2..$F8.
lda #0x0 stz 0xf2
sta 0xf2 stz 0xf4
sta 0xf4 stz 0xf6
sta 0xf6 stz 0xf8
sta 0xf8
ldy #0x40 ldy #0x40
.Ludivmoddi_loop: .Ludivmoddi_loop:
; Shift left: dividend (becomes quotient) and remainder together ; Shift left: dividend (becomes quotient) and remainder together

View file

@ -22,7 +22,12 @@ typedef unsigned char u8;
#define DEXP_SHIFT 52 #define DEXP_SHIFT 52
#define DEXP_BIAS 1023 #define DEXP_BIAS 1023
static inline u64 dpack(u64 sign, s16 exp, u64 mant) { // noinline: keeps register pressure in the callers (esp. __muldf3)
// low enough for greedy regalloc to allocate at -O2. Without this,
// __muldf3 fails with "ran out of registers during register
// allocation" — too many concurrent u64 lifetimes (sa, sb, ma, mb,
// sr, mr) and the dpack inline blew it past the spill capacity.
__attribute__((noinline)) static u64 dpack(u64 sign, s16 exp, u64 mant) {
if (mant == 0) return sign; if (mant == 0) return sign;
u64 e = (u64)(exp + DEXP_BIAS); u64 e = (u64)(exp + DEXP_BIAS);
if (e >= 2047) { if (e >= 2047) {
@ -38,6 +43,11 @@ static inline u64 dpack(u64 sign, s16 exp, u64 mant) {
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit. // Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN. // Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
// Inlinable on purpose — out_sign/out_exp/out_mant point at caller
// stack locals; if dclass were noinline the writes would lower to
// `sta (d,s),y` which uses DBR for the bank, silently corrupting
// data when the caller has switched DBR. Caught by smoke's
// dmul-after-bank-switch test (#dmul-bank-switch).
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) { static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
*out_sign = x & DSIGN_BIT; *out_sign = x & DSIGN_BIT;
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF); s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);

View file

@ -83,7 +83,11 @@ if [ -x "$LLVM_MC" ]; then
sta 0x1000 sta 0x1000
sta 0x010000 sta 0x010000
mvn 0x01, 0x02 mvn 0x01, 0x02
jsl 0x012345' jsl 0x012345
lda 0x123456, x
sta 0xabcdef, x
stz 0xe2
stz 0x1234'
mcOut="$(printf '%s\n' "$mcInput" | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)" mcOut="$(printf '%s\n' "$mcInput" | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)"
assertHas() { assertHas() {
@ -103,6 +107,27 @@ if [ -x "$LLVM_MC" ]; then
assertHas "[0x8f,0x00,0x00,0x01]" assertHas "[0x8f,0x00,0x00,0x01]"
assertHas "[0x54,0x01,0x02]" assertHas "[0x54,0x01,0x02]"
assertHas "[0x22,0x45,0x23,0x01]" assertHas "[0x22,0x45,0x23,0x01]"
# abs_long,X (DBR-independent X-indexed access — used by future
# DBR-safe pointer-deref lowering)
assertHas "[0xbf,0x56,0x34,0x12]"
assertHas "[0x9f,0xef,0xcd,0xab]"
# STZ (store zero) — saves a byte vs `LDA #0; STA dp` for zeroing
# DP scratch slots (used by the upcoming [dp],Y bank-byte
# invariant for DBR-safe pointer derefs).
assertHas "[0x64,0xe2]"
assertHas "[0x9c,0x34,0x12]"
# WDM / TRB / TSB / PEI — useful 65816 instructions added for
# MAME debug hooks (WDM), atomic memory bit ops on hardware
# registers (TRB/TSB), and indirect data push (PEI).
extOut="$(printf '\twdm 0xab\n\ttrb 0x1234\n\ttsb 0x10\n\tpei 0xe0\n' \
| "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)"
for enc in "[0x42,0xab]" "[0x1c,0x34,0x12]" "[0x04,0x10]" "[0xd4,0xe0]"; do
if ! printf '%s\n' "$extOut" | grep -qF "$enc"; then
warn "missing extended-opcode encoding: $enc"
printf '%s\n' "$extOut" >&2
die "llvm-mc did not produce expected extended-opcode encoding"
fi
done
else else
warn "llvm-mc not built; skipping MC round-trip check" warn "llvm-mc not built; skipping MC round-trip check"
fi fi
@ -843,6 +868,91 @@ EOF
# function. STA8abs in AsmPrinter must wrap with SEP/REP when # function. STA8abs in AsmPrinter must wrap with SEP/REP when
# UsesAcc8 is false; bare `sta g+N` in M=0 writes 2 bytes and # UsesAcc8 is false; bare `sta g+N` in M=0 writes 2 bytes and
# corrupts the next global. # corrupts the next global.
log "check: clang lowers 'g = 0' to single STZ via AsmPrinter peephole"
cStzFile="$(mktemp --suffix=.c)"
sStzFile="$(mktemp --suffix=.s)"
cat > "$cStzFile" <<'EOF'
unsigned short g;
void zero(void) { g = 0; }
EOF
"$CLANG" --target=w65816 -O2 -S "$cStzFile" -o "$sStzFile"
# Should see exactly one `stz g` and zero `lda #0` in the function.
if ! grep -qE '^\s*stz\s+g\b' "$sStzFile"; then
warn "STZ peephole not firing"; cat "$sStzFile" >&2
die "expected 'stz g' in zero() but didn't find it"
fi
if grep -qE '^\s*lda\s+#0x0' "$sStzFile"; then
warn "STZ peephole left a redundant LDA #0"; cat "$sStzFile" >&2
die "STZ peephole should have eliminated the LDA #0"
fi
rm -f "$cStzFile" "$sStzFile"
# Multi-STA-from-shared-LDA: when SDAG CSE shares one `lda #0` across
# multiple `sta`s, the peephole MUST NOT fire on the first STA (would
# delete the LDA, leaving the remaining STAs reading dead A). Verify
# the LDA #0 is preserved and no STZ appears in this case.
log "check: STZ peephole skips when LDA #0 feeds multiple STAs"
cStzMultiFile="$(mktemp --suffix=.c)"
sStzMultiFile="$(mktemp --suffix=.s)"
cat > "$cStzMultiFile" <<'EOF'
unsigned short ga, gb, gc;
void zeroAll(void) { ga = 0; gb = 0; gc = 0; }
EOF
"$CLANG" --target=w65816 -O2 -S "$cStzMultiFile" -o "$sStzMultiFile"
if ! grep -qE '^\s*lda\s+#0x0' "$sStzMultiFile"; then
warn "STZ peephole over-eagerly deleted shared LDA #0"
cat "$sStzMultiFile" >&2
die "expected lda #0 preserved when feeding multiple STAs"
fi
n_sta=$(grep -cE '^\s*sta\s+g[abc]\b' "$sStzMultiFile")
if [ "$n_sta" -ne 3 ]; then
warn "expected 3 STA after shared LDA #0, found $n_sta"
cat "$sStzMultiFile" >&2
die "STZ peephole regressed on multi-STA case"
fi
rm -f "$cStzMultiFile" "$sStzMultiFile"
log "check: clang lowers 'foo(1,2,3)' constant args via PEA"
cPeaFile="$(mktemp --suffix=.c)"
sPeaFile="$(mktemp --suffix=.s)"
cat > "$cPeaFile" <<'EOF'
extern void foo(int a, int b, int c);
void caller(void) { foo(1, 2, 3); }
EOF
"$CLANG" --target=w65816 -O2 -S "$cPeaFile" -o "$sPeaFile"
# arg2 (c=3) and arg1 (b=2) are pushed via PEA. arg0 (a=1)
# stays in A. Expect at least 2 `pea` instructions and zero
# `pha` after a `lda #imm`.
n_pea=$(grep -cE '^\s*pea\s+' "$sPeaFile")
if [ "$n_pea" -lt 2 ]; then
warn "PEA peephole not firing on constant-arg pushes"
cat "$sPeaFile" >&2
die "expected >= 2 PEA in caller() but found $n_pea"
fi
rm -f "$cPeaFile" "$sPeaFile"
# PEI peephole: an i64-libcall return whose high half lives in
# DPF0 ($F0..$F1) is forwarded to the next call as a stacked arg.
# Pre-peephole shape: `lda $f0; pha`. Post-peephole: `pei $f0`,
# saving 1 byte and not touching A.
log "check: clang lowers DPF0 forwarding via PEI"
cPeiFile="$(mktemp --suffix=.c)"
sPeiFile="$(mktemp --suffix=.s)"
cat > "$cPeiFile" <<'EOF'
unsigned long long divmod(unsigned long long a, unsigned long long b);
unsigned long long use(unsigned long long x);
unsigned long long chain(unsigned long long a, unsigned long long b) {
return use(divmod(a, b));
}
EOF
"$CLANG" --target=w65816 -O2 -S "$cPeiFile" -o "$sPeiFile"
if ! grep -qE '^\s*pei\s+0xf0\b' "$sPeiFile"; then
warn "PEI peephole not firing on DPF0 forwarding"
cat "$sPeiFile" >&2
die "expected 'pei 0xf0' in chain() but didn't find it"
fi
rm -f "$cPeiFile" "$sPeiFile"
log "check: clang i8 store to global in M=0 mode is SEP/REP bracketed" log "check: clang i8 store to global in M=0 mode is SEP/REP bracketed"
cGlobFile="$(mktemp --suffix=.c)" cGlobFile="$(mktemp --suffix=.c)"
sGlobFile="$(mktemp --suffix=.s)" sGlobFile="$(mktemp --suffix=.s)"
@ -1104,9 +1214,6 @@ int toInt(double x) { return (int)x; }
double fromInt(int n) { return (double)n; } double fromInt(int n) { return (double)n; }
EOF EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile" "$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
# softDouble.c builds at -O1 (not -O2): __muldf3's 64x64 -> 128
# multiply with the inlined alignment shifts overflows the greedy
# allocator's spill heuristics on the single-A target at -O2.
"$CLANG" --target=w65816 -O1 -ffunction-sections \ "$CLANG" --target=w65816 -O1 -ffunction-sections \
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile" -c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \ "$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
@ -2176,7 +2283,12 @@ int main(void) {
if (r == 10 && eq(buf, "n=-42 s=hi")) ok |= 0x02; if (r == 10 && eq(buf, "n=-42 s=hi")) ok |= 0x02;
r = sprintf(buf, "%04x %lu", 0xC, (unsigned long)123456); r = sprintf(buf, "%04x %lu", 0xC, (unsigned long)123456);
if (r == 11 && eq(buf, "000c 123456")) ok |= 0x04; if (r == 11 && eq(buf, "000c 123456")) ok |= 0x04;
r = snprintf(buf, 6, "abcdefghij"); /* Test that snprintf truncates per C99: 10 chars asked, only 5 fit + NUL.
Funnel the format through a non-literal pointer so clang's
-Wformat-truncation static check doesn't fire (the truncation IS
what we're testing). */
const char *fmt_trunc = "abcdefghij";
r = snprintf(buf, 6, "%s", fmt_trunc);
if (r == 10 && eq(buf, "abcde")) ok |= 0x08; if (r == 10 && eq(buf, "abcde")) ok |= 0x08;
r = sprintf(buf, "%.2f", 1.5); r = sprintf(buf, "%.2f", 1.5);
if (r == 4 && eq(buf, "1.50")) ok |= 0x10; if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
@ -2302,6 +2414,46 @@ EOF
fi fi
rm -f "$cExFile" "$oExFile" "$binExFile" rm -f "$cExFile" "$oExFile" "$binExFile"
log "check: MAME runs rand/srand reproducible sequence (#93)"
cRdFile="$(mktemp --suffix=.c)"
oRdFile="$(mktemp --suffix=.o)"
binRdFile="$(mktemp --suffix=.bin)"
cat > "$cRdFile" <<'EOF'
extern int rand(void);
extern void srand(unsigned int);
__attribute__((noinline)) void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
int main(void) {
srand(1);
int r1 = rand();
int r2 = rand();
int r3 = rand();
// Same seed: must reproduce.
srand(1);
int r1b = rand();
int r2b = rand();
unsigned char ok = 0;
if (r1 != 0 && r1 == r1b) ok |= 0x01; // reproducible
if (r2 != 0 && r2 == r2b) ok |= 0x02; // reproducible
if (r1 != r2 && r2 != r3) ok |= 0x04; // diverse
if (r1 >= 0 && r1 <= 0x7FFF) ok |= 0x08; // RAND_MAX bound
switchToBank2();
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
while (1) {}
}
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
"$cRdFile" -o "$oRdFile"
"$PROJECT_ROOT/tools/link816" -o "$binRdFile" --text-base 0x1000 \
"$oCrt0F" "$oLibcF" "$oExtrasF" "$oSfF" "$oSdF" "$oLibgccFile" \
"$oRdFile" >/dev/null 2>&1
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binRdFile" --check \
0x025000=000f >/dev/null 2>&1; then
die "MAME: rand/srand sequence broken"
fi
rm -f "$cRdFile" "$oRdFile" "$binRdFile"
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)" log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)"
cTr2File="$(mktemp --suffix=.c)" cTr2File="$(mktemp --suffix=.c)"
oTr2File="$(mktemp --suffix=.o)" oTr2File="$(mktemp --suffix=.o)"
@ -3166,6 +3318,42 @@ EOF
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output" die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
fi fi
# stdint.h / stddef.h / limits.h / inttypes.h: standalone
# replacements for clang's bundled versions (which try to include
# glibc bits/* headers and break the build). Compile a small
# program that exercises the typedefs and the offsetof macro.
log "check: standalone runtime headers (stdint/stddef/limits/inttypes/locale/signal)"
cStdiFile="$(mktemp --suffix=.c)"
sStdiFile="$(mktemp --suffix=.s)"
cat > "$cStdiFile" <<'EOF'
#include <stdint.h>
#include <stddef.h>
#include <limits.h>
#include <inttypes.h>
#include <locale.h>
#include <signal.h>
struct S { uint8_t a; uint16_t b; uint32_t c; uint64_t d; };
int main(void) {
/* Touch typedefs / functions from each header so the build
catches missing symbols, not just absent files. */
uint64_t big = UINT64_C(0xDEADBEEFCAFE);
intmax_t maxv = INTMAX_MAX;
int i_max = INT_MAX;
size_t off = offsetof(struct S, c);
char *loc = setlocale(LC_ALL, "C");
struct lconv *lc = localeconv();
signal(SIGINT, SIG_IGN);
return (int)(off + i_max + (int)(big & 1) + (int)(maxv & 1)
+ (loc[0] - 'C') + lc->frac_digits);
}
EOF
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
-S "$cStdiFile" -o "$sStdiFile"
if [ ! -s "$sStdiFile" ]; then
die "standalone runtime headers compile failed"
fi
rm -f "$cStdiFile" "$sStdiFile"
# Linker exports the synthetic __bss_start / __bss_end / etc. # Linker exports the synthetic __bss_start / __bss_end / etc.
# symbols so crt0 can do BSS init and runtime malloc finds the # symbols so crt0 can do BSS init and runtime malloc finds the
# heap top. # heap top.
@ -3269,6 +3457,26 @@ EOF
fi fi
rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile" rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile"
log "check: link816 hard-fails when BSS would exceed LC1 ceiling (\$E000)"
# Force BSS to land past $E000 — link must reject with the LC1
# ceiling diagnostic (without crt0's LC2 RAM enable, that range
# silently corrupts).
cBigFile="$(mktemp --suffix=.c)"
oBigFile="$(mktemp --suffix=.o)"
binBssOFile="$(mktemp --suffix=.bin)"
cat > "$cBigFile" <<'EOF'
int main(void) { return 0; }
EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
if "$PROJECT_ROOT/tools/link816" -o "$binBssOFile" --text-base 0x1000 \
--bss-base 0xE100 "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err; then
die "link816 should have rejected --bss-base 0xE100 (above LC1 ceiling)"
fi
if ! grep -q 'exceeds bank-0 LC1 ceiling' /tmp/bsslink.err; then
die "link816 LC1-ceiling diagnostic missing: $(cat /tmp/bsslink.err)"
fi
rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err
# OMF emitter — wrap the linked binary as a single-segment OMF # OMF emitter — wrap the linked binary as a single-segment OMF
# file ready for IIgs loading. # file ready for IIgs loading.
log "check: omfEmit produces a valid OMF v2.1 single-segment file" log "check: omfEmit produces a valid OMF v2.1 single-segment file"

View file

@ -484,20 +484,37 @@ struct Linker {
// overflow the 0x2000 bss start, shift bss above them so // overflow the 0x2000 bss start, shift bss above them so
// crt0's bss-init doesn't zero loaded text bytes. Caller // crt0's bss-init doesn't zero loaded text bytes. Caller
// can still force a specific bssBase via --bss-base. // can still force a specific bssBase via --bss-base.
// The IIgs IO window at $C000-$CFFF is unusable; if loadEnd //
// would push bss into IO, jump above it to bank 1 ($10000+). // IIgs bank-0 hazard zones:
// $C000-$CFFF: IO and soft switches (ALWAYS unusable —
// reads/writes hit hardware registers).
// $D000-$DFFF: Language Card 1 area. Read-only ROM by
// default; crt0 enables LC1 RAM via the
// $C083 soft switch (read-twice trick) so
// BSS placed here is writable.
// $E000-$FFFF: bank-0 ROM area, also LC-switched but
// we don't enable it (less common need).
// Skip past the IO window if BSS would land there; LC1
// ($D000-$DFFF) IS now usable thanks to crt0's soft-switch
// enable. Above $DFFF means BSS exceeds 16-bit range —
// bail clearly rather than silently corrupt.
uint32_t loadEnd = L.initBase + L.initSize; uint32_t loadEnd = L.initBase + L.initSize;
L.bssBase = bssBase; L.bssBase = bssBase;
if (L.bssBase < loadEnd) { if (L.bssBase < loadEnd) {
// Page-align upward for nicer addresses in the map. // Page-align upward for nicer addresses in the map.
L.bssBase = (loadEnd + 0xFF) & ~0xFFu; L.bssBase = (loadEnd + 0xFF) & ~0xFFu;
// If bss would land in the IIgs IO window ($C000-$CFFF),
// skip past it to $D000. bss reads/writes via DBR=0
// would be intercepted by IO if we placed it there.
if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) { if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) {
L.bssBase = 0xD000; L.bssBase = 0xD000;
} }
} }
if (L.bssBase + L.bssSize > 0xE000) {
char msg[160];
std::snprintf(msg, sizeof(msg),
"bss [0x%X+%u] exceeds bank-0 LC1 ceiling 0xE000 — "
"shrink the runtime or split into bank 1",
L.bssBase, L.bssSize);
die(msg);
}
// Publish layout now so resolveSym() can read it during reloc // Publish layout now so resolveSym() can read it during reloc
// application (it's a const member that uses lastLayout). // application (it's a const member that uses lastLayout).
lastLayout = L; lastLayout = L;

View file

@ -41,12 +41,18 @@ public:
// Reset per-function state (defensive — SkipNextSepImm should // Reset per-function state (defensive — SkipNextSepImm should
// already be cleared by the next emitInstruction, but guarantee // already be cleared by the next emitInstruction, but guarantee
// it's not leaked across functions if a function ends mid-elision). // it's not leaked across functions if a function ends mid-elision).
void emitFunctionBodyEnd() override { SkipNextSepImm = -1; } void emitFunctionBodyEnd() override {
SkipNextSepImm = -1;
SkipNextStaAbs = false;
SkipNextPush16 = false;
}
// Reset on MBB entry too — labels emit before the MIs of a new MBB, // Reset on MBB entry too — labels emit before the MIs of a new MBB,
// and a stale flag from a previous MBB's last LDAi8imm could // and a stale flag from a previous MBB's last LDAi8imm could
// accidentally swallow the new MBB's first SEP. // accidentally swallow the new MBB's first SEP.
void emitBasicBlockStart(const MachineBasicBlock &MBB) override { void emitBasicBlockStart(const MachineBasicBlock &MBB) override {
SkipNextSepImm = -1; SkipNextSepImm = -1;
SkipNextStaAbs = false;
SkipNextPush16 = false;
AsmPrinter::emitBasicBlockStart(MBB); AsmPrinter::emitBasicBlockStart(MBB);
} }
@ -57,6 +63,14 @@ public:
// already left M=8, so the wrap's SEP would be a no-op. // already left M=8, so the wrap's SEP would be a no-op.
int SkipNextSepImm = -1; int SkipNextSepImm = -1;
// When true, the next STAabs is consumed (already replaced with STZ
// by the LDAi16imm-0 peephole).
bool SkipNextStaAbs = false;
// When true, the next PUSH16 is consumed (already replaced with PEA
// by the LDAi16imm + PUSH16 peephole).
bool SkipNextPush16 = false;
static char ID; static char ID;
}; };
@ -108,6 +122,26 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
SkipNextSepImm = -1; SkipNextSepImm = -1;
} }
// Drop the STAabs that the LDAi16imm-0 peephole replaced with STZ.
if (SkipNextStaAbs && !MI->isDebugInstr()) {
if (MI->getOpcode() == W65816::STAabs) {
SkipNextStaAbs = false;
return; // consume, already emitted as STZ
}
// Anything other than the expected STAabs cancels the elision.
SkipNextStaAbs = false;
}
// Drop the PUSH16 that the LDAi16imm + PUSH16 peephole replaced
// with PEA.
if (SkipNextPush16 && !MI->isDebugInstr()) {
if (MI->getOpcode() == W65816::PUSH16) {
SkipNextPush16 = false;
return; // consume, already emitted as PEA
}
SkipNextPush16 = false;
}
W65816MCInstLower MCInstLowering(OutContext, *this); W65816MCInstLower MCInstLowering(OutContext, *this);
// Expand codegen pseudos into their MC-layer realisations. Keep this // Expand codegen pseudos into their MC-layer realisations. Keep this
@ -193,6 +227,70 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
return; return;
} }
case W65816::LDAi16imm: { case W65816::LDAi16imm: {
// Peek at the next non-debug MI for two peephole patterns:
// (1) LDAi16imm 0 + STAabs $g -> STZ_Abs $g (saves 3B)
// (2) LDAi16imm $imm + PUSH16 -> PEA #$imm (saves 1B)
// Both replace the LDA+next pair with a single 3- or 4-byte op
// that achieves the same memory effect.
auto It = std::next(MI->getIterator());
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It;
bool IsZero = MI->getOperand(1).isImm() &&
MI->getOperand(1).getImm() == 0;
// STAabs operand layout: (value:Acc16, addr). Operand 0 is the A
// register; only fire when this STA kills A — otherwise dropping
// the LDA leaves a later A-consumer without its value. E.g. SDAG
// CSE'd `g16 = 0; g32 = 0;` shares one LDAi16imm 0 across multiple
// STAabs; only the LAST one kills A, so the peephole would have
// miscompiled the earlier two by deleting the LDA but not the
// remaining STAs.
if (IsZero && It != MI->getParent()->end() &&
It->getOpcode() == W65816::STAabs &&
It->getOperand(0).isReg() && It->getOperand(0).isKill()) {
MCInst Stz;
Stz.setOpcode(W65816::STZ_Abs);
Stz.addOperand(lowerOperand(It->getOperand(1), MCInstLowering));
EmitToStreamer(*OutStreamer, Stz);
SkipNextStaAbs = true;
return;
}
// PEA peephole: LDAi16imm + PUSH16 -> PEA. Safe iff A is dead
// after the PUSH16 — the next instruction must redefine A (so the
// value PUSH16 read is genuinely dead). We use modifiesRegister
// which handles both explicit defs and implicit-defs (e.g. JSL
// clobbers A as part of the calling convention). Falls through
// to a normal LDA #imm; PHA pair if A might be live afterward.
// Note: physreg use-kill flags on PUSH16's implicit-$a are not
// reliably set at AsmPrinter time, so we can't gate on them
// directly; checking the next instruction's def-set is robust.
if (It != MI->getParent()->end() && It->getOpcode() == W65816::PUSH16) {
auto It2 = std::next(It);
while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2;
bool ADead = false;
if (It2 != MI->getParent()->end()) {
const TargetRegisterInfo *TRI =
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
if (It2->modifiesRegister(W65816::A, TRI))
ADead = true;
} else {
// PUSH16 is the last instruction in the BB. A is dead at
// BB exit iff it's not live-out. Check the BB's live-out
// set via successors; if no successor lists A as live-in,
// it's safe. Conservative: treat as not-dead (skip peephole).
// This case is uncommon — the PUSH chain almost always feeds
// a JSL within the same BB.
}
if (ADead) {
MCInst Pea;
Pea.setOpcode(W65816::PEA);
Pea.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering));
EmitToStreamer(*OutStreamer, Pea);
SkipNextPush16 = true;
return;
}
}
MCInst Lda; MCInst Lda;
Lda.setOpcode(W65816::LDA_Imm16); Lda.setOpcode(W65816::LDA_Imm16);
Lda.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering)); Lda.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering));
@ -252,6 +350,40 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
EmitToStreamer(*OutStreamer, Sta); EmitToStreamer(*OutStreamer, Sta);
return; return;
} }
case W65816::LDA_DP: {
// PEI peephole: LDA_DP $dp + PUSH16 -> PEI $dp. PEI pushes the
// 16-bit value at direct page $dp directly onto the stack without
// touching A, saving 1 byte (PEI=2B vs LDA_DP+PHA=3B). Safe iff
// A is dead after the PUSH16, same as the LDAi16imm+PUSH16
// peephole. Common case: i64-libcall-return forwarding —
// copyPhysReg(A=DPF0) emits LDA $F0; the next op is PUSH16 to
// forward the i64 high-half into a downstream call's args; the
// chained call's first op then redefines A.
auto It = std::next(MI->getIterator());
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It;
if (It != MI->getParent()->end() &&
It->getOpcode() == W65816::PUSH16) {
auto It2 = std::next(It);
while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2;
bool ADead = false;
if (It2 != MI->getParent()->end()) {
const TargetRegisterInfo *TRI =
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
if (It2->modifiesRegister(W65816::A, TRI))
ADead = true;
}
if (ADead) {
MCInst Pei;
Pei.setOpcode(W65816::PEI_DP);
Pei.addOperand(lowerOperand(MI->getOperand(0), MCInstLowering));
EmitToStreamer(*OutStreamer, Pei);
SkipNextPush16 = true;
return;
}
}
// Fall through to default emit (no peephole opportunity).
break;
}
case W65816::ADCi16imm: case W65816::ADCi16imm:
case W65816::SBCi16imm: { case W65816::SBCi16imm: {
bool IsSub = MI->getOpcode() == W65816::SBCi16imm; bool IsSub = MI->getOpcode() == W65816::SBCi16imm;

View file

@ -27,6 +27,19 @@ using namespace llvm;
// (The pure-i8-detection helpers were removed when the prologue went // (The pure-i8-detection helpers were removed when the prologue went
// to "always 16-bit M". See emitPrologue comment.) // to "always 16-bit M". See emitPrologue comment.)
//
// (DBR-zero wrap was prototyped here — PHB at function entry to save
// caller's DBR, set DBR=0, restore at exit. Two issues blocked it:
// (a) saving DBR to a DP slot ($F2/$F3) conflicts with libgcc's
// muldi3/divdi3 scratch — those routines use $F2..$F8 freely, so
// the saved DBR doesn't survive a libcall in the function body.
// (b) saving via PHB shifts SP, which means LowerFormalArguments
// would need to bump every arg's StackOffset by 1 — but at
// LowerFormalArguments time we don't know yet whether the function
// will need the wrap (indirect-Y emission is a later lowering
// choice). Right approach is a per-function attribute the user
// opts into, plus PEI integration to add a fixed-size "saved DBR"
// slot. Deferred — see STATUS.md.)
W65816FrameLowering::W65816FrameLowering(const W65816Subtarget &STI) W65816FrameLowering::W65816FrameLowering(const W65816Subtarget &STI)
: TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(1), 0, : TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(1), 0,
@ -153,8 +166,6 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF,
// before the RTL. // before the RTL.
uint64_t StackSize = MF.getFrameInfo().getStackSize(); uint64_t StackSize = MF.getFrameInfo().getStackSize();
bool HasVLA = MF.getFrameInfo().hasVarSizedObjects(); bool HasVLA = MF.getFrameInfo().hasVarSizedObjects();
if (StackSize == 0 && !HasVLA)
return;
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>(); const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
const W65816InstrInfo &TII = *STI.getInstrInfo(); const W65816InstrInfo &TII = *STI.getInstrInfo();
@ -162,6 +173,9 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF,
// Insert before the terminator (the return). // Insert before the terminator (the return).
DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc(); DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();
if (StackSize == 0 && !HasVLA)
return;
// Detect whether the return live-out includes Y or X — for i64 returns // Detect whether the return live-out includes Y or X — for i64 returns
// (Outs[0..2] -> A,X,Y), Y holds bits 32-47 and X holds bits 16-31, so // (Outs[0..2] -> A,X,Y), Y holds bits 32-47 and X holds bits 16-31, so
// any TAY/PLY/TAX in the SP-restore would corrupt the return value. // any TAY/PLY/TAX in the SP-restore would corrupt the return value.

View file

@ -210,6 +210,19 @@ class InstAbsLong<bits<8> op, string mnem>
let Inst{31-8} = addr; let Inst{31-8} = addr;
} }
// Absolute Long Indexed X. EA = addr_long + X. The bank comes from
// the operand's high byte, NOT from DBR useful for accessing data
// in a known bank (typically bank 0) regardless of the caller's DBR
// state. 4-byte instruction.
class InstAbsLongX<bits<8> op, string mnem>
: W65816Inst<(outs), (ins addrLong:$addr), !strconcat(mnem, "\t$addr, x")> {
let Size = 4;
bits<24> addr;
bits<32> Inst;
let Inst{7-0} = op;
let Inst{31-8} = addr;
}
class InstAbsX<bits<8> op, string mnem> class InstAbsX<bits<8> op, string mnem>
: W65816Inst<(outs), (ins addrAbs:$addr), !strconcat(mnem, "\t$addr, x")> { : W65816Inst<(outs), (ins addrAbs:$addr), !strconcat(mnem, "\t$addr, x")> {
let Size = 3; let Size = 3;

View file

@ -1072,6 +1072,26 @@ def XBA : InstImplied<0xEB, "xba"> { let mayLoad = 0; let mayStore = 0; }
def WAI : InstImplied<0xCB, "wai">; def WAI : InstImplied<0xCB, "wai">;
def STP : InstImplied<0xDB, "stp">; def STP : InstImplied<0xDB, "stp">;
// WDM (William D Mensch) reserved 2-byte NOP-equivalent. Useful as
// a debugger / emulator hook: MAME's apple2gs CPU traps on WDM and a
// Lua plugin can dispatch on the operand byte. CPU-side, it acts as
// a 2-byte NOP. Operand syntax mirrors MVN: `wdm $ab` (no `#`).
def WDM : InstDP<0x42, "wdm">;
// TRB / TSB Test and Reset/Set memory Bits. Atomic bit clear/set
// on a byte (or 16-bit word per M flag) at the given DP or abs
// address. Z flag set per (M & A) where M is the memory operand.
// Useful for memory-mapped IO bit twiddling. No DP indexing form.
def TRB_DP : InstDP<0x14, "trb">;
def TRB_Abs : InstAbs<0x1C, "trb">;
def TSB_DP : InstDP<0x04, "tsb">;
def TSB_Abs : InstAbs<0x0C, "tsb">;
// PEI Push Effective Indirect. Reads a 16-bit value from DP and
// pushes it. Useful for indirect parameter passing without going
// through A first.
def PEI_DP : InstDP<0xD4, "pei">;
//---------------------------------------------------------------- LDA (load A) //---------------------------------------------------------------- LDA (load A)
// The `_Imm8` forms of the mode-dependent load/arith/compare ops are // The `_Imm8` forms of the mode-dependent load/arith/compare ops are
// marked isCodeGenOnly so the asm matcher never picks them our // marked isCodeGenOnly so the asm matcher never picks them our
@ -1091,6 +1111,7 @@ def LDA_DPIndY : InstDPIndY<0xB1, "lda">;
def LDA_DPIndX : InstDPIndX<0xA1, "lda">; def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">; def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">;
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">; def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">;
def LDA_LongX : InstAbsLongX<0xBF, "lda">;
//---------------------------------------------------------------- STA (store A) //---------------------------------------------------------------- STA (store A)
def STA_DP : InstDP<0x85, "sta">; def STA_DP : InstDP<0x85, "sta">;
@ -1104,6 +1125,7 @@ def STA_DPIndY : InstDPIndY<0x91, "sta">;
def STA_DPIndX : InstDPIndX<0x81, "sta">; def STA_DPIndX : InstDPIndX<0x81, "sta">;
def STA_DPIndLong : InstDPIndLong <0x87, "sta">; def STA_DPIndLong : InstDPIndLong <0x87, "sta">;
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">; def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">;
def STA_LongX : InstAbsLongX<0x9F, "sta">;
//---------------------------------------------------------------- LDX (load X) //---------------------------------------------------------------- LDX (load X)
def LDX_Imm8 : InstImm8<0xA2, "ldx"> { let XHigh = 1; let DecoderNamespace = "W65816XHigh"; let isCodeGenOnly = 1; let Defs = [X]; } def LDX_Imm8 : InstImm8<0xA2, "ldx"> { let XHigh = 1; let DecoderNamespace = "W65816XHigh"; let isCodeGenOnly = 1; let Defs = [X]; }
@ -1131,6 +1153,14 @@ def STY_DP : InstDP<0x84, "sty">;
def STY_Abs : InstAbs<0x8C, "sty">; def STY_Abs : InstAbs<0x8C, "sty">;
def STY_DPX : InstDPX<0x94, "sty">; def STY_DPX : InstDPX<0x94, "sty">;
//---------------------------------------------------------------- STZ (store zero)
// Width follows M flag same as STA. Useful for zeroing DP scratch
// without burning A. Saves 1 byte vs `LDA #0; STA dp` per zero.
def STZ_DP : InstDP<0x64, "stz">;
def STZ_Abs : InstAbs<0x9C, "stz">;
def STZ_DPX : InstDPX<0x74, "stz">;
def STZ_AbsX : InstAbsX<0x9E, "stz">;
//------------------------------------------------------------------------- ADC //------------------------------------------------------------------------- ADC
def ADC_Imm8 : InstImm8<0x69, "adc"> { let MHigh = 1; let DecoderNamespace = "W65816MHigh"; let isCodeGenOnly = 1; } def ADC_Imm8 : InstImm8<0x69, "adc"> { let MHigh = 1; let DecoderNamespace = "W65816MHigh"; let isCodeGenOnly = 1; }
def ADC_Imm16 : InstImm16<0x69, "adc"> { let MLow = 1; } def ADC_Imm16 : InstImm16<0x69, "adc"> { let MLow = 1; }

View file

@ -268,13 +268,24 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
TII.get(NewOpc)).addImm(Offset); TII.get(NewOpc)).addImm(Offset);
switch (NewOpc) { switch (NewOpc) {
case W65816::LDA_StackRel: case W65816::LDA_StackRel:
case W65816::LDA_StackRelIndY:
Builder.addReg(W65816::A, RegState::ImplicitDefine); Builder.addReg(W65816::A, RegState::ImplicitDefine);
break; break;
case W65816::LDA_StackRelIndY:
// Indirect-Y: A def + Y use. The Y use is critical — without it,
// post-RA passes can reorder a Y-defining op past us, leaving the
// load reading at (ptr + stale_Y). Caught when modelling the dep
// for the (sr,s),Y bank-wrap workaround in W65816NegYIndY.
Builder.addReg(W65816::A, RegState::ImplicitDefine)
.addReg(W65816::Y, RegState::Implicit);
break;
case W65816::STA_StackRel: case W65816::STA_StackRel:
case W65816::STA_StackRelIndY:
Builder.addReg(W65816::A, RegState::Implicit); Builder.addReg(W65816::A, RegState::Implicit);
break; break;
case W65816::STA_StackRelIndY:
// Indirect-Y store: A use + Y use (same Y reasoning as above).
Builder.addReg(W65816::A, RegState::Implicit)
.addReg(W65816::Y, RegState::Implicit);
break;
case W65816::ADC_StackRel: case W65816::ADC_StackRel:
case W65816::SBC_StackRel: case W65816::SBC_StackRel:
Builder.addReg(W65816::A, RegState::Implicit) Builder.addReg(W65816::A, RegState::Implicit)

View file

@ -1295,6 +1295,32 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
// path (i32 (lo|hi) == 0): the OR sets Z, then the SETCC compares // path (i32 (lo|hi) == 0): the OR sets Z, then the SETCC compares
// against 0. The second compare is provably redundant because $a // against 0. The second compare is provably redundant because $a
// hasn't changed since the previous flag-defining op. // hasn't changed since the previous flag-defining op.
// Intra-MBB only — cross-MBB recursion into predecessors was tried
// (catches SETCC merge blocks where each pred ends with `lda #c`)
// but proved too brittle: predecessors ending with JSLpseudo declare
// implicit-def $a but the return-value flags aren't reliably set,
// and other corner cases break smoke.
auto isATransparent = [](const MachineInstr &MI) {
// Stores that don't touch A or P-bits-other-than-via-A.
return MI.getOpcode() == W65816::STAfi ||
MI.getOpcode() == W65816::STAfi_indY ||
MI.getOpcode() == W65816::STA8fi;
};
// Returns true iff walking back from `Start` (exclusive) finds an
// A-modifier as the first non-skip op. Skips debug ops and
// A-transparent stores; stops at the first real op. Templated to
// accept either iterator or const_iterator (Cmps came from a non-
// const iteration; predecessors are walked via const_iterator).
auto walkbackBefore = [&](auto Start, auto Begin) -> bool {
auto It = Start;
while (It != Begin) {
--It;
if (It->isDebugInstr()) continue;
if (isATransparent(*It)) continue;
return It->modifiesRegister(W65816::A, TRI);
}
return false;
};
for (MachineBasicBlock &MBB : MF) { for (MachineBasicBlock &MBB : MF) {
SmallVector<MachineInstr *, 8> Cmps; SmallVector<MachineInstr *, 8> Cmps;
for (MachineInstr &MI : MBB) for (MachineInstr &MI : MBB)
@ -1308,27 +1334,7 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
!Cmp->getOperand(1).isImm() || !Cmp->getOperand(1).isImm() ||
Cmp->getOperand(1).getImm() != 0) Cmp->getOperand(1).getImm() != 0)
continue; continue;
// Walk back across debug ops to find the immediately-prior real bool Found = walkbackBefore(Cmp->getIterator(), MBB.begin());
// instruction. If it modifies $a (i.e. it's an A-defining op
// that ALSO sets N/Z — true for every A-write op on the 65816
// except the no-op TSC variants), the CMP is redundant.
auto PrevIt = Cmp->getIterator();
bool Found = false;
while (PrevIt != MBB.begin()) {
--PrevIt;
if (PrevIt->isDebugInstr()) continue;
// Stores don't change $a — skip and keep walking back. This
// pass runs pre-PEI, so the skip-list uses the *pseudo* opcodes
// (STAfi / STAfi_indY / STA8fi); their post-PEI MC counterparts
// never appear here. STA8fi flips M via SEP/REP (Defs=[P]) but
// doesn't touch A or N/Z, so it's transparent for this CMP.
if (PrevIt->getOpcode() == W65816::STAfi ||
PrevIt->getOpcode() == W65816::STAfi_indY ||
PrevIt->getOpcode() == W65816::STA8fi)
continue;
Found = PrevIt->modifiesRegister(W65816::A, TRI);
break;
}
if (Found) { if (Found) {
Cmp->eraseFromParent(); Cmp->eraseFromParent();
Changed = true; Changed = true;