Checkpoint
This commit is contained in:
parent
18ef7e1fa6
commit
d6a34075a5
25 changed files with 1042 additions and 117 deletions
134
STATUS.md
134
STATUS.md
|
|
@ -72,9 +72,11 @@ which runs correctly under MAME (apple2gs).
|
||||||
native object format) for round-tripping with classic dev tools.
|
native object format) for round-tripping with classic dev tools.
|
||||||
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
- `runtime/build.sh` builds crt0, libc, soft-float, soft-double,
|
||||||
libgcc into linkable objects.
|
libgcc into linkable objects.
|
||||||
- `scripts/smokeTest.sh` runs 92 end-to-end checks (scalar ops,
|
- `scripts/smokeTest.sh` runs 99 end-to-end checks (scalar ops,
|
||||||
control flow, calling conventions, MAME execution, regressions,
|
control flow, calling conventions, MAME execution, regressions,
|
||||||
link816 bss-base safety, iigs/toolbox.h compile-check).
|
link816 bss-base safety, iigs/toolbox.h compile-check, standalone
|
||||||
|
runtime headers, AsmPrinter peepholes for STZ / PEA / PEI —
|
||||||
|
single-STA, shared-LDA-multi-STA, and DPF0-forwarding cases).
|
||||||
Currently 100% pass at -O2 throughout.
|
Currently 100% pass at -O2 throughout.
|
||||||
|
|
||||||
**ABI:**
|
**ABI:**
|
||||||
|
|
@ -152,13 +154,11 @@ end-to-end in MAME.
|
||||||
`strtok` / `strtok_r` live in their own TU at `-O2` (with
|
`strtok` / `strtok_r` live in their own TU at `-O2` (with
|
||||||
`__attribute__((noinline))` on `strtok_r` so the strtok() wrapper
|
`__attribute__((noinline))` on `strtok_r` so the strtok() wrapper
|
||||||
doesn't duplicate it). Multi-call strtok over "a,b,,c" works
|
doesn't duplicate it). Multi-call strtok over "a,b,,c" works
|
||||||
end-to-end in smoke. Latent backend issue: at certain rodata
|
end-to-end in smoke. The layout-sensitive miscompile that
|
||||||
layouts, -O2 strtok_r's BB0_7 inner CMP loop miscompiles due to
|
previously haunted strtok_r's inner CMP loop has been fixed by
|
||||||
LICM/sink interaction; current smoke layout passes but adding
|
modelling `Uses=[P]` on the conditional branches (the LICM/sink
|
||||||
bytes upstream (e.g. growing softDouble.o) can shift delim into
|
interaction that elided "redundant" CMPs no longer fires); no
|
||||||
a failing address. Surgical workaround `-mllvm -disable-machine-
|
surgical workaround flags needed.
|
||||||
sink` on strtok.c is documented; not currently applied because
|
|
||||||
smoke is green.
|
|
||||||
|
|
||||||
A small **RPN calculator** test (smoke #87) chains strtok, atol,
|
A small **RPN calculator** test (smoke #87) chains strtok, atol,
|
||||||
push/pop over a static stack, snprintf "%ld", and strcmp to verify
|
push/pop over a static stack, snprintf "%ld", and strcmp to verify
|
||||||
|
|
@ -203,20 +203,6 @@ sidecar bytes.
|
||||||
|
|
||||||
## Known issues / workarounds
|
## Known issues / workarounds
|
||||||
|
|
||||||
- **#70 FIXED**: greedy regalloc + W65816StackSlotCleanup Pass -2
|
|
||||||
was deleting an entry-side store to a slot that the loop body
|
|
||||||
read. Pass -2 collapses `LDAfi slotA; STAfi slotB; LDAfi slotC;
|
|
||||||
OPfi slotB` into `LDAfi slotC; OPfi slotA` (memory-to-memory copy
|
|
||||||
through A elimination), but didn't check whether slotB had other
|
|
||||||
refs in the function. In iterative qsort, slotB happened to be
|
|
||||||
the spill home for `hi` — the Pass -2 transform deleted the only
|
|
||||||
initialiser, leaving the loop body's `lda <hi-slot>, s` reading
|
|
||||||
garbage. Fix: function-wide `slotHasOtherRefs` safety check
|
|
||||||
before erasing the spill. `softDouble.c` still uses
|
|
||||||
`-mllvm -regalloc=fast` for `__muldf3`'s 64×64→128 multiply
|
|
||||||
(different greedy bug — register-pressure-driven, not
|
|
||||||
spill-deletion-driven).
|
|
||||||
|
|
||||||
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
|
- **(d,s),y / (sr,s),y addressing wraps the bank** when Y is
|
||||||
negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
|
negative as 16-bit unsigned. Worked around by `W65816NegYIndY`
|
||||||
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
|
rewriting the affected ops to `TAX ; LDA/STA $0000,X`. Stays
|
||||||
|
|
@ -228,34 +214,98 @@ sidecar bytes.
|
||||||
address of one of its locals — the callee's `*p = v` will write
|
address of one of its locals — the callee's `*p = v` will write
|
||||||
to the wrong bank. Documented; no compiler-side mitigation
|
to the wrong bank. Documented; no compiler-side mitigation
|
||||||
beyond the existing DPF0 fake-physreg routing for the i64-return
|
beyond the existing DPF0 fake-physreg routing for the i64-return
|
||||||
high half.
|
high half. Workaround: inline pointer-arg helpers so the writes
|
||||||
|
stay in the caller's frame using stack-rel direct stores. The
|
||||||
|
W65816 only has three DBR-independent addressing modes
|
||||||
|
(abs_long, abs_long,X, [dp],Y) — none cheap to retrofit into
|
||||||
|
the current pointer-deref lowering (+5 bytes minimum per access).
|
||||||
|
Real fix needs PHB/PLB at noinline-pointer-callee entry/exit.
|
||||||
|
|
||||||
- **strtok -O2 layout-sensitive miscompile FIXED** — modelling
|
## Recently fixed
|
||||||
`Uses=[P]` on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/
|
|
||||||
BVS/BVC) made MachineCSE see the dependency between an earlier
|
- **#70 — iterative qsort -O2 miscompile** — `W65816StackSlotCleanup`
|
||||||
CMP and the consuming Bxx, eliminating an entire class of
|
Pass -2 was deleting a store to a slot the loop body read.
|
||||||
layout-sensitive flag-corruption bugs. Verified by sweeping
|
Function-wide `slotHasOtherRefs` safety check added (Pass -1 and
|
||||||
`--rodata-base` from text-end to text-end+300 in 13 increments
|
Pass -2c hardened with the same pattern). Iterative qsort at
|
||||||
— every layout returns the correct strtok result.
|
plain -O2 + greedy now compiles correctly; the `optnone` workaround
|
||||||
As a follow-on, MachineCSE has been re-enabled (was previously
|
in smoke #70 was removed.
|
||||||
disabled in `W65816TargetMachine::addMachineSSAOptimization` as
|
|
||||||
a workaround for the same root cause).
|
- **strtok -O2 layout-sensitive miscompile** — modelling `Uses=[P]`
|
||||||
|
on the conditional branches (BEQ/BNE/BCS/BCC/BMI/BPL/BVS/BVC) made
|
||||||
|
MachineCSE / scheduler / LICM / sink see the CMP→Bxx flag
|
||||||
|
dependency. An entire class of layout-sensitive flag-corruption
|
||||||
|
bugs went away; verified by sweeping `--rodata-base` from text-end
|
||||||
|
to text-end+300 in 13 increments — every layout returns the correct
|
||||||
|
strtok result. As a follow-on, MachineCSE has been re-enabled
|
||||||
|
(was previously disabled in `W65816TargetMachine::addMachineSSAOpti
|
||||||
|
mization` as a workaround for the same root cause).
|
||||||
|
|
||||||
|
- **link816 silently produced 4.3GB binaries** when `--rodata-base`
|
||||||
|
was set inside the text region. Now dies with a clear error:
|
||||||
|
`--rodata-base 0xX overlaps text 0xY+N (must start at or after 0xZ)`.
|
||||||
|
|
||||||
|
- **link816 BSS-relocate landed in IIgs Language Card area** —
|
||||||
|
when text+rodata grew past $C000, link816 placed BSS at $D000
|
||||||
|
(the LC1 area), where IIgs-by-default maps ROM (writes drop
|
||||||
|
silently, reads return ROM bytes). Globals never initialised;
|
||||||
|
caught by the expression-parser smoke (#92) when adding rand /
|
||||||
|
strnlen / etc. pushed the runtime past that threshold. Two-part
|
||||||
|
fix: crt0 now enables LC1 RAM via the standard `lda $C083`
|
||||||
|
read-twice trick at startup, and link816 hard-fails (rather
|
||||||
|
than silently corrupt) if BSS would exceed the LC1 ceiling
|
||||||
|
($E000) — past that you'd need crt0 to also enable LC2 / shadow
|
||||||
|
RAM, which we haven't wired up.
|
||||||
|
|
||||||
|
- **STZ peephole multi-STA latent miscompile** — AsmPrinter's
|
||||||
|
`LDA #0; STA $g` -> `STZ $g` peephole eliminated the LDA but
|
||||||
|
only consumed the FIRST `STA`. When SDAG-CSE shared one
|
||||||
|
`LDA #0` across multiple `STA`s (`g16=0; g32=0;` is one IR
|
||||||
|
shape), trailing `STA`s read whatever was in A on entry —
|
||||||
|
silently corrupting any global where A wasn't 0 at function
|
||||||
|
entry. Smoke happened to pass because A was 0 by luck in
|
||||||
|
every covered path. Fixed by gating the peephole on the
|
||||||
|
consuming `STA` killing A (regalloc only sets `killed` on the
|
||||||
|
last reader); smoke #98 added to lock the multi-STA case.
|
||||||
|
|
||||||
|
- **PEI AsmPrinter peephole** — new: `LDA $dp; PHA` -> `PEI $dp`
|
||||||
|
saves 1 byte and avoids touching A. Fires on the
|
||||||
|
`copyPhysReg(A=DPF0); PUSH16` pattern (i64-libcall return-value
|
||||||
|
forwarding into the next call's stacked args), which appears
|
||||||
|
in every chained soft-double / soft-int64 expression. Saves
|
||||||
|
68 bytes across the runtime (-64 in math.o alone). Same
|
||||||
|
next-instruction-modifies-A safety check as the PEA peephole.
|
||||||
|
Smoke #99 added.
|
||||||
|
|
||||||
|
- **PEA peephole opcode-allowlist replaced with `modifiesRegister`** —
|
||||||
|
the next-after-PUSH16 check that gates the PEA peephole was a
|
||||||
|
hand-curated list of opcodes that obviously redefine A; switched
|
||||||
|
to `MachineInstr::modifiesRegister(A, TRI)` which also catches
|
||||||
|
implicit-defs (e.g. JSL clobbering A as part of the call ABI).
|
||||||
|
Saves a few bytes and is more robust.
|
||||||
|
|
||||||
|
- **libgcc.s `lda #0; sta $XX` -> `stz $XX`** — 7 sites converted
|
||||||
|
in libgcc.s after STZ landed in the assembler. Saves 28 bytes;
|
||||||
|
also removes two PHA/PLA save-restore wraps around the LDA #0
|
||||||
|
(STZ doesn't touch A, so the wraps are unnecessary).
|
||||||
|
|
||||||
## What's still needed for a "ship-ready" toolchain
|
## What's still needed for a "ship-ready" toolchain
|
||||||
|
|
||||||
- **softDouble.c -O1 hold-out** — `__muldf3`'s 64×64→128 multiply
|
- **softDouble.c -O1 hold-out** — `__muldf3`'s u64 lifetime pressure
|
||||||
with inlined alignment shifts overflows the greedy register
|
overflows the greedy register allocator at -O2 ("ran out of
|
||||||
allocator at -O2 ("ran out of registers during register
|
registers during register allocation"). Builds correctly at
|
||||||
allocation"). Builds correctly at -O1 (replaces the previous
|
-O1. Investigated: marking dpack noinline reduces pressure but
|
||||||
-O2 + -mllvm -regalloc=fast workaround; -O1 is smaller and
|
isn't enough; making dclass noinline would unblock -O2 (verified)
|
||||||
doesn't require the non-default flag).
|
but the (d,s),y-uses-DBR bug then corrupts dclass's pointer-arg
|
||||||
|
writes when a caller has switched DBR (caught by smoke's
|
||||||
|
dmul-after-bank-switch test). Real fix is gated on the broader
|
||||||
|
DBR-pointer-deref limitation listed above.
|
||||||
|
|
||||||
|
|
||||||
- **More of the C standard library**: real `<stdio.h>` file I/O
|
- **More of the C standard library**: real `<stdio.h>` file I/O
|
||||||
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
|
(`fopen`, `fread`, `fwrite`, `fseek` are currently stubs
|
||||||
returning success/zero) — would need a memory-backed FS or a
|
returning success/zero) — would need a memory-backed FS or a
|
||||||
MAME hook; `<locale.h>` / `<wchar.h>` if any real-world code
|
MAME hook. `<locale.h>` / `<signal.h>` are stubbed (compile and
|
||||||
needs them.
|
return safe defaults); `<wchar.h>` / `<time.h>` mostly absent.
|
||||||
|
|
||||||
- **C++ runtime support**: vtable layout for multiple inheritance,
|
- **C++ runtime support**: vtable layout for multiple inheritance,
|
||||||
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
|
RTTI, exceptions (or a documented `-fno-exceptions` requirement).
|
||||||
|
|
|
||||||
|
|
@ -41,11 +41,12 @@ cc "$SRC/extras.c"
|
||||||
cc "$SRC/strtok.c"
|
cc "$SRC/strtok.c"
|
||||||
cc "$SRC/math.c"
|
cc "$SRC/math.c"
|
||||||
cc "$SRC/softFloat.c"
|
cc "$SRC/softFloat.c"
|
||||||
# softDouble.c builds at -O1 instead of -O2: __muldf3's 64x64 -> 128
|
# softDouble.c builds at -O1: __muldf3's u64 live-range pressure
|
||||||
# mul + inlined alignment shifts overflows the greedy allocator on
|
# overflows the greedy allocator at -O2. dpack is already noinline
|
||||||
# the single-A target ("ran out of registers during register
|
# to reduce pressure, but dclass MUST stay inline (its pointer-arg
|
||||||
# allocation"). -O1 produces correct + smaller code than the
|
# writes from a noinline boundary would lower to `sta (d,s),y` which
|
||||||
# previous -O2 + -regalloc=fast workaround.
|
# uses DBR for the bank — silently corrupted under DBR != 0, caught
|
||||||
|
# by the dmul-after-bank-switch test). -O1 sidesteps this.
|
||||||
cc "$SRC/softDouble.c" -O1
|
cc "$SRC/softDouble.c" -O1
|
||||||
|
|
||||||
echo "runtime built: $(ls -1 "$OUT"/*.o | wc -l) objects"
|
echo "runtime built: $(ls -1 "$OUT"/*.o | wc -l) objects"
|
||||||
|
|
|
||||||
|
|
@ -10,6 +10,9 @@ int isspace(int c);
|
||||||
int isxdigit(int c);
|
int isxdigit(int c);
|
||||||
int isprint(int c);
|
int isprint(int c);
|
||||||
int ispunct(int c);
|
int ispunct(int c);
|
||||||
|
int iscntrl(int c);
|
||||||
|
int isgraph(int c);
|
||||||
|
int isblank(int c);
|
||||||
int toupper(int c);
|
int toupper(int c);
|
||||||
int tolower(int c);
|
int tolower(int c);
|
||||||
|
|
||||||
|
|
|
||||||
54
runtime/include/inttypes.h
Normal file
54
runtime/include/inttypes.h
Normal file
|
|
@ -0,0 +1,54 @@
|
||||||
|
// Minimal inttypes.h for the W65816 runtime. Pulls in stdint.h's
|
||||||
|
// fixed-width types and adds the printf/scanf format-string macros
|
||||||
|
// for those types. Standalone (does not include the host clang's
|
||||||
|
// inttypes.h, which pulls in glibc headers and breaks the build).
|
||||||
|
|
||||||
|
#ifndef _INTTYPES_H
|
||||||
|
#define _INTTYPES_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
// (strtoimax / strtoumax not implemented — runtime has strtol /
|
||||||
|
// strtoul for the 32-bit forms which cover the common needs.)
|
||||||
|
|
||||||
|
// PRIxN format macros. `int` is 16-bit on W65816, `long` is 32,
|
||||||
|
// `long long` is 64.
|
||||||
|
|
||||||
|
#define PRId8 "d"
|
||||||
|
#define PRIi8 "i"
|
||||||
|
#define PRIo8 "o"
|
||||||
|
#define PRIu8 "u"
|
||||||
|
#define PRIx8 "x"
|
||||||
|
#define PRIX8 "X"
|
||||||
|
|
||||||
|
#define PRId16 "d"
|
||||||
|
#define PRIi16 "i"
|
||||||
|
#define PRIo16 "o"
|
||||||
|
#define PRIu16 "u"
|
||||||
|
#define PRIx16 "x"
|
||||||
|
#define PRIX16 "X"
|
||||||
|
|
||||||
|
#define PRId32 "ld"
|
||||||
|
#define PRIi32 "li"
|
||||||
|
#define PRIo32 "lo"
|
||||||
|
#define PRIu32 "lu"
|
||||||
|
#define PRIx32 "lx"
|
||||||
|
#define PRIX32 "lX"
|
||||||
|
|
||||||
|
#define PRId64 "lld"
|
||||||
|
#define PRIi64 "lli"
|
||||||
|
#define PRIo64 "llo"
|
||||||
|
#define PRIu64 "llu"
|
||||||
|
#define PRIx64 "llx"
|
||||||
|
#define PRIX64 "llX"
|
||||||
|
|
||||||
|
#define PRIdMAX PRId64
|
||||||
|
#define PRIuMAX PRIu64
|
||||||
|
#define PRIxMAX PRIx64
|
||||||
|
|
||||||
|
#define PRIdPTR PRId16
|
||||||
|
#define PRIiPTR PRIi16
|
||||||
|
#define PRIuPTR PRIu16
|
||||||
|
#define PRIxPTR PRIx16
|
||||||
|
|
||||||
|
#endif
|
||||||
39
runtime/include/limits.h
Normal file
39
runtime/include/limits.h
Normal file
|
|
@ -0,0 +1,39 @@
|
||||||
|
// Minimal limits.h for the W65816 runtime. Standalone (does not
|
||||||
|
// include the host clang's limits.h, which pulls in glibc headers
|
||||||
|
// and breaks the build). Sizes per the W65816 backend's view:
|
||||||
|
// char = 8 bits
|
||||||
|
// short = 16 bits
|
||||||
|
// int = 16 bits
|
||||||
|
// long = 32 bits
|
||||||
|
// long long = 64 bits
|
||||||
|
|
||||||
|
#ifndef _LIMITS_H
|
||||||
|
#define _LIMITS_H
|
||||||
|
|
||||||
|
#define CHAR_BIT 8
|
||||||
|
#define MB_LEN_MAX 1
|
||||||
|
|
||||||
|
#define SCHAR_MIN (-128)
|
||||||
|
#define SCHAR_MAX 127
|
||||||
|
#define UCHAR_MAX 255
|
||||||
|
// `char` is signed by default on this target.
|
||||||
|
#define CHAR_MIN SCHAR_MIN
|
||||||
|
#define CHAR_MAX SCHAR_MAX
|
||||||
|
|
||||||
|
#define SHRT_MIN (-32768)
|
||||||
|
#define SHRT_MAX 32767
|
||||||
|
#define USHRT_MAX 65535U
|
||||||
|
|
||||||
|
#define INT_MIN (-32768)
|
||||||
|
#define INT_MAX 32767
|
||||||
|
#define UINT_MAX 65535U
|
||||||
|
|
||||||
|
#define LONG_MIN (-2147483647L - 1)
|
||||||
|
#define LONG_MAX 2147483647L
|
||||||
|
#define ULONG_MAX 4294967295UL
|
||||||
|
|
||||||
|
#define LLONG_MIN (-9223372036854775807LL - 1)
|
||||||
|
#define LLONG_MAX 9223372036854775807LL
|
||||||
|
#define ULLONG_MAX 18446744073709551615ULL
|
||||||
|
|
||||||
|
#endif
|
||||||
40
runtime/include/locale.h
Normal file
40
runtime/include/locale.h
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
// Minimal locale.h for the W65816 runtime. No real locale support —
|
||||||
|
// just enough to let portable code compile. setlocale() always
|
||||||
|
// returns "C" and ignores its argument; localeconv() returns a
|
||||||
|
// pointer to a fixed C-locale struct.
|
||||||
|
|
||||||
|
#ifndef _LOCALE_H
|
||||||
|
#define _LOCALE_H
|
||||||
|
|
||||||
|
struct lconv {
|
||||||
|
char *decimal_point;
|
||||||
|
char *thousands_sep;
|
||||||
|
char *grouping;
|
||||||
|
char *int_curr_symbol;
|
||||||
|
char *currency_symbol;
|
||||||
|
char *mon_decimal_point;
|
||||||
|
char *mon_thousands_sep;
|
||||||
|
char *mon_grouping;
|
||||||
|
char *positive_sign;
|
||||||
|
char *negative_sign;
|
||||||
|
char int_frac_digits;
|
||||||
|
char frac_digits;
|
||||||
|
char p_cs_precedes;
|
||||||
|
char p_sep_by_space;
|
||||||
|
char n_cs_precedes;
|
||||||
|
char n_sep_by_space;
|
||||||
|
char p_sign_posn;
|
||||||
|
char n_sign_posn;
|
||||||
|
};
|
||||||
|
|
||||||
|
#define LC_ALL 0
|
||||||
|
#define LC_COLLATE 1
|
||||||
|
#define LC_CTYPE 2
|
||||||
|
#define LC_MONETARY 3
|
||||||
|
#define LC_NUMERIC 4
|
||||||
|
#define LC_TIME 5
|
||||||
|
|
||||||
|
char *setlocale(int category, const char *locale);
|
||||||
|
struct lconv *localeconv(void);
|
||||||
|
|
||||||
|
#endif
|
||||||
27
runtime/include/signal.h
Normal file
27
runtime/include/signal.h
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
// Minimal signal.h for the W65816 runtime. No real signal handling
|
||||||
|
// — IIgs has no concept of POSIX signals. signal() always returns
|
||||||
|
// SIG_ERR; raise() always returns -1. These exist so portable code
|
||||||
|
// (e.g. asserts that map abort() through raise(SIGABRT)) compiles.
|
||||||
|
|
||||||
|
#ifndef _SIGNAL_H
|
||||||
|
#define _SIGNAL_H
|
||||||
|
|
||||||
|
typedef int sig_atomic_t;
|
||||||
|
|
||||||
|
typedef void (*__sighandler_t)(int);
|
||||||
|
|
||||||
|
#define SIG_DFL ((__sighandler_t)0)
|
||||||
|
#define SIG_IGN ((__sighandler_t)1)
|
||||||
|
#define SIG_ERR ((__sighandler_t)-1)
|
||||||
|
|
||||||
|
#define SIGINT 2
|
||||||
|
#define SIGILL 4
|
||||||
|
#define SIGABRT 6
|
||||||
|
#define SIGFPE 8
|
||||||
|
#define SIGSEGV 11
|
||||||
|
#define SIGTERM 15
|
||||||
|
|
||||||
|
__sighandler_t signal(int sig, __sighandler_t handler);
|
||||||
|
int raise(int sig);
|
||||||
|
|
||||||
|
#endif
|
||||||
17
runtime/include/stddef.h
Normal file
17
runtime/include/stddef.h
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
// Minimal stddef.h for the W65816 runtime. Standalone (does not
|
||||||
|
// include the host clang's stddef.h).
|
||||||
|
|
||||||
|
#ifndef _STDDEF_H
|
||||||
|
#define _STDDEF_H
|
||||||
|
|
||||||
|
typedef unsigned int size_t;
|
||||||
|
typedef int ptrdiff_t;
|
||||||
|
typedef int wchar_t; // not really wide-char-supported
|
||||||
|
|
||||||
|
#ifndef NULL
|
||||||
|
# define NULL ((void *)0)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define offsetof(t, m) __builtin_offsetof(t, m)
|
||||||
|
|
||||||
|
#endif
|
||||||
78
runtime/include/stdint.h
Normal file
78
runtime/include/stdint.h
Normal file
|
|
@ -0,0 +1,78 @@
|
||||||
|
// Minimal stdint.h for the W65816 runtime. Standalone (does not
|
||||||
|
// include the host clang's stdint.h, which pulls in glibc headers
|
||||||
|
// and breaks the build). Sizes per the W65816 backend's view:
|
||||||
|
// char = 8 bits
|
||||||
|
// short = 16 bits
|
||||||
|
// int = 16 bits
|
||||||
|
// long = 32 bits
|
||||||
|
// long long = 64 bits
|
||||||
|
|
||||||
|
#ifndef _STDINT_H
|
||||||
|
#define _STDINT_H
|
||||||
|
|
||||||
|
typedef signed char int8_t;
|
||||||
|
typedef unsigned char uint8_t;
|
||||||
|
typedef short int16_t;
|
||||||
|
typedef unsigned short uint16_t;
|
||||||
|
typedef long int32_t;
|
||||||
|
typedef unsigned long uint32_t;
|
||||||
|
typedef long long int64_t;
|
||||||
|
typedef unsigned long long uint64_t;
|
||||||
|
|
||||||
|
typedef int8_t int_least8_t;
|
||||||
|
typedef uint8_t uint_least8_t;
|
||||||
|
typedef int16_t int_least16_t;
|
||||||
|
typedef uint16_t uint_least16_t;
|
||||||
|
typedef int32_t int_least32_t;
|
||||||
|
typedef uint32_t uint_least32_t;
|
||||||
|
typedef int64_t int_least64_t;
|
||||||
|
typedef uint64_t uint_least64_t;
|
||||||
|
|
||||||
|
typedef int16_t int_fast8_t;
|
||||||
|
typedef uint16_t uint_fast8_t;
|
||||||
|
typedef int16_t int_fast16_t;
|
||||||
|
typedef uint16_t uint_fast16_t;
|
||||||
|
typedef int32_t int_fast32_t;
|
||||||
|
typedef uint32_t uint_fast32_t;
|
||||||
|
typedef int64_t int_fast64_t;
|
||||||
|
typedef uint64_t uint_fast64_t;
|
||||||
|
|
||||||
|
typedef int16_t intptr_t; // pointers are 16-bit on W65816
|
||||||
|
typedef uint16_t uintptr_t;
|
||||||
|
|
||||||
|
typedef int64_t intmax_t;
|
||||||
|
typedef uint64_t uintmax_t;
|
||||||
|
|
||||||
|
#define INT8_MIN (-0x7F - 1)
|
||||||
|
#define INT8_MAX 0x7F
|
||||||
|
#define UINT8_MAX 0xFFU
|
||||||
|
#define INT16_MIN (-0x7FFF - 1)
|
||||||
|
#define INT16_MAX 0x7FFF
|
||||||
|
#define UINT16_MAX 0xFFFFU
|
||||||
|
#define INT32_MIN (-0x7FFFFFFFL - 1)
|
||||||
|
#define INT32_MAX 0x7FFFFFFFL
|
||||||
|
#define UINT32_MAX 0xFFFFFFFFUL
|
||||||
|
#define INT64_MIN (-0x7FFFFFFFFFFFFFFFLL - 1)
|
||||||
|
#define INT64_MAX 0x7FFFFFFFFFFFFFFFLL
|
||||||
|
#define UINT64_MAX 0xFFFFFFFFFFFFFFFFULL
|
||||||
|
|
||||||
|
#define INTPTR_MIN INT16_MIN
|
||||||
|
#define INTPTR_MAX INT16_MAX
|
||||||
|
#define UINTPTR_MAX UINT16_MAX
|
||||||
|
|
||||||
|
#define INTMAX_MIN INT64_MIN
|
||||||
|
#define INTMAX_MAX INT64_MAX
|
||||||
|
#define UINTMAX_MAX UINT64_MAX
|
||||||
|
|
||||||
|
#define INT8_C(v) v
|
||||||
|
#define UINT8_C(v) v ## U
|
||||||
|
#define INT16_C(v) v
|
||||||
|
#define UINT16_C(v) v ## U
|
||||||
|
#define INT32_C(v) v ## L
|
||||||
|
#define UINT32_C(v) v ## UL
|
||||||
|
#define INT64_C(v) v ## LL
|
||||||
|
#define UINT64_C(v) v ## ULL
|
||||||
|
#define INTMAX_C(v) v ## LL
|
||||||
|
#define UINTMAX_C(v) v ## ULL
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
@ -37,4 +37,14 @@ void clearerr(FILE *stream);
|
||||||
#define SEEK_CUR 1
|
#define SEEK_CUR 1
|
||||||
#define SEEK_END 2
|
#define SEEK_END 2
|
||||||
|
|
||||||
|
#define EOF (-1)
|
||||||
|
|
||||||
|
// Input stubs. Real implementations would route through GS/OS
|
||||||
|
// console I/O; current impl in libc.c returns EOF / 0.
|
||||||
|
int getchar(void);
|
||||||
|
int fgetc(FILE *stream);
|
||||||
|
char *fgets(char *buf, int n, FILE *stream);
|
||||||
|
int ungetc(int c, FILE *stream);
|
||||||
|
#define getc(s) fgetc(s)
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
||||||
|
|
@ -31,4 +31,8 @@ int atexit(__atexit_fn fn);
|
||||||
#define EXIT_SUCCESS 0
|
#define EXIT_SUCCESS 0
|
||||||
#define EXIT_FAILURE 1
|
#define EXIT_FAILURE 1
|
||||||
|
|
||||||
|
#define RAND_MAX 0x7FFF
|
||||||
|
int rand(void);
|
||||||
|
void srand(unsigned int seed);
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
||||||
|
|
@ -10,10 +10,13 @@ int memcmp(const void *a, const void *b, size_t n);
|
||||||
void *memchr(const void *s, int c, size_t n);
|
void *memchr(const void *s, int c, size_t n);
|
||||||
|
|
||||||
size_t strlen(const char *s);
|
size_t strlen(const char *s);
|
||||||
|
size_t strnlen(const char *s, size_t maxlen);
|
||||||
char *strcpy(char *dst, const char *src);
|
char *strcpy(char *dst, const char *src);
|
||||||
char *strncpy(char *dst, const char *src, size_t n);
|
char *strncpy(char *dst, const char *src, size_t n);
|
||||||
int strcmp(const char *a, const char *b);
|
int strcmp(const char *a, const char *b);
|
||||||
int strncmp(const char *a, const char *b, size_t n);
|
int strncmp(const char *a, const char *b, size_t n);
|
||||||
|
int strcasecmp(const char *a, const char *b);
|
||||||
|
int strncasecmp(const char *a, const char *b, size_t n);
|
||||||
char *strchr(const char *s, int c);
|
char *strchr(const char *s, int c);
|
||||||
char *strrchr(const char *s, int c);
|
char *strrchr(const char *s, int c);
|
||||||
char *strstr(const char *haystack, const char *needle);
|
char *strstr(const char *haystack, const char *needle);
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,21 @@ __start:
|
||||||
lda #0x0fff
|
lda #0x0fff
|
||||||
tcs
|
tcs
|
||||||
|
|
||||||
|
; Enable Language Card 1 RAM at $D000-$DFFF for read+write.
|
||||||
|
; By default the IIgs maps that range to ROM (read-only). Two
|
||||||
|
; reads of $C083 enable RAM-bank-1, second read also enables
|
||||||
|
; writes. Without this, BSS auto-relocated past $C000 lands on
|
||||||
|
; ROM and globals never initialise (writes drop on the floor;
|
||||||
|
; reads return ROM bytes). Caught by the expression-parser
|
||||||
|
; smoke test (#92) when runtime growth pushed bss past $BFFF.
|
||||||
|
; The reads must be 8-bit (one byte at a time) — a 16-bit M
|
||||||
|
; read at $C083 would also touch $C084 (a different soft
|
||||||
|
; switch), wiping the LC enable we just set.
|
||||||
|
sep #0x20
|
||||||
|
lda 0xc083
|
||||||
|
lda 0xc083
|
||||||
|
rep #0x20
|
||||||
|
|
||||||
; Zero BSS. X iterates from __bss_start to __bss_end; each
|
; Zero BSS. X iterates from __bss_start to __bss_end; each
|
||||||
; iteration writes one byte of zero at addr X (via DP=0 +
|
; iteration writes one byte of zero at addr X (via DP=0 +
|
||||||
; offset 0 — which is just X). Wraps in 8-bit M for the
|
; offset 0 — which is just X). Wraps in 8-bit M for the
|
||||||
|
|
|
||||||
|
|
@ -67,6 +67,70 @@ long long llabs(long long n) {
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// strnlen: like strlen but capped at maxlen. Useful for safely
|
||||||
|
// measuring strings that may not be NUL-terminated within a known
|
||||||
|
// buffer size.
|
||||||
|
size_t strnlen(const char *s, size_t maxlen) {
|
||||||
|
size_t n = 0;
|
||||||
|
while (n < maxlen && s[n]) {
|
||||||
|
n++;
|
||||||
|
}
|
||||||
|
return n;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static int toLowerByte(int c) {
|
||||||
|
if (c >= 'A' && c <= 'Z') {
|
||||||
|
return c - 'A' + 'a';
|
||||||
|
}
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int strcasecmp(const char *a, const char *b) {
|
||||||
|
while (*a && *b) {
|
||||||
|
int da = toLowerByte((unsigned char)*a);
|
||||||
|
int db = toLowerByte((unsigned char)*b);
|
||||||
|
if (da != db) {
|
||||||
|
return da - db;
|
||||||
|
}
|
||||||
|
a++;
|
||||||
|
b++;
|
||||||
|
}
|
||||||
|
return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int strncasecmp(const char *a, const char *b, size_t n) {
|
||||||
|
while (n && *a && *b) {
|
||||||
|
int da = toLowerByte((unsigned char)*a);
|
||||||
|
int db = toLowerByte((unsigned char)*b);
|
||||||
|
if (da != db) {
|
||||||
|
return da - db;
|
||||||
|
}
|
||||||
|
a++;
|
||||||
|
b++;
|
||||||
|
n--;
|
||||||
|
}
|
||||||
|
if (!n) return 0;
|
||||||
|
return toLowerByte((unsigned char)*a) - toLowerByte((unsigned char)*b);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// Linear congruential generator — Numerical Recipes constants.
|
||||||
|
// Returns 15-bit values (RAND_MAX = 0x7FFF) per C standard convention.
|
||||||
|
static unsigned long randSeed = 1;
|
||||||
|
|
||||||
|
void srand(unsigned int seed) {
|
||||||
|
randSeed = seed;
|
||||||
|
}
|
||||||
|
|
||||||
|
int rand(void) {
|
||||||
|
randSeed = randSeed * 1103515245UL + 12345UL;
|
||||||
|
return (int)((randSeed >> 16) & 0x7FFF);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
// ----- additional string.h ----------------------------------------------
|
// ----- additional string.h ----------------------------------------------
|
||||||
|
|
||||||
static int inSet(char c, const char *set) {
|
static int inSet(char c, const char *set) {
|
||||||
|
|
|
||||||
|
|
@ -119,6 +119,9 @@ int isxdigit(int c) {
|
||||||
}
|
}
|
||||||
int isprint(int c) { return c >= 0x20 && c < 0x7f; }
|
int isprint(int c) { return c >= 0x20 && c < 0x7f; }
|
||||||
int ispunct(int c) { return isprint(c) && !isalnum(c) && c != ' '; }
|
int ispunct(int c) { return isprint(c) && !isalnum(c) && c != ' '; }
|
||||||
|
int iscntrl(int c) { return (c >= 0 && c < 0x20) || c == 0x7f; }
|
||||||
|
int isgraph(int c) { return isprint(c) && c != ' '; }
|
||||||
|
int isblank(int c) { return c == ' ' || c == '\t'; }
|
||||||
|
|
||||||
int toupper(int c) { return islower(c) ? c - 32 : c; }
|
int toupper(int c) { return islower(c) ? c - 32 : c; }
|
||||||
int tolower(int c) { return isupper(c) ? c + 32 : c; }
|
int tolower(int c) { return isupper(c) ? c + 32 : c; }
|
||||||
|
|
@ -160,6 +163,21 @@ int puts(const char *s) {
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---- input stubs ----
|
||||||
|
//
|
||||||
|
// Real input would route through GS/OS console / event handling.
|
||||||
|
// These return EOF / NULL so user code that calls them links and
|
||||||
|
// gets predictable end-of-input behaviour. FILE struct is defined
|
||||||
|
// further down (alongside fopen etc.) — forward-declare for the
|
||||||
|
// signatures.
|
||||||
|
struct __sFILE;
|
||||||
|
int getchar(void) { return -1; /* EOF */ }
|
||||||
|
int fgetc(struct __sFILE *s) { (void)s; return -1; }
|
||||||
|
char *fgets(char *b, int n, struct __sFILE *s) {
|
||||||
|
(void)b; (void)n; (void)s; return (char *)0;
|
||||||
|
}
|
||||||
|
int ungetc(int c, struct __sFILE *s) { (void)c; (void)s; return -1; }
|
||||||
|
|
||||||
// ---- minimal printf ----
|
// ---- minimal printf ----
|
||||||
|
|
||||||
// Re-declare va_list / va_* locally rather than including stdarg.h —
|
// Re-declare va_list / va_* locally rather than including stdarg.h —
|
||||||
|
|
@ -617,3 +635,76 @@ long ftell(FILE *stream) {
|
||||||
int feof(FILE *stream) { (void)stream; return 1; }
|
int feof(FILE *stream) { (void)stream; return 1; }
|
||||||
int ferror(FILE *stream) { (void)stream; return 0; }
|
int ferror(FILE *stream) { (void)stream; return 0; }
|
||||||
void clearerr(FILE *stream) { (void)stream; }
|
void clearerr(FILE *stream) { (void)stream; }
|
||||||
|
|
||||||
|
// ---- locale.h stubs ----
|
||||||
|
//
|
||||||
|
// No real locale support — IIgs is single-locale. setlocale always
|
||||||
|
// returns "C", localeconv returns a fixed C-locale struct. These
|
||||||
|
// are stubs so portable code that calls setlocale("") for diagnostic
|
||||||
|
// purposes compiles and runs.
|
||||||
|
|
||||||
|
struct lconv {
|
||||||
|
char *decimal_point;
|
||||||
|
char *thousands_sep;
|
||||||
|
char *grouping;
|
||||||
|
char *int_curr_symbol;
|
||||||
|
char *currency_symbol;
|
||||||
|
char *mon_decimal_point;
|
||||||
|
char *mon_thousands_sep;
|
||||||
|
char *mon_grouping;
|
||||||
|
char *positive_sign;
|
||||||
|
char *negative_sign;
|
||||||
|
char int_frac_digits;
|
||||||
|
char frac_digits;
|
||||||
|
char p_cs_precedes;
|
||||||
|
char p_sep_by_space;
|
||||||
|
char n_cs_precedes;
|
||||||
|
char n_sep_by_space;
|
||||||
|
char p_sign_posn;
|
||||||
|
char n_sign_posn;
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct lconv __c_lconv = {
|
||||||
|
(char *)".", // decimal_point
|
||||||
|
(char *)"", // thousands_sep
|
||||||
|
(char *)"", // grouping
|
||||||
|
(char *)"", // int_curr_symbol
|
||||||
|
(char *)"", // currency_symbol
|
||||||
|
(char *)"", // mon_decimal_point
|
||||||
|
(char *)"", // mon_thousands_sep
|
||||||
|
(char *)"", // mon_grouping
|
||||||
|
(char *)"", // positive_sign
|
||||||
|
(char *)"", // negative_sign
|
||||||
|
(char)127, // int_frac_digits (CHAR_MAX = "unspecified")
|
||||||
|
(char)127, // frac_digits
|
||||||
|
(char)127, (char)127, (char)127, (char)127, (char)127, (char)127,
|
||||||
|
};
|
||||||
|
|
||||||
|
char *setlocale(int category, const char *locale) {
|
||||||
|
(void)category; (void)locale;
|
||||||
|
return (char *)"C";
|
||||||
|
}
|
||||||
|
|
||||||
|
struct lconv *localeconv(void) {
|
||||||
|
return &__c_lconv;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- signal.h stubs ----
|
||||||
|
//
|
||||||
|
// IIgs has no POSIX-style signal model. signal() always fails (returns
|
||||||
|
// SIG_ERR); raise() returns -1. Code that uses these for diagnostic
|
||||||
|
// fall-through (e.g. abort -> raise(SIGABRT) -> stub) compiles and
|
||||||
|
// behaves as "signals disabled".
|
||||||
|
|
||||||
|
typedef void (*__sighandler_t)(int);
|
||||||
|
#define _SIG_ERR ((__sighandler_t)-1)
|
||||||
|
|
||||||
|
__sighandler_t signal(int sig, __sighandler_t handler) {
|
||||||
|
(void)sig; (void)handler;
|
||||||
|
return _SIG_ERR;
|
||||||
|
}
|
||||||
|
|
||||||
|
int raise(int sig) {
|
||||||
|
(void)sig;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -60,8 +60,7 @@ __mulhi3:
|
||||||
sta 0xe0 ; multiplier
|
sta 0xe0 ; multiplier
|
||||||
lda 0x4, s
|
lda 0x4, s
|
||||||
sta 0xe2 ; multiplicand
|
sta 0xe2 ; multiplicand
|
||||||
lda #0x0
|
stz 0xe4 ; running product = 0
|
||||||
sta 0xe4 ; running product
|
|
||||||
.Lmul_loop:
|
.Lmul_loop:
|
||||||
lda 0xe0
|
lda 0xe0
|
||||||
beq .Lmul_done
|
beq .Lmul_done
|
||||||
|
|
@ -225,12 +224,9 @@ __modhi3:
|
||||||
; Uses JSR/RTS, same bank.
|
; Uses JSR/RTS, same bank.
|
||||||
; --------------------------------------------------------------------
|
; --------------------------------------------------------------------
|
||||||
__divmod_setup:
|
__divmod_setup:
|
||||||
; Sign tracker. We don't have STZ in our instruction set yet, so
|
; Sign tracker. STZ doesn't touch A — preserves the value
|
||||||
; clear via PHA/LDA #0/STA/PLA to avoid trashing A.
|
; we still need below.
|
||||||
pha
|
stz 0xee
|
||||||
lda #0x0
|
|
||||||
sta 0xee
|
|
||||||
pla
|
|
||||||
; Dividend sign + abs value.
|
; Dividend sign + abs value.
|
||||||
cmp #0x8000
|
cmp #0x8000
|
||||||
bcc .Lset_a_pos
|
bcc .Lset_a_pos
|
||||||
|
|
@ -269,9 +265,8 @@ __divmod_setup:
|
||||||
; outputs quotient at $ea, remainder at $ec. JSR/RTS local helper.
|
; outputs quotient at $ea, remainder at $ec. JSR/RTS local helper.
|
||||||
; --------------------------------------------------------------------
|
; --------------------------------------------------------------------
|
||||||
__udivmod_core:
|
__udivmod_core:
|
||||||
lda #0x0
|
stz 0xea
|
||||||
sta 0xea
|
stz 0xec
|
||||||
sta 0xec
|
|
||||||
ldx #0x10
|
ldx #0x10
|
||||||
.Lcore_loop:
|
.Lcore_loop:
|
||||||
asl 0xe6
|
asl 0xe6
|
||||||
|
|
@ -327,9 +322,8 @@ __mulsi3:
|
||||||
lda 0x6, s
|
lda 0x6, s
|
||||||
sta 0xe6
|
sta 0xe6
|
||||||
; Clear running product at $e8/$ea.
|
; Clear running product at $e8/$ea.
|
||||||
lda #0x0
|
stz 0xe8
|
||||||
sta 0xe8
|
stz 0xea
|
||||||
sta 0xea
|
|
||||||
; Loop 32 times: examine LSB of multiplier, conditionally add
|
; Loop 32 times: examine LSB of multiplier, conditionally add
|
||||||
; multiplicand to product, then shift multiplier right and
|
; multiplicand to product, then shift multiplier right and
|
||||||
; multiplicand left. Use Y as a 16-bit counter (X mode = 16).
|
; multiplicand left. Use Y as a 16-bit counter (X mode = 16).
|
||||||
|
|
@ -456,10 +450,9 @@ __ashrsi3:
|
||||||
; JSR/RTS local helper.
|
; JSR/RTS local helper.
|
||||||
; --------------------------------------------------------------------
|
; --------------------------------------------------------------------
|
||||||
__udivmodsi_core:
|
__udivmodsi_core:
|
||||||
lda #0x0
|
stz 0xe8
|
||||||
sta 0xe8
|
stz 0xea
|
||||||
sta 0xea
|
stz 0xec
|
||||||
sta 0xec
|
|
||||||
sta 0xee
|
sta 0xee
|
||||||
ldy #0x20
|
ldy #0x20
|
||||||
.Lcoresi_loop:
|
.Lcoresi_loop:
|
||||||
|
|
@ -588,11 +581,8 @@ __modsi3:
|
||||||
; (8,s)=b_hi.
|
; (8,s)=b_hi.
|
||||||
; --------------------------------------------------------------------
|
; --------------------------------------------------------------------
|
||||||
__divmodsi_setup:
|
__divmodsi_setup:
|
||||||
; Clear sign tracker.
|
; Clear sign tracker. STZ preserves A.
|
||||||
pha
|
stz 0xf0
|
||||||
lda #0x0
|
|
||||||
sta 0xf0
|
|
||||||
pla
|
|
||||||
; |a|: A=a_lo, X=a_hi. Save them first (we need a_hi for sign test).
|
; |a|: A=a_lo, X=a_hi. Save them first (we need a_hi for sign test).
|
||||||
sta 0xe0 ; tentative a_lo (may negate below)
|
sta 0xe0 ; tentative a_lo (may negate below)
|
||||||
stx 0xe2 ; tentative a_hi
|
stx 0xe2 ; tentative a_hi
|
||||||
|
|
@ -805,11 +795,10 @@ __ashrdi3:
|
||||||
__muldi3:
|
__muldi3:
|
||||||
jsr __divmoddi4_stash
|
jsr __divmoddi4_stash
|
||||||
; Clear product P0..P3 at $F2..$F8.
|
; Clear product P0..P3 at $F2..$F8.
|
||||||
lda #0x0
|
stz 0xf2
|
||||||
sta 0xf2
|
stz 0xf4
|
||||||
sta 0xf4
|
stz 0xf6
|
||||||
sta 0xf6
|
stz 0xf8
|
||||||
sta 0xf8
|
|
||||||
; Loop 64 times on a's bits.
|
; Loop 64 times on a's bits.
|
||||||
ldy #0x40
|
ldy #0x40
|
||||||
.Lmuldi_loop:
|
.Lmuldi_loop:
|
||||||
|
|
@ -975,11 +964,10 @@ __umoddi3:
|
||||||
; Output: quotient at $E0..$E6, remainder at $F2..$F8.
|
; Output: quotient at $E0..$E6, remainder at $F2..$F8.
|
||||||
__udivmoddi_core:
|
__udivmoddi_core:
|
||||||
; Clear remainder $F2..$F8.
|
; Clear remainder $F2..$F8.
|
||||||
lda #0x0
|
stz 0xf2
|
||||||
sta 0xf2
|
stz 0xf4
|
||||||
sta 0xf4
|
stz 0xf6
|
||||||
sta 0xf6
|
stz 0xf8
|
||||||
sta 0xf8
|
|
||||||
ldy #0x40
|
ldy #0x40
|
||||||
.Ludivmoddi_loop:
|
.Ludivmoddi_loop:
|
||||||
; Shift left: dividend (becomes quotient) and remainder together
|
; Shift left: dividend (becomes quotient) and remainder together
|
||||||
|
|
|
||||||
|
|
@ -22,7 +22,12 @@ typedef unsigned char u8;
|
||||||
#define DEXP_SHIFT 52
|
#define DEXP_SHIFT 52
|
||||||
#define DEXP_BIAS 1023
|
#define DEXP_BIAS 1023
|
||||||
|
|
||||||
static inline u64 dpack(u64 sign, s16 exp, u64 mant) {
|
// noinline: keeps register pressure in the callers (esp. __muldf3)
|
||||||
|
// low enough for greedy regalloc to allocate at -O2. Without this,
|
||||||
|
// __muldf3 fails with "ran out of registers during register
|
||||||
|
// allocation" — too many concurrent u64 lifetimes (sa, sb, ma, mb,
|
||||||
|
// sr, mr) and the dpack inline blew it past the spill capacity.
|
||||||
|
__attribute__((noinline)) static u64 dpack(u64 sign, s16 exp, u64 mant) {
|
||||||
if (mant == 0) return sign;
|
if (mant == 0) return sign;
|
||||||
u64 e = (u64)(exp + DEXP_BIAS);
|
u64 e = (u64)(exp + DEXP_BIAS);
|
||||||
if (e >= 2047) {
|
if (e >= 2047) {
|
||||||
|
|
@ -38,6 +43,11 @@ static inline u64 dpack(u64 sign, s16 exp, u64 mant) {
|
||||||
|
|
||||||
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
|
// Decompose `x` into sign / unbiased-exp / mantissa-with-leading-bit.
|
||||||
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
|
// Returns the class: 0=zero, 1=normal, 2=infinity, 3=NaN.
|
||||||
|
// Inlinable on purpose — out_sign/out_exp/out_mant point at caller
|
||||||
|
// stack locals; if dclass were noinline the writes would lower to
|
||||||
|
// `sta (d,s),y` which uses DBR for the bank, silently corrupting
|
||||||
|
// data when the caller has switched DBR. Caught by smoke's
|
||||||
|
// dmul-after-bank-switch test (#dmul-bank-switch).
|
||||||
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
|
static u16 dclass(u64 x, u64 *out_sign, s16 *out_exp, u64 *out_mant) {
|
||||||
*out_sign = x & DSIGN_BIT;
|
*out_sign = x & DSIGN_BIT;
|
||||||
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);
|
s16 e = (s16)((x >> DEXP_SHIFT) & 0x7FF);
|
||||||
|
|
|
||||||
|
|
@ -83,7 +83,11 @@ if [ -x "$LLVM_MC" ]; then
|
||||||
sta 0x1000
|
sta 0x1000
|
||||||
sta 0x010000
|
sta 0x010000
|
||||||
mvn 0x01, 0x02
|
mvn 0x01, 0x02
|
||||||
jsl 0x012345'
|
jsl 0x012345
|
||||||
|
lda 0x123456, x
|
||||||
|
sta 0xabcdef, x
|
||||||
|
stz 0xe2
|
||||||
|
stz 0x1234'
|
||||||
mcOut="$(printf '%s\n' "$mcInput" | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)"
|
mcOut="$(printf '%s\n' "$mcInput" | "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)"
|
||||||
|
|
||||||
assertHas() {
|
assertHas() {
|
||||||
|
|
@ -103,6 +107,27 @@ if [ -x "$LLVM_MC" ]; then
|
||||||
assertHas "[0x8f,0x00,0x00,0x01]"
|
assertHas "[0x8f,0x00,0x00,0x01]"
|
||||||
assertHas "[0x54,0x01,0x02]"
|
assertHas "[0x54,0x01,0x02]"
|
||||||
assertHas "[0x22,0x45,0x23,0x01]"
|
assertHas "[0x22,0x45,0x23,0x01]"
|
||||||
|
# abs_long,X (DBR-independent X-indexed access — used by future
|
||||||
|
# DBR-safe pointer-deref lowering)
|
||||||
|
assertHas "[0xbf,0x56,0x34,0x12]"
|
||||||
|
assertHas "[0x9f,0xef,0xcd,0xab]"
|
||||||
|
# STZ (store zero) — saves a byte vs `LDA #0; STA dp` for zeroing
|
||||||
|
# DP scratch slots (used by the upcoming [dp],Y bank-byte
|
||||||
|
# invariant for DBR-safe pointer derefs).
|
||||||
|
assertHas "[0x64,0xe2]"
|
||||||
|
assertHas "[0x9c,0x34,0x12]"
|
||||||
|
# WDM / TRB / TSB / PEI — useful 65816 instructions added for
|
||||||
|
# MAME debug hooks (WDM), atomic memory bit ops on hardware
|
||||||
|
# registers (TRB/TSB), and indirect data push (PEI).
|
||||||
|
extOut="$(printf '\twdm 0xab\n\ttrb 0x1234\n\ttsb 0x10\n\tpei 0xe0\n' \
|
||||||
|
| "$LLVM_MC" -arch=w65816 -show-encoding 2>&1)"
|
||||||
|
for enc in "[0x42,0xab]" "[0x1c,0x34,0x12]" "[0x04,0x10]" "[0xd4,0xe0]"; do
|
||||||
|
if ! printf '%s\n' "$extOut" | grep -qF "$enc"; then
|
||||||
|
warn "missing extended-opcode encoding: $enc"
|
||||||
|
printf '%s\n' "$extOut" >&2
|
||||||
|
die "llvm-mc did not produce expected extended-opcode encoding"
|
||||||
|
fi
|
||||||
|
done
|
||||||
else
|
else
|
||||||
warn "llvm-mc not built; skipping MC round-trip check"
|
warn "llvm-mc not built; skipping MC round-trip check"
|
||||||
fi
|
fi
|
||||||
|
|
@ -843,6 +868,91 @@ EOF
|
||||||
# function. STA8abs in AsmPrinter must wrap with SEP/REP when
|
# function. STA8abs in AsmPrinter must wrap with SEP/REP when
|
||||||
# UsesAcc8 is false; bare `sta g+N` in M=0 writes 2 bytes and
|
# UsesAcc8 is false; bare `sta g+N` in M=0 writes 2 bytes and
|
||||||
# corrupts the next global.
|
# corrupts the next global.
|
||||||
|
log "check: clang lowers 'g = 0' to single STZ via AsmPrinter peephole"
|
||||||
|
cStzFile="$(mktemp --suffix=.c)"
|
||||||
|
sStzFile="$(mktemp --suffix=.s)"
|
||||||
|
cat > "$cStzFile" <<'EOF'
|
||||||
|
unsigned short g;
|
||||||
|
void zero(void) { g = 0; }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -S "$cStzFile" -o "$sStzFile"
|
||||||
|
# Should see exactly one `stz g` and zero `lda #0` in the function.
|
||||||
|
if ! grep -qE '^\s*stz\s+g\b' "$sStzFile"; then
|
||||||
|
warn "STZ peephole not firing"; cat "$sStzFile" >&2
|
||||||
|
die "expected 'stz g' in zero() but didn't find it"
|
||||||
|
fi
|
||||||
|
if grep -qE '^\s*lda\s+#0x0' "$sStzFile"; then
|
||||||
|
warn "STZ peephole left a redundant LDA #0"; cat "$sStzFile" >&2
|
||||||
|
die "STZ peephole should have eliminated the LDA #0"
|
||||||
|
fi
|
||||||
|
rm -f "$cStzFile" "$sStzFile"
|
||||||
|
|
||||||
|
# Multi-STA-from-shared-LDA: when SDAG CSE shares one `lda #0` across
|
||||||
|
# multiple `sta`s, the peephole MUST NOT fire on the first STA (would
|
||||||
|
# delete the LDA, leaving the remaining STAs reading dead A). Verify
|
||||||
|
# the LDA #0 is preserved and no STZ appears in this case.
|
||||||
|
log "check: STZ peephole skips when LDA #0 feeds multiple STAs"
|
||||||
|
cStzMultiFile="$(mktemp --suffix=.c)"
|
||||||
|
sStzMultiFile="$(mktemp --suffix=.s)"
|
||||||
|
cat > "$cStzMultiFile" <<'EOF'
|
||||||
|
unsigned short ga, gb, gc;
|
||||||
|
void zeroAll(void) { ga = 0; gb = 0; gc = 0; }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -S "$cStzMultiFile" -o "$sStzMultiFile"
|
||||||
|
if ! grep -qE '^\s*lda\s+#0x0' "$sStzMultiFile"; then
|
||||||
|
warn "STZ peephole over-eagerly deleted shared LDA #0"
|
||||||
|
cat "$sStzMultiFile" >&2
|
||||||
|
die "expected lda #0 preserved when feeding multiple STAs"
|
||||||
|
fi
|
||||||
|
n_sta=$(grep -cE '^\s*sta\s+g[abc]\b' "$sStzMultiFile")
|
||||||
|
if [ "$n_sta" -ne 3 ]; then
|
||||||
|
warn "expected 3 STA after shared LDA #0, found $n_sta"
|
||||||
|
cat "$sStzMultiFile" >&2
|
||||||
|
die "STZ peephole regressed on multi-STA case"
|
||||||
|
fi
|
||||||
|
rm -f "$cStzMultiFile" "$sStzMultiFile"
|
||||||
|
|
||||||
|
log "check: clang lowers 'foo(1,2,3)' constant args via PEA"
|
||||||
|
cPeaFile="$(mktemp --suffix=.c)"
|
||||||
|
sPeaFile="$(mktemp --suffix=.s)"
|
||||||
|
cat > "$cPeaFile" <<'EOF'
|
||||||
|
extern void foo(int a, int b, int c);
|
||||||
|
void caller(void) { foo(1, 2, 3); }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -S "$cPeaFile" -o "$sPeaFile"
|
||||||
|
# arg2 (c=3) and arg1 (b=2) are pushed via PEA. arg0 (a=1)
|
||||||
|
# stays in A. Expect at least 2 `pea` instructions and zero
|
||||||
|
# `pha` after a `lda #imm`.
|
||||||
|
n_pea=$(grep -cE '^\s*pea\s+' "$sPeaFile")
|
||||||
|
if [ "$n_pea" -lt 2 ]; then
|
||||||
|
warn "PEA peephole not firing on constant-arg pushes"
|
||||||
|
cat "$sPeaFile" >&2
|
||||||
|
die "expected >= 2 PEA in caller() but found $n_pea"
|
||||||
|
fi
|
||||||
|
rm -f "$cPeaFile" "$sPeaFile"
|
||||||
|
|
||||||
|
# PEI peephole: an i64-libcall return whose high half lives in
|
||||||
|
# DPF0 ($F0..$F1) is forwarded to the next call as a stacked arg.
|
||||||
|
# Pre-peephole shape: `lda $f0; pha`. Post-peephole: `pei $f0`,
|
||||||
|
# saving 1 byte and not touching A.
|
||||||
|
log "check: clang lowers DPF0 forwarding via PEI"
|
||||||
|
cPeiFile="$(mktemp --suffix=.c)"
|
||||||
|
sPeiFile="$(mktemp --suffix=.s)"
|
||||||
|
cat > "$cPeiFile" <<'EOF'
|
||||||
|
unsigned long long divmod(unsigned long long a, unsigned long long b);
|
||||||
|
unsigned long long use(unsigned long long x);
|
||||||
|
unsigned long long chain(unsigned long long a, unsigned long long b) {
|
||||||
|
return use(divmod(a, b));
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -S "$cPeiFile" -o "$sPeiFile"
|
||||||
|
if ! grep -qE '^\s*pei\s+0xf0\b' "$sPeiFile"; then
|
||||||
|
warn "PEI peephole not firing on DPF0 forwarding"
|
||||||
|
cat "$sPeiFile" >&2
|
||||||
|
die "expected 'pei 0xf0' in chain() but didn't find it"
|
||||||
|
fi
|
||||||
|
rm -f "$cPeiFile" "$sPeiFile"
|
||||||
|
|
||||||
log "check: clang i8 store to global in M=0 mode is SEP/REP bracketed"
|
log "check: clang i8 store to global in M=0 mode is SEP/REP bracketed"
|
||||||
cGlobFile="$(mktemp --suffix=.c)"
|
cGlobFile="$(mktemp --suffix=.c)"
|
||||||
sGlobFile="$(mktemp --suffix=.s)"
|
sGlobFile="$(mktemp --suffix=.s)"
|
||||||
|
|
@ -1104,9 +1214,6 @@ int toInt(double x) { return (int)x; }
|
||||||
double fromInt(int n) { return (double)n; }
|
double fromInt(int n) { return (double)n; }
|
||||||
EOF
|
EOF
|
||||||
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cDblFile" -o "$oDblFile"
|
||||||
# softDouble.c builds at -O1 (not -O2): __muldf3's 64x64 -> 128
|
|
||||||
# multiply with the inlined alignment shifts overflows the greedy
|
|
||||||
# allocator's spill heuristics on the single-A target at -O2.
|
|
||||||
"$CLANG" --target=w65816 -O1 -ffunction-sections \
|
"$CLANG" --target=w65816 -O1 -ffunction-sections \
|
||||||
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
|
-c "$PROJECT_ROOT/runtime/src/softDouble.c" -o "$oSdFile"
|
||||||
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
|
"$PROJECT_ROOT/tools/link816" -o "$binDblFile" \
|
||||||
|
|
@ -2176,7 +2283,12 @@ int main(void) {
|
||||||
if (r == 10 && eq(buf, "n=-42 s=hi")) ok |= 0x02;
|
if (r == 10 && eq(buf, "n=-42 s=hi")) ok |= 0x02;
|
||||||
r = sprintf(buf, "%04x %lu", 0xC, (unsigned long)123456);
|
r = sprintf(buf, "%04x %lu", 0xC, (unsigned long)123456);
|
||||||
if (r == 11 && eq(buf, "000c 123456")) ok |= 0x04;
|
if (r == 11 && eq(buf, "000c 123456")) ok |= 0x04;
|
||||||
r = snprintf(buf, 6, "abcdefghij");
|
/* Test that snprintf truncates per C99: 10 chars asked, only 5 fit + NUL.
|
||||||
|
Funnel the format through a non-literal pointer so clang's
|
||||||
|
-Wformat-truncation static check doesn't fire (the truncation IS
|
||||||
|
what we're testing). */
|
||||||
|
const char *fmt_trunc = "abcdefghij";
|
||||||
|
r = snprintf(buf, 6, "%s", fmt_trunc);
|
||||||
if (r == 10 && eq(buf, "abcde")) ok |= 0x08;
|
if (r == 10 && eq(buf, "abcde")) ok |= 0x08;
|
||||||
r = sprintf(buf, "%.2f", 1.5);
|
r = sprintf(buf, "%.2f", 1.5);
|
||||||
if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
|
if (r == 4 && eq(buf, "1.50")) ok |= 0x10;
|
||||||
|
|
@ -2302,6 +2414,46 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cExFile" "$oExFile" "$binExFile"
|
rm -f "$cExFile" "$oExFile" "$binExFile"
|
||||||
|
|
||||||
|
log "check: MAME runs rand/srand reproducible sequence (#93)"
|
||||||
|
cRdFile="$(mktemp --suffix=.c)"
|
||||||
|
oRdFile="$(mktemp --suffix=.o)"
|
||||||
|
binRdFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cRdFile" <<'EOF'
|
||||||
|
extern int rand(void);
|
||||||
|
extern void srand(unsigned int);
|
||||||
|
__attribute__((noinline)) void switchToBank2(void) {
|
||||||
|
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
|
||||||
|
}
|
||||||
|
int main(void) {
|
||||||
|
srand(1);
|
||||||
|
int r1 = rand();
|
||||||
|
int r2 = rand();
|
||||||
|
int r3 = rand();
|
||||||
|
// Same seed: must reproduce.
|
||||||
|
srand(1);
|
||||||
|
int r1b = rand();
|
||||||
|
int r2b = rand();
|
||||||
|
unsigned char ok = 0;
|
||||||
|
if (r1 != 0 && r1 == r1b) ok |= 0x01; // reproducible
|
||||||
|
if (r2 != 0 && r2 == r2b) ok |= 0x02; // reproducible
|
||||||
|
if (r1 != r2 && r2 != r3) ok |= 0x04; // diverse
|
||||||
|
if (r1 >= 0 && r1 <= 0x7FFF) ok |= 0x08; // RAND_MAX bound
|
||||||
|
switchToBank2();
|
||||||
|
*(volatile unsigned short *)0x5000 = (unsigned short)ok;
|
||||||
|
while (1) {}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c \
|
||||||
|
"$cRdFile" -o "$oRdFile"
|
||||||
|
"$PROJECT_ROOT/tools/link816" -o "$binRdFile" --text-base 0x1000 \
|
||||||
|
"$oCrt0F" "$oLibcF" "$oExtrasF" "$oSfF" "$oSdF" "$oLibgccFile" \
|
||||||
|
"$oRdFile" >/dev/null 2>&1
|
||||||
|
if ! bash "$PROJECT_ROOT/scripts/runInMame.sh" "$binRdFile" --check \
|
||||||
|
0x025000=000f >/dev/null 2>&1; then
|
||||||
|
die "MAME: rand/srand sequence broken"
|
||||||
|
fi
|
||||||
|
rm -f "$cRdFile" "$oRdFile" "$binRdFile"
|
||||||
|
|
||||||
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)"
|
log "check: MAME runs atan/asin/acos/sinh/cosh/tanh (#85)"
|
||||||
cTr2File="$(mktemp --suffix=.c)"
|
cTr2File="$(mktemp --suffix=.c)"
|
||||||
oTr2File="$(mktemp --suffix=.o)"
|
oTr2File="$(mktemp --suffix=.o)"
|
||||||
|
|
@ -3166,6 +3318,42 @@ EOF
|
||||||
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
|
die "iigs/toolbox.h: WriteCString tool number 0x290B not in output"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# stdint.h / stddef.h / limits.h / inttypes.h: standalone
|
||||||
|
# replacements for clang's bundled versions (which try to include
|
||||||
|
# glibc bits/* headers and break the build). Compile a small
|
||||||
|
# program that exercises the typedefs and the offsetof macro.
|
||||||
|
log "check: standalone runtime headers (stdint/stddef/limits/inttypes/locale/signal)"
|
||||||
|
cStdiFile="$(mktemp --suffix=.c)"
|
||||||
|
sStdiFile="$(mktemp --suffix=.s)"
|
||||||
|
cat > "$cStdiFile" <<'EOF'
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stddef.h>
|
||||||
|
#include <limits.h>
|
||||||
|
#include <inttypes.h>
|
||||||
|
#include <locale.h>
|
||||||
|
#include <signal.h>
|
||||||
|
struct S { uint8_t a; uint16_t b; uint32_t c; uint64_t d; };
|
||||||
|
int main(void) {
|
||||||
|
/* Touch typedefs / functions from each header so the build
|
||||||
|
catches missing symbols, not just absent files. */
|
||||||
|
uint64_t big = UINT64_C(0xDEADBEEFCAFE);
|
||||||
|
intmax_t maxv = INTMAX_MAX;
|
||||||
|
int i_max = INT_MAX;
|
||||||
|
size_t off = offsetof(struct S, c);
|
||||||
|
char *loc = setlocale(LC_ALL, "C");
|
||||||
|
struct lconv *lc = localeconv();
|
||||||
|
signal(SIGINT, SIG_IGN);
|
||||||
|
return (int)(off + i_max + (int)(big & 1) + (int)(maxv & 1)
|
||||||
|
+ (loc[0] - 'C') + lc->frac_digits);
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -I"$PROJECT_ROOT/runtime/include" \
|
||||||
|
-S "$cStdiFile" -o "$sStdiFile"
|
||||||
|
if [ ! -s "$sStdiFile" ]; then
|
||||||
|
die "standalone runtime headers compile failed"
|
||||||
|
fi
|
||||||
|
rm -f "$cStdiFile" "$sStdiFile"
|
||||||
|
|
||||||
# Linker exports the synthetic __bss_start / __bss_end / etc.
|
# Linker exports the synthetic __bss_start / __bss_end / etc.
|
||||||
# symbols so crt0 can do BSS init and runtime malloc finds the
|
# symbols so crt0 can do BSS init and runtime malloc finds the
|
||||||
# heap top.
|
# heap top.
|
||||||
|
|
@ -3269,6 +3457,26 @@ EOF
|
||||||
fi
|
fi
|
||||||
rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile"
|
rm -f "$cBigFile" "$oBigFile" "$binBssAutoFile" "$mapBssAutoFile"
|
||||||
|
|
||||||
|
log "check: link816 hard-fails when BSS would exceed LC1 ceiling (\$E000)"
|
||||||
|
# Force BSS to land past $E000 — link must reject with the LC1
|
||||||
|
# ceiling diagnostic (without crt0's LC2 RAM enable, that range
|
||||||
|
# silently corrupts).
|
||||||
|
cBigFile="$(mktemp --suffix=.c)"
|
||||||
|
oBigFile="$(mktemp --suffix=.o)"
|
||||||
|
binBssOFile="$(mktemp --suffix=.bin)"
|
||||||
|
cat > "$cBigFile" <<'EOF'
|
||||||
|
int main(void) { return 0; }
|
||||||
|
EOF
|
||||||
|
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cBigFile" -o "$oBigFile"
|
||||||
|
if "$PROJECT_ROOT/tools/link816" -o "$binBssOFile" --text-base 0x1000 \
|
||||||
|
--bss-base 0xE100 "$oBigFile" "$oLibgccFile" 2>/tmp/bsslink.err; then
|
||||||
|
die "link816 should have rejected --bss-base 0xE100 (above LC1 ceiling)"
|
||||||
|
fi
|
||||||
|
if ! grep -q 'exceeds bank-0 LC1 ceiling' /tmp/bsslink.err; then
|
||||||
|
die "link816 LC1-ceiling diagnostic missing: $(cat /tmp/bsslink.err)"
|
||||||
|
fi
|
||||||
|
rm -f "$cBigFile" "$oBigFile" "$binBssOFile" /tmp/bsslink.err
|
||||||
|
|
||||||
# OMF emitter — wrap the linked binary as a single-segment OMF
|
# OMF emitter — wrap the linked binary as a single-segment OMF
|
||||||
# file ready for IIgs loading.
|
# file ready for IIgs loading.
|
||||||
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
log "check: omfEmit produces a valid OMF v2.1 single-segment file"
|
||||||
|
|
|
||||||
|
|
@ -484,20 +484,37 @@ struct Linker {
|
||||||
// overflow the 0x2000 bss start, shift bss above them so
|
// overflow the 0x2000 bss start, shift bss above them so
|
||||||
// crt0's bss-init doesn't zero loaded text bytes. Caller
|
// crt0's bss-init doesn't zero loaded text bytes. Caller
|
||||||
// can still force a specific bssBase via --bss-base.
|
// can still force a specific bssBase via --bss-base.
|
||||||
// The IIgs IO window at $C000-$CFFF is unusable; if loadEnd
|
//
|
||||||
// would push bss into IO, jump above it to bank 1 ($10000+).
|
// IIgs bank-0 hazard zones:
|
||||||
|
// $C000-$CFFF: IO and soft switches (ALWAYS unusable —
|
||||||
|
// reads/writes hit hardware registers).
|
||||||
|
// $D000-$DFFF: Language Card 1 area. Read-only ROM by
|
||||||
|
// default; crt0 enables LC1 RAM via the
|
||||||
|
// $C083 soft switch (read-twice trick) so
|
||||||
|
// BSS placed here is writable.
|
||||||
|
// $E000-$FFFF: bank-0 ROM area, also LC-switched but
|
||||||
|
// we don't enable it (less common need).
|
||||||
|
// Skip past the IO window if BSS would land there; LC1
|
||||||
|
// ($D000-$DFFF) IS now usable thanks to crt0's soft-switch
|
||||||
|
// enable. Above $DFFF means BSS exceeds 16-bit range —
|
||||||
|
// bail clearly rather than silently corrupt.
|
||||||
uint32_t loadEnd = L.initBase + L.initSize;
|
uint32_t loadEnd = L.initBase + L.initSize;
|
||||||
L.bssBase = bssBase;
|
L.bssBase = bssBase;
|
||||||
if (L.bssBase < loadEnd) {
|
if (L.bssBase < loadEnd) {
|
||||||
// Page-align upward for nicer addresses in the map.
|
// Page-align upward for nicer addresses in the map.
|
||||||
L.bssBase = (loadEnd + 0xFF) & ~0xFFu;
|
L.bssBase = (loadEnd + 0xFF) & ~0xFFu;
|
||||||
// If bss would land in the IIgs IO window ($C000-$CFFF),
|
|
||||||
// skip past it to $D000. bss reads/writes via DBR=0
|
|
||||||
// would be intercepted by IO if we placed it there.
|
|
||||||
if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) {
|
if (L.bssBase >= 0xC000 && L.bssBase < 0xD000) {
|
||||||
L.bssBase = 0xD000;
|
L.bssBase = 0xD000;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if (L.bssBase + L.bssSize > 0xE000) {
|
||||||
|
char msg[160];
|
||||||
|
std::snprintf(msg, sizeof(msg),
|
||||||
|
"bss [0x%X+%u] exceeds bank-0 LC1 ceiling 0xE000 — "
|
||||||
|
"shrink the runtime or split into bank 1",
|
||||||
|
L.bssBase, L.bssSize);
|
||||||
|
die(msg);
|
||||||
|
}
|
||||||
// Publish layout now so resolveSym() can read it during reloc
|
// Publish layout now so resolveSym() can read it during reloc
|
||||||
// application (it's a const member that uses lastLayout).
|
// application (it's a const member that uses lastLayout).
|
||||||
lastLayout = L;
|
lastLayout = L;
|
||||||
|
|
|
||||||
|
|
@ -41,12 +41,18 @@ public:
|
||||||
// Reset per-function state (defensive — SkipNextSepImm should
|
// Reset per-function state (defensive — SkipNextSepImm should
|
||||||
// already be cleared by the next emitInstruction, but guarantee
|
// already be cleared by the next emitInstruction, but guarantee
|
||||||
// it's not leaked across functions if a function ends mid-elision).
|
// it's not leaked across functions if a function ends mid-elision).
|
||||||
void emitFunctionBodyEnd() override { SkipNextSepImm = -1; }
|
void emitFunctionBodyEnd() override {
|
||||||
|
SkipNextSepImm = -1;
|
||||||
|
SkipNextStaAbs = false;
|
||||||
|
SkipNextPush16 = false;
|
||||||
|
}
|
||||||
// Reset on MBB entry too — labels emit before the MIs of a new MBB,
|
// Reset on MBB entry too — labels emit before the MIs of a new MBB,
|
||||||
// and a stale flag from a previous MBB's last LDAi8imm could
|
// and a stale flag from a previous MBB's last LDAi8imm could
|
||||||
// accidentally swallow the new MBB's first SEP.
|
// accidentally swallow the new MBB's first SEP.
|
||||||
void emitBasicBlockStart(const MachineBasicBlock &MBB) override {
|
void emitBasicBlockStart(const MachineBasicBlock &MBB) override {
|
||||||
SkipNextSepImm = -1;
|
SkipNextSepImm = -1;
|
||||||
|
SkipNextStaAbs = false;
|
||||||
|
SkipNextPush16 = false;
|
||||||
AsmPrinter::emitBasicBlockStart(MBB);
|
AsmPrinter::emitBasicBlockStart(MBB);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -57,6 +63,14 @@ public:
|
||||||
// already left M=8, so the wrap's SEP would be a no-op.
|
// already left M=8, so the wrap's SEP would be a no-op.
|
||||||
int SkipNextSepImm = -1;
|
int SkipNextSepImm = -1;
|
||||||
|
|
||||||
|
// When true, the next STAabs is consumed (already replaced with STZ
|
||||||
|
// by the LDAi16imm-0 peephole).
|
||||||
|
bool SkipNextStaAbs = false;
|
||||||
|
|
||||||
|
// When true, the next PUSH16 is consumed (already replaced with PEA
|
||||||
|
// by the LDAi16imm + PUSH16 peephole).
|
||||||
|
bool SkipNextPush16 = false;
|
||||||
|
|
||||||
static char ID;
|
static char ID;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
@ -108,6 +122,26 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
SkipNextSepImm = -1;
|
SkipNextSepImm = -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Drop the STAabs that the LDAi16imm-0 peephole replaced with STZ.
|
||||||
|
if (SkipNextStaAbs && !MI->isDebugInstr()) {
|
||||||
|
if (MI->getOpcode() == W65816::STAabs) {
|
||||||
|
SkipNextStaAbs = false;
|
||||||
|
return; // consume, already emitted as STZ
|
||||||
|
}
|
||||||
|
// Anything other than the expected STAabs cancels the elision.
|
||||||
|
SkipNextStaAbs = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Drop the PUSH16 that the LDAi16imm + PUSH16 peephole replaced
|
||||||
|
// with PEA.
|
||||||
|
if (SkipNextPush16 && !MI->isDebugInstr()) {
|
||||||
|
if (MI->getOpcode() == W65816::PUSH16) {
|
||||||
|
SkipNextPush16 = false;
|
||||||
|
return; // consume, already emitted as PEA
|
||||||
|
}
|
||||||
|
SkipNextPush16 = false;
|
||||||
|
}
|
||||||
|
|
||||||
W65816MCInstLower MCInstLowering(OutContext, *this);
|
W65816MCInstLower MCInstLowering(OutContext, *this);
|
||||||
|
|
||||||
// Expand codegen pseudos into their MC-layer realisations. Keep this
|
// Expand codegen pseudos into their MC-layer realisations. Keep this
|
||||||
|
|
@ -193,6 +227,70 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
case W65816::LDAi16imm: {
|
case W65816::LDAi16imm: {
|
||||||
|
// Peek at the next non-debug MI for two peephole patterns:
|
||||||
|
// (1) LDAi16imm 0 + STAabs $g -> STZ_Abs $g (saves 3B)
|
||||||
|
// (2) LDAi16imm $imm + PUSH16 -> PEA #$imm (saves 1B)
|
||||||
|
// Both replace the LDA+next pair with a single 3- or 4-byte op
|
||||||
|
// that achieves the same memory effect.
|
||||||
|
auto It = std::next(MI->getIterator());
|
||||||
|
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It;
|
||||||
|
|
||||||
|
bool IsZero = MI->getOperand(1).isImm() &&
|
||||||
|
MI->getOperand(1).getImm() == 0;
|
||||||
|
// STAabs operand layout: (value:Acc16, addr). Operand 0 is the A
|
||||||
|
// register; only fire when this STA kills A — otherwise dropping
|
||||||
|
// the LDA leaves a later A-consumer without its value. E.g. SDAG
|
||||||
|
// CSE'd `g16 = 0; g32 = 0;` shares one LDAi16imm 0 across multiple
|
||||||
|
// STAabs; only the LAST one kills A, so the peephole would have
|
||||||
|
// miscompiled the earlier two by deleting the LDA but not the
|
||||||
|
// remaining STAs.
|
||||||
|
if (IsZero && It != MI->getParent()->end() &&
|
||||||
|
It->getOpcode() == W65816::STAabs &&
|
||||||
|
It->getOperand(0).isReg() && It->getOperand(0).isKill()) {
|
||||||
|
MCInst Stz;
|
||||||
|
Stz.setOpcode(W65816::STZ_Abs);
|
||||||
|
Stz.addOperand(lowerOperand(It->getOperand(1), MCInstLowering));
|
||||||
|
EmitToStreamer(*OutStreamer, Stz);
|
||||||
|
SkipNextStaAbs = true;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// PEA peephole: LDAi16imm + PUSH16 -> PEA. Safe iff A is dead
|
||||||
|
// after the PUSH16 — the next instruction must redefine A (so the
|
||||||
|
// value PUSH16 read is genuinely dead). We use modifiesRegister
|
||||||
|
// which handles both explicit defs and implicit-defs (e.g. JSL
|
||||||
|
// clobbers A as part of the calling convention). Falls through
|
||||||
|
// to a normal LDA #imm; PHA pair if A might be live afterward.
|
||||||
|
// Note: physreg use-kill flags on PUSH16's implicit-$a are not
|
||||||
|
// reliably set at AsmPrinter time, so we can't gate on them
|
||||||
|
// directly; checking the next instruction's def-set is robust.
|
||||||
|
if (It != MI->getParent()->end() && It->getOpcode() == W65816::PUSH16) {
|
||||||
|
auto It2 = std::next(It);
|
||||||
|
while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2;
|
||||||
|
bool ADead = false;
|
||||||
|
if (It2 != MI->getParent()->end()) {
|
||||||
|
const TargetRegisterInfo *TRI =
|
||||||
|
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
|
||||||
|
if (It2->modifiesRegister(W65816::A, TRI))
|
||||||
|
ADead = true;
|
||||||
|
} else {
|
||||||
|
// PUSH16 is the last instruction in the BB. A is dead at
|
||||||
|
// BB exit iff it's not live-out. Check the BB's live-out
|
||||||
|
// set via successors; if no successor lists A as live-in,
|
||||||
|
// it's safe. Conservative: treat as not-dead (skip peephole).
|
||||||
|
// This case is uncommon — the PUSH chain almost always feeds
|
||||||
|
// a JSL within the same BB.
|
||||||
|
}
|
||||||
|
if (ADead) {
|
||||||
|
MCInst Pea;
|
||||||
|
Pea.setOpcode(W65816::PEA);
|
||||||
|
Pea.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering));
|
||||||
|
EmitToStreamer(*OutStreamer, Pea);
|
||||||
|
SkipNextPush16 = true;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
MCInst Lda;
|
MCInst Lda;
|
||||||
Lda.setOpcode(W65816::LDA_Imm16);
|
Lda.setOpcode(W65816::LDA_Imm16);
|
||||||
Lda.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering));
|
Lda.addOperand(lowerOperand(MI->getOperand(1), MCInstLowering));
|
||||||
|
|
@ -252,6 +350,40 @@ void W65816AsmPrinter::emitInstruction(const MachineInstr *MI) {
|
||||||
EmitToStreamer(*OutStreamer, Sta);
|
EmitToStreamer(*OutStreamer, Sta);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
case W65816::LDA_DP: {
|
||||||
|
// PEI peephole: LDA_DP $dp + PUSH16 -> PEI $dp. PEI pushes the
|
||||||
|
// 16-bit value at direct page $dp directly onto the stack without
|
||||||
|
// touching A, saving 1 byte (PEI=2B vs LDA_DP+PHA=3B). Safe iff
|
||||||
|
// A is dead after the PUSH16, same as the LDAi16imm+PUSH16
|
||||||
|
// peephole. Common case: i64-libcall-return forwarding —
|
||||||
|
// copyPhysReg(A=DPF0) emits LDA $F0; the next op is PUSH16 to
|
||||||
|
// forward the i64 high-half into a downstream call's args; the
|
||||||
|
// chained call's first op then redefines A.
|
||||||
|
auto It = std::next(MI->getIterator());
|
||||||
|
while (It != MI->getParent()->end() && It->isDebugInstr()) ++It;
|
||||||
|
if (It != MI->getParent()->end() &&
|
||||||
|
It->getOpcode() == W65816::PUSH16) {
|
||||||
|
auto It2 = std::next(It);
|
||||||
|
while (It2 != MI->getParent()->end() && It2->isDebugInstr()) ++It2;
|
||||||
|
bool ADead = false;
|
||||||
|
if (It2 != MI->getParent()->end()) {
|
||||||
|
const TargetRegisterInfo *TRI =
|
||||||
|
MI->getParent()->getParent()->getSubtarget().getRegisterInfo();
|
||||||
|
if (It2->modifiesRegister(W65816::A, TRI))
|
||||||
|
ADead = true;
|
||||||
|
}
|
||||||
|
if (ADead) {
|
||||||
|
MCInst Pei;
|
||||||
|
Pei.setOpcode(W65816::PEI_DP);
|
||||||
|
Pei.addOperand(lowerOperand(MI->getOperand(0), MCInstLowering));
|
||||||
|
EmitToStreamer(*OutStreamer, Pei);
|
||||||
|
SkipNextPush16 = true;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Fall through to default emit (no peephole opportunity).
|
||||||
|
break;
|
||||||
|
}
|
||||||
case W65816::ADCi16imm:
|
case W65816::ADCi16imm:
|
||||||
case W65816::SBCi16imm: {
|
case W65816::SBCi16imm: {
|
||||||
bool IsSub = MI->getOpcode() == W65816::SBCi16imm;
|
bool IsSub = MI->getOpcode() == W65816::SBCi16imm;
|
||||||
|
|
|
||||||
|
|
@ -27,6 +27,19 @@ using namespace llvm;
|
||||||
|
|
||||||
// (The pure-i8-detection helpers were removed when the prologue went
|
// (The pure-i8-detection helpers were removed when the prologue went
|
||||||
// to "always 16-bit M". See emitPrologue comment.)
|
// to "always 16-bit M". See emitPrologue comment.)
|
||||||
|
//
|
||||||
|
// (DBR-zero wrap was prototyped here — PHB at function entry to save
|
||||||
|
// caller's DBR, set DBR=0, restore at exit. Two issues blocked it:
|
||||||
|
// (a) saving DBR to a DP slot ($F2/$F3) conflicts with libgcc's
|
||||||
|
// muldi3/divdi3 scratch — those routines use $F2..$F8 freely, so
|
||||||
|
// the saved DBR doesn't survive a libcall in the function body.
|
||||||
|
// (b) saving via PHB shifts SP, which means LowerFormalArguments
|
||||||
|
// would need to bump every arg's StackOffset by 1 — but at
|
||||||
|
// LowerFormalArguments time we don't know yet whether the function
|
||||||
|
// will need the wrap (indirect-Y emission is a later lowering
|
||||||
|
// choice). Right approach is a per-function attribute the user
|
||||||
|
// opts into, plus PEI integration to add a fixed-size "saved DBR"
|
||||||
|
// slot. Deferred — see STATUS.md.)
|
||||||
|
|
||||||
W65816FrameLowering::W65816FrameLowering(const W65816Subtarget &STI)
|
W65816FrameLowering::W65816FrameLowering(const W65816Subtarget &STI)
|
||||||
: TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(1), 0,
|
: TargetFrameLowering(TargetFrameLowering::StackGrowsDown, Align(1), 0,
|
||||||
|
|
@ -153,8 +166,6 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF,
|
||||||
// before the RTL.
|
// before the RTL.
|
||||||
uint64_t StackSize = MF.getFrameInfo().getStackSize();
|
uint64_t StackSize = MF.getFrameInfo().getStackSize();
|
||||||
bool HasVLA = MF.getFrameInfo().hasVarSizedObjects();
|
bool HasVLA = MF.getFrameInfo().hasVarSizedObjects();
|
||||||
if (StackSize == 0 && !HasVLA)
|
|
||||||
return;
|
|
||||||
|
|
||||||
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
|
const W65816Subtarget &STI = MF.getSubtarget<W65816Subtarget>();
|
||||||
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
const W65816InstrInfo &TII = *STI.getInstrInfo();
|
||||||
|
|
@ -162,6 +173,9 @@ void W65816FrameLowering::emitEpilogue(MachineFunction &MF,
|
||||||
// Insert before the terminator (the return).
|
// Insert before the terminator (the return).
|
||||||
DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();
|
DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();
|
||||||
|
|
||||||
|
if (StackSize == 0 && !HasVLA)
|
||||||
|
return;
|
||||||
|
|
||||||
// Detect whether the return live-out includes Y or X — for i64 returns
|
// Detect whether the return live-out includes Y or X — for i64 returns
|
||||||
// (Outs[0..2] -> A,X,Y), Y holds bits 32-47 and X holds bits 16-31, so
|
// (Outs[0..2] -> A,X,Y), Y holds bits 32-47 and X holds bits 16-31, so
|
||||||
// any TAY/PLY/TAX in the SP-restore would corrupt the return value.
|
// any TAY/PLY/TAX in the SP-restore would corrupt the return value.
|
||||||
|
|
|
||||||
|
|
@ -210,6 +210,19 @@ class InstAbsLong<bits<8> op, string mnem>
|
||||||
let Inst{31-8} = addr;
|
let Inst{31-8} = addr;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Absolute Long Indexed X. EA = addr_long + X. The bank comes from
|
||||||
|
// the operand's high byte, NOT from DBR — useful for accessing data
|
||||||
|
// in a known bank (typically bank 0) regardless of the caller's DBR
|
||||||
|
// state. 4-byte instruction.
|
||||||
|
class InstAbsLongX<bits<8> op, string mnem>
|
||||||
|
: W65816Inst<(outs), (ins addrLong:$addr), !strconcat(mnem, "\t$addr, x")> {
|
||||||
|
let Size = 4;
|
||||||
|
bits<24> addr;
|
||||||
|
bits<32> Inst;
|
||||||
|
let Inst{7-0} = op;
|
||||||
|
let Inst{31-8} = addr;
|
||||||
|
}
|
||||||
|
|
||||||
class InstAbsX<bits<8> op, string mnem>
|
class InstAbsX<bits<8> op, string mnem>
|
||||||
: W65816Inst<(outs), (ins addrAbs:$addr), !strconcat(mnem, "\t$addr, x")> {
|
: W65816Inst<(outs), (ins addrAbs:$addr), !strconcat(mnem, "\t$addr, x")> {
|
||||||
let Size = 3;
|
let Size = 3;
|
||||||
|
|
|
||||||
|
|
@ -1072,6 +1072,26 @@ def XBA : InstImplied<0xEB, "xba"> { let mayLoad = 0; let mayStore = 0; }
|
||||||
def WAI : InstImplied<0xCB, "wai">;
|
def WAI : InstImplied<0xCB, "wai">;
|
||||||
def STP : InstImplied<0xDB, "stp">;
|
def STP : InstImplied<0xDB, "stp">;
|
||||||
|
|
||||||
|
// WDM (William D Mensch) — reserved 2-byte NOP-equivalent. Useful as
|
||||||
|
// a debugger / emulator hook: MAME's apple2gs CPU traps on WDM and a
|
||||||
|
// Lua plugin can dispatch on the operand byte. CPU-side, it acts as
|
||||||
|
// a 2-byte NOP. Operand syntax mirrors MVN: `wdm $ab` (no `#`).
|
||||||
|
def WDM : InstDP<0x42, "wdm">;
|
||||||
|
|
||||||
|
// TRB / TSB — Test and Reset/Set memory Bits. Atomic bit clear/set
|
||||||
|
// on a byte (or 16-bit word per M flag) at the given DP or abs
|
||||||
|
// address. Z flag set per (M & A) where M is the memory operand.
|
||||||
|
// Useful for memory-mapped IO bit twiddling. No DP indexing form.
|
||||||
|
def TRB_DP : InstDP<0x14, "trb">;
|
||||||
|
def TRB_Abs : InstAbs<0x1C, "trb">;
|
||||||
|
def TSB_DP : InstDP<0x04, "tsb">;
|
||||||
|
def TSB_Abs : InstAbs<0x0C, "tsb">;
|
||||||
|
|
||||||
|
// PEI — Push Effective Indirect. Reads a 16-bit value from DP and
|
||||||
|
// pushes it. Useful for indirect parameter passing without going
|
||||||
|
// through A first.
|
||||||
|
def PEI_DP : InstDP<0xD4, "pei">;
|
||||||
|
|
||||||
//---------------------------------------------------------------- LDA (load A)
|
//---------------------------------------------------------------- LDA (load A)
|
||||||
// The `_Imm8` forms of the mode-dependent load/arith/compare ops are
|
// The `_Imm8` forms of the mode-dependent load/arith/compare ops are
|
||||||
// marked isCodeGenOnly so the asm matcher never picks them — our
|
// marked isCodeGenOnly so the asm matcher never picks them — our
|
||||||
|
|
@ -1091,6 +1111,7 @@ def LDA_DPIndY : InstDPIndY<0xB1, "lda">;
|
||||||
def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
|
def LDA_DPIndX : InstDPIndX<0xA1, "lda">;
|
||||||
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">;
|
def LDA_DPIndLong : InstDPIndLong <0xA7, "lda">;
|
||||||
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">;
|
def LDA_DPIndLongY : InstDPIndLongY<0xB7, "lda">;
|
||||||
|
def LDA_LongX : InstAbsLongX<0xBF, "lda">;
|
||||||
|
|
||||||
//---------------------------------------------------------------- STA (store A)
|
//---------------------------------------------------------------- STA (store A)
|
||||||
def STA_DP : InstDP<0x85, "sta">;
|
def STA_DP : InstDP<0x85, "sta">;
|
||||||
|
|
@ -1104,6 +1125,7 @@ def STA_DPIndY : InstDPIndY<0x91, "sta">;
|
||||||
def STA_DPIndX : InstDPIndX<0x81, "sta">;
|
def STA_DPIndX : InstDPIndX<0x81, "sta">;
|
||||||
def STA_DPIndLong : InstDPIndLong <0x87, "sta">;
|
def STA_DPIndLong : InstDPIndLong <0x87, "sta">;
|
||||||
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">;
|
def STA_DPIndLongY : InstDPIndLongY<0x97, "sta">;
|
||||||
|
def STA_LongX : InstAbsLongX<0x9F, "sta">;
|
||||||
|
|
||||||
//---------------------------------------------------------------- LDX (load X)
|
//---------------------------------------------------------------- LDX (load X)
|
||||||
def LDX_Imm8 : InstImm8<0xA2, "ldx"> { let XHigh = 1; let DecoderNamespace = "W65816XHigh"; let isCodeGenOnly = 1; let Defs = [X]; }
|
def LDX_Imm8 : InstImm8<0xA2, "ldx"> { let XHigh = 1; let DecoderNamespace = "W65816XHigh"; let isCodeGenOnly = 1; let Defs = [X]; }
|
||||||
|
|
@ -1131,6 +1153,14 @@ def STY_DP : InstDP<0x84, "sty">;
|
||||||
def STY_Abs : InstAbs<0x8C, "sty">;
|
def STY_Abs : InstAbs<0x8C, "sty">;
|
||||||
def STY_DPX : InstDPX<0x94, "sty">;
|
def STY_DPX : InstDPX<0x94, "sty">;
|
||||||
|
|
||||||
|
//---------------------------------------------------------------- STZ (store zero)
|
||||||
|
// Width follows M flag — same as STA. Useful for zeroing DP scratch
|
||||||
|
// without burning A. Saves 1 byte vs `LDA #0; STA dp` per zero.
|
||||||
|
def STZ_DP : InstDP<0x64, "stz">;
|
||||||
|
def STZ_Abs : InstAbs<0x9C, "stz">;
|
||||||
|
def STZ_DPX : InstDPX<0x74, "stz">;
|
||||||
|
def STZ_AbsX : InstAbsX<0x9E, "stz">;
|
||||||
|
|
||||||
//------------------------------------------------------------------------- ADC
|
//------------------------------------------------------------------------- ADC
|
||||||
def ADC_Imm8 : InstImm8<0x69, "adc"> { let MHigh = 1; let DecoderNamespace = "W65816MHigh"; let isCodeGenOnly = 1; }
|
def ADC_Imm8 : InstImm8<0x69, "adc"> { let MHigh = 1; let DecoderNamespace = "W65816MHigh"; let isCodeGenOnly = 1; }
|
||||||
def ADC_Imm16 : InstImm16<0x69, "adc"> { let MLow = 1; }
|
def ADC_Imm16 : InstImm16<0x69, "adc"> { let MLow = 1; }
|
||||||
|
|
|
||||||
|
|
@ -268,13 +268,24 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
|
||||||
TII.get(NewOpc)).addImm(Offset);
|
TII.get(NewOpc)).addImm(Offset);
|
||||||
switch (NewOpc) {
|
switch (NewOpc) {
|
||||||
case W65816::LDA_StackRel:
|
case W65816::LDA_StackRel:
|
||||||
case W65816::LDA_StackRelIndY:
|
|
||||||
Builder.addReg(W65816::A, RegState::ImplicitDefine);
|
Builder.addReg(W65816::A, RegState::ImplicitDefine);
|
||||||
break;
|
break;
|
||||||
|
case W65816::LDA_StackRelIndY:
|
||||||
|
// Indirect-Y: A def + Y use. The Y use is critical — without it,
|
||||||
|
// post-RA passes can reorder a Y-defining op past us, leaving the
|
||||||
|
// load reading at (ptr + stale_Y). Caught when modelling the dep
|
||||||
|
// for the (sr,s),Y bank-wrap workaround in W65816NegYIndY.
|
||||||
|
Builder.addReg(W65816::A, RegState::ImplicitDefine)
|
||||||
|
.addReg(W65816::Y, RegState::Implicit);
|
||||||
|
break;
|
||||||
case W65816::STA_StackRel:
|
case W65816::STA_StackRel:
|
||||||
case W65816::STA_StackRelIndY:
|
|
||||||
Builder.addReg(W65816::A, RegState::Implicit);
|
Builder.addReg(W65816::A, RegState::Implicit);
|
||||||
break;
|
break;
|
||||||
|
case W65816::STA_StackRelIndY:
|
||||||
|
// Indirect-Y store: A use + Y use (same Y reasoning as above).
|
||||||
|
Builder.addReg(W65816::A, RegState::Implicit)
|
||||||
|
.addReg(W65816::Y, RegState::Implicit);
|
||||||
|
break;
|
||||||
case W65816::ADC_StackRel:
|
case W65816::ADC_StackRel:
|
||||||
case W65816::SBC_StackRel:
|
case W65816::SBC_StackRel:
|
||||||
Builder.addReg(W65816::A, RegState::Implicit)
|
Builder.addReg(W65816::A, RegState::Implicit)
|
||||||
|
|
|
||||||
|
|
@ -1295,6 +1295,32 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
// path (i32 (lo|hi) == 0): the OR sets Z, then the SETCC compares
|
// path (i32 (lo|hi) == 0): the OR sets Z, then the SETCC compares
|
||||||
// against 0. The second compare is provably redundant because $a
|
// against 0. The second compare is provably redundant because $a
|
||||||
// hasn't changed since the previous flag-defining op.
|
// hasn't changed since the previous flag-defining op.
|
||||||
|
// Intra-MBB only — cross-MBB recursion into predecessors was tried
|
||||||
|
// (catches SETCC merge blocks where each pred ends with `lda #c`)
|
||||||
|
// but proved too brittle: predecessors ending with JSLpseudo declare
|
||||||
|
// implicit-def $a but the return-value flags aren't reliably set,
|
||||||
|
// and other corner cases break smoke.
|
||||||
|
auto isATransparent = [](const MachineInstr &MI) {
|
||||||
|
// Stores that don't touch A or P-bits-other-than-via-A.
|
||||||
|
return MI.getOpcode() == W65816::STAfi ||
|
||||||
|
MI.getOpcode() == W65816::STAfi_indY ||
|
||||||
|
MI.getOpcode() == W65816::STA8fi;
|
||||||
|
};
|
||||||
|
// Returns true iff walking back from `Start` (exclusive) finds an
|
||||||
|
// A-modifier as the first non-skip op. Skips debug ops and
|
||||||
|
// A-transparent stores; stops at the first real op. Templated to
|
||||||
|
// accept either iterator or const_iterator (Cmps came from a non-
|
||||||
|
// const iteration; predecessors are walked via const_iterator).
|
||||||
|
auto walkbackBefore = [&](auto Start, auto Begin) -> bool {
|
||||||
|
auto It = Start;
|
||||||
|
while (It != Begin) {
|
||||||
|
--It;
|
||||||
|
if (It->isDebugInstr()) continue;
|
||||||
|
if (isATransparent(*It)) continue;
|
||||||
|
return It->modifiesRegister(W65816::A, TRI);
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
for (MachineBasicBlock &MBB : MF) {
|
for (MachineBasicBlock &MBB : MF) {
|
||||||
SmallVector<MachineInstr *, 8> Cmps;
|
SmallVector<MachineInstr *, 8> Cmps;
|
||||||
for (MachineInstr &MI : MBB)
|
for (MachineInstr &MI : MBB)
|
||||||
|
|
@ -1308,27 +1334,7 @@ bool W65816StackSlotCleanup::runOnMachineFunction(MachineFunction &MF) {
|
||||||
!Cmp->getOperand(1).isImm() ||
|
!Cmp->getOperand(1).isImm() ||
|
||||||
Cmp->getOperand(1).getImm() != 0)
|
Cmp->getOperand(1).getImm() != 0)
|
||||||
continue;
|
continue;
|
||||||
// Walk back across debug ops to find the immediately-prior real
|
bool Found = walkbackBefore(Cmp->getIterator(), MBB.begin());
|
||||||
// instruction. If it modifies $a (i.e. it's an A-defining op
|
|
||||||
// that ALSO sets N/Z — true for every A-write op on the 65816
|
|
||||||
// except the no-op TSC variants), the CMP is redundant.
|
|
||||||
auto PrevIt = Cmp->getIterator();
|
|
||||||
bool Found = false;
|
|
||||||
while (PrevIt != MBB.begin()) {
|
|
||||||
--PrevIt;
|
|
||||||
if (PrevIt->isDebugInstr()) continue;
|
|
||||||
// Stores don't change $a — skip and keep walking back. This
|
|
||||||
// pass runs pre-PEI, so the skip-list uses the *pseudo* opcodes
|
|
||||||
// (STAfi / STAfi_indY / STA8fi); their post-PEI MC counterparts
|
|
||||||
// never appear here. STA8fi flips M via SEP/REP (Defs=[P]) but
|
|
||||||
// doesn't touch A or N/Z, so it's transparent for this CMP.
|
|
||||||
if (PrevIt->getOpcode() == W65816::STAfi ||
|
|
||||||
PrevIt->getOpcode() == W65816::STAfi_indY ||
|
|
||||||
PrevIt->getOpcode() == W65816::STA8fi)
|
|
||||||
continue;
|
|
||||||
Found = PrevIt->modifiesRegister(W65816::A, TRI);
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
if (Found) {
|
if (Found) {
|
||||||
Cmp->eraseFromParent();
|
Cmp->eraseFromParent();
|
||||||
Changed = true;
|
Changed = true;
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue