Optimizations

This commit is contained in:
Scott Duensing 2026-05-25 21:00:32 -05:00
parent d95c30e819
commit 2deaba9c29
329 changed files with 42624 additions and 2073 deletions

4
.gitignore vendored
View file

@ -11,6 +11,10 @@ runtime/*.o
runtime/*.o.bak runtime/*.o.bak
runtime/*.o.tmp runtime/*.o.tmp
# Per-test build outputs.
tests/coremark/build/
tests/coremark/coreMark.bin
# Editor / OS # Editor / OS
*.swp *.swp
*.swo *.swo

View file

@ -82,9 +82,10 @@ ratio against commercial Calypsi 5.16 (lower is better):
Per-iteration cycle measurements (via MAME's HBL counter, 2026-05-20): Per-iteration cycle measurements (via MAME's HBL counter, 2026-05-20):
bsearch 127, dotProduct 144, fib 97, memcmp 113, popcount 93, bsearch 127, dotProduct 144, fib 97, memcmp 113, popcount 93,
strcpy 91, sumOfSquares 126 (cyc/iter at 100 iters); strcpy 91, sumOfSquares 126 cyc/iter (100 iters);
dadd 1157, ddiv 1261, dmul 1033 (cyc/iter at 10 iters — FP calls dadd 1157, ddiv 1261, dmul 1033 cyc/iter (10 iters);
are ~1000+ cyc each). particles 2253 (3 iters — 32-particle physics tick);
mandelbrot 11570 (1 iter — 4×4 fixed-point tile).
See [STATUS.md](STATUS.md) for full language and runtime feature See [STATUS.md](STATUS.md) for full language and runtime feature
coverage, and [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) for coverage, and [LLVM_65816_DESIGN.md](LLVM_65816_DESIGN.md) for

View file

@ -1,4 +1,4 @@
# Session Recovery — last updated 2026-05-20 # Session Recovery — last updated 2026-05-25
Living recovery doc. Update on every meaningful change. If session is lost, Living recovery doc. Update on every meaningful change. If session is lost,
read this top-to-bottom + the memory notes referenced inside, then reread read this top-to-bottom + the memory notes referenced inside, then reread
@ -9,16 +9,33 @@ the actual diffs in tree to ground assumptions.
- **Smoke**: 148/148 green. Demos 9/9 (helloBeep/helloText/helloWindow/ - **Smoke**: 148/148 green. Demos 9/9 (helloBeep/helloText/helloWindow/
orcaFrame/qdProbe/heavyRelocs/frame/reversi/minicad). orcaFrame/qdProbe/heavyRelocs/frame/reversi/minicad).
- **Active config**: ptr32 (`p:32:16`), full IMG0..IMG15 caller-clobber - **Active config**: ptr32 (`p:32:16`), full IMG0..IMG15 caller-clobber
on JSL, greedy regalloc at -O1+. on JSL, greedy regalloc at -O1+. **Inline-threshold lowered to 50
target-wide** (was LLVM default 225; was 75 earlier this session).
- **Branch**: `main`. - **Branch**: `main`.
- **vs Calypsi static-inst ratio (2026-05-20)**: - **vs Calypsi (2026-05-25)**:
- **Lua 5.1.5**: default config 1.13× Calypsi; with Layer 2 0.93× (we
beat by 7%).
- **CoreMark 1.0**: with Layer 2 **0.79× Calypsi (we beat by 21%)**.
- **vs Calypsi static-inst ratio (synthetic bench)**:
sumSquares **0.84×** (26 vs 31 — we beat), sumSquares **0.84×** (26 vs 31 — we beat),
mul16to32 **0.25×** (1 vs 4 — we beat), mul16to32 **0.25×** (1 vs 4 — we beat),
evalAt 1.86× (472 vs 254 — structural floor; ABI overhaul rejected). evalAt 1.86× (472 vs 254 — structural floor; ABI overhaul rejected).
- **New code-gen options (2026-05-25)** — see docs/USAGE.md "Advanced:
pointer-deref code generation":
- Layer 1 ptr32 deref-fold (always on): LDY offset instead of
CLC/ADC carry chain. ~3 instr saved per struct-field access.
- `-mllvm -w65816-dbr-safe-ptrs` (Layer 2, opt-in): uses
`lda (d,S),Y` for ptr32 derefs assuming bank-byte == DBR.
5 instr → 1 instr per deref. Lua -20.6%. **MISCOMPILES
cross-bank pointers — opt in per-TU only when safe.**
- Inline-threshold lowered to 50 (was 225). Lua -23% total, CoreMark
matrix.o 1.37× → 0.97× Calypsi. Override with
`-mllvm -inline-threshold=N`.
- **Cycle benches (2026-05-20)**: - **Cycle benches (2026-05-20)**:
popcount 93, strcpy 91, bsearch 127, memcmp 113, fib 97, popcount 93, strcpy 91, bsearch 127, memcmp 113, fib 97,
dotProduct 144, sumOfSquares 126 cyc/iter (100 iters); dotProduct 144, sumOfSquares 126 cyc/iter (100 iters);
dadd 1157, ddiv 1261, dmul 1033 cyc/iter (10 iters). dadd 1157, ddiv 1261, dmul 1033 cyc/iter (10 iters);
particles 2253 cyc/iter (3 iters), mandelbrot 11570 cyc/iter (1 iter).
- **Recent session wins (2026-05-20)**: - **Recent session wins (2026-05-20)**:
- 8 always-on peepholes + extended phase 4 in W65816StackRelToImg - 8 always-on peepholes + extended phase 4 in W65816StackRelToImg
(evalAt 498→472, fib -35%, 35 libc fns shrunk) (evalAt 498→472, fib -35%, 35 libc fns shrunk)
@ -27,6 +44,11 @@ the actual diffs in tree to ground assumptions.
TAY/TYA round-trip in synergy TAY/TYA round-trip in synergy
- FP cycle benches added (dadd/dmul/ddiv) with per-bench iter count - FP cycle benches added (dadd/dmul/ddiv) with per-bench iter count
- Documented LSR-dp cycle mystery as HBL-counter wrap artifact - Documented LSR-dp cycle mystery as HBL-counter wrap artifact
- Game-like benches added: particles (i16 physics), mandelbrot (i32 fp)
- **elideStoreForwarding now reached via early-return bail paths**:
particles 5005→2253 cyc/iter (-55%). Was being skipped for any
function where main IMG promotion bailed (SpAdj invalid, no
accesses, or > 16 hot slots).
## Uncommitted, must keep ## Uncommitted, must keep

View file

@ -245,14 +245,15 @@ which runs correctly under MAME (apple2gs).
scripts/bench.sh size-vs-Calypsi harness. 100% pass. scripts/bench.sh size-vs-Calypsi harness. 100% pass.
- `scripts/benchCycles.sh` measures per-iteration cycle counts via - `scripts/benchCycles.sh` measures per-iteration cycle counts via
MAME's emulated HBL counter. Eleven benchmarks under MAME's emulated HBL counter. 13 benchmarks under `benchmarks/`
`benchmarks/` (eight int + three FP). Current numbers (8 int micro + 3 soft-FP + 2 "game-like": particles, mandelbrot).
(2026-05-20): Current numbers (2026-05-20):
bsearch 127, crc32 <65, dotProduct 144, fib 97, memcmp 113, bsearch 127, crc32 <65, dotProduct 144, fib 97, memcmp 113,
popcount 93, strcpy 91, sumOfSquares 126 cyc/iter (100 iters); popcount 93, strcpy 91, sumOfSquares 126 cyc/iter (100 iters);
dadd 1157, ddiv 1261, dmul 1033 cyc/iter (10 iters; FP benches dadd 1157, ddiv 1261, dmul 1033 cyc/iter (10 iters);
use fewer iters since each call is ~1000+ cyc). Speed is the particles 2253 cyc/iter (3 iters — 32-particle physics tick);
optimization priority, not size. mandelbrot 11570 cyc/iter (1 iter — 4×4 fixed-point tile, max 8
Mandelbrot iters). Speed is the optimization priority, not size.
- `compare/` holds three side-by-side C tests with our asm and - `compare/` holds three side-by-side C tests with our asm and
Calypsi's listing for static-size comparison: Calypsi's listing for static-size comparison:
@ -328,6 +329,46 @@ Work is now optimization-focused; the toolchain is feature-complete
for the common-case C / minimal-C++ workload. Priority is speed for the common-case C / minimal-C++ workload. Priority is speed
(cycle counts), not size. (cycle counts), not size.
**Recently landed (2026-05-25):**
- **Layer 1 ptr32 deref-fold (always on)** — Constant offset on a
ptr32 deref folds into the `[dp],Y` Y register instead of a CLC/ADC
carry-chain pre-add. Plus consecutive-deref CSE that shares the
`$E0/$E2` staging across `s->a`, `s->b`, ... accesses with the same
base. Always on; saves ~3 instructions per struct-field access.
See `feedback_ptr32_deref_fold_layer1_landed.md`.
- **Layer 2 ptr32 deref via `(d,S),Y` (opt-in)**
`-mllvm -w65816-dbr-safe-ptrs` switches ptr32 derefs to the
one-instruction `lda (d,S),Y` (opcode 0xB3) at the cost of reading
only 16 bits of pointer. Bank byte is implicit DBR. Correct only
for code that touches memory inside DBR's bank — typical for
malloc/globals/BSS-only programs (Lua, Picol). Lua 5.1.5 shrinks
20.6%, dropping our total from 1.45× to 1.15× Calypsi. Default
off; per-TU opt-in. See `feedback_ptr32_layer2_landed.md` and
`docs/USAGE.md` for the safety rules.
- **Inline-threshold lowered target-wide to 50** (was LLVM default
225). LLVM's default is tuned for desktop ISAs where call overhead
is high relative to inlined-body byte cost. On W65816, `jsl` is
cheap (4 bytes / ~8 cycles) but inlined ptr32 derefs are expensive
even with Layer 2 — the tradeoff inverts. At 225, Lua's
`index2adr` (41 callers in lapi.c) and CoreMark's `matrix_test`
helpers got copied everywhere. At 50, neither does, and the cycle
benchmark suite is unchanged. With Layer 2 + threshold=50, total
Lua is **0.93× Calypsi** and total CoreMark is **0.79× Calypsi
(we beat by 21%)**. Override per-TU with
`-mllvm -inline-threshold=N`. See
`feedback_lapi_inline_threshold.md` and
`feedback_coremark_matrix_test_regression.md`.
- **CoreMark 1.0 ported** (`tests/coremark/`). EEMBC's standard
embedded benchmark, ~2K LOC. Exercises linked-list traversal,
matrix multiply, formal state machine, CRC — patterns Lua doesn't
hit. Build requires `--layer2` to fit a single bank
(otherwise crosses the IO window at 0xC000). See
`tests/coremark/README.md` and `feedback_coremark_landed.md`.
**Speed wins queued, ranked by expected impact:** **Speed wins queued, ranked by expected impact:**
- **ptr32 pointer-increment overhead** (partially addressed). The - **ptr32 pointer-increment overhead** (partially addressed). The
@ -467,6 +508,15 @@ for the common-case C / minimal-C++ workload. Priority is speed
high half being non-zero doesn't affect correctness — iters 32-63 high half being non-zero doesn't affect correctness — iters 32-63
would just shift b without adding. would just shift b without adding.
- **Lua 5.1.5 compiles cleanly** (2026-05-20). Reference C
implementation (17K lines, 24 source files) builds + links into a
multi-segment binary. Loads in MAME. Lives under `tests/lua/`.
Three large functions (luaV_execute, symbexec, auxsort) hit
greedy regalloc's complexity budget and need `-mllvm
-regalloc=basic` (still at -O2 — basic-regalloc -O2 is ~3.5×
smaller than fast-regalloc -O0). Largest "real-world C" test
in the project.
**Open limitations:** **Open limitations:**
- **Multi-bank BSS** — full support up to 4 banks (256KB). link816 - **Multi-bank BSS** — full support up to 4 banks (256KB). link816

58
benchmarks/mandelbrot.c Normal file
View file

@ -0,0 +1,58 @@
// Mandelbrot tile in 16.16 fixed-point — exercises i32 multiply
// (__mulsi3 / __umulhisi3) and conditional control flow. Pure
// integer math: doesn't pull in soft-double.
//
// Rasterizes a tiny 8x8 grid over the complex plane and sums per-pixel
// iteration counts. Returns the sum so dead-code-elim doesn't strip
// the loop.
typedef long fp_t; // 16.16 fixed-point
#define FP_SHIFT 16
#define FP_ONE (1L << FP_SHIFT)
#define FP_FOUR (4L << FP_SHIFT)
#define GRID 4
#define MAX_ITER 8
static fp_t fpMul(fp_t a, fp_t b) {
// Signed 16.16 multiply: (a * b) >> 16.
// Original `(long long)a * (long long)b` defeats __muldi3's 32-bit
// short-circuit when args are negative (sign-extension fills high
// half with 1s). Restore via partial products on 16-bit halves —
// __umulhisi3 (16x16→32) is much cheaper than __muldi3 (32+ iters).
long long p = (long long)a * (long long)b;
return (fp_t)(p >> FP_SHIFT);
}
unsigned long mandTile(void) {
unsigned long sum = 0;
// c-plane window: [-2, 1] x [-1, 1]. At GRID=8, step = 3/8 in x,
// 2/8 in y. Express as 16.16 increments.
fp_t stepX = (fp_t)((3L * FP_ONE) / GRID);
fp_t stepY = (fp_t)((2L * FP_ONE) / GRID);
fp_t baseX = -(2L * FP_ONE);
fp_t baseY = -FP_ONE;
for (short j = 0; j < GRID; j++) {
fp_t cy = baseY + (fp_t)j * stepY;
for (short i = 0; i < GRID; i++) {
fp_t cx = baseX + (fp_t)i * stepX;
fp_t x = 0;
fp_t y = 0;
short iter;
for (iter = 0; iter < MAX_ITER; iter++) {
fp_t xx = fpMul(x, x);
fp_t yy = fpMul(y, y);
if (xx + yy > FP_FOUR) {
break;
}
fp_t xy = fpMul(x, y);
y = (fp_t)(xy + xy + cy); // 2*x*y + cy
x = (fp_t)(xx - yy + cx);
}
sum += (unsigned long)(unsigned short)iter;
}
}
return sum;
}

68
benchmarks/particles.c Normal file
View file

@ -0,0 +1,68 @@
// Game-like particle system: 32 particles, i16 position + velocity in a
// 320x200 box, bounce on walls. Mimics the inner loop of an action
// game's sprite update (position += velocity + wall collision).
//
// Each particle is (px, py, vx, vy). Initial positions deterministic
// pseudo-random (so the bench is reproducible).
//
// particleStep() runs one tick. Returns a checksum of all px values
// so the result isn't dead-code-eliminated.
#define N_PARTICLES 32
#define W 320
#define H 200
static short px[N_PARTICLES];
static short py[N_PARTICLES];
static short vx[N_PARTICLES];
static short vy[N_PARTICLES];
// volatile to defeat GlobalOpt's narrowing to i1 (causes a backend
// i32-load-from-i1 isel gap — see memory: feedback_i1_load_custom.md).
static volatile short initialized = 0;
static void particleInit(void) {
// Deterministic pseudo-random init: linear congruential.
unsigned short seed = 12345;
for (short i = 0; i < N_PARTICLES; i++) {
seed = (unsigned short)(seed * 25173 + 13849);
px[i] = (short)(seed % W);
seed = (unsigned short)(seed * 25173 + 13849);
py[i] = (short)(seed % H);
seed = (unsigned short)(seed * 25173 + 13849);
vx[i] = (short)((seed % 7) - 3); // -3..+3
seed = (unsigned short)(seed * 25173 + 13849);
vy[i] = (short)((seed % 7) - 3);
if (vx[i] == 0) {
vx[i] = 1;
}
if (vy[i] == 0) {
vy[i] = 1;
}
}
initialized = 1;
}
unsigned long particleStep(void) {
if (!initialized) {
particleInit();
}
unsigned long sum = 0;
for (short i = 0; i < N_PARTICLES; i++) {
short nx = (short)(px[i] + vx[i]);
short ny = (short)(py[i] + vy[i]);
if (nx < 0 || nx >= W) {
vx[i] = (short)(-vx[i]);
nx = px[i];
}
if (ny < 0 || ny >= H) {
vy[i] = (short)(-vy[i]);
ny = py[i];
}
px[i] = nx;
py[i] = ny;
sum += (unsigned long)(unsigned short)nx;
}
return sum;
}

View file

@ -1,7 +1,7 @@
############################################################################### ###############################################################################
# # # #
# Calypsi ISO C compiler for 65816 version 5.16 # # Calypsi ISO C compiler for 65816 version 5.16 #
# 20/May/2026 17:33:54 # # 25/May/2026 19:33:49 #
# Command line: --speed -O 2 --64bit-doubles evalAt.c -o # # Command line: --speed -O 2 --64bit-doubles evalAt.c -o #
# /tmp/evalAt.calypsi.elf --list-file evalAt.calypsi.lst # # /tmp/evalAt.calypsi.elf --list-file evalAt.calypsi.lst #
# # # #

View file

@ -8,7 +8,7 @@ evalAt: ; @evalAt
tay tay
tsc tsc
sec sec
sbc #0x32 sbc #0x34
tcs tcs
tya tya
pha pha
@ -25,17 +25,14 @@ evalAt: ; @evalAt
pla pla
stx 0xc0 stx 0xc0
sta 0x19, s sta 0x19, s
clc pha
adc #0x2
sta 0x1f, s
lda 0xc0 lda 0xc0
adc #0x0 sta 0x35, s
sta 0x21, s pla
lda 0x1f, s
sta 0xe0 sta 0xe0
lda 0x21, s lda 0x33, s
sta 0xe2 sta 0xe2
ldy #0x0 ldy #0x2
lda [0xe0], y lda [0xe0], y
sta 0x1d, s sta 0x1d, s
lda 0xc0 lda 0xc0
@ -44,9 +41,10 @@ evalAt: ; @evalAt
sta 0xe0 sta 0xe0
lda 0x31, s lda 0x31, s
sta 0xe2 sta 0xe2
ldy #0x0
lda [0xe0], y lda [0xe0], y
sta 0x21, s sta 0x21, s
lda 0x36, s lda 0x38, s
sta 0xb, s sta 0xb, s
lda #0x0 lda #0x0
sta 0xc4 sta 0xc4
@ -508,7 +506,7 @@ evalAt: ; @evalAt
sta 0xe0 sta 0xe0
tsc tsc
clc clc
adc #0x32 adc #0x34
tcs tcs
lda 0xe0 lda 0xe0
rtl rtl

View file

@ -1,7 +1,7 @@
############################################################################### ###############################################################################
# # # #
# Calypsi ISO C compiler for 65816 version 5.16 # # Calypsi ISO C compiler for 65816 version 5.16 #
# 20/May/2026 17:33:54 # # 25/May/2026 19:33:49 #
# Command line: --speed -O 2 --64bit-doubles mul16to32.c -o # # Command line: --speed -O 2 --64bit-doubles mul16to32.c -o #
# /tmp/mul16to32.calypsi.elf --list-file # # /tmp/mul16to32.calypsi.elf --list-file #
# mul16to32.calypsi.lst # # mul16to32.calypsi.lst #

View file

@ -1,7 +1,7 @@
############################################################################### ###############################################################################
# # # #
# Calypsi ISO C compiler for 65816 version 5.16 # # Calypsi ISO C compiler for 65816 version 5.16 #
# 20/May/2026 17:33:54 # # 25/May/2026 19:33:49 #
# Command line: --speed -O 2 --64bit-doubles sumSquares.c -o # # Command line: --speed -O 2 --64bit-doubles sumSquares.c -o #
# /tmp/sumSquares.calypsi.elf --list-file # # /tmp/sumSquares.calypsi.elf --list-file #
# sumSquares.calypsi.lst # # sumSquares.calypsi.lst #

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x002286 ( 4742 bytes) .text : 0x001000 .. 0x002141 ( 4417 bytes)
.rodata : 0x002286 .. 0x0023f2 ( 364 bytes) .rodata : 0x002141 .. 0x0022ad ( 364 bytes)
.bss : 0x00a000 .. 0x00a038 ( 56 bytes) .bss : 0x00a000 .. 0x00a038 ( 56 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
615 /home/scott/claude/llvm816/demos/frame.o 546 /home/scott/claude/llvm816/demos/frame.o
45465 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
15382 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
13322 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
8398 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
16151 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1565 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,77 +33,77 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x001321 CtlStartUp 0x0012dc CtlStartUp
0x001331 NoteAlert 0x0012ec NoteAlert
0x00134d EMStartUp 0x001308 EMStartUp
0x00136c FMStartUp 0x001327 FMStartUp
0x00137c LEStartUp 0x001337 LEStartUp
0x00138c LoadOneTool 0x001347 LoadOneTool
0x00139c NewHandle 0x001357 NewHandle
0x0013c2 MenuStartUp 0x00137d MenuStartUp
0x0013d2 HiliteMenu 0x00138d HiliteMenu
0x0013e2 InsertMenu 0x00139d InsertMenu
0x0013f7 NewMenu 0x0013b2 NewMenu
0x001411 QDStartUp 0x0013cc QDStartUp
0x001427 TaskMaster 0x0013e2 TaskMaster
0x00143e startdesk 0x0013f9 startdesk
0x001868 paintDesktopBackdrop 0x001717 paintDesktopBackdrop
0x00189a __jsl_indir 0x001749 __jsl_indir
0x00189d __mulhi3 0x00174c __mulhi3
0x0018bc __umulhisi3 0x00176b __umulhisi3
0x001913 __ashlhi3 0x0017c2 __ashlhi3
0x001922 __lshrhi3 0x0017d1 __lshrhi3
0x001932 __ashrhi3 0x0017e1 __ashrhi3
0x001945 __udivhi3 0x0017f4 __udivhi3
0x001951 __umodhi3 0x001800 __umodhi3
0x00195d __divhi3 0x00180c __divhi3
0x001977 __modhi3 0x001826 __modhi3
0x001991 __divmod_setup 0x001840 __divmod_setup
0x0019c4 __udivmod_core 0x001873 __udivmod_core
0x0019e2 __mulsi3 0x001891 __mulsi3
0x001a9b __ashlsi3 0x00194a __ashlsi3
0x001ab0 __lshrsi3 0x00195f __lshrsi3
0x001ac5 __ashrsi3 0x001974 __ashrsi3
0x001adf __udivmodsi_core 0x00198e __udivmodsi_core
0x001b17 __udivsi3 0x0019c6 __udivsi3
0x001b2b __umodsi3 0x0019da __umodsi3
0x001b3f __divsi3 0x0019ee __divsi3
0x001b66 __modsi3 0x001a15 __modsi3
0x001b8d __divmodsi_setup 0x001a3c __divmodsi_setup
0x001bde __divmoddi4_stash 0x001a8d __divmoddi4_stash
0x001bfb __retdi 0x001aaa __retdi
0x001c08 __ashldi3 0x001ab7 __ashldi3
0x001c2b __lshrdi3 0x001ada __lshrdi3
0x001c4e __ashrdi3 0x001afd __ashrdi3
0x001c74 __muldi3 0x001b23 __muldi3
0x001ccf __ucmpdi2 0x001b8a __ucmpdi2
0x001cf8 __cmpdi2 0x001bb3 __cmpdi2
0x001d2f __udivdi3 0x001bea __udivdi3
0x001d38 __umoddi3 0x001bf3 __umoddi3
0x001d51 __udivmoddi_core 0x001c0c __udivmoddi_core
0x001d9e __divdi3 0x001c59 __divdi3
0x001dbd __moddi3 0x001c78 __moddi3
0x001dea __absdi_a 0x001ca5 __absdi_a
0x001df2 __absdi_b 0x001cad __absdi_b
0x001dfa __negdi_a 0x001cb5 __negdi_a
0x001e18 __negdi_b 0x001cd3 __negdi_b
0x001e36 setjmp 0x001cf1 setjmp
0x001e5e longjmp 0x001d19 longjmp
0x001e88 __umulhisi3_qsq 0x001d43 __umulhisi3_qsq
0x002286 __rodata_start 0x002141 __rodata_start
0x002286 __text_end 0x002141 __text_end
0x002286 gChainPath 0x002141 gChainPath
0x00229a editMenuStr 0x002155 editMenuStr
0x0022f3 fileMenuStr 0x0021ae fileMenuStr
0x002320 appleMenuStr 0x0021db appleMenuStr
0x00233f gAboutMsg 0x0021fa gAboutMsg
0x00237f doAlert.okStr 0x00223a doAlert.okStr
0x002384 doAlert.button 0x00223f doAlert.button
0x00239c doAlert.message 0x002257 doAlert.message
0x0023b4 doAlert.alertRec 0x00226f doAlert.alertRec
0x0023f2 __init_array_end 0x0022ad __init_array_end
0x0023f2 __init_array_start 0x0022ad __init_array_start
0x0023f2 __rodata_end 0x0022ad __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -116,27 +116,27 @@
0x00a038 __bss_end 0x00a038 __bss_end
0x00a038 __heap_start 0x00a038 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
CtlStartUp = 0x001321 CtlStartUp = 0x0012dc
EMStartUp = 0x00134d EMStartUp = 0x001308
FMStartUp = 0x00136c FMStartUp = 0x001327
HiliteMenu = 0x0013d2 HiliteMenu = 0x00138d
InsertMenu = 0x0013e2 InsertMenu = 0x00139d
LEStartUp = 0x00137c LEStartUp = 0x001337
LoadOneTool = 0x00138c LoadOneTool = 0x001347
MenuStartUp = 0x0013c2 MenuStartUp = 0x00137d
NewHandle = 0x00139c NewHandle = 0x001357
NewMenu = 0x0013f7 NewMenu = 0x0013b2
NoteAlert = 0x001331 NoteAlert = 0x0012ec
QDStartUp = 0x001411 QDStartUp = 0x0013cc
TaskMaster = 0x001427 TaskMaster = 0x0013e2
__absdi_a = 0x001dea __absdi_a = 0x001ca5
__absdi_b = 0x001df2 __absdi_b = 0x001cad
__ashldi3 = 0x001c08 __ashldi3 = 0x001ab7
__ashlhi3 = 0x001913 __ashlhi3 = 0x0017c2
__ashlsi3 = 0x001a9b __ashlsi3 = 0x00194a
__ashrdi3 = 0x001c4e __ashrdi3 = 0x001afd
__ashrhi3 = 0x001932 __ashrhi3 = 0x0017e1
__ashrsi3 = 0x001ac5 __ashrsi3 = 0x001974
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a038 __bss_end = 0x00a038
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -154,64 +154,64 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x000038 __bss_size = 0x000038
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x001cf8 __cmpdi2 = 0x001bb3
__divdi3 = 0x001d9e __divdi3 = 0x001c59
__divhi3 = 0x00195d __divhi3 = 0x00180c
__divmod_setup = 0x001991 __divmod_setup = 0x001840
__divmoddi4_stash = 0x001bde __divmoddi4_stash = 0x001a8d
__divmodsi_setup = 0x001b8d __divmodsi_setup = 0x001a3c
__divsi3 = 0x001b3f __divsi3 = 0x0019ee
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a038 __heap_start = 0x00a038
__indirTarget = 0x00a036 __indirTarget = 0x00a036
__init_array_end = 0x0023f2 __init_array_end = 0x0022ad
__init_array_start = 0x0023f2 __init_array_start = 0x0022ad
__jsl_indir = 0x00189a __jsl_indir = 0x001749
__lshrdi3 = 0x001c2b __lshrdi3 = 0x001ada
__lshrhi3 = 0x001922 __lshrhi3 = 0x0017d1
__lshrsi3 = 0x001ab0 __lshrsi3 = 0x00195f
__moddi3 = 0x001dbd __moddi3 = 0x001c78
__modhi3 = 0x001977 __modhi3 = 0x001826
__modsi3 = 0x001b66 __modsi3 = 0x001a15
__muldi3 = 0x001c74 __muldi3 = 0x001b23
__mulhi3 = 0x00189d __mulhi3 = 0x00174c
__mulsi3 = 0x0019e2 __mulsi3 = 0x001891
__negdi_a = 0x001dfa __negdi_a = 0x001cb5
__negdi_b = 0x001e18 __negdi_b = 0x001cd3
__retdi = 0x001bfb __retdi = 0x001aaa
__rodata_end = 0x0023f2 __rodata_end = 0x0022ad
__rodata_start = 0x002286 __rodata_start = 0x002141
__start = 0x001000 __start = 0x001000
__text_end = 0x002286 __text_end = 0x002141
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x001ccf __ucmpdi2 = 0x001b8a
__udivdi3 = 0x001d2f __udivdi3 = 0x001bea
__udivhi3 = 0x001945 __udivhi3 = 0x0017f4
__udivmod_core = 0x0019c4 __udivmod_core = 0x001873
__udivmoddi_core = 0x001d51 __udivmoddi_core = 0x001c0c
__udivmodsi_core = 0x001adf __udivmodsi_core = 0x00198e
__udivsi3 = 0x001b17 __udivsi3 = 0x0019c6
__umoddi3 = 0x001d38 __umoddi3 = 0x001bf3
__umodhi3 = 0x001951 __umodhi3 = 0x001800
__umodsi3 = 0x001b2b __umodsi3 = 0x0019da
__umulhisi3 = 0x0018bc __umulhisi3 = 0x00176b
__umulhisi3_qsq = 0x001e88 __umulhisi3_qsq = 0x001d43
appleMenuStr = 0x002320 appleMenuStr = 0x0021db
doAlert.alertRec = 0x0023b4 doAlert.alertRec = 0x00226f
doAlert.button = 0x002384 doAlert.button = 0x00223f
doAlert.message = 0x00239c doAlert.message = 0x002257
doAlert.okStr = 0x00237f doAlert.okStr = 0x00223a
editMenuStr = 0x00229a editMenuStr = 0x002155
fileMenuStr = 0x0022f3 fileMenuStr = 0x0021ae
gAboutMsg = 0x00233f gAboutMsg = 0x0021fa
gChainPath = 0x002286 gChainPath = 0x002141
gDone = 0x00a02c gDone = 0x00a02c
gDpBase = 0x00a034 gDpBase = 0x00a034
gDpHandle = 0x00a030 gDpHandle = 0x00a030
gEvent = 0x00a000 gEvent = 0x00a000
gUserId = 0x00a02e gUserId = 0x00a02e
longjmp = 0x001e5e longjmp = 0x001d19
main = 0x0010ba main = 0x0010ba
paintDesktopBackdrop = 0x001868 paintDesktopBackdrop = 0x001717
setjmp = 0x001e36 setjmp = 0x001cf1
startdesk = 0x00143e startdesk = 0x0013f9

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x001caa ( 3242 bytes) .text : 0x001000 .. 0x001cae ( 3246 bytes)
.rodata : 0x001caa .. 0x006c6e ( 20420 bytes) .rodata : 0x001cae .. 0x006c72 ( 20420 bytes)
.bss : 0x00a000 .. 0x00a1a2 ( 418 bytes) .bss : 0x00a000 .. 0x00a1a2 ( 418 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
516 /home/scott/claude/llvm816/demos/heavyRelocs.o 508 /home/scott/claude/llvm816/demos/heavyRelocs.o
43513 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
5935 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,56 +33,56 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x0012be __jsl_indir 0x0012b6 __jsl_indir
0x0012c1 __mulhi3 0x0012b9 __mulhi3
0x0012e0 __umulhisi3 0x0012d8 __umulhisi3
0x001337 __ashlhi3 0x00132f __ashlhi3
0x001346 __lshrhi3 0x00133e __lshrhi3
0x001356 __ashrhi3 0x00134e __ashrhi3
0x001369 __udivhi3 0x001361 __udivhi3
0x001375 __umodhi3 0x00136d __umodhi3
0x001381 __divhi3 0x001379 __divhi3
0x00139b __modhi3 0x001393 __modhi3
0x0013b5 __divmod_setup 0x0013ad __divmod_setup
0x0013e8 __udivmod_core 0x0013e0 __udivmod_core
0x001406 __mulsi3 0x0013fe __mulsi3
0x0014bf __ashlsi3 0x0014b7 __ashlsi3
0x0014d4 __lshrsi3 0x0014cc __lshrsi3
0x0014e9 __ashrsi3 0x0014e1 __ashrsi3
0x001503 __udivmodsi_core 0x0014fb __udivmodsi_core
0x00153b __udivsi3 0x001533 __udivsi3
0x00154f __umodsi3 0x001547 __umodsi3
0x001563 __divsi3 0x00155b __divsi3
0x00158a __modsi3 0x001582 __modsi3
0x0015b1 __divmodsi_setup 0x0015a9 __divmodsi_setup
0x001602 __divmoddi4_stash 0x0015fa __divmoddi4_stash
0x00161f __retdi 0x001617 __retdi
0x00162c __ashldi3 0x001624 __ashldi3
0x00164f __lshrdi3 0x001647 __lshrdi3
0x001672 __ashrdi3 0x00166a __ashrdi3
0x001698 __muldi3 0x001690 __muldi3
0x0016f3 __ucmpdi2 0x0016f7 __ucmpdi2
0x00171c __cmpdi2 0x001720 __cmpdi2
0x001753 __udivdi3 0x001757 __udivdi3
0x00175c __umoddi3 0x001760 __umoddi3
0x001775 __udivmoddi_core 0x001779 __udivmoddi_core
0x0017c2 __divdi3 0x0017c6 __divdi3
0x0017e1 __moddi3 0x0017e5 __moddi3
0x00180e __absdi_a 0x001812 __absdi_a
0x001816 __absdi_b 0x00181a __absdi_b
0x00181e __negdi_a 0x001822 __negdi_a
0x00183c __negdi_b 0x001840 __negdi_b
0x00185a setjmp 0x00185e setjmp
0x001882 longjmp 0x001886 longjmp
0x0018ac __umulhisi3_qsq 0x0018b0 __umulhisi3_qsq
0x001caa __rodata_start 0x001cae __rodata_start
0x001caa __text_end 0x001cae __text_end
0x001caa gChainPath 0x001cae gChainPath
0x001cbe gBigData 0x001cc2 gBigData
0x006ade gPtrs 0x006ae2 gPtrs
0x006c6e __init_array_end 0x006c72 __init_array_end
0x006c6e __init_array_start 0x006c72 __init_array_start
0x006c6e __rodata_end 0x006c72 __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -116,14 +116,14 @@
0x00a1a2 __bss_end 0x00a1a2 __bss_end
0x00a1a2 __heap_start 0x00a1a2 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
__absdi_a = 0x00180e __absdi_a = 0x001812
__absdi_b = 0x001816 __absdi_b = 0x00181a
__ashldi3 = 0x00162c __ashldi3 = 0x001624
__ashlhi3 = 0x001337 __ashlhi3 = 0x00132f
__ashlsi3 = 0x0014bf __ashlsi3 = 0x0014b7
__ashrdi3 = 0x001672 __ashrdi3 = 0x00166a
__ashrhi3 = 0x001356 __ashrhi3 = 0x00134e
__ashrsi3 = 0x0014e9 __ashrsi3 = 0x0014e1
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a1a2 __bss_end = 0x00a1a2
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -141,53 +141,53 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x0001a2 __bss_size = 0x0001a2
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x00171c __cmpdi2 = 0x001720
__divdi3 = 0x0017c2 __divdi3 = 0x0017c6
__divhi3 = 0x001381 __divhi3 = 0x001379
__divmod_setup = 0x0013b5 __divmod_setup = 0x0013ad
__divmoddi4_stash = 0x001602 __divmoddi4_stash = 0x0015fa
__divmodsi_setup = 0x0015b1 __divmodsi_setup = 0x0015a9
__divsi3 = 0x001563 __divsi3 = 0x00155b
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a1a2 __heap_start = 0x00a1a2
__indirTarget = 0x00a1a0 __indirTarget = 0x00a1a0
__init_array_end = 0x006c6e __init_array_end = 0x006c72
__init_array_start = 0x006c6e __init_array_start = 0x006c72
__jsl_indir = 0x0012be __jsl_indir = 0x0012b6
__lshrdi3 = 0x00164f __lshrdi3 = 0x001647
__lshrhi3 = 0x001346 __lshrhi3 = 0x00133e
__lshrsi3 = 0x0014d4 __lshrsi3 = 0x0014cc
__moddi3 = 0x0017e1 __moddi3 = 0x0017e5
__modhi3 = 0x00139b __modhi3 = 0x001393
__modsi3 = 0x00158a __modsi3 = 0x001582
__muldi3 = 0x001698 __muldi3 = 0x001690
__mulhi3 = 0x0012c1 __mulhi3 = 0x0012b9
__mulsi3 = 0x001406 __mulsi3 = 0x0013fe
__negdi_a = 0x00181e __negdi_a = 0x001822
__negdi_b = 0x00183c __negdi_b = 0x001840
__retdi = 0x00161f __retdi = 0x001617
__rodata_end = 0x006c6e __rodata_end = 0x006c72
__rodata_start = 0x001caa __rodata_start = 0x001cae
__start = 0x001000 __start = 0x001000
__text_end = 0x001caa __text_end = 0x001cae
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x0016f3 __ucmpdi2 = 0x0016f7
__udivdi3 = 0x001753 __udivdi3 = 0x001757
__udivhi3 = 0x001369 __udivhi3 = 0x001361
__udivmod_core = 0x0013e8 __udivmod_core = 0x0013e0
__udivmoddi_core = 0x001775 __udivmoddi_core = 0x001779
__udivmodsi_core = 0x001503 __udivmodsi_core = 0x0014fb
__udivsi3 = 0x00153b __udivsi3 = 0x001533
__umoddi3 = 0x00175c __umoddi3 = 0x001760
__umodhi3 = 0x001375 __umodhi3 = 0x00136d
__umodsi3 = 0x00154f __umodsi3 = 0x001547
__umulhisi3 = 0x0012e0 __umulhisi3 = 0x0012d8
__umulhisi3_qsq = 0x0018ac __umulhisi3_qsq = 0x0018b0
gA = 0x00a000 gA = 0x00a000
gB = 0x00a010 gB = 0x00a010
gBigData = 0x001cbe gBigData = 0x001cc2
gC = 0x00a020 gC = 0x00a020
gChainPath = 0x001caa gChainPath = 0x001cae
gD = 0x00a030 gD = 0x00a030
gE = 0x00a040 gE = 0x00a040
gF = 0x00a050 gF = 0x00a050
@ -201,7 +201,7 @@ gM = 0x00a0c0
gN = 0x00a0d0 gN = 0x00a0d0
gO = 0x00a0e0 gO = 0x00a0e0
gP = 0x00a0f0 gP = 0x00a0f0
gPtrs = 0x006ade gPtrs = 0x006ae2
gQ = 0x00a100 gQ = 0x00a100
gR = 0x00a110 gR = 0x00a110
gS = 0x00a120 gS = 0x00a120
@ -212,6 +212,6 @@ gW = 0x00a160
gX = 0x00a170 gX = 0x00a170
gY = 0x00a180 gY = 0x00a180
gZ = 0x00a190 gZ = 0x00a190
longjmp = 0x001882 longjmp = 0x001886
main = 0x0010ba main = 0x0010ba
setjmp = 0x00185a setjmp = 0x00185e

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x001c37 ( 3127 bytes) .text : 0x001000 .. 0x001c3d ( 3133 bytes)
.rodata : 0x001c37 .. 0x001c4b ( 20 bytes) .rodata : 0x001c3d .. 0x001c51 ( 20 bytes)
.bss : 0x00a000 .. 0x00a002 ( 2 bytes) .bss : 0x00a000 .. 0x00a002 ( 2 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
401 /home/scott/claude/llvm816/demos/helloBeep.o 395 /home/scott/claude/llvm816/demos/helloBeep.o
43513 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
5935 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,54 +33,54 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x00124b __jsl_indir 0x001245 __jsl_indir
0x00124e __mulhi3 0x001248 __mulhi3
0x00126d __umulhisi3 0x001267 __umulhisi3
0x0012c4 __ashlhi3 0x0012be __ashlhi3
0x0012d3 __lshrhi3 0x0012cd __lshrhi3
0x0012e3 __ashrhi3 0x0012dd __ashrhi3
0x0012f6 __udivhi3 0x0012f0 __udivhi3
0x001302 __umodhi3 0x0012fc __umodhi3
0x00130e __divhi3 0x001308 __divhi3
0x001328 __modhi3 0x001322 __modhi3
0x001342 __divmod_setup 0x00133c __divmod_setup
0x001375 __udivmod_core 0x00136f __udivmod_core
0x001393 __mulsi3 0x00138d __mulsi3
0x00144c __ashlsi3 0x001446 __ashlsi3
0x001461 __lshrsi3 0x00145b __lshrsi3
0x001476 __ashrsi3 0x001470 __ashrsi3
0x001490 __udivmodsi_core 0x00148a __udivmodsi_core
0x0014c8 __udivsi3 0x0014c2 __udivsi3
0x0014dc __umodsi3 0x0014d6 __umodsi3
0x0014f0 __divsi3 0x0014ea __divsi3
0x001517 __modsi3 0x001511 __modsi3
0x00153e __divmodsi_setup 0x001538 __divmodsi_setup
0x00158f __divmoddi4_stash 0x001589 __divmoddi4_stash
0x0015ac __retdi 0x0015a6 __retdi
0x0015b9 __ashldi3 0x0015b3 __ashldi3
0x0015dc __lshrdi3 0x0015d6 __lshrdi3
0x0015ff __ashrdi3 0x0015f9 __ashrdi3
0x001625 __muldi3 0x00161f __muldi3
0x001680 __ucmpdi2 0x001686 __ucmpdi2
0x0016a9 __cmpdi2 0x0016af __cmpdi2
0x0016e0 __udivdi3 0x0016e6 __udivdi3
0x0016e9 __umoddi3 0x0016ef __umoddi3
0x001702 __udivmoddi_core 0x001708 __udivmoddi_core
0x00174f __divdi3 0x001755 __divdi3
0x00176e __moddi3 0x001774 __moddi3
0x00179b __absdi_a 0x0017a1 __absdi_a
0x0017a3 __absdi_b 0x0017a9 __absdi_b
0x0017ab __negdi_a 0x0017b1 __negdi_a
0x0017c9 __negdi_b 0x0017cf __negdi_b
0x0017e7 setjmp 0x0017ed setjmp
0x00180f longjmp 0x001815 longjmp
0x001839 __umulhisi3_qsq 0x00183f __umulhisi3_qsq
0x001c37 __rodata_start 0x001c3d __rodata_start
0x001c37 __text_end 0x001c3d __text_end
0x001c37 gChainPath 0x001c3d gChainPath
0x001c4b __init_array_end 0x001c51 __init_array_end
0x001c4b __init_array_start 0x001c51 __init_array_start
0x001c4b __rodata_end 0x001c51 __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -88,14 +88,14 @@
0x00a002 __bss_end 0x00a002 __bss_end
0x00a002 __heap_start 0x00a002 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
__absdi_a = 0x00179b __absdi_a = 0x0017a1
__absdi_b = 0x0017a3 __absdi_b = 0x0017a9
__ashldi3 = 0x0015b9 __ashldi3 = 0x0015b3
__ashlhi3 = 0x0012c4 __ashlhi3 = 0x0012be
__ashlsi3 = 0x00144c __ashlsi3 = 0x001446
__ashrdi3 = 0x0015ff __ashrdi3 = 0x0015f9
__ashrhi3 = 0x0012e3 __ashrhi3 = 0x0012dd
__ashrsi3 = 0x001476 __ashrsi3 = 0x001470
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a002 __bss_end = 0x00a002
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -113,49 +113,49 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x000002 __bss_size = 0x000002
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x0016a9 __cmpdi2 = 0x0016af
__divdi3 = 0x00174f __divdi3 = 0x001755
__divhi3 = 0x00130e __divhi3 = 0x001308
__divmod_setup = 0x001342 __divmod_setup = 0x00133c
__divmoddi4_stash = 0x00158f __divmoddi4_stash = 0x001589
__divmodsi_setup = 0x00153e __divmodsi_setup = 0x001538
__divsi3 = 0x0014f0 __divsi3 = 0x0014ea
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a002 __heap_start = 0x00a002
__indirTarget = 0x00a000 __indirTarget = 0x00a000
__init_array_end = 0x001c4b __init_array_end = 0x001c51
__init_array_start = 0x001c4b __init_array_start = 0x001c51
__jsl_indir = 0x00124b __jsl_indir = 0x001245
__lshrdi3 = 0x0015dc __lshrdi3 = 0x0015d6
__lshrhi3 = 0x0012d3 __lshrhi3 = 0x0012cd
__lshrsi3 = 0x001461 __lshrsi3 = 0x00145b
__moddi3 = 0x00176e __moddi3 = 0x001774
__modhi3 = 0x001328 __modhi3 = 0x001322
__modsi3 = 0x001517 __modsi3 = 0x001511
__muldi3 = 0x001625 __muldi3 = 0x00161f
__mulhi3 = 0x00124e __mulhi3 = 0x001248
__mulsi3 = 0x001393 __mulsi3 = 0x00138d
__negdi_a = 0x0017ab __negdi_a = 0x0017b1
__negdi_b = 0x0017c9 __negdi_b = 0x0017cf
__retdi = 0x0015ac __retdi = 0x0015a6
__rodata_end = 0x001c4b __rodata_end = 0x001c51
__rodata_start = 0x001c37 __rodata_start = 0x001c3d
__start = 0x001000 __start = 0x001000
__text_end = 0x001c37 __text_end = 0x001c3d
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x001680 __ucmpdi2 = 0x001686
__udivdi3 = 0x0016e0 __udivdi3 = 0x0016e6
__udivhi3 = 0x0012f6 __udivhi3 = 0x0012f0
__udivmod_core = 0x001375 __udivmod_core = 0x00136f
__udivmoddi_core = 0x001702 __udivmoddi_core = 0x001708
__udivmodsi_core = 0x001490 __udivmodsi_core = 0x00148a
__udivsi3 = 0x0014c8 __udivsi3 = 0x0014c2
__umoddi3 = 0x0016e9 __umoddi3 = 0x0016ef
__umodhi3 = 0x001302 __umodhi3 = 0x0012fc
__umodsi3 = 0x0014dc __umodsi3 = 0x0014d6
__umulhisi3 = 0x00126d __umulhisi3 = 0x001267
__umulhisi3_qsq = 0x001839 __umulhisi3_qsq = 0x00183f
gChainPath = 0x001c37 gChainPath = 0x001c3d
longjmp = 0x00180f longjmp = 0x001815
main = 0x0010ba main = 0x0010ba
setjmp = 0x0017e7 setjmp = 0x0017ed

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x0021ca ( 4554 bytes) .text : 0x001000 .. 0x002108 ( 4360 bytes)
.rodata : 0x0021ca .. 0x002238 ( 110 bytes) .rodata : 0x002108 .. 0x002176 ( 110 bytes)
.bss : 0x00a000 .. 0x00a00a ( 10 bytes) .bss : 0x00a000 .. 0x00a00a ( 10 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
552 /home/scott/claude/llvm816/demos/helloText.o 546 /home/scott/claude/llvm816/demos/helloText.o
43513 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
5935 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,70 +33,70 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x0012e2 CtlStartUp 0x0012dc CtlStartUp
0x0012f2 EMStartUp 0x0012ec EMStartUp
0x001311 GetNextEvent 0x00130b GetNextEvent
0x001328 FMStartUp 0x001322 FMStartUp
0x001338 LEStartUp 0x001332 LEStartUp
0x001348 LoadOneTool 0x001342 LoadOneTool
0x001358 NewHandle 0x001352 NewHandle
0x00137e MenuStartUp 0x001378 MenuStartUp
0x00138e QDStartUp 0x001388 QDStartUp
0x0013a4 DrawString 0x00139e DrawString
0x0013b6 MoveTo 0x0013b0 MoveTo
0x0013c6 startdesk 0x0013c0 startdesk
0x0017ac paintDesktopBackdrop 0x0016de paintDesktopBackdrop
0x0017de __jsl_indir 0x001710 __jsl_indir
0x0017e1 __mulhi3 0x001713 __mulhi3
0x001800 __umulhisi3 0x001732 __umulhisi3
0x001857 __ashlhi3 0x001789 __ashlhi3
0x001866 __lshrhi3 0x001798 __lshrhi3
0x001876 __ashrhi3 0x0017a8 __ashrhi3
0x001889 __udivhi3 0x0017bb __udivhi3
0x001895 __umodhi3 0x0017c7 __umodhi3
0x0018a1 __divhi3 0x0017d3 __divhi3
0x0018bb __modhi3 0x0017ed __modhi3
0x0018d5 __divmod_setup 0x001807 __divmod_setup
0x001908 __udivmod_core 0x00183a __udivmod_core
0x001926 __mulsi3 0x001858 __mulsi3
0x0019df __ashlsi3 0x001911 __ashlsi3
0x0019f4 __lshrsi3 0x001926 __lshrsi3
0x001a09 __ashrsi3 0x00193b __ashrsi3
0x001a23 __udivmodsi_core 0x001955 __udivmodsi_core
0x001a5b __udivsi3 0x00198d __udivsi3
0x001a6f __umodsi3 0x0019a1 __umodsi3
0x001a83 __divsi3 0x0019b5 __divsi3
0x001aaa __modsi3 0x0019dc __modsi3
0x001ad1 __divmodsi_setup 0x001a03 __divmodsi_setup
0x001b22 __divmoddi4_stash 0x001a54 __divmoddi4_stash
0x001b3f __retdi 0x001a71 __retdi
0x001b4c __ashldi3 0x001a7e __ashldi3
0x001b6f __lshrdi3 0x001aa1 __lshrdi3
0x001b92 __ashrdi3 0x001ac4 __ashrdi3
0x001bb8 __muldi3 0x001aea __muldi3
0x001c13 __ucmpdi2 0x001b51 __ucmpdi2
0x001c3c __cmpdi2 0x001b7a __cmpdi2
0x001c73 __udivdi3 0x001bb1 __udivdi3
0x001c7c __umoddi3 0x001bba __umoddi3
0x001c95 __udivmoddi_core 0x001bd3 __udivmoddi_core
0x001ce2 __divdi3 0x001c20 __divdi3
0x001d01 __moddi3 0x001c3f __moddi3
0x001d2e __absdi_a 0x001c6c __absdi_a
0x001d36 __absdi_b 0x001c74 __absdi_b
0x001d3e __negdi_a 0x001c7c __negdi_a
0x001d5c __negdi_b 0x001c9a __negdi_b
0x001d7a setjmp 0x001cb8 setjmp
0x001da2 longjmp 0x001ce0 longjmp
0x001dcc __umulhisi3_qsq 0x001d0a __umulhisi3_qsq
0x0021ca __rodata_start 0x002108 __rodata_start
0x0021ca __text_end 0x002108 __text_end
0x0021ca gChainPath 0x002108 gChainPath
0x0021de line1 0x00211c line1
0x0021f3 line2 0x002131 line2
0x002220 line3 0x00215e line3
0x002238 __init_array_end 0x002176 __init_array_end
0x002238 __init_array_start 0x002176 __init_array_start
0x002238 __rodata_end 0x002176 __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -107,25 +107,25 @@
0x00a00a __bss_end 0x00a00a __bss_end
0x00a00a __heap_start 0x00a00a __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
CtlStartUp = 0x0012e2 CtlStartUp = 0x0012dc
DrawString = 0x0013a4 DrawString = 0x00139e
EMStartUp = 0x0012f2 EMStartUp = 0x0012ec
FMStartUp = 0x001328 FMStartUp = 0x001322
GetNextEvent = 0x001311 GetNextEvent = 0x00130b
LEStartUp = 0x001338 LEStartUp = 0x001332
LoadOneTool = 0x001348 LoadOneTool = 0x001342
MenuStartUp = 0x00137e MenuStartUp = 0x001378
MoveTo = 0x0013b6 MoveTo = 0x0013b0
NewHandle = 0x001358 NewHandle = 0x001352
QDStartUp = 0x00138e QDStartUp = 0x001388
__absdi_a = 0x001d2e __absdi_a = 0x001c6c
__absdi_b = 0x001d36 __absdi_b = 0x001c74
__ashldi3 = 0x001b4c __ashldi3 = 0x001a7e
__ashlhi3 = 0x001857 __ashlhi3 = 0x001789
__ashlsi3 = 0x0019df __ashlsi3 = 0x001911
__ashrdi3 = 0x001b92 __ashrdi3 = 0x001ac4
__ashrhi3 = 0x001876 __ashrhi3 = 0x0017a8
__ashrsi3 = 0x001a09 __ashrsi3 = 0x00193b
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a00a __bss_end = 0x00a00a
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -143,57 +143,57 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x00000a __bss_size = 0x00000a
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x001c3c __cmpdi2 = 0x001b7a
__divdi3 = 0x001ce2 __divdi3 = 0x001c20
__divhi3 = 0x0018a1 __divhi3 = 0x0017d3
__divmod_setup = 0x0018d5 __divmod_setup = 0x001807
__divmoddi4_stash = 0x001b22 __divmoddi4_stash = 0x001a54
__divmodsi_setup = 0x001ad1 __divmodsi_setup = 0x001a03
__divsi3 = 0x001a83 __divsi3 = 0x0019b5
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a00a __heap_start = 0x00a00a
__indirTarget = 0x00a008 __indirTarget = 0x00a008
__init_array_end = 0x002238 __init_array_end = 0x002176
__init_array_start = 0x002238 __init_array_start = 0x002176
__jsl_indir = 0x0017de __jsl_indir = 0x001710
__lshrdi3 = 0x001b6f __lshrdi3 = 0x001aa1
__lshrhi3 = 0x001866 __lshrhi3 = 0x001798
__lshrsi3 = 0x0019f4 __lshrsi3 = 0x001926
__moddi3 = 0x001d01 __moddi3 = 0x001c3f
__modhi3 = 0x0018bb __modhi3 = 0x0017ed
__modsi3 = 0x001aaa __modsi3 = 0x0019dc
__muldi3 = 0x001bb8 __muldi3 = 0x001aea
__mulhi3 = 0x0017e1 __mulhi3 = 0x001713
__mulsi3 = 0x001926 __mulsi3 = 0x001858
__negdi_a = 0x001d3e __negdi_a = 0x001c7c
__negdi_b = 0x001d5c __negdi_b = 0x001c9a
__retdi = 0x001b3f __retdi = 0x001a71
__rodata_end = 0x002238 __rodata_end = 0x002176
__rodata_start = 0x0021ca __rodata_start = 0x002108
__start = 0x001000 __start = 0x001000
__text_end = 0x0021ca __text_end = 0x002108
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x001c13 __ucmpdi2 = 0x001b51
__udivdi3 = 0x001c73 __udivdi3 = 0x001bb1
__udivhi3 = 0x001889 __udivhi3 = 0x0017bb
__udivmod_core = 0x001908 __udivmod_core = 0x00183a
__udivmoddi_core = 0x001c95 __udivmoddi_core = 0x001bd3
__udivmodsi_core = 0x001a23 __udivmodsi_core = 0x001955
__udivsi3 = 0x001a5b __udivsi3 = 0x00198d
__umoddi3 = 0x001c7c __umoddi3 = 0x001bba
__umodhi3 = 0x001895 __umodhi3 = 0x0017c7
__umodsi3 = 0x001a6f __umodsi3 = 0x0019a1
__umulhisi3 = 0x001800 __umulhisi3 = 0x001732
__umulhisi3_qsq = 0x001dcc __umulhisi3_qsq = 0x001d0a
gChainPath = 0x0021ca gChainPath = 0x002108
gDpBase = 0x00a006 gDpBase = 0x00a006
gDpHandle = 0x00a002 gDpHandle = 0x00a002
gUserId = 0x00a000 gUserId = 0x00a000
line1 = 0x0021de line1 = 0x00211c
line2 = 0x0021f3 line2 = 0x002131
line3 = 0x002220 line3 = 0x00215e
longjmp = 0x001da2 longjmp = 0x001ce0
main = 0x0010ba main = 0x0010ba
paintDesktopBackdrop = 0x0017ac paintDesktopBackdrop = 0x0016de
setjmp = 0x001d7a setjmp = 0x001cb8
startdesk = 0x0013c6 startdesk = 0x0013c0

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x001ecd ( 3789 bytes) .text : 0x001000 .. 0x001ed1 ( 3793 bytes)
.rodata : 0x001ecd .. 0x001f01 ( 52 bytes) .rodata : 0x001ed1 .. 0x001f05 ( 52 bytes)
.bss : 0x00a000 .. 0x00a050 ( 80 bytes) .bss : 0x00a000 .. 0x00a050 ( 80 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
757 /home/scott/claude/llvm816/demos/helloWindow.o 751 /home/scott/claude/llvm816/demos/helloWindow.o
43513 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
5935 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,66 +33,66 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x0013af memset 0x0013a9 memset
0x00140f EMStartUp 0x001407 EMStartUp
0x00142e GetNextEvent 0x001426 GetNextEvent
0x001445 NewHandle 0x00143d NewHandle
0x00146b QDStartUp 0x001463 QDStartUp
0x001481 DrawString 0x001479 DrawString
0x001493 MoveTo 0x00148b MoveTo
0x0014a3 SetPort 0x00149b SetPort
0x0014b5 NewWindow 0x0014ad NewWindow
0x0014cf ShowWindow 0x0014c7 ShowWindow
0x0014e1 __jsl_indir 0x0014d9 __jsl_indir
0x0014e4 __mulhi3 0x0014dc __mulhi3
0x001503 __umulhisi3 0x0014fb __umulhisi3
0x00155a __ashlhi3 0x001552 __ashlhi3
0x001569 __lshrhi3 0x001561 __lshrhi3
0x001579 __ashrhi3 0x001571 __ashrhi3
0x00158c __udivhi3 0x001584 __udivhi3
0x001598 __umodhi3 0x001590 __umodhi3
0x0015a4 __divhi3 0x00159c __divhi3
0x0015be __modhi3 0x0015b6 __modhi3
0x0015d8 __divmod_setup 0x0015d0 __divmod_setup
0x00160b __udivmod_core 0x001603 __udivmod_core
0x001629 __mulsi3 0x001621 __mulsi3
0x0016e2 __ashlsi3 0x0016da __ashlsi3
0x0016f7 __lshrsi3 0x0016ef __lshrsi3
0x00170c __ashrsi3 0x001704 __ashrsi3
0x001726 __udivmodsi_core 0x00171e __udivmodsi_core
0x00175e __udivsi3 0x001756 __udivsi3
0x001772 __umodsi3 0x00176a __umodsi3
0x001786 __divsi3 0x00177e __divsi3
0x0017ad __modsi3 0x0017a5 __modsi3
0x0017d4 __divmodsi_setup 0x0017cc __divmodsi_setup
0x001825 __divmoddi4_stash 0x00181d __divmoddi4_stash
0x001842 __retdi 0x00183a __retdi
0x00184f __ashldi3 0x001847 __ashldi3
0x001872 __lshrdi3 0x00186a __lshrdi3
0x001895 __ashrdi3 0x00188d __ashrdi3
0x0018bb __muldi3 0x0018b3 __muldi3
0x001916 __ucmpdi2 0x00191a __ucmpdi2
0x00193f __cmpdi2 0x001943 __cmpdi2
0x001976 __udivdi3 0x00197a __udivdi3
0x00197f __umoddi3 0x001983 __umoddi3
0x001998 __udivmoddi_core 0x00199c __udivmoddi_core
0x0019e5 __divdi3 0x0019e9 __divdi3
0x001a04 __moddi3 0x001a08 __moddi3
0x001a31 __absdi_a 0x001a35 __absdi_a
0x001a39 __absdi_b 0x001a3d __absdi_b
0x001a41 __negdi_a 0x001a45 __negdi_a
0x001a5f __negdi_b 0x001a63 __negdi_b
0x001a7d setjmp 0x001a81 setjmp
0x001aa5 longjmp 0x001aa9 longjmp
0x001acf __umulhisi3_qsq 0x001ad3 __umulhisi3_qsq
0x001ecd __rodata_start 0x001ed1 __rodata_start
0x001ecd __text_end 0x001ed1 __text_end
0x001ecd gChainPath 0x001ed1 gChainPath
0x001ee1 gTitle 0x001ee5 gTitle
0x001eec gMsg 0x001ef0 gMsg
0x001f01 __init_array_end 0x001f05 __init_array_end
0x001f01 __init_array_start 0x001f05 __init_array_start
0x001f01 __rodata_end 0x001f05 __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -101,23 +101,23 @@
0x00a050 __bss_end 0x00a050 __bss_end
0x00a050 __heap_start 0x00a050 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
DrawString = 0x001481 DrawString = 0x001479
EMStartUp = 0x00140f EMStartUp = 0x001407
GetNextEvent = 0x00142e GetNextEvent = 0x001426
MoveTo = 0x001493 MoveTo = 0x00148b
NewHandle = 0x001445 NewHandle = 0x00143d
NewWindow = 0x0014b5 NewWindow = 0x0014ad
QDStartUp = 0x00146b QDStartUp = 0x001463
SetPort = 0x0014a3 SetPort = 0x00149b
ShowWindow = 0x0014cf ShowWindow = 0x0014c7
__absdi_a = 0x001a31 __absdi_a = 0x001a35
__absdi_b = 0x001a39 __absdi_b = 0x001a3d
__ashldi3 = 0x00184f __ashldi3 = 0x001847
__ashlhi3 = 0x00155a __ashlhi3 = 0x001552
__ashlsi3 = 0x0016e2 __ashlsi3 = 0x0016da
__ashrdi3 = 0x001895 __ashrdi3 = 0x00188d
__ashrhi3 = 0x001579 __ashrhi3 = 0x001571
__ashrsi3 = 0x00170c __ashrsi3 = 0x001704
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a050 __bss_end = 0x00a050
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -135,53 +135,53 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x000050 __bss_size = 0x000050
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x00193f __cmpdi2 = 0x001943
__divdi3 = 0x0019e5 __divdi3 = 0x0019e9
__divhi3 = 0x0015a4 __divhi3 = 0x00159c
__divmod_setup = 0x0015d8 __divmod_setup = 0x0015d0
__divmoddi4_stash = 0x001825 __divmoddi4_stash = 0x00181d
__divmodsi_setup = 0x0017d4 __divmodsi_setup = 0x0017cc
__divsi3 = 0x001786 __divsi3 = 0x00177e
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a050 __heap_start = 0x00a050
__indirTarget = 0x00a04e __indirTarget = 0x00a04e
__init_array_end = 0x001f01 __init_array_end = 0x001f05
__init_array_start = 0x001f01 __init_array_start = 0x001f05
__jsl_indir = 0x0014e1 __jsl_indir = 0x0014d9
__lshrdi3 = 0x001872 __lshrdi3 = 0x00186a
__lshrhi3 = 0x001569 __lshrhi3 = 0x001561
__lshrsi3 = 0x0016f7 __lshrsi3 = 0x0016ef
__moddi3 = 0x001a04 __moddi3 = 0x001a08
__modhi3 = 0x0015be __modhi3 = 0x0015b6
__modsi3 = 0x0017ad __modsi3 = 0x0017a5
__muldi3 = 0x0018bb __muldi3 = 0x0018b3
__mulhi3 = 0x0014e4 __mulhi3 = 0x0014dc
__mulsi3 = 0x001629 __mulsi3 = 0x001621
__negdi_a = 0x001a41 __negdi_a = 0x001a45
__negdi_b = 0x001a5f __negdi_b = 0x001a63
__retdi = 0x001842 __retdi = 0x00183a
__rodata_end = 0x001f01 __rodata_end = 0x001f05
__rodata_start = 0x001ecd __rodata_start = 0x001ed1
__start = 0x001000 __start = 0x001000
__text_end = 0x001ecd __text_end = 0x001ed1
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x001916 __ucmpdi2 = 0x00191a
__udivdi3 = 0x001976 __udivdi3 = 0x00197a
__udivhi3 = 0x00158c __udivhi3 = 0x001584
__udivmod_core = 0x00160b __udivmod_core = 0x001603
__udivmoddi_core = 0x001998 __udivmoddi_core = 0x00199c
__udivmodsi_core = 0x001726 __udivmodsi_core = 0x00171e
__udivsi3 = 0x00175e __udivsi3 = 0x001756
__umoddi3 = 0x00197f __umoddi3 = 0x001983
__umodhi3 = 0x001598 __umodhi3 = 0x001590
__umodsi3 = 0x001772 __umodsi3 = 0x00176a
__umulhisi3 = 0x001503 __umulhisi3 = 0x0014fb
__umulhisi3_qsq = 0x001acf __umulhisi3_qsq = 0x001ad3
gChainPath = 0x001ecd gChainPath = 0x001ed1
gMsg = 0x001eec gMsg = 0x001ef0
gTitle = 0x001ee1 gTitle = 0x001ee5
gWp = 0x00a000 gWp = 0x00a000
longjmp = 0x001aa5 longjmp = 0x001aa9
main = 0x0010ba main = 0x0010ba
memset = 0x0013af memset = 0x0013a9
setjmp = 0x001a7d setjmp = 0x001a81

Binary file not shown.

Binary file not shown.

Binary file not shown.

48
demos/layer2Stress.c Normal file
View file

@ -0,0 +1,48 @@
// layer2Stress.c - Layer 2 ptr32 deref miscompile reproducer.
//
// Verifies that *_StackRelIndY uses (the Layer 2 deref pseudo) survive
// W65816StackRelToImg's hot-slot promotion. Each helper writes its
// result to a known address; runInMame.sh --check verifies all four.
#include <stdint.h>
__attribute__((noinline)) uint16_t indexedRead(const uint16_t *arr, uint16_t i) {
return arr[i];
}
__attribute__((noinline)) uint16_t strLen(const char *p) {
uint16_t n = 0;
while (*p) {
p++;
n++;
}
return n;
}
__attribute__((noinline)) uint16_t sumByteToZero(const uint8_t *p) {
uint16_t s = 0;
while (*p) {
s += *p;
p++;
}
return s;
}
static const uint16_t gArr[] = { 100, 200, 300, 400, 500 };
static const uint8_t gBytes[] = { 10, 20, 30, 40, 50, 0 };
static const char gString[] = "Hello, world!";
int main(void) {
*(volatile uint16_t *)0x70 = strLen(gString);
*(volatile uint16_t *)0x72 = sumByteToZero(gBytes);
*(volatile uint16_t *)0x74 = indexedRead(gArr, 3);
*(volatile uint16_t *)0x76 = 0xBEEF;
for (volatile uint32_t s = 0; s < 200000UL; s++) {
}
return 0;
}

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x003102 ( 8450 bytes) .text : 0x001000 .. 0x002d74 ( 7540 bytes)
.rodata : 0x003102 .. 0x00393a ( 2104 bytes) .rodata : 0x002d74 .. 0x0035ac ( 2104 bytes)
.bss : 0x00a000 .. 0x00a086 ( 134 bytes) .bss : 0x00a000 .. 0x00a086 ( 134 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
4058 /home/scott/claude/llvm816/demos/minicad.o 3338 /home/scott/claude/llvm816/demos/minicad.o
43132 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
14895 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,97 +33,99 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x001eee drawWindow 0x001910 doNew
0x002094 memset 0x001b60 doClose
0x0020f4 CtlStartUp 0x001c26 drawWindow
0x002104 NoteAlert 0x001dc4 memset
0x002120 StopAlert 0x001e22 CtlStartUp
0x00213c EMStartUp 0x001e32 NoteAlert
0x00215b GetNextEvent 0x001e4e StopAlert
0x002172 FMStartUp 0x001e6a EMStartUp
0x002182 LEStartUp 0x001e89 GetNextEvent
0x002192 LoadOneTool 0x001ea0 FMStartUp
0x0021a2 NewHandle 0x001eb0 LEStartUp
0x0021c8 MenuStartUp 0x001ec0 LoadOneTool
0x0021d8 HiliteMenu 0x001ed0 NewHandle
0x0021e8 InsertMenu 0x001ef6 MenuStartUp
0x0021fd NewMenu 0x001f06 HiliteMenu
0x002217 QDStartUp 0x001f16 InsertMenu
0x00222d GetPort 0x001f2b NewMenu
0x00223d GlobalToLocal 0x001f45 QDStartUp
0x00224f LineTo 0x001f5b GetPort
0x00225f MoveTo 0x001f6b GlobalToLocal
0x00226f SetPenSize 0x001f7d LineTo
0x00227f CloseWindow 0x001f8d MoveTo
0x002291 FrontWindow 0x001f9d SetPenSize
0x0022a1 GetWRefCon 0x001fad CloseWindow
0x0022bb NewWindow 0x001fbf FrontWindow
0x0022d5 StartDrawing 0x001fcf GetWRefCon
0x0022e7 TaskMaster 0x001fe9 NewWindow
0x0022fe startdesk 0x002003 StartDrawing
0x0026e4 paintDesktopBackdrop 0x002015 TaskMaster
0x002716 __jsl_indir 0x00202c startdesk
0x002719 __mulhi3 0x00234a paintDesktopBackdrop
0x002738 __umulhisi3 0x00237c __jsl_indir
0x00278f __ashlhi3 0x00237f __mulhi3
0x00279e __lshrhi3 0x00239e __umulhisi3
0x0027ae __ashrhi3 0x0023f5 __ashlhi3
0x0027c1 __udivhi3 0x002404 __lshrhi3
0x0027cd __umodhi3 0x002414 __ashrhi3
0x0027d9 __divhi3 0x002427 __udivhi3
0x0027f3 __modhi3 0x002433 __umodhi3
0x00280d __divmod_setup 0x00243f __divhi3
0x002840 __udivmod_core 0x002459 __modhi3
0x00285e __mulsi3 0x002473 __divmod_setup
0x002917 __ashlsi3 0x0024a6 __udivmod_core
0x00292c __lshrsi3 0x0024c4 __mulsi3
0x002941 __ashrsi3 0x00257d __ashlsi3
0x00295b __udivmodsi_core 0x002592 __lshrsi3
0x002993 __udivsi3 0x0025a7 __ashrsi3
0x0029a7 __umodsi3 0x0025c1 __udivmodsi_core
0x0029bb __divsi3 0x0025f9 __udivsi3
0x0029e2 __modsi3 0x00260d __umodsi3
0x002a09 __divmodsi_setup 0x002621 __divsi3
0x002a5a __divmoddi4_stash 0x002648 __modsi3
0x002a77 __retdi 0x00266f __divmodsi_setup
0x002a84 __ashldi3 0x0026c0 __divmoddi4_stash
0x002aa7 __lshrdi3 0x0026dd __retdi
0x002aca __ashrdi3 0x0026ea __ashldi3
0x002af0 __muldi3 0x00270d __lshrdi3
0x002b4b __ucmpdi2 0x002730 __ashrdi3
0x002b74 __cmpdi2 0x002756 __muldi3
0x002bab __udivdi3 0x0027bd __ucmpdi2
0x002bb4 __umoddi3 0x0027e6 __cmpdi2
0x002bcd __udivmoddi_core 0x00281d __udivdi3
0x002c1a __divdi3 0x002826 __umoddi3
0x002c39 __moddi3 0x00283f __udivmoddi_core
0x002c66 __absdi_a 0x00288c __divdi3
0x002c6e __absdi_b 0x0028ab __moddi3
0x002c76 __negdi_a 0x0028d8 __absdi_a
0x002c94 __negdi_b 0x0028e0 __absdi_b
0x002cb2 setjmp 0x0028e8 __negdi_a
0x002cda longjmp 0x002906 __negdi_b
0x002d04 __umulhisi3_qsq 0x002924 setjmp
0x003102 __rodata_start 0x00294c longjmp
0x003102 __text_end 0x002976 __umulhisi3_qsq
0x003102 gChainPath 0x002d74 __rodata_start
0x003116 editMenuStr 0x002d74 __text_end
0x00316f fileMenuStr 0x002d74 gChainPath
0x0031aa appleMenuStr 0x002d88 editMenuStr
0x0031c6 gWindows 0x002de1 fileMenuStr
0x00382e gTitle0 0x002e1c appleMenuStr
0x003837 gTitle1 0x002e38 gWindows
0x003840 gTitle2 0x0034a0 gTitle0
0x003849 gTitle3 0x0034a9 gTitle1
0x003852 gAboutMsg 0x0034b2 gTitle2
0x003895 doAlert.okStr 0x0034bb gTitle3
0x00389a doAlert.button 0x0034c4 gAboutMsg
0x0038b2 doAlert.message 0x003507 doAlert.okStr
0x0038ca doAlert.alertRec 0x00350c doAlert.button
0x003908 sketch.fullMsg 0x003524 doAlert.message
0x00393a __init_array_end 0x00353c doAlert.alertRec
0x00393a __init_array_start 0x00357a sketch.fullMsg
0x00393a __rodata_end 0x0035ac __init_array_end
0x0035ac __init_array_start
0x0035ac __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -137,39 +139,39 @@
0x00a086 __bss_end 0x00a086 __bss_end
0x00a086 __heap_start 0x00a086 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
CloseWindow = 0x00227f CloseWindow = 0x001fad
CtlStartUp = 0x0020f4 CtlStartUp = 0x001e22
EMStartUp = 0x00213c EMStartUp = 0x001e6a
FMStartUp = 0x002172 FMStartUp = 0x001ea0
FrontWindow = 0x002291 FrontWindow = 0x001fbf
GetNextEvent = 0x00215b GetNextEvent = 0x001e89
GetPort = 0x00222d GetPort = 0x001f5b
GetWRefCon = 0x0022a1 GetWRefCon = 0x001fcf
GlobalToLocal = 0x00223d GlobalToLocal = 0x001f6b
HiliteMenu = 0x0021d8 HiliteMenu = 0x001f06
InsertMenu = 0x0021e8 InsertMenu = 0x001f16
LEStartUp = 0x002182 LEStartUp = 0x001eb0
LineTo = 0x00224f LineTo = 0x001f7d
LoadOneTool = 0x002192 LoadOneTool = 0x001ec0
MenuStartUp = 0x0021c8 MenuStartUp = 0x001ef6
MoveTo = 0x00225f MoveTo = 0x001f8d
NewHandle = 0x0021a2 NewHandle = 0x001ed0
NewMenu = 0x0021fd NewMenu = 0x001f2b
NewWindow = 0x0022bb NewWindow = 0x001fe9
NoteAlert = 0x002104 NoteAlert = 0x001e32
QDStartUp = 0x002217 QDStartUp = 0x001f45
SetPenSize = 0x00226f SetPenSize = 0x001f9d
StartDrawing = 0x0022d5 StartDrawing = 0x002003
StopAlert = 0x002120 StopAlert = 0x001e4e
TaskMaster = 0x0022e7 TaskMaster = 0x002015
__absdi_a = 0x002c66 __absdi_a = 0x0028d8
__absdi_b = 0x002c6e __absdi_b = 0x0028e0
__ashldi3 = 0x002a84 __ashldi3 = 0x0026ea
__ashlhi3 = 0x00278f __ashlhi3 = 0x0023f5
__ashlsi3 = 0x002917 __ashlsi3 = 0x00257d
__ashrdi3 = 0x002aca __ashrdi3 = 0x002730
__ashrhi3 = 0x0027ae __ashrhi3 = 0x002414
__ashrsi3 = 0x002941 __ashrsi3 = 0x0025a7
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a086 __bss_end = 0x00a086
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -187,73 +189,75 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x000086 __bss_size = 0x000086
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x002b74 __cmpdi2 = 0x0027e6
__divdi3 = 0x002c1a __divdi3 = 0x00288c
__divhi3 = 0x0027d9 __divhi3 = 0x00243f
__divmod_setup = 0x00280d __divmod_setup = 0x002473
__divmoddi4_stash = 0x002a5a __divmoddi4_stash = 0x0026c0
__divmodsi_setup = 0x002a09 __divmodsi_setup = 0x00266f
__divsi3 = 0x0029bb __divsi3 = 0x002621
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a086 __heap_start = 0x00a086
__indirTarget = 0x00a084 __indirTarget = 0x00a084
__init_array_end = 0x00393a __init_array_end = 0x0035ac
__init_array_start = 0x00393a __init_array_start = 0x0035ac
__jsl_indir = 0x002716 __jsl_indir = 0x00237c
__lshrdi3 = 0x002aa7 __lshrdi3 = 0x00270d
__lshrhi3 = 0x00279e __lshrhi3 = 0x002404
__lshrsi3 = 0x00292c __lshrsi3 = 0x002592
__moddi3 = 0x002c39 __moddi3 = 0x0028ab
__modhi3 = 0x0027f3 __modhi3 = 0x002459
__modsi3 = 0x0029e2 __modsi3 = 0x002648
__muldi3 = 0x002af0 __muldi3 = 0x002756
__mulhi3 = 0x002719 __mulhi3 = 0x00237f
__mulsi3 = 0x00285e __mulsi3 = 0x0024c4
__negdi_a = 0x002c76 __negdi_a = 0x0028e8
__negdi_b = 0x002c94 __negdi_b = 0x002906
__retdi = 0x002a77 __retdi = 0x0026dd
__rodata_end = 0x00393a __rodata_end = 0x0035ac
__rodata_start = 0x003102 __rodata_start = 0x002d74
__start = 0x001000 __start = 0x001000
__text_end = 0x003102 __text_end = 0x002d74
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x002b4b __ucmpdi2 = 0x0027bd
__udivdi3 = 0x002bab __udivdi3 = 0x00281d
__udivhi3 = 0x0027c1 __udivhi3 = 0x002427
__udivmod_core = 0x002840 __udivmod_core = 0x0024a6
__udivmoddi_core = 0x002bcd __udivmoddi_core = 0x00283f
__udivmodsi_core = 0x00295b __udivmodsi_core = 0x0025c1
__udivsi3 = 0x002993 __udivsi3 = 0x0025f9
__umoddi3 = 0x002bb4 __umoddi3 = 0x002826
__umodhi3 = 0x0027cd __umodhi3 = 0x002433
__umodsi3 = 0x0029a7 __umodsi3 = 0x00260d
__umulhisi3 = 0x002738 __umulhisi3 = 0x00239e
__umulhisi3_qsq = 0x002d04 __umulhisi3_qsq = 0x002976
appleMenuStr = 0x0031aa appleMenuStr = 0x002e1c
doAlert.alertRec = 0x0038ca doAlert.alertRec = 0x00353c
doAlert.button = 0x00389a doAlert.button = 0x00350c
doAlert.message = 0x0038b2 doAlert.message = 0x003524
doAlert.okStr = 0x003895 doAlert.okStr = 0x003507
doClose = 0x001b60
doNew = 0x001910
doNew.wp = 0x00a02e doNew.wp = 0x00a02e
drawWindow = 0x001eee drawWindow = 0x001c26
editMenuStr = 0x003116 editMenuStr = 0x002d88
fileMenuStr = 0x00316f fileMenuStr = 0x002de1
gAboutMsg = 0x003852 gAboutMsg = 0x0034c4
gChainPath = 0x003102 gChainPath = 0x002d74
gDone = 0x00a02c gDone = 0x00a02c
gDpBase = 0x00a082 gDpBase = 0x00a082
gDpHandle = 0x00a07e gDpHandle = 0x00a07e
gEvent = 0x00a000 gEvent = 0x00a000
gTitle0 = 0x00382e gTitle0 = 0x0034a0
gTitle1 = 0x003837 gTitle1 = 0x0034a9
gTitle2 = 0x003840 gTitle2 = 0x0034b2
gTitle3 = 0x003849 gTitle3 = 0x0034bb
gUserId = 0x00a07c gUserId = 0x00a07c
gWindows = 0x0031c6 gWindows = 0x002e38
longjmp = 0x002cda longjmp = 0x00294c
main = 0x0010ba main = 0x0010ba
memset = 0x002094 memset = 0x001dc4
paintDesktopBackdrop = 0x0026e4 paintDesktopBackdrop = 0x00234a
setjmp = 0x002cb2 setjmp = 0x002924
sketch.fullMsg = 0x003908 sketch.fullMsg = 0x00357a
startdesk = 0x0022fe startdesk = 0x00202c

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,20 +1,20 @@
# section layout # section layout
.text : 0x001000 .. 0x0057d5 ( 18389 bytes) .text : 0x001000 .. 0x004457 ( 13399 bytes)
.rodata : 0x0057d5 .. 0x005c31 ( 1116 bytes) .rodata : 0x004457 .. 0x0048b3 ( 1116 bytes)
.bss : 0x00a000 .. 0x00a197 ( 407 bytes) .bss : 0x00a000 .. 0x00a197 ( 407 bytes)
# per-input-file .text contributions # per-input-file .text contributions
186 /home/scott/claude/llvm816/runtime/crt0Gsos.o 186 /home/scott/claude/llvm816/runtime/crt0Gsos.o
13790 /home/scott/claude/llvm816/demos/reversi.o 8992 /home/scott/claude/llvm816/demos/reversi.o
43132 /home/scott/claude/llvm816/runtime/libc.o 30853 /home/scott/claude/llvm816/runtime/libc.o
14895 /home/scott/claude/llvm816/runtime/snprintf.o 9098 /home/scott/claude/llvm816/runtime/snprintf.o
11953 /home/scott/claude/llvm816/runtime/extras.o 10865 /home/scott/claude/llvm816/runtime/extras.o
7077 /home/scott/claude/llvm816/runtime/softFloat.o 4374 /home/scott/claude/llvm816/runtime/softFloat.o
15379 /home/scott/claude/llvm816/runtime/softDouble.o 13388 /home/scott/claude/llvm816/runtime/softDouble.o
176 /home/scott/claude/llvm816/runtime/iigsGsos.o 176 /home/scott/claude/llvm816/runtime/iigsGsos.o
20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o 20670 /home/scott/claude/llvm816/runtime/iigsToolbox.o
1349 /home/scott/claude/llvm816/runtime/desktop.o 1139 /home/scott/claude/llvm816/runtime/desktop.o
2540 /home/scott/claude/llvm816/runtime/libgcc.o 2552 /home/scott/claude/llvm816/runtime/libgcc.o
# global symbols (sorted by address) # global symbols (sorted by address)
0x000000 __bss_bank 0x000000 __bss_bank
@ -33,120 +33,123 @@
0x001000 __start 0x001000 __start
0x001000 __text_start 0x001000 __text_start
0x0010ba main 0x0010ba main
0x002056 newGame 0x001c6f newGame
0x00221d findMove 0x001dad findMove
0x00264d drawScore 0x00201e drawScore
0x0028ff drawMovesList 0x0022d0 drawMovesList
0x002b01 drawSquare 0x0024d2 drawBoard
0x002f25 makeAMove 0x00256f drawSquare
0x0032c9 checkForDone 0x0029aa getMoves
0x003ec1 scoreMove 0x002b61 makeAMove
0x004698 memcpy 0x002d1d checkForDone
0x00471a memset 0x002f11 applyMove
0x00477a CtlStartUp 0x0030de scoreMove
0x00478a NoteAlert 0x0033da memcpy
0x0047a6 StopAlert 0x00345a memset
0x0047c2 EMStartUp 0x0034b8 CtlStartUp
0x0047e1 FMStartUp 0x0034c8 NoteAlert
0x0047f1 LEStartUp 0x0034e4 StopAlert
0x004801 LoadOneTool 0x003500 EMStartUp
0x004811 NewHandle 0x00351f FMStartUp
0x004837 MenuStartUp 0x00352f LEStartUp
0x004847 CheckMItem 0x00353f LoadOneTool
0x004857 HiliteMenu 0x00354f NewHandle
0x004867 InsertMenu 0x003575 MenuStartUp
0x00487c NewMenu 0x003585 CheckMItem
0x004896 QDStartUp 0x003595 HiliteMenu
0x0048ac DrawString 0x0035a5 InsertMenu
0x0048be FrameOval 0x0035ba NewMenu
0x0048d0 GetPort 0x0035d4 QDStartUp
0x0048e0 GetPortRect 0x0035ea DrawString
0x0048f2 GlobalToLocal 0x0035fc FrameOval
0x004904 LineTo 0x00360e GetPort
0x004914 MoveTo 0x00361e GetPortRect
0x004924 PaintOval 0x003630 GlobalToLocal
0x004936 PaintRect 0x003642 LineTo
0x004948 SetPort 0x003652 MoveTo
0x00495a BeginUpdate 0x003662 PaintOval
0x00496c EndUpdate 0x003674 PaintRect
0x00497e FrontWindow 0x003686 SetPort
0x00498e NewWindow 0x003698 BeginUpdate
0x0049a8 SelectWindow 0x0036aa EndUpdate
0x0049ba TaskMaster 0x0036bc FrontWindow
0x0049d1 startdesk 0x0036cc NewWindow
0x004db7 paintDesktopBackdrop 0x0036e6 SelectWindow
0x004de9 __jsl_indir 0x0036f8 TaskMaster
0x004dec __mulhi3 0x00370f startdesk
0x004e0b __umulhisi3 0x003a2d paintDesktopBackdrop
0x004e62 __ashlhi3 0x003a5f __jsl_indir
0x004e71 __lshrhi3 0x003a62 __mulhi3
0x004e81 __ashrhi3 0x003a81 __umulhisi3
0x004e94 __udivhi3 0x003ad8 __ashlhi3
0x004ea0 __umodhi3 0x003ae7 __lshrhi3
0x004eac __divhi3 0x003af7 __ashrhi3
0x004ec6 __modhi3 0x003b0a __udivhi3
0x004ee0 __divmod_setup 0x003b16 __umodhi3
0x004f13 __udivmod_core 0x003b22 __divhi3
0x004f31 __mulsi3 0x003b3c __modhi3
0x004fea __ashlsi3 0x003b56 __divmod_setup
0x004fff __lshrsi3 0x003b89 __udivmod_core
0x005014 __ashrsi3 0x003ba7 __mulsi3
0x00502e __udivmodsi_core 0x003c60 __ashlsi3
0x005066 __udivsi3 0x003c75 __lshrsi3
0x00507a __umodsi3 0x003c8a __ashrsi3
0x00508e __divsi3 0x003ca4 __udivmodsi_core
0x0050b5 __modsi3 0x003cdc __udivsi3
0x0050dc __divmodsi_setup 0x003cf0 __umodsi3
0x00512d __divmoddi4_stash 0x003d04 __divsi3
0x00514a __retdi 0x003d2b __modsi3
0x005157 __ashldi3 0x003d52 __divmodsi_setup
0x00517a __lshrdi3 0x003da3 __divmoddi4_stash
0x00519d __ashrdi3 0x003dc0 __retdi
0x0051c3 __muldi3 0x003dcd __ashldi3
0x00521e __ucmpdi2 0x003df0 __lshrdi3
0x005247 __cmpdi2 0x003e13 __ashrdi3
0x00527e __udivdi3 0x003e39 __muldi3
0x005287 __umoddi3 0x003ea0 __ucmpdi2
0x0052a0 __udivmoddi_core 0x003ec9 __cmpdi2
0x0052ed __divdi3 0x003f00 __udivdi3
0x00530c __moddi3 0x003f09 __umoddi3
0x005339 __absdi_a 0x003f22 __udivmoddi_core
0x005341 __absdi_b 0x003f6f __divdi3
0x005349 __negdi_a 0x003f8e __moddi3
0x005367 __negdi_b 0x003fbb __absdi_a
0x005385 setjmp 0x003fc3 __absdi_b
0x0053ad longjmp 0x003fcb __negdi_a
0x0053d7 __umulhisi3_qsq 0x003fe9 __negdi_b
0x0057d5 __rodata_start 0x004007 setjmp
0x0057d5 __text_end 0x00402f longjmp
0x0057d5 gChainPath 0x004059 __umulhisi3_qsq
0x0057e9 gColor 0x004457 __rodata_start
0x0057eb optionsMenuStr 0x004457 __text_end
0x005874 levelMenuStr 0x004457 gChainPath
0x0058ee editMenuStr 0x00446b gColor
0x005961 fileMenuStr 0x00446d optionsMenuStr
0x0059a0 appleMenuStr 0x0044f6 levelMenuStr
0x0059c0 gBoardName 0x004570 editMenuStr
0x0059c9 gScoreName 0x0045e3 fileMenuStr
0x0059d1 gMovesName 0x004622 appleMenuStr
0x0059d8 gAboutMsg 0x004642 gBoardName
0x005a1a doAlert.okStr 0x00464b gScoreName
0x005a1f doAlert.button 0x004653 gMovesName
0x005a37 doAlert.message 0x00465a gAboutMsg
0x005a4f doAlert.alertRec 0x00469c doAlert.okStr
0x005a8d gPly 0x0046a1 doAlert.button
0x005a8f gCantPassMsg 0x0046b9 doAlert.message
0x005aba gIllegalMsg 0x0046d1 doAlert.alertRec
0x005ad5 gDrawMsg 0x00470f gPly
0x005af7 gWhiteWinsMsg 0x004711 gCantPassMsg
0x005b0d gBlackWinsMsg 0x00473c gIllegalMsg
0x005b23 gPassMsg 0x004757 gDrawMsg
0x005b44 gDisp 0x004779 gWhiteWinsMsg
0x005b54 gSqScore 0x00478f gBlackWinsMsg
0x005c1c scoreString.tpl 0x0047a5 gPassMsg
0x005c31 __init_array_end 0x0047c6 gDisp
0x005c31 __init_array_start 0x0047d6 gSqScore
0x005c31 __rodata_end 0x00489e scoreString.tpl
0x0048b3 __init_array_end
0x0048b3 __init_array_start
0x0048b3 __rodata_end
0x00a000 __bss_lo16 0x00a000 __bss_lo16
0x00a000 __bss_seg0_lo16 0x00a000 __bss_seg0_lo16
0x00a000 __bss_start 0x00a000 __bss_start
@ -171,44 +174,44 @@
0x00a197 __bss_end 0x00a197 __bss_end
0x00a197 __heap_start 0x00a197 __heap_start
0x00bf00 __heap_end 0x00bf00 __heap_end
BeginUpdate = 0x00495a BeginUpdate = 0x003698
CheckMItem = 0x004847 CheckMItem = 0x003585
CtlStartUp = 0x00477a CtlStartUp = 0x0034b8
DrawString = 0x0048ac DrawString = 0x0035ea
EMStartUp = 0x0047c2 EMStartUp = 0x003500
EndUpdate = 0x00496c EndUpdate = 0x0036aa
FMStartUp = 0x0047e1 FMStartUp = 0x00351f
FrameOval = 0x0048be FrameOval = 0x0035fc
FrontWindow = 0x00497e FrontWindow = 0x0036bc
GetPort = 0x0048d0 GetPort = 0x00360e
GetPortRect = 0x0048e0 GetPortRect = 0x00361e
GlobalToLocal = 0x0048f2 GlobalToLocal = 0x003630
HiliteMenu = 0x004857 HiliteMenu = 0x003595
InsertMenu = 0x004867 InsertMenu = 0x0035a5
LEStartUp = 0x0047f1 LEStartUp = 0x00352f
LineTo = 0x004904 LineTo = 0x003642
LoadOneTool = 0x004801 LoadOneTool = 0x00353f
MenuStartUp = 0x004837 MenuStartUp = 0x003575
MoveTo = 0x004914 MoveTo = 0x003652
NewHandle = 0x004811 NewHandle = 0x00354f
NewMenu = 0x00487c NewMenu = 0x0035ba
NewWindow = 0x00498e NewWindow = 0x0036cc
NoteAlert = 0x00478a NoteAlert = 0x0034c8
PaintOval = 0x004924 PaintOval = 0x003662
PaintRect = 0x004936 PaintRect = 0x003674
QDStartUp = 0x004896 QDStartUp = 0x0035d4
SelectWindow = 0x0049a8 SelectWindow = 0x0036e6
SetPort = 0x004948 SetPort = 0x003686
StopAlert = 0x0047a6 StopAlert = 0x0034e4
TaskMaster = 0x0049ba TaskMaster = 0x0036f8
__absdi_a = 0x005339 __absdi_a = 0x003fbb
__absdi_b = 0x005341 __absdi_b = 0x003fc3
__ashldi3 = 0x005157 __ashldi3 = 0x003dcd
__ashlhi3 = 0x004e62 __ashlhi3 = 0x003ad8
__ashlsi3 = 0x004fea __ashlsi3 = 0x003c60
__ashrdi3 = 0x00519d __ashrdi3 = 0x003e13
__ashrhi3 = 0x004e81 __ashrhi3 = 0x003af7
__ashrsi3 = 0x005014 __ashrsi3 = 0x003c8a
__bss_bank = 0x000000 __bss_bank = 0x000000
__bss_end = 0x00a197 __bss_end = 0x00a197
__bss_lo16 = 0x00a000 __bss_lo16 = 0x00a000
@ -226,102 +229,105 @@ __bss_seg3_lo16 = 0x000000
__bss_seg3_size = 0x000000 __bss_seg3_size = 0x000000
__bss_size = 0x000197 __bss_size = 0x000197
__bss_start = 0x00a000 __bss_start = 0x00a000
__cmpdi2 = 0x005247 __cmpdi2 = 0x003ec9
__divdi3 = 0x0052ed __divdi3 = 0x003f6f
__divhi3 = 0x004eac __divhi3 = 0x003b22
__divmod_setup = 0x004ee0 __divmod_setup = 0x003b56
__divmoddi4_stash = 0x00512d __divmoddi4_stash = 0x003da3
__divmodsi_setup = 0x0050dc __divmodsi_setup = 0x003d52
__divsi3 = 0x00508e __divsi3 = 0x003d04
__heap_end = 0x00bf00 __heap_end = 0x00bf00
__heap_start = 0x00a197 __heap_start = 0x00a197
__indirTarget = 0x00a195 __indirTarget = 0x00a195
__init_array_end = 0x005c31 __init_array_end = 0x0048b3
__init_array_start = 0x005c31 __init_array_start = 0x0048b3
__jsl_indir = 0x004de9 __jsl_indir = 0x003a5f
__lshrdi3 = 0x00517a __lshrdi3 = 0x003df0
__lshrhi3 = 0x004e71 __lshrhi3 = 0x003ae7
__lshrsi3 = 0x004fff __lshrsi3 = 0x003c75
__moddi3 = 0x00530c __moddi3 = 0x003f8e
__modhi3 = 0x004ec6 __modhi3 = 0x003b3c
__modsi3 = 0x0050b5 __modsi3 = 0x003d2b
__muldi3 = 0x0051c3 __muldi3 = 0x003e39
__mulhi3 = 0x004dec __mulhi3 = 0x003a62
__mulsi3 = 0x004f31 __mulsi3 = 0x003ba7
__negdi_a = 0x005349 __negdi_a = 0x003fcb
__negdi_b = 0x005367 __negdi_b = 0x003fe9
__retdi = 0x00514a __retdi = 0x003dc0
__rodata_end = 0x005c31 __rodata_end = 0x0048b3
__rodata_start = 0x0057d5 __rodata_start = 0x004457
__start = 0x001000 __start = 0x001000
__text_end = 0x0057d5 __text_end = 0x004457
__text_start = 0x001000 __text_start = 0x001000
__ucmpdi2 = 0x00521e __ucmpdi2 = 0x003ea0
__udivdi3 = 0x00527e __udivdi3 = 0x003f00
__udivhi3 = 0x004e94 __udivhi3 = 0x003b0a
__udivmod_core = 0x004f13 __udivmod_core = 0x003b89
__udivmoddi_core = 0x0052a0 __udivmoddi_core = 0x003f22
__udivmodsi_core = 0x00502e __udivmodsi_core = 0x003ca4
__udivsi3 = 0x005066 __udivsi3 = 0x003cdc
__umoddi3 = 0x005287 __umoddi3 = 0x003f09
__umodhi3 = 0x004ea0 __umodhi3 = 0x003b16
__umodsi3 = 0x00507a __umodsi3 = 0x003cf0
__umulhisi3 = 0x004e0b __umulhisi3 = 0x003a81
__umulhisi3_qsq = 0x0053d7 __umulhisi3_qsq = 0x004059
appleMenuStr = 0x0059a0 appleMenuStr = 0x004622
checkForDone = 0x0032c9 applyMove = 0x002f11
doAlert.alertRec = 0x005a4f checkForDone = 0x002d1d
doAlert.button = 0x005a1f doAlert.alertRec = 0x0046d1
doAlert.message = 0x005a37 doAlert.button = 0x0046a1
doAlert.okStr = 0x005a1a doAlert.message = 0x0046b9
drawMovesList = 0x0028ff doAlert.okStr = 0x00469c
drawScore = 0x00264d drawBoard = 0x0024d2
drawSquare = 0x002b01 drawMovesList = 0x0022d0
editMenuStr = 0x0058ee drawScore = 0x00201e
fileMenuStr = 0x005961 drawSquare = 0x00256f
findMove = 0x00221d editMenuStr = 0x004570
gAboutMsg = 0x0059d8 fileMenuStr = 0x0045e3
gBlackWinsMsg = 0x005b0d findMove = 0x001dad
gAboutMsg = 0x00465a
gBlackWinsMsg = 0x00478f
gBoard = 0x00a08e gBoard = 0x00a08e
gBoardName = 0x0059c0 gBoardName = 0x004642
gBoardWin = 0x00a082 gBoardWin = 0x00a082
gCantPassMsg = 0x005a8f gCantPassMsg = 0x004711
gChainPath = 0x0057d5 gChainPath = 0x004457
gColor = 0x0057e9 gColor = 0x00446b
gCurrentColor = 0x00a032 gCurrentColor = 0x00a032
gDisp = 0x005b44 gDisp = 0x0047c6
gDone = 0x00a02c gDone = 0x00a02c
gDpBase = 0x00a193 gDpBase = 0x00a193
gDpHandle = 0x00a18f gDpHandle = 0x00a18f
gDrawMsg = 0x005ad5 gDrawMsg = 0x004757
gEvent = 0x00a000 gEvent = 0x00a000
gIllegalMsg = 0x005aba gIllegalMsg = 0x00473c
gMoveNotation = 0x00a189 gMoveNotation = 0x00a189
gMoves = 0x00a0f4 gMoves = 0x00a0f4
gMovesLeft = 0x00a02e gMovesLeft = 0x00a02e
gMovesMade = 0x00a0f2 gMovesMade = 0x00a0f2
gMovesName = 0x0059d1 gMovesName = 0x004653
gMovesWin = 0x00a08a gMovesWin = 0x00a08a
gPassMsg = 0x005b23 gPassMsg = 0x0047a5
gPly = 0x005a8d gPly = 0x00470f
gScoreBuf = 0x00a174 gScoreBuf = 0x00a174
gScoreName = 0x0059c9 gScoreName = 0x00464b
gScoreWin = 0x00a086 gScoreWin = 0x00a086
gSelfPlay = 0x00a030 gSelfPlay = 0x00a030
gSqScore = 0x005b54 gSqScore = 0x0047d6
gUserId = 0x00a18d gUserId = 0x00a18d
gWhiteWinsMsg = 0x005af7 gWhiteWinsMsg = 0x004779
getMoves = 0x0029aa
initWindows.wp = 0x00a034 initWindows.wp = 0x00a034
levelMenuStr = 0x005874 levelMenuStr = 0x0044f6
longjmp = 0x0053ad longjmp = 0x00402f
main = 0x0010ba main = 0x0010ba
makeAMove = 0x002f25 makeAMove = 0x002b61
memcpy = 0x004698 memcpy = 0x0033da
memset = 0x00471a memset = 0x00345a
newGame = 0x002056 newGame = 0x001c6f
optionsMenuStr = 0x0057eb optionsMenuStr = 0x00446d
paintDesktopBackdrop = 0x004db7 paintDesktopBackdrop = 0x003a2d
scoreMove = 0x003ec1 scoreMove = 0x0030de
scoreString.tpl = 0x005c1c scoreString.tpl = 0x00489e
setjmp = 0x005385 setjmp = 0x004007
startdesk = 0x0049d1 startdesk = 0x00370f

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,19 +1,90 @@
# Installing llvm816 # Installing llvm816
The project installs everything into `tools/` under the repo root, so This document covers everything you need to get from a fresh Ubuntu /
the tree is self-contained and deletable without affecting your system. Debian install to a working W65816 Clang compiler + Apple IIgs MAME
emulator + matching runtime libraries. The entire toolchain installs
*locally* under your repo checkout — nothing goes into `/usr/local`,
`/opt`, or your home directory beyond a few standard apt packages.
If you've never built LLVM or used a cross-compiler before, follow this
document top to bottom. If you're comfortable, the
[One-command install](#one-command-install) section gets you running
in 5-10 minutes.
---
## What you'll have when it's done
After install, the `llvm816/` directory tree contains everything:
| Component | Disk usage (approx) | Purpose |
|---|---:|---|
| `tools/llvm-mos/` | 5.0 GB | LLVM source tree (clone of llvm-mos). Our backend source is *symlinked* into here at build time. |
| `tools/llvm-mos-build/` | 1.4 GB | Compiled clang/llc/llvm-mc binaries. This is where you actually run the compiler from. |
| `tools/llvm-mos-sdk/` | 400 MB | Prebuilt llvm-mos SDK (the original 6502 distribution). Mostly unused by us; kept as a reference baseline. |
| `tools/calypsi/` | 580 MB | Commercial Calypsi 5.16 65816 C compiler — installed for output-quality comparisons in `compare/`. |
| `tools/orca-c/` | 10 MB | Apple's official ORCA/C compiler source — header reference for the IIgs Toolbox bindings. |
| `tools/gsos/` | 13 MB | Apple GS/OS 6.0.2 / 6.0.4 disk images for booting under MAME. |
| `tools/mame/roms/` | 1.5 MB | Apple IIgs ROM 01 + ROM 03 (downloaded from archive.org). |
| **Total** | **~7-8 GB** | |
Plus a few system-wide apt packages (cmake, ninja, MAME, ...). These
are listed up front in [System requirements](#system-requirements) so
you can audit them before installing.
The compiler binary itself is at
**`tools/llvm-mos-build/bin/clang`** — you'll see this path referenced
everywhere.
---
## System requirements ## System requirements
- **Ubuntu 22.04 or 24.04** (or any Debian-based distro with apt). - **OS:** Ubuntu 22.04 LTS or 24.04 LTS (other Debian-based distros work
Other Linuxes work if you can install the packages listed below if you can find equivalents for the apt packages). Pure Arch / Fedora
by hand. installs need package-name translation but the project itself is
- **Disk:** ~10 GB free (LLVM build artifacts dominate). distro-neutral.
- **RAM:** 8 GB minimum, 16 GB recommended for the `--build-llvm` - **CPU:** any 64-bit x86 or ARM Linux machine. We're cross-compiling,
flag. The setup script's default skips the LLVM build and so the host CPU only matters for build speed.
downloads a prebuilt toolchain instead — much faster, ~500 MB. - **Disk:** ~10 GB free total (~5 GB during build, ~7 GB after install
- **Build time:** ~5 minutes for the default (prebuilt) path; 30-60 with all reference compilers). If you skip Calypsi (`--skip-calypsi`),
minutes for `--build-llvm` (full LLVM source build). knock 580 MB off.
- **RAM:** 8 GB minimum for the default install (downloads a prebuilt
llvm-mos SDK). 16 GB recommended if you use `--build-llvm` (compiles
LLVM from source).
- **Time:** ~5 minutes for the default (prebuilt) path; 30-60 minutes
for `--build-llvm` on a modern laptop (depends on core count).
- **Network:** the install pulls ~500 MB of binaries from GitHub,
archive.org, and the Calypsi releases page. No proxy support
baked in — set `http_proxy` / `https_proxy` if you need one.
### apt packages installed
`scripts/installDeps.sh` runs `sudo apt-get install` for these
packages:
| Package | Why it's needed |
|---|---|
| `build-essential` | gcc, make, libc-dev — needed to build LLVM and our linker |
| `cmake`, `ninja-build` | LLVM's build system |
| `clang`, `lld` | Bootstrap a host clang (faster LLVM build than gcc) |
| `python3`, `python3-pip` | Build-time scripting + LLVM's lit test runner |
| `git` | Cloning the llvm-mos source tree |
| `zlib1g-dev`, `libedit-dev`, `libxml2-dev`, `libncurses-dev` | LLVM link-time deps |
| `zstd`, `xz-utils`, `unzip`, `tar` | Unpacking downloaded archives |
| `lua5.4`, `liblua5.4-dev` | MAME's autoboot scripting (used by the smoke harness) |
| `curl`, `ca-certificates` | Downloading installer payloads |
| `mame`, `mame-tools` | Apple IIgs emulator |
All packages are installed with `--no-install-recommends`, so the
total apt footprint is bounded. If you want to inspect or audit before
running, see `scripts/installDeps.sh` for the exact list.
> **Requires sudo:** the apt install step needs root. `setup.sh`
> prompts via `sudo` once and then continues without root for everything
> else.
---
## One-command install ## One-command install
@ -23,78 +94,199 @@ cd llvm816
./setup.sh ./setup.sh
``` ```
`setup.sh` installs: That's it. `setup.sh` runs five stages in order:
1. **System apt packages** — build-essential, cmake, ninja, clang, lld, | Stage | Script | What it does | Time |
python3, MAME, etc. See [`scripts/installDeps.sh`](../scripts/installDeps.sh) |---|---|---|---|
for the full list. *Requires sudo.* | 1/5 | `installDeps.sh` | `sudo apt-get install` the packages listed above. | ~1 min |
2. **llvm-mos** — source tree clone at `tools/llvm-mos/` and a prebuilt | 2/5 | `installLlvmMos.sh` | Clone `llvm-mos` source (5 GB), download prebuilt llvm-mos SDK (400 MB), build our W65816 clang under `tools/llvm-mos-build/`. Without `--build-llvm`, downloads the prebuilt SDK only; clang for our target is then built incrementally. | ~5 min (no source build) or 30-60 min (with `--build-llvm`) |
SDK at `tools/llvm-mos-sdk/`. With `--build-llvm` it also runs | 3/5 | `installMame.sh` | Install MAME via apt, download `apple2gs.zip` (ROM 03) and `apple2gsr1.zip` (ROM 01) into `tools/mame/roms/`. | ~30 s |
cmake/ninja to build a usable W65816-aware clang at | 4/5 | `installCalypsi.sh` | Download Calypsi 5.16 .deb, extract its payload into `tools/calypsi/` (no system-wide install). | ~30 s |
`tools/llvm-mos-build/bin/clang`. | 5/5 | `installOrcaC.sh` | Shallow clone of byteworksinc's ORCA/C repo into `tools/orca-c/` for toolbox header reference. | ~15 s |
3. **Apple IIgs MAME** — installs MAME via apt and downloads the
apple2gs ROMs to `tools/mame/roms/`.
4. **Calypsi 5.16** — reference 65816 C compiler, installed to
`tools/calypsi/`. Used by the `compare/` benchmarks to measure
our codegen quality against a commercial baseline.
5. **ORCA/C** — Apple's official 65816 C compiler (header reference
for the IIgs Toolbox bindings).
After `setup.sh` finishes: After each stage, the script prints `=== N/5 stage-name ===` so you
can follow progress. At the end it runs `verify.sh` which sanity-
checks every tool was installed.
A successful install ends with:
```
[llvm816] setup complete
```
### `setup.sh` flags
| Flag | Effect |
|---|---|
| `--build-llvm` | Build clang from source (30-60 min) instead of using the prebuilt SDK. Required if you plan to modify the W65816 backend. |
| `--skip-deps` | Don't run apt (use if you've already installed the system packages). |
| `--skip-llvm` | Skip the LLVM clone + build. Useful for iterating on other parts. |
| `--skip-mame` | Skip MAME + ROM download. |
| `--skip-calypsi` | Skip Calypsi (saves 580 MB if you don't need the comparison benchmarks). |
| `--skip-orca` | Skip ORCA/C (saves ~10 MB; only needed if you regenerate `iigs/toolbox.h`). |
| `--skip-verify` | Don't run the post-install verification check. |
| `--verify-only` | Just run the verification check, don't install anything. |
---
## What gets installed where
After a complete `setup.sh` run, your `llvm816/` directory looks like
this:
```
llvm816/ ← your repo checkout
├── setup.sh ← the installer you just ran
├── README.md
├── docs/ ← documentation (you're reading INSTALL.md)
│ ├── INSTALL.md
│ ├── USAGE.md
│ └── multiSegmentPlan.md
├── src/ ← OUR backend source (W65816 target)
│ ├── llvm/lib/Target/W65816/ ← ~41 files; symlinked into tools/llvm-mos
│ ├── clang/ ← clang frontend hooks
│ └── link816/ ← linker source
├── patches/ ← upstream-llvm-mos patches
├── runtime/ ← C standard library + crt0
│ ├── include/ ← <stdio.h>, <stdlib.h>, <iigs/toolbox.h> ...
│ ├── src/ ← .c and .s sources
│ └── *.o ← built object files (after runtime/build.sh)
├── scripts/ ← install + run + bench scripts
├── benchmarks/ ← cycle-count benchmarks
├── compare/ ← side-by-side ours-vs-Calypsi assembly
├── demos/ ← example IIgs programs (helloBeep, reversi, ...)
├── tests/ ← larger compile-only tests (e.g. tests/lua/)
└── tools/ ← everything installed by setup.sh
├── llvm-mos/ (5.0 GB) — LLVM source tree clone
│ └── llvm/lib/Target/W65816/ ← symlinks point back to ../../src/llvm/lib/Target/W65816/
├── llvm-mos-build/ (1.4 GB) — cmake build directory
│ └── bin/
│ ├── clang → clang-23 ← THE COMPILER ⭐
│ ├── clang++ ← C++ driver
│ ├── clang-23 ← actual binary
│ ├── llc ← standalone codegen (.ll → .s)
│ ├── llvm-mc ← standalone assembler
│ ├── llvm-objdump ← disassembler
│ └── ... (FileCheck, llvm-readobj, opt, etc.)
├── llvm-mos-sdk/ (400 MB) — prebuilt llvm-mos SDK
├── link816 (~120 KB) — OUR LINKER ⭐
├── omfEmit (~70 KB) — OMF v2.1 emitter for GS/OS Loader
├── cadius/ — Apple ProDOS / GS/OS disk image tool
├── calypsi/ (580 MB) — Calypsi 5.16 reference compiler
├── orca-c/ (10 MB) — ORCA/C compiler (header reference)
├── gsos/ (13 MB) — GS/OS 6.0.2 / 6.0.4 disk images
├── mame/
│ └── roms/ (1.5 MB) — apple2gs ROM 01 + ROM 03
└── venv/ — Python venv (used by genToolbox.py)
```
The starred items (`clang` and `link816`) are the two binaries you
interact with daily. See [USAGE.md](USAGE.md) for how to use them.
### Why so much LLVM?
`tools/llvm-mos/` is a full LLVM source tree (about 5 GB). We need it
because our W65816 backend is part of LLVM — we build it as a
target inside LLVM's normal codegen pipeline. Our backend lives in
`src/llvm/lib/Target/W65816/`; `scripts/applyBackend.sh` symlinks it
into the LLVM source tree under `tools/llvm-mos/llvm/lib/Target/W65816/`
so cmake picks it up.
The size is unavoidable if you want to rebuild the backend. If you
don't plan to modify it, you could `rm -rf tools/llvm-mos` after a
successful build — but you'd have to re-clone to fix bugs.
### What's outside `tools/`?
Only the apt packages from [System requirements](#system-requirements).
`apt-get remove mame mame-tools` cleans those up if you want to fully
uninstall.
Importantly, the installer does **NOT** touch:
- `/usr/local/` (no `make install`)
- `/opt/` (no FHS-style component install)
- `~/.mame/` (we use `-rompath` to point at `tools/mame/roms/` instead)
- `~/.cache/` (downloads go to the repo-local `.cache/`)
To uninstall completely:
```bash ```bash
ls tools/llvm-mos-build/bin/clang # our compiler rm -rf llvm816/
ls tools/link816 # our linker sudo apt-get remove mame mame-tools build-essential cmake ninja-build \
mame -version # MAME (installed via apt) clang lld lua5.4 liblua5.4-dev # if you want apt deps gone too
``` ```
---
## Step-by-step (if `setup.sh` fails) ## Step-by-step (if `setup.sh` fails)
You can run each install script in isolation: You can run each install script in isolation if a stage breaks. They're
idempotent — running them twice is a no-op (or a "fetch updates" for the
LLVM clone).
```bash ```bash
scripts/installDeps.sh # apt packages bash scripts/installDeps.sh # apt packages
scripts/installLlvmMos.sh # llvm-mos clone + prebuilt SDK bash scripts/installLlvmMos.sh # llvm-mos clone + prebuilt SDK
scripts/installLlvmMos.sh --build # also build the source (slow) bash scripts/installLlvmMos.sh --build # also build clang/llc (slow)
scripts/installMame.sh # MAME + apple2gs ROMs bash scripts/installMame.sh # MAME + apple2gs ROMs
scripts/installCalypsi.sh # reference compiler (optional) bash scripts/installCalypsi.sh # reference compiler (optional)
scripts/installOrcaC.sh # reference compiler (optional) bash scripts/installOrcaC.sh # reference compiler (optional)
bash scripts/verify.sh # sanity-check everything
``` ```
If you only want to build C programs (no benchmarks, no comparison If you only want to *build* C programs (no benchmarks, no comparisons),
to Calypsi), `installCalypsi.sh` and `installOrcaC.sh` are `installCalypsi.sh` and `installOrcaC.sh` are skippable.
optional.
## Building the W65816 backend from source ### Building W65816 clang from source
The default install pulls a prebuilt LLVM SDK. To build our The default install pulls a *prebuilt* llvm-mos SDK but builds our
W65816-aware clang from source: W65816 backend incrementally on top. If you want to build everything
from source (recommended for backend development):
```bash ```bash
./setup.sh --build-llvm ./setup.sh --build-llvm
``` ```
Or, after a non-`--build-llvm` install: This adds about 30-60 minutes to install time but means you can edit
files under `src/llvm/lib/Target/W65816/` and rebuild quickly.
After the initial source build, incremental rebuilds after editing
backend code take ~30 seconds:
```bash ```bash
scripts/applyBackend.sh # symlink our W65816 sources into llvm-mos clone ninja -C tools/llvm-mos-build llc clang
cmake --build tools/llvm-mos-build --target llc clang
``` ```
The build takes 30-60 minutes on a modern laptop. Subsequent `scripts/applyBackend.sh` re-runs the symlink-into-LLVM step if you've
incremental builds after editing W65816 backend code are ~30 added new files under `src/llvm/lib/Target/W65816/`.
seconds.
---
## Verifying the install ## Verifying the install
```bash `setup.sh` automatically runs `scripts/verify.sh` at the end — it walks
# Compile + disassemble a small C function every installed tool and checks each runs. If anything fails it shows
scripts/cDemo.sh which step failed (e.g. `[FAIL] llvm-mos source tree` means the clone
didn't land where expected).
# Build the runtime library (libc, libgcc, etc.) To re-verify later:
```bash
./setup.sh --verify-only
```
For a real end-to-end test (compiles and runs a tiny C program through
the entire pipeline):
```bash
# Build the runtime libraries (libc, libgcc, etc.) — ~30 s
bash runtime/build.sh bash runtime/build.sh
# Run the smoke test suite (~150 checks, takes ~3 minutes) # Compile + disassemble a small C demo
bash scripts/cDemo.sh
# Run the full smoke test suite (~150 checks, takes ~3 min)
bash scripts/smokeTest.sh bash scripts/smokeTest.sh
``` ```
@ -104,65 +296,182 @@ A successful smoke test ends with:
[llvm816] all smoke checks passed [llvm816] all smoke checks passed
``` ```
If smoke passes, your install is good.
---
## Updating ## Updating
```bash ```bash
git pull cd llvm816
scripts/applyBackend.sh # re-symlink our sources into the LLVM tree git pull # update our backend source
cmake --build tools/llvm-mos-build --target llc clang bash scripts/applyBackend.sh # re-symlink into tools/llvm-mos
ninja -C tools/llvm-mos-build llc clang
bash runtime/build.sh bash runtime/build.sh
``` ```
If you want a fully clean rebuild: If you want a fully clean rebuild (e.g., to chase a "stale .o" bug):
```bash ```bash
rm -rf tools/llvm-mos-build rm -rf tools/llvm-mos-build
./setup.sh --build-llvm ./setup.sh --build-llvm
``` ```
---
## Uninstalling ## Uninstalling
The toolchain is fully contained under `tools/`. To uninstall: The toolchain is fully contained under `llvm816/`:
```bash ```bash
rm -rf llvm816/ rm -rf llvm816/
sudo apt-get remove mame mame-tools # if you want MAME gone too sudo apt-get remove mame mame-tools # if you want MAME gone too
# (also remove build-essential, cmake, etc., if they're not used elsewhere)
``` ```
The setup script doesn't touch `/usr/local` or `~/.mame` — nothing Nothing remains outside the repo.
to clean up outside the repo.
---
## Troubleshooting ## Troubleshooting
**`cmake: command not found`** — run `scripts/installDeps.sh`. The ### `cmake: command not found` / `ninja: command not found`
apt packages aren't installed yet.
**`ROMs not found`** — the apple2gs ROM download from archive.org The apt packages aren't installed yet:
occasionally fails. Re-run `scripts/installMame.sh`. The script
is idempotent; it skips ROMs already downloaded.
**`clang: error: unable to find target 'w65816'`** — the prebuilt
SDK's clang doesn't know about our W65816 target. You need the
source-built clang:
```bash ```bash
scripts/installLlvmMos.sh --build bash scripts/installDeps.sh
# Or, more granular:
scripts/applyBackend.sh
cmake --build tools/llvm-mos-build --target clang
``` ```
The W65816 target lives in *our* fork at `tools/llvm-mos-build/bin/clang`, ### `clang: error: unable to find target 'w65816'`
not in the prebuilt SDK.
**MAME can't find ROMs at runtime** — make sure `mame` is launched You're invoking a clang that doesn't include our backend. Likely
with `-rompath tools/mame/roms`. The provided causes:
[`scripts/runInMame.sh`](../scripts/runInMame.sh) does this
automatically.
**`linkage error: missing __umulhisi3`** — link `runtime/libgcc.o` 1. You're using the **system** clang (`/usr/bin/clang`). The system
into your binary. See [USAGE.md](USAGE.md#linking). clang doesn't know W65816. Use the full path:
**MAME pops up a window I don't want** — the `runInMame.sh` ```bash
wrapper now runs headless (`-video none` + `SDL_VIDEODRIVER=dummy`). ./tools/llvm-mos-build/bin/clang --target=w65816 ...
If you're invoking MAME directly, add those flags. ```
2. The build didn't complete. Check:
```bash
ls -la tools/llvm-mos-build/bin/clang # should be a symlink to clang-23
./tools/llvm-mos-build/bin/clang --print-targets | grep -i 65816
```
If the binary's missing or no W65816 target appears, rebuild:
```bash
bash scripts/applyBackend.sh
ninja -C tools/llvm-mos-build llc clang
```
3. You're using the prebuilt SDK (`tools/llvm-mos-sdk/bin/...`). That's
the original llvm-mos SDK; our W65816 target lives only in our
*source build* at `tools/llvm-mos-build/bin/`.
### `linkage error: missing __umulhisi3` / `missing __mulsi3` / similar
You forgot to link `runtime/libgcc.o`. See
[USAGE.md § Linking](USAGE.md#linking). Typical link line:
```bash
./tools/link816 -o myprog.bin --text-base 0x1000 \
runtime/crt0.o runtime/libc.o runtime/libgcc.o myprog.o
```
### `ROMs not found at runtime`
The apple2gs ROM download from archive.org occasionally fails. Re-run
the installer (it's idempotent and skips already-present ROMs):
```bash
bash scripts/installMame.sh
ls tools/mame/roms/ # should contain apple2gs.zip and apple2gsr1.zip
```
If you invoke `mame` directly without using our `runInMame.sh` wrapper,
pass `-rompath tools/mame/roms` so it finds them.
### MAME pops up a window I don't want
The supplied `scripts/runInMame.sh` runs headless (`-video none` plus
`SDL_VIDEODRIVER=dummy`). If you're calling `mame` yourself, add those
flags.
### `git: refusing to fetch into ...` for llvm-mos
`installLlvmMos.sh` refuses to refresh the LLVM clone if it has local
modifications or is on a non-`main` branch — because our backend
symlinks are stitched into the clone and a destructive refresh would
stomp them.
If you really want to refresh from upstream:
```bash
bash scripts/updateLlvmMos.sh
```
That script handles the symlinks safely.
### Disk fills up during `--build-llvm`
A full LLVM build needs ~12 GB of temporary build artifacts (cmake's
intermediate `.o` files, .a archives, etc.) on top of the 5 GB source
tree. Free ~15 GB before running `--build-llvm`.
Once the build completes, the *intermediate* artifacts under
`tools/llvm-mos-build/CMakeFiles/` can be deleted — the binaries
under `tools/llvm-mos-build/bin/` are self-contained:
```bash
rm -rf tools/llvm-mos-build/CMakeFiles tools/llvm-mos-build/lib
```
But this disables incremental rebuilds. Re-running `--build-llvm`
recreates everything.
### Calypsi install fails / I don't want it
Calypsi is only used by the `compare/` benchmarks for output-quality
comparison. Everything else works without it. Skip it:
```bash
./setup.sh --skip-calypsi
```
### MAME version mismatch warnings
We're tested against MAME 0.240 ROMs running on whatever MAME apt ships
with (typically 0.260+). Cross-version ROMs usually work but you may
see "BAD CHECKSUM" warnings. These are warnings, not errors — boot
proceeds normally.
If you want a clean match, the ROM 03 set from MAME 0.240 is
`apple2gs.zip` mirrored on archive.org; the installer pulls that exact
version.
### Smoke test fails on a single check
If `scripts/smokeTest.sh` fails one check but otherwise looks fine, it
might be MAME timing. Try:
```bash
MAME_CHECK_FRAME=600 MAME_SECS=12 bash scripts/smokeTest.sh
```
Some demos (especially the larger toolbox ones) need more wall-clock
time to fully draw. Persistent failures are real bugs — file an issue
with the failing check's output.
---
## Where to go next
- **Compile your first program:** [USAGE.md](USAGE.md).
- **Backend internals (if you're hacking on the compiler):**
[LLVM_65816_DESIGN.md](../LLVM_65816_DESIGN.md).
- **Feature matrix (what's implemented):** [STATUS.md](../STATUS.md).

File diff suppressed because it is too large Load diff

View file

@ -31,6 +31,7 @@ int fflush(FILE *stream);
int fclose(FILE *stream); int fclose(FILE *stream);
FILE *fopen(const char *path, const char *mode); FILE *fopen(const char *path, const char *mode);
FILE *freopen(const char *path, const char *mode, FILE *stream);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
int fseek(FILE *stream, long offset, int whence); int fseek(FILE *stream, long offset, int whence);

View file

@ -38,5 +38,7 @@ char *strtok (char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr); char *strtok_r(char *str, const char *delim, char **saveptr);
char *strerror(int err); char *strerror(int err);
int strcoll(const char *a, const char *b);
size_t strxfrm(char *dst, const char *src, size_t n);
#endif #endif

BIN
screenshots/frame.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/heavyRelocs.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/helloBeep.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/helloText.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/helloWindow.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/minicad.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/orcaFrame.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/qdProbe.png (Stored with Git LFS)

Binary file not shown.

BIN
screenshots/reversi.png (Stored with Git LFS)

Binary file not shown.

View file

@ -81,7 +81,7 @@ for src in "$BENCH_DIR"/*.c; do
name=$(basename "$src" .c) name=$(basename "$src" .c)
cObj=$(mktemp --suffix=.clang.o) cObj=$(mktemp --suffix=.clang.o)
"$CLANG" --target=w65816 -O2 -ffunction-sections \ "$CLANG" --target=w65816 -O2 ${W65816_CC_EXTRA:-} -ffunction-sections \
-c "$src" -o "$cObj" 2>/dev/null || { echo "clang failed on $name" >&2; rm -f "$cObj"; continue; } -c "$src" -o "$cObj" 2>/dev/null || { echo "clang failed on $name" >&2; rm -f "$cObj"; continue; }
read clangText _ _ < <(clangSize "$cObj") read clangText _ _ < <(clangSize "$cObj")

View file

@ -44,6 +44,8 @@ benchInputs() {
dmul) echo 'dmul(da, db)';; dmul) echo 'dmul(da, db)';;
dadd) echo 'dadd(da, db)';; dadd) echo 'dadd(da, db)';;
ddiv) echo 'ddiv(da, db)';; ddiv) echo 'ddiv(da, db)';;
particles) echo 'particleStep()';;
mandelbrot) echo 'mandTile()';;
*) echo "/* unknown */";; *) echo "/* unknown */";;
esac esac
} }
@ -61,6 +63,8 @@ benchExtern() {
dmul) echo 'extern double dmul(double a, double b); static volatile double da = 3.14, db = 2.71;';; dmul) echo 'extern double dmul(double a, double b); static volatile double da = 3.14, db = 2.71;';;
dadd) echo 'extern double dadd(double a, double b); static volatile double da = 3.14, db = 2.71;';; dadd) echo 'extern double dadd(double a, double b); static volatile double da = 3.14, db = 2.71;';;
ddiv) echo 'extern double ddiv(double a, double b); static volatile double da = 3.14, db = 2.71;';; ddiv) echo 'extern double ddiv(double a, double b); static volatile double da = 3.14, db = 2.71;';;
particles) echo 'extern unsigned long particleStep(void);';;
mandelbrot) echo 'extern unsigned long mandTile(void);';;
*) echo '';; *) echo '';;
esac esac
} }
@ -76,13 +80,15 @@ runOneBench() {
echo "(no input config)" echo "(no input config)"
return return
fi fi
# FP benches assign result to sinkD (double); rest assign to sink as ulong # FP benches assign result to sinkD (double); rest assign to sink as ulong.
# FP benches also use fewer iters (each call is ~1000+ cycles, so 100 # Per-bench iter count chosen so total time fits inside one 256-tick HBL
# iters wraps the 8-bit HBL counter many times). # wrap window (~16K cyc). FP and heavy game-like benches use fewer iters.
local sink_lhs sink_cast iters local sink_lhs sink_cast iters
case "$name" in case "$name" in
dmul|dadd|ddiv) sink_lhs='sinkD'; sink_cast=''; iters=10 ;; dmul|dadd|ddiv) sink_lhs='sinkD'; sink_cast=''; iters=10 ;;
*) sink_lhs='sink'; sink_cast='(unsigned long)'; iters=100 ;; particles) sink_lhs='sink'; sink_cast='(unsigned long)'; iters=3 ;;
mandelbrot) sink_lhs='sink'; sink_cast='(unsigned long)'; iters=1 ;;
*) sink_lhs="sink"; sink_cast="(unsigned long)"; iters=100 ;;
esac esac
local cwrap=$(mktemp --suffix=.c) local cwrap=$(mktemp --suffix=.c)
@ -92,9 +98,6 @@ runOneBench() {
cat > "$cwrap" <<EOF cat > "$cwrap" <<EOF
$extern_decl $extern_decl
__attribute__((noinline)) static void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
// Read VBL bit + scan-line position from the IIgs Mega II registers. // Read VBL bit + scan-line position from the IIgs Mega II registers.
// \$C02E (VertCnt low) increments at HBL rate (~15.7 kHz), wrapping at // \$C02E (VertCnt low) increments at HBL rate (~15.7 kHz), wrapping at
// 256. Higher resolution than the soft-VBL counter at \$E1006B; works // 256. Higher resolution than the soft-VBL counter at \$E1006B; works
@ -120,9 +123,9 @@ int main(void) {
unsigned char t1 = readVbl(); unsigned char t1 = readVbl();
__asm__ volatile ("sei\n" ::: "memory"); __asm__ volatile ("sei\n" ::: "memory");
unsigned char dt = t1 - t0; // VBL ticks; wraps at 256 unsigned char dt = t1 - t0; // VBL ticks; wraps at 256
switchToBank2(); // Direct 24-bit writes to bank 2 (DBR-independent, no bank-switch).
*(volatile unsigned short *)0x5000 = (unsigned short)dt; *(volatile unsigned short *)0x025000 = (unsigned short)dt;
*(volatile unsigned short *)0x5002 = (unsigned short)(sink & 0xFFFF); *(volatile unsigned short *)0x025002 = (unsigned short)(sink & 0xFFFF);
while (1) {} while (1) {}
} }
EOF EOF
@ -134,9 +137,17 @@ EOF
"$LINK" -o "$bin" --text-base 0x1000 "$oCrt0" "$oLibgcc" "$oSoftDouble" "$owrap" "$obench" 2>/dev/null \ "$LINK" -o "$bin" --text-base 0x1000 "$oCrt0" "$oLibgcc" "$oSoftDouble" "$owrap" "$obench" 2>/dev/null \
|| { echo "link-fail"; rm -f "$cwrap" "$owrap" "$obench" "$bin"; return; } || { echo "link-fail"; rm -f "$cwrap" "$owrap" "$obench" "$bin"; return; }
# Slow benches need a larger MAME check-frame budget — default 300
# frames (5 sec @ 60Hz) is enough for the fast int benches but not
# for soft-FP-heavy or large-loop ones.
local mame_env=""
case "$name" in
mandelbrot) mame_env="MAME_CHECK_FRAME=600 MAME_SECS=12" ;;
esac
# Read VBL delta at $025000. # Read VBL delta at $025000.
local val local val
val=$(bash "$PROJECT_ROOT/scripts/runInMame.sh" "$bin" 0x025000 0000 2>&1 \ val=$(env $mame_env bash "$PROJECT_ROOT/scripts/runInMame.sh" "$bin" 0x025000 0000 2>&1 \
| grep -oE 'val=0x[0-9a-f]+' | head -1 | sed 's/val=0x//') | grep -oE 'val=0x[0-9a-f]+' | head -1 | sed 's/val=0x//')
rm -f "$cwrap" "$owrap" "$obench" "$bin" rm -f "$cwrap" "$owrap" "$obench" "$bin"

147
scripts/benchCyclesCalypsi.sh Executable file
View file

@ -0,0 +1,147 @@
#!/usr/bin/env bash
# benchCyclesCalypsi.sh — measure per-call cycles for each benchmark in
# benchmarks/ compiled with Calypsi cc65816 5.16 (--speed -O 2).
# Mirrors benchCyclesPrecise.sh but for Calypsi. Output: markdown
# table; per-call cycles via MAME emu.time().
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
BENCH_DIR="$PROJECT_ROOT/benchmarks"
CC="$PROJECT_ROOT/tools/calypsi/usr/local/lib/calypsi-65816-5.16/bin/cc65816"
LN="$PROJECT_ROOT/tools/calypsi/usr/local/lib/calypsi-65816-5.16/bin/ln65816"
RUNNER="$PROJECT_ROOT/scripts/runInMameCyclesCalypsi.sh"
LINKER_SCM=$(mktemp --suffix=.scm)
trap 'rm -f "$LINKER_SCM"' EXIT
cat > "$LINKER_SCM" <<EOF
(define memories
'((memory IIgsCode (address (#x1000 . #x9FFF))
(section code farcode cdata idata switch data_init_table))
(memory IIgsBSS (address (#xA000 . #xBFFF))
(section stack data zdata heap))
(memory IIgsNear (address (#x020000 . #x02FFFF))
(section znear))
(memory IIgsDP (address (#x0000 . #x00FF))
(section (registers ztiny)))
(memory IIgsVec (address (#xFF00 . #xFFFF))
(section (reset #xFFFC)))
(block stack (size #x200))
(block heap (size #x100))
(base-address _DirectPageStart IIgsDP 0)
(base-address _NearBaseAddress IIgsNear 0)))
EOF
# Mirror bench definitions from benchCyclesPrecise.sh
benchInputs() {
case "$1" in
sumOfSquares) echo 'sumOfSquares(50)';;
fib) echo 'fib(10)';;
strcpy) echo 'mystrcpy(dst, "hello world!")';;
memcmp) echo 'mymemcmp("hello", "hello", 5)';;
bsearch) echo 'bsearch(arr, 8, 5)';;
dotProduct) echo 'dotProduct(va, vb, 4)';;
popcount) echo 'popcount(0x12345678UL)';;
crc32) echo 'crc32((const unsigned char *)"hello", 5)';;
bubbleSort) echo '(bubbleSort(bsBuf, 16), 0)';;
strLen) echo 'strLen("The quick brown fox jumps over the lazy dog!")';;
djb2Hash) echo 'djb2Hash("hello world")';;
*) echo "/* unknown */";;
esac
}
benchExtern() {
case "$1" in
sumOfSquares) echo 'extern unsigned long sumOfSquares(unsigned short n);';;
fib) echo 'extern unsigned short fib(unsigned short n);';;
strcpy) echo 'extern char *mystrcpy(char *d, const char *s); static char dst[16];';;
memcmp) echo 'extern int mymemcmp(const void *a, const void *b, unsigned int n);';;
bsearch) echo 'extern int bsearch(const int *arr, int n, int key); static const int arr[] = {1,2,3,4,5,6,7,8};';;
dotProduct) echo 'extern long dotProduct(const short *a, const short *b, unsigned int n); static const short va[] = {1,2,3,4}; static const short vb[] = {5,6,7,8};';;
popcount) echo 'extern int popcount(unsigned long x);';;
crc32) echo 'extern unsigned long crc32(const unsigned char *p, unsigned int n);';;
bubbleSort) echo 'extern void bubbleSort(short *a, unsigned short n); static short bsBuf[16] = {7,3,1,9,4,5,8,2,6,0,15,11,13,10,14,12};';;
strLen) echo 'extern unsigned short strLen(const char *s);';;
djb2Hash) echo 'extern unsigned long djb2Hash(const char *s);';;
*) echo '';;
esac
}
benchIters() {
case "$1" in
sumOfSquares) echo 50;;
fib) echo 100;;
strcpy) echo 200;;
memcmp) echo 500;;
bsearch) echo 200;;
dotProduct) echo 200;;
popcount) echo 100;;
crc32) echo 100;;
bubbleSort) echo 10;;
strLen) echo 200;;
djb2Hash) echo 200;;
*) echo 50;;
esac
}
runOneCalypsiBench() {
local name="$1"
local extern_decl call_expr iters
extern_decl=$(benchExtern "$name")
call_expr=$(benchInputs "$name")
iters=$(benchIters "$name")
[ -z "$extern_decl" ] && { echo "skip"; return; }
local cwrap obench bin
cwrap=$(mktemp --suffix=.c)
owrap=$(mktemp --suffix=.o)
obench=$(mktemp --suffix=.o)
bin=$(mktemp --suffix=.bin)
bin_raw="${bin%.bin}.raw"
lst=$(mktemp --suffix=.lst)
trap 'rm -f "$cwrap" "$owrap" "$obench" "$bin" "$bin_raw" "$lst"' RETURN
cat > "$cwrap" <<EOF
$extern_decl
volatile unsigned long sink;
__task int main(void) {
for (int w = 0; w < 5; w++) sink = (unsigned long)($call_expr);
*((volatile unsigned long __far *)0x025000) = 0xa1a1UL;
for (int i = 0; i < $iters; i++) sink = (unsigned long)($call_expr);
*((volatile unsigned long __far *)0x025002) = 0xa2a2UL;
while (1) {}
}
EOF
"$CC" -O 2 --speed --code-model=small -c "$cwrap" -o "$owrap" 2>/dev/null \
|| { echo "compile-fail-wrap"; return; }
"$CC" -O 2 --speed --code-model=small -c "$BENCH_DIR/$name.c" -o "$obench" 2>/dev/null \
|| { echo "compile-fail-bench"; return; }
# ln65816 emits the raw alongside the bin; one per memory
"$LN" "$LINKER_SCM" "$owrap" "$obench" -o "$bin" \
--output-format raw --raw-multiple-memories --rtattr exit=simplified \
--list-file "$lst" clib-sc-sd.a 2>/dev/null \
|| { echo "link-fail"; return; }
local entry
entry=$(grep "__program_start =" "$lst" | head -1 | awk '{print substr($NF,3)}')
[ -z "$entry" ] && { echo "no-entry"; return; }
local val
val=$(bash "$RUNNER" "$bin_raw" "$entry" "$iters" 2>&1 | grep -oE 'cyc_per_call=[0-9.]+' | head -1 | sed 's/cyc_per_call=//')
if [ -z "$val" ]; then
echo "(no read)"
else
printf '%.0f cyc/call' "$val"
fi
}
printf '| Benchmark | Calypsi (cyc) |\n'
printf '|-----------|--------------:|\n'
for src in "$BENCH_DIR"/*.c; do
name=$(basename "$src" .c)
extern_decl=$(benchExtern "$name")
[ -z "$extern_decl" ] && continue
result=$(runOneCalypsiBench "$name")
printf '| %s | %s |\n' "$name" "$result"
done

View file

@ -109,25 +109,22 @@ runOneBench() {
cat > "$cwrap" <<EOF cat > "$cwrap" <<EOF
$extern_decl $extern_decl
__attribute__((noinline)) static void switchToBank2(void) {
__asm__ volatile ("sep #0x20\n.byte 0xa9,0x02\npha\nplb\nrep #0x20\n");
}
volatile unsigned long sink; volatile unsigned long sink;
#define ITERS $iters #define ITERS $iters
int main(void) { int main(void) {
switchToBank2();
/* warm-up */ /* warm-up */
for (int w = 0; w < 5; w++) sink = (unsigned long)($call_expr); for (int w = 0; w < 5; w++) sink = (unsigned long)($call_expr);
*(volatile unsigned short *)0x5000 = 0xa1a1; /* START */ /* START / DONE markers go to bank 2 via direct 24-bit writes. */
*(volatile unsigned short *)0x025000 = 0xa1a1;
for (int i = 0; i < ITERS; i++) sink = (unsigned long)($call_expr); for (int i = 0; i < ITERS; i++) sink = (unsigned long)($call_expr);
*(volatile unsigned short *)0x5002 = 0xa2a2; /* DONE */ *(volatile unsigned short *)0x025002 = 0xa2a2;
while (1) {} while (1) {}
} }
EOF EOF
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$cwrap" -o "$owrap" 2>/dev/null \ "$CLANG" --target=w65816 -O2 ${W65816_CC_EXTRA:-} -ffunction-sections -c "$cwrap" -o "$owrap" 2>/dev/null \
|| { echo "compile-fail"; rm -f "$cwrap" "$owrap"; return; } || { echo "compile-fail"; rm -f "$cwrap" "$owrap"; return; }
"$CLANG" --target=w65816 -O2 -ffunction-sections -c "$BENCH_DIR/$name.c" -o "$obench" 2>/dev/null \ "$CLANG" --target=w65816 -O2 ${W65816_CC_EXTRA:-} -ffunction-sections -c "$BENCH_DIR/$name.c" -o "$obench" 2>/dev/null \
|| { echo "compile-fail"; rm -f "$cwrap" "$owrap" "$obench"; return; } || { echo "compile-fail"; rm -f "$cwrap" "$owrap" "$obench"; return; }
"$LINK" -o "$bin" --text-base 0x1000 "$oCrt0" "$oLibgcc" "$owrap" "$obench" 2>/dev/null \ "$LINK" -o "$bin" --text-base 0x1000 "$oCrt0" "$oLibgcc" "$owrap" "$obench" 2>/dev/null \
|| { echo "link-fail"; rm -f "$cwrap" "$owrap" "$obench" "$bin"; return; } || { echo "link-fail"; rm -f "$cwrap" "$owrap" "$obench" "$bin"; return; }

View file

@ -96,10 +96,10 @@ emu.register_frame_done(function()
end) end)
EOF EOF
OUT=$(timeout 60 mame apple2gs \ OUT=$(SDL_VIDEODRIVER=dummy SDL_AUDIODRIVER=dummy timeout 60 mame apple2gs \
-rompath "$PROJECT_ROOT/tools/mame/roms" \ -rompath "$PROJECT_ROOT/tools/mame/roms" \
-plugins -autoboot_script "$LUA_PATH" \ -plugins -autoboot_script "$LUA_PATH" \
-window -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | grep "^MAME-") -video none -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | grep "^MAME-")
echo "$OUT" echo "$OUT"
if echo "$OUT" | grep -q "MAME-CYCLES"; then if echo "$OUT" | grep -q "MAME-CYCLES"; then

View file

@ -0,0 +1,76 @@
#!/usr/bin/env bash
# runInMameCyclesCalypsi.sh — cycle-bench a Calypsi raw binary.
# Loads <code.raw> at $1000 and jumps to <entry> (hex addr from
# Calypsi's --list-file, look for `__program_start = HHHHHH`).
# Markers and per-iter math identical to runInMameCycles.sh.
set -euo pipefail
PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
BIN="$1"
ENTRY="$2"
ITERS="${3:-100}"
CLOCK_HZ=1023000
SECS=8
LUA_PATH=$(mktemp --suffix=.lua)
trap 'rm -f "$LUA_PATH"' EXIT
LUA_BODY=$(cat <<LUA
local frame = 0
local loaded = false
local start_t = nil
local done_t = nil
emu.register_frame_done(function()
frame = frame + 1
local cpu = manager.machine.devices[":maincpu"]
local mem = cpu.spaces["program"]
if frame == 30 and not loaded then
local f = io.open("__BIN__", "rb")
if not f then print("BIN-MISSING"); manager.machine:exit(); return end
local data = f:read("*all"); f:close()
for i = 1, #data do
local addr = 0x001000 + i - 1
if not (addr >= 0x00C000 and addr < 0x00D000) then
mem:write_u8(addr, data:byte(i))
end
end
loaded = true
cpu.state["PC"].value = 0x__ENTRY__
cpu.state["PB"].value = 0x00
cpu.state["DB"].value = 0x00
cpu.state["D"].value = 0x00
cpu.state["P"].value = 0x34
cpu.state["E"].value = 1
cpu.state["S"].value = 0x01FF
print(string.format("MAME-LOADED bytes=%d entry=\$%04x", #data, 0x__ENTRY__))
return
end
if not loaded then return end
if not start_t and mem:read_u16(0x025000) == 0xa1a1 then
start_t = emu.time()
end
if start_t and not done_t and mem:read_u16(0x025002) == 0xa2a2 then
done_t = emu.time()
local delta = done_t - start_t
local cyc = delta * __CLOCK__
local per_call = cyc / __ITERS__
print(string.format("MAME-CYCLES iters=__ITERS__ delta_us=%.3f total_cyc=%.0f cyc_per_call=%.2f",
delta * 1e6, cyc, per_call))
manager.machine:exit()
end
end)
LUA
)
LUA_BODY="${LUA_BODY//__BIN__/$BIN}"
LUA_BODY="${LUA_BODY//__ENTRY__/$ENTRY}"
LUA_BODY="${LUA_BODY//__CLOCK__/$CLOCK_HZ}"
LUA_BODY="${LUA_BODY//__ITERS__/$ITERS}"
printf '%s\n' "$LUA_BODY" > "$LUA_PATH"
OUT=$(SDL_VIDEODRIVER=dummy SDL_AUDIODRIVER=dummy timeout 60 mame apple2gs \
-rompath "$PROJECT_ROOT/tools/mame/roms" \
-plugins -autoboot_script "$LUA_PATH" \
-video none -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | grep "^MAME-")
echo "$OUT"
if echo "$OUT" | grep -q "MAME-CYCLES"; then exit 0; fi
exit 1

View file

@ -107,7 +107,12 @@ $LUA_CHECKS
end) end)
EOF EOF
OUT=$(timeout 30 mame apple2gs \ RAMSIZE_ARG=()
if [ -n "${MAME_RAMSIZE:-}" ]; then
RAMSIZE_ARG=(-ramsize "$MAME_RAMSIZE")
fi
OUT=$(timeout "${MAME_TIMEOUT:-30}" mame apple2gs \
"${RAMSIZE_ARG[@]}" \
-rompath "$PROJECT_ROOT/tools/mame/roms" \ -rompath "$PROJECT_ROOT/tools/mame/roms" \
-plugins -autoboot_script "$LUA_PATH" \ -plugins -autoboot_script "$LUA_PATH" \
-window -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | /usr/bin/grep -E "^(MAME-|SEG-)") -window -sound none -nothrottle -seconds_to_run "$SECS" 2>&1 | /usr/bin/grep -E "^(MAME-|SEG-)")

File diff suppressed because it is too large Load diff

View file

@ -54,6 +54,15 @@ local function gf(p,n) local pp = manager.machine.ioport.ports[p]; return pp and
local kcmd = gf(":macadb:KEY3", "Command / Open Apple") local kcmd = gf(":macadb:KEY3", "Command / Open Apple")
local function press(f) if f then f:set_value(1) end end local function press(f) if f then f:set_value(1) end end
local function release(f) if f then f:set_value(0) end end local function release(f) if f then f:set_value(0) end end
-- ADB mouse input ports (set when PARK_MOUSE is enabled to scroll the
-- IIgs cursor out of the snapshot region — otherwise the Finder-left
-- cursor sits on the menu bar and the inverted-XOR cursor shape leaves
-- a visible dark square in the captured screen).
local mx = gf(":macadb:MOUSE1", "Mouse X")
local my = gf(":macadb:MOUSE2", "Mouse Y")
local park_active = ${PARK_MOUSE:-0}
local park_frame_start = 5500 -- after the Cmd-O that launches the demo
local park_frame_end = 5900 -- stop nudging once well off screen
local steps = { local steps = {
{3300, function() nat:post("D") end}, {3300, function() nat:post("D") end},
{3540, function() press(kcmd) end}, {3540, function() press(kcmd) end},
@ -79,6 +88,16 @@ ${SNAP_STEPS}
} }
emu.register_frame_done(function() emu.register_frame_done(function()
frame = frame + 1 frame = frame + 1
-- Park the cursor at the right edge of the screen when PARK_MOUSE=1.
-- ADB mouse takes deltas, so push X for a window of frames and the
-- cursor scrolls right until the IIgs cursor manager clamps it at
-- the screen edge. Don't push Y — that moves the cursor INTO the
-- demo's content area (most demos sit in the upper screen, so a
-- downward-parked cursor lands ON visible content).
if park_active == 1 and mx and my and
frame >= park_frame_start and frame <= park_frame_end then
mx:set_value(100); my:set_value(0)
end
while idx <= #steps and frame >= steps[idx][1] do while idx <= #steps and frame >= steps[idx][1] do
steps[idx][2](); idx = idx + 1 steps[idx][2](); idx = idx + 1
end end

96
scripts/updateScreenshots.sh Executable file
View file

@ -0,0 +1,96 @@
#!/usr/bin/env bash
# updateScreenshots.sh - regenerate all demo screenshots and save them
# at 704x462 (double the native 704x231 to give a proper aspect ratio).
#
# For each demo:
# 1. Run snapDemo.sh to capture multiple frames during the run
# 2. Pick the last snapshot (most settled state)
# 3. Resize to 704x462 (height doubled)
# 4. Save to screenshots/<demo>.png
set -uo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
OUT_DIR="$PROJECT_ROOT/screenshots"
# Demos with visible (or expected-empty-desktop) output.
DEMOS="helloBeep helloText helloWindow qdProbe orcaFrame frame minicad reversi heavyRelocs"
# Per-demo snap frame list. Most are fine with defaults; reversi/
# minicad/frame can use slightly later frames to capture more drawn
# content.
declare -A FRAMES
FRAMES[helloBeep]="6500"
FRAMES[helloText]="6500"
FRAMES[helloWindow]="6500"
FRAMES[qdProbe]="6500"
FRAMES[orcaFrame]="6500"
FRAMES[frame]="6000"
FRAMES[minicad]="6000"
FRAMES[reversi]="6500,7000,7500"
FRAMES[heavyRelocs]="6500"
for demo in $DEMOS; do
omf="$PROJECT_ROOT/demos/${demo}.omf"
[ -f "$omf" ] || { echo "skip $demo (no OMF — run demos/build.sh $demo)"; continue; }
echo "=== $demo ==="
frame_list="${FRAMES[$demo]:-6500}"
snap_out=$(PARK_MOUSE=1 bash "$SCRIPT_DIR/snapDemo.sh" "$demo" "$frame_list" 2>&1) || {
echo " snapDemo.sh failed for $demo (exit=$?); skipping"
continue
}
snap_dir=$(echo "$snap_out" | grep "snaps in:" | sed 's|snaps in: ||')
[ -d "$snap_dir/apple2gs" ] || { echo " no snaps for $demo (dir: $snap_dir)"; continue; }
# Pick the LAST .png (most settled state).
last_png=$(ls -1 "$snap_dir/apple2gs"/*.png 2>/dev/null | sort | tail -1)
[ -n "$last_png" ] || { echo " no png files in $snap_dir"; continue; }
# Resize to 704x462 (double height) and copy.
convert "$last_png" -resize 704x462\! "$OUT_DIR/${demo}.png"
# Mask the cursor artifact. Demos with active event loops (reversi)
# had their cursor moved off-screen by PARK_MOUSE; demos that stop
# processing events leave the cursor's XOR pattern frozen at its
# last position (around x=52-76, y=63-87 in 704x462 coords). Each
# demo needs a slightly different mask shape/color to avoid
# overpainting the menu bar divider line, window borders, or text.
case "$demo" in
helloText)
# white content area, menu bar divider at y=57-60 — skip
convert "$OUT_DIR/${demo}.png" \
-fill white -draw "rectangle 52,63 76,87" \
"$OUT_DIR/${demo}.png" ;;
minicad|orcaFrame)
# window top border at y=80 in doubled — mask above it only
convert "$OUT_DIR/${demo}.png" \
-fill white -draw "rectangle 52,62 76,79" \
"$OUT_DIR/${demo}.png" ;;
frame)
# frame's window border is lower; full cursor extent visible
convert "$OUT_DIR/${demo}.png" \
-fill white -draw "rectangle 52,62 76,86" \
"$OUT_DIR/${demo}.png" ;;
helloWindow)
# cursor sits on the window's black title bar — paint black
convert "$OUT_DIR/${demo}.png" \
-fill black -draw "rectangle 52,58 76,88" \
"$OUT_DIR/${demo}.png" ;;
reversi)
# cursor remnant in white area between window border and
# the title bar's black background; avoid title bar at
# x>=72 / y>=71.
convert "$OUT_DIR/${demo}.png" \
-fill white -draw "rectangle 54,63 71,70" \
"$OUT_DIR/${demo}.png" ;;
esac
echo " wrote $OUT_DIR/${demo}.png ($(identify -format '%wx%h' "$OUT_DIR/${demo}.png"))"
# Cleanup
rm -rf "$snap_dir"
done
echo ""
echo "All screenshots updated:"
identify "$OUT_DIR"/*.png | column -t

View file

@ -48,6 +48,22 @@ static cl::opt<bool> LoaderBankDeref(
"builds."), "builds."),
cl::init(false), cl::Hidden); cl::init(false), cl::Hidden);
// Layer 2 ptr32 opt: when set, ptr32 derefs assume the pointer's bank
// byte matches DBR. Uses `lda (d,s),Y` (opcode 0xB3, stack-relative
// indirect indexed-Y) instead of staging at $E0/$E2 and using
// `lda [dp],Y` (24-bit indirect-long). Saves ~4 instructions per
// deref. Correct only for code that touches memory inside DBR's bank
// — malloc'd Lua state + globals + BSS qualify; cross-bank pointers
// (rare) do not. Caller's responsibility. Tested by hand on lapi.c.
static cl::opt<bool> DbrSafePtrs(
"w65816-dbr-safe-ptrs",
cl::desc("ptr32 derefs use 16-bit stack-rel-indirect-Y, assuming "
"the pointer's bank byte matches DBR. Significantly "
"shrinks struct-field-heavy code (Lua's lapi.c: ~3.4×"
"much smaller) at the cost of safety for cross-bank "
"pointers (which become a miscompile)."),
cl::init(false), cl::Hidden);
W65816TargetLowering::W65816TargetLowering(const TargetMachine &TM, W65816TargetLowering::W65816TargetLowering(const TargetMachine &TM,
const W65816Subtarget &STI) const W65816Subtarget &STI)
: TargetLowering(TM, STI) { : TargetLowering(TM, STI) {
@ -138,6 +154,10 @@ W65816TargetLowering::W65816TargetLowering(const TargetMachine &TM,
setLoadExtAction(ISD::EXTLOAD, MVT::i32, MemVT, Expand); setLoadExtAction(ISD::EXTLOAD, MVT::i32, MemVT, Expand);
setTruncStoreAction(MVT::i32, MemVT, Expand); setTruncStoreAction(MVT::i32, MemVT, Expand);
} }
// Truncating byte stores (`s->c = (char)v`) land as TRUNCSTORE
// i16->i8 in SDAG after combiner canonicalization. Custom-route
// through LowerStore so the ptr-offset peel fires for them too.
setTruncStoreAction(MVT::i16, MVT::i8, Custom);
} }
// Vararg support: VASTART writes the address of the first vararg slot // Vararg support: VASTART writes the address of the first vararg slot
@ -614,6 +634,56 @@ static SDValue extractWide32Hi(SelectionDAG &DAG, const SDLoc &DL, SDValue X) {
return DAG.getTargetExtractSubreg(llvm::sub_hi, DL, MVT::i16, X); return DAG.getTargetExtractSubreg(llvm::sub_hi, DL, MVT::i16, X);
} }
// Match `Ptr = REG_SEQUENCE(ADDC(BaseLo, KLo), sub_lo,
// ADDE(BaseHi, 0, carry), sub_hi)` shape
// produced by LowerI32Bin for `(add Wide32, const)` where the constant
// fits an unsigned 16-bit Y (KHi must be 0). Returns true with OutBase
// = buildWide32(BaseLo, BaseHi) and OutOff = KLo on a successful peel.
// The bank-byte carry-in is intentionally dropped: the `[dp],Y` deref
// adds Y to the 24-bit pointer without propagating beyond 16 bits.
// Caller's responsibility that the target object doesn't span a bank.
static bool peelPtr32Offset(SelectionDAG &DAG, SDLoc DL, SDValue Ptr,
SDValue &OutBase, uint16_t &OutOff) {
if (Ptr.getValueType() != MVT::i32) return false;
// Pre-LowerI32Bin shape: `ISD::ADD(BaseWide32, i32 const)`. LowerLoad
// runs before LowerI32Bin in legalization order, so the ADD is still
// visible as an ISD::ADD when LowerLoad inspects Ptr.
if (Ptr.getOpcode() == ISD::ADD) {
SDValue L = Ptr.getOperand(0);
SDValue R = Ptr.getOperand(1);
auto *KC = dyn_cast<ConstantSDNode>(R);
if (!KC) {
KC = dyn_cast<ConstantSDNode>(L);
if (!KC) return false;
L = R;
}
uint64_t K = KC->getZExtValue();
if (K == 0 || K > 0xFFFFu) return false;
OutOff = (uint16_t)K;
OutBase = L;
return true;
}
// Post-LowerI32Bin shape (REG_SEQUENCE of ADDC/ADDE). May not occur
// in practice given the ADD path above, but kept for robustness.
if (!Ptr.getNode() || !Ptr.isMachineOpcode()) return false;
if (Ptr.getMachineOpcode() != TargetOpcode::REG_SEQUENCE) return false;
SDValue Lo = lookThroughRegSeq(Ptr, llvm::sub_lo);
SDValue Hi = lookThroughRegSeq(Ptr, llvm::sub_hi);
if (!Lo || !Hi) return false;
if (Lo.getOpcode() != ISD::ADDC) return false;
if (Hi.getOpcode() != ISD::ADDE) return false;
if (Hi.getOperand(2) != Lo.getValue(1)) return false;
auto *KLo = dyn_cast<ConstantSDNode>(Lo.getOperand(1));
auto *KHi = dyn_cast<ConstantSDNode>(Hi.getOperand(1));
if (!KLo || !KHi) return false;
if (KHi->getZExtValue() != 0) return false;
uint64_t K = KLo->getZExtValue() & 0xFFFFu;
if (K == 0) return false;
OutOff = (uint16_t)K;
OutBase = buildWide32(DAG, DL, Lo.getOperand(0), Hi.getOperand(0));
return true;
}
SDValue W65816TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const { SDValue W65816TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0); SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get(); ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
@ -964,13 +1034,17 @@ SDValue W65816TargetLowering::LowerLoad(SDValue Op,
// asserts memvt must be supported; i1 isn't. // asserts memvt must be supported; i1 isn't.
if (MemVT == MVT::i1) MemVT = MVT::i8; if (MemVT == MVT::i1) MemVT = MVT::i8;
SDVTList VTs = DAG.getVTList(MVT::i16, MVT::Other); SDVTList VTs = DAG.getVTList(MVT::i16, MVT::Other);
// Try to peel a constant offset from Ptr and route through
// LD_PTR_OFF — folds `(ptr + K)` into the Y-register of `[E0],Y`,
// saving the i32 ADD's CLC/ADC carry chain. ~3 instr per access.
// See feedback_ptr32_deref_fold_layer1_mi.md.
// LD_PTR_OFF: deferred — the peel fires correctly but the resulting
// SDAG breaks the JSON-tokenizer + snprintf smoke tests in ways
// bisection didn't isolate. Stick with LD_PTR (no peel) here; the
// LowerStore peel for ST_PTR_OFF / STB_PTR_OFF keeps the store-side
// optimization. Future: route loads through a SDAG combine that
// runs post-LegalizeOps so we see the final REG_SEQUENCE shape.
SDValue Ops[] = { Chain, Ptr }; SDValue Ops[] = { Chain, Ptr };
// memVT for the LD_PTR memintrinsic must match MMO's size (i8 vs
// i16) — getMemIntrinsicNode asserts memvt.getStoreSize() <= MMO
// size. Pre-fix this branch hardcoded i16 because only i32-result
// LOADs reached here. Now that LOAD i16/i8 are Custom too (so the
// global-fold can fire), byte loads via i32 ptrs land here with
// MemVT=i8 and a 1-byte MMO.
SDValue LdNode = DAG.getMemIntrinsicNode(W65816ISD::LD_PTR, DL, VTs, Ops, SDValue LdNode = DAG.getMemIntrinsicNode(W65816ISD::LD_PTR, DL, VTs, Ops,
MemVT, Ld->getMemOperand()); MemVT, Ld->getMemOperand());
SDValue Val = LdNode; SDValue Val = LdNode;
@ -1330,9 +1404,18 @@ SDValue W65816TargetLowering::LowerStore(SDValue Op,
if (Val.getValueType() == MVT::i8) if (Val.getValueType() == MVT::i8)
Val = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i16, Val); Val = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i16, Val);
SDVTList VTs = DAG.getVTList(MVT::Other);
SDValue Base; uint16_t Off = 0;
if (peelPtr32Offset(DAG, DL, Ptr, Base, Off)) {
unsigned OffOpc = (MemVT == MVT::i8) ? unsigned(W65816ISD::STB_PTR_OFF)
: unsigned(W65816ISD::ST_PTR_OFF);
SDValue OffN = DAG.getTargetConstant(Off, DL, MVT::i16);
SDValue OpsOff[] = { Chain, Val, Base, OffN };
return DAG.getMemIntrinsicNode(OffOpc, DL, VTs, OpsOff, MemVT,
St->getMemOperand());
}
unsigned NodeOpc = (MemVT == MVT::i8) ? unsigned(W65816ISD::STB_PTR) unsigned NodeOpc = (MemVT == MVT::i8) ? unsigned(W65816ISD::STB_PTR)
: unsigned(W65816ISD::ST_PTR); : unsigned(W65816ISD::ST_PTR);
SDVTList VTs = DAG.getVTList(MVT::Other);
SDValue Ops[] = { Chain, Val, Ptr }; SDValue Ops[] = { Chain, Val, Ptr };
return DAG.getMemIntrinsicNode(NodeOpc, DL, VTs, Ops, MemVT, return DAG.getMemIntrinsicNode(NodeOpc, DL, VTs, Ops, MemVT,
St->getMemOperand()); St->getMemOperand());
@ -2818,6 +2901,22 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
// half (high half is pad, ORCA convention). Stage at $E0..$E2, // half (high half is pad, ORCA convention). Stage at $E0..$E2,
// then [dp],Y addresses the right bank without forcing 0. // then [dp],Y addresses the right bank without forcing 0.
// //
// MI-level peephole: if the Wide32 ptr is the sole user of a
// `REG_SEQUENCE(ADCi16imm BaseLo K, sub_lo, ADCEi16imm BaseHi 0,
// sub_hi)` chain (= `(add Wide32, K)` after ISel), peel the
// offset and pass K via the Y register on the `[dp],Y` deref.
// Saves ~3 instructions per access (the CLC/ADC/ADC carry chain).
// The bank-wrap caveat from LDAptr32Off applies: Y addition does
// NOT propagate beyond 16 bits, so the target object must not
// span a bank boundary (true for malloc'd / globally-allocated
// ptr32 objects; struct sizeof is far below 64KB).
//
// Doing this here rather than in LowerLoad / a SDAG combine avoids
// the JSON-tokenizer + BST + sprintf smoke regressions those paths
// tripped — the rewrites perturbed SDAG scheduling in ways that
// bisection couldn't pin down. At MI level, the rewrite is
// structural: ADCi16imm/ADCEi16imm become dead and get DCE'd.
//
// Dead unless ptr32 mode is active (LowerLoad/LowerStore are gated // Dead unless ptr32 mode is active (LowerLoad/LowerStore are gated
// on i32 address type). // on i32 address type).
MachineFunction *MF = BB->getParent(); MachineFunction *MF = BB->getParent();
@ -2828,6 +2927,246 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
bool IsLoad = MI.getOpcode() == W65816::LDAptr32; bool IsLoad = MI.getOpcode() == W65816::LDAptr32;
bool IsByteStore = MI.getOpcode() == W65816::STBptr32; bool IsByteStore = MI.getOpcode() == W65816::STBptr32;
Register Ptr = MI.getOperand(IsLoad ? 1 : 1).getReg(); Register Ptr = MI.getOperand(IsLoad ? 1 : 1).getReg();
// Try the ADC-chain peel. We need:
// 1. Ptr has exactly one use (this MI) — else other users still
// need the full computed Wide32, no net win.
// 2. Ptr was defined by a REG_SEQUENCE.
// 3. Sub_lo source is ADCi16imm BaseLoReg KLo.
// 4. Sub_hi source is ADCEi16imm BaseHiReg 0.
// 5. KLo > 0 and KLo fits 16-bit unsigned.
Register PeelBaseLo, PeelBaseHi;
int64_t PeelOff = 0;
MachineInstr *DeadLoDef = nullptr;
MachineInstr *DeadHiDef = nullptr;
MachineInstr *DeadPtrDef = nullptr;
SmallVector<MachineInstr *, 4> ExtraChainDeads;
if (IsLoad && MRI.hasOneUse(Ptr)) {
MachineInstr *PtrDef = MRI.getUniqueVRegDef(Ptr);
if (PtrDef && PtrDef->getOpcode() == TargetOpcode::REG_SEQUENCE) {
Register SubLoReg, SubHiReg;
for (unsigned i = 1, e = PtrDef->getNumOperands(); i + 1 < e; i += 2) {
unsigned SubIdx = PtrDef->getOperand(i + 1).getImm();
Register R = PtrDef->getOperand(i).getReg();
if (SubIdx == llvm::sub_lo) SubLoReg = R;
else if (SubIdx == llvm::sub_hi) SubHiReg = R;
}
MachineInstr *LoDef = SubLoReg ? MRI.getUniqueVRegDef(SubLoReg)
: nullptr;
MachineInstr *HiDef = SubHiReg ? MRI.getUniqueVRegDef(SubHiReg)
: nullptr;
// We don't require SubLoReg/SubHiReg to be single-use: an
// ADCi16imm result CSE'd across multiple users (e.g., `L+K`
// also used as input to `(L+K)+M`) is fine — peeling THIS load
// doesn't kill the original ADC chain (other users still need
// it). We only erase the chain if it's all single-use end-to-end.
bool OuterSingleUse =
MRI.hasOneUse(SubLoReg) && MRI.hasOneUse(SubHiReg);
if (LoDef && HiDef &&
LoDef->getOpcode() == W65816::ADCi16imm &&
HiDef->getOpcode() == W65816::ADCEi16imm &&
// ADCi16imm and ADCEi16imm must be in the same MBB so we
// can verify nothing clobbers $p between them.
LoDef->getParent() == HiDef->getParent()) {
// Walk forward from LoDef to HiDef. If any instr between
// them defines $p, the ADCE reads a tampered carry and our
// simple substitution would change semantics.
bool PChainOK = true;
for (auto It = std::next(LoDef->getIterator());
It != HiDef->getIterator() && PChainOK; ++It) {
for (const MachineOperand &MO : It->operands()) {
if (MO.isReg() && MO.getReg() == W65816::P &&
MO.isDef() && !MO.isDead()) {
PChainOK = false;
break;
}
}
}
int64_t KLo = LoDef->getOperand(2).getImm();
int64_t KHi = HiDef->getOperand(2).getImm();
Register CandLo = LoDef->getOperand(1).getReg();
Register CandHi = HiDef->getOperand(1).getReg();
// Accept a vreg that's `COPY <phys-reg>` for any of the
// arg/accumulator/index physregs. This catches both incoming
// function args ($a/$x at entry) AND values that came from
// a preceding load (where the result was COPYed off $a).
auto isFromArgCopy = [&](Register R) -> bool {
if (!R.isVirtual()) return false;
MachineInstr *Def = MRI.getUniqueVRegDef(R);
if (!Def || !Def->isCopy()) return false;
const MachineOperand &Src = Def->getOperand(1);
if (!Src.isReg() || !Src.getReg().isPhysical()) return false;
unsigned P = Src.getReg();
return P == W65816::A || P == W65816::X || P == W65816::Y;
};
// A vreg is "from a fixed (caller-pushed) stack arg" if its
// unique def is LDAfi against a fixed FrameIndex (negative
// index in MachineFrameInfo). Caller-pushed args live in
// immutable slots, so reading them later is value-equivalent
// to reading them at function entry.
auto isFromFixedArgSlot = [&](Register R) -> bool {
if (!R.isVirtual()) return false;
MachineInstr *Def = MRI.getUniqueVRegDef(R);
if (!Def || Def->getOpcode() != W65816::LDAfi) return false;
const MachineOperand &FIOp = Def->getOperand(1);
if (!FIOp.isFI()) return false;
int FI = FIOp.getIndex();
const MachineFrameInfo &MFI = MF->getFrameInfo();
return MFI.isFixedObjectIndex(FI);
};
auto isFromArg = [&](Register R) -> bool {
if (isFromArgCopy(R)) return true;
if (isFromFixedArgSlot(R)) return true;
if (!R.isVirtual()) return false;
MachineInstr *Def = MRI.getUniqueVRegDef(R);
if (!Def || !Def->isCopy()) return false;
const MachineOperand &Src = Def->getOperand(1);
if (!Src.isReg() || !Src.getReg().isVirtual()) return false;
return isFromArgCopy(Src.getReg()) ||
isFromFixedArgSlot(Src.getReg());
};
// Recursive walk: nested ADC chains arise from i32-LOAD split
// (high half loads at `Ptr+2`, where `Ptr` is itself `arg+K`).
// Walk back, accumulating offset, until we reach an arg-base
// OR exhaust the chain.
//
// We allow inner ADC results to have multiple users — this
// happens when the SDAG CSEs `L+K` and reuses it as input to
// `(L+K)+M`. In that case, peeling THIS load doesn't kill
// the inner ADC chain (other users still need it), so we
// don't erase those inner Ms. Only the outer-most chain
// (single-use) and PtrDef are erased.
//
// Bisecting: try peeling whenever the chain reaches a
// "stable" base — args, fixed-arg-slot loads, OR any vreg
// (widest). Wider gates have historically tripped a
// FrameLowering-related smoke regression in sprintf.
int64_t Off = KLo;
bool ChainOK = (PChainOK && KHi == 0 && KLo > 0 && KLo <= 0xFFFF);
// Cap on chain walks (avoid pathological deep chains).
unsigned MaxChainDepth = 8;
// Track per-layer "all single-use" status — only erase layers
// up to the first non-single-use one.
unsigned SingleUseLayers = OuterSingleUse ? 1 : 0;
SmallVector<MachineInstr *, 6> ChainDeads;
if (OuterSingleUse) {
ChainDeads.push_back(LoDef);
ChainDeads.push_back(HiDef);
}
// Narrow gate: walk back only until we reach an arg-base or
// arg-slot base. A truly wide gate (peel any chain regardless
// of base) makes Lua ~+0.85% LARGER because each peel adds 4B
// of stack-slot staging that exceeds the carry-chain savings
// for deep-chain cases. Tested 2026-05-25.
while (ChainOK && MaxChainDepth-- > 0 &&
(!isFromArg(CandLo) || !isFromArg(CandHi))) {
if (!CandLo.isVirtual() || !CandHi.isVirtual()) {
ChainOK = false; break;
}
MachineInstr *InnerLo = MRI.getUniqueVRegDef(CandLo);
MachineInstr *InnerHi = MRI.getUniqueVRegDef(CandHi);
if (!InnerLo || !InnerHi ||
InnerLo->getOpcode() != W65816::ADCi16imm ||
InnerHi->getOpcode() != W65816::ADCEi16imm ||
InnerLo->getParent() != InnerHi->getParent()) {
ChainOK = false; break;
}
bool InnerSingleUse = MRI.hasOneUse(CandLo) && MRI.hasOneUse(CandHi);
bool InnerPOK = true;
for (auto It = std::next(InnerLo->getIterator());
It != InnerHi->getIterator() && InnerPOK; ++It) {
for (const MachineOperand &MO : It->operands()) {
if (MO.isReg() && MO.getReg() == W65816::P &&
MO.isDef() && !MO.isDead()) {
InnerPOK = false; break;
}
}
}
if (!InnerPOK) { ChainOK = false; break; }
int64_t InnerKLo = InnerLo->getOperand(2).getImm();
int64_t InnerKHi = InnerHi->getOperand(2).getImm();
if (InnerKHi != 0) { ChainOK = false; break; }
int64_t NewOff = Off + InnerKLo;
if (NewOff > 0xFFFF) { ChainOK = false; break; }
Off = NewOff;
CandLo = InnerLo->getOperand(1).getReg();
CandHi = InnerHi->getOperand(1).getReg();
// Track whether this inner layer is erasable (all-single-use
// from outer through here).
if (InnerSingleUse && SingleUseLayers ==
ChainDeads.size() / 2) {
SingleUseLayers++;
ChainDeads.push_back(InnerLo);
ChainDeads.push_back(InnerHi);
}
// Even if not single-use, we keep walking back — the peel
// is still correct (just doesn't kill the inner chain).
}
if (ChainOK && Off > 0 && Off <= 0xFFFF &&
isFromArg(CandLo) && isFromArg(CandHi)) {
PeelBaseLo = CandLo;
PeelBaseHi = CandHi;
PeelOff = Off;
DeadPtrDef = PtrDef;
// Only erase the ADC chain if it's all-single-use end to
// end. Otherwise leave it alive — other users need it.
if (OuterSingleUse) {
DeadLoDef = LoDef;
DeadHiDef = HiDef;
for (unsigned i = 2; i < ChainDeads.size(); ++i)
ExtraChainDeads.push_back(ChainDeads[i]);
}
}
}
}
}
// Layer 2 fast path: -w65816-dbr-safe-ptrs assumes the bank byte
// matches DBR, letting us skip $E0/$E2 staging entirely. Emit just
// a STAfi of sub_lo and an LDAfi_indY/STAfi_indY deref via the
// 16-bit stack-rel-indirect-Y opcode (0xB3 / 0x93). ~4 instr per
// deref saved vs the heavy [dp],Y indirect-long path.
if (DbrSafePtrs) {
Register PtrLo = MRI.createVirtualRegister(&W65816::Wide16RegClass);
if (PeelOff) {
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrLo)
.addReg(PeelBaseLo);
} else {
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrLo)
.addReg(Ptr, (RegState)0, llvm::sub_lo);
}
int FILo = MF->getFrameInfo().CreateStackObject(2, Align(2),
/*isSpillSlot=*/false);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
.addReg(PtrLo).addFrameIndex(FILo).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(PeelOff);
if (IsLoad) {
Register Dst = MI.getOperand(0).getReg();
// LDAfi_indY $dst, FILo — PEI resolves to LDA (FILo,S),Y.
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi_indY),
W65816::A).addFrameIndex(FILo).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), Dst).addReg(W65816::A);
} else {
Register Val = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::SEP)).addImm(0x20);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi_indY))
.addReg(W65816::A).addFrameIndex(FILo).addImm(0);
if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::REP)).addImm(0x20);
}
MI.eraseFromParent();
if (DeadPtrDef) DeadPtrDef->eraseFromParent();
if (DeadLoDef) DeadLoDef->eraseFromParent();
if (DeadHiDef) DeadHiDef->eraseFromParent();
for (MachineInstr *D : ExtraChainDeads) D->eraseFromParent();
return BB;
}
// Extract the i16 sub-halves of the Wide32 ptr. At custom-inserter // Extract the i16 sub-halves of the Wide32 ptr. At custom-inserter
// time Ptr is still a virtual register, so `TRI.getSubReg` won't // time Ptr is still a virtual register, so `TRI.getSubReg` won't
// work (it's physreg-only). Use COPY-with-subreg-index instead; // work (it's physreg-only). Use COPY-with-subreg-index instead;
@ -2835,10 +3174,18 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
// physreg operand later. // physreg operand later.
Register PtrLo = MRI.createVirtualRegister(&W65816::Wide16RegClass); Register PtrLo = MRI.createVirtualRegister(&W65816::Wide16RegClass);
Register PtrHi = MRI.createVirtualRegister(&W65816::Wide16RegClass); Register PtrHi = MRI.createVirtualRegister(&W65816::Wide16RegClass);
if (PeelOff) {
// Peeled path: pull base halves from the ADC chain's inputs.
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrLo)
.addReg(PeelBaseLo);
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrHi)
.addReg(PeelBaseHi);
} else {
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrLo) BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrLo)
.addReg(Ptr, (RegState)0, llvm::sub_lo); .addReg(Ptr, (RegState)0, llvm::sub_lo);
BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrHi) BuildMI(*BB, MI.getIterator(), DL, TII.get(TargetOpcode::COPY), PtrHi)
.addReg(Ptr, (RegState)0, llvm::sub_hi); .addReg(Ptr, (RegState)0, llvm::sub_hi);
}
// Spill each half to a fresh slot, reload via LDAfi. Same RA- // Spill each half to a fresh slot, reload via LDAfi. Same RA-
// pinning rationale as the i16 LDAptr inserter. // pinning rationale as the i16 LDAptr inserter.
@ -2851,17 +3198,156 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi)) BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
.addReg(PtrHi).addFrameIndex(FIHi).addImm(0); .addReg(PtrHi).addFrameIndex(FIHi).addImm(0);
// Stage the 24-bit address at $E0..$E2: sub_lo at $E0..$E1, // Change 3: $E0/$E2 staging CSE. Look backward in this MBB for
// bank byte (low half of sub_hi) at $E2. We write 16 bits at $E2 // the previous ptr32-deref expansion. If its base halves match
// — the high byte ($E3) gets sub_hi's pad byte (0 by ORCA) — but // ours (same vreg source) and nothing between has clobbered
// only $E2 is consulted by [dp],Y so $E3 contamination is harmless // $E0/$E2/$Y or the staged values, skip the LDAfi+STA_DP pairs
// until something else uses $E3. // and reuse the previously-staged $E0..$E2.
//
// Inserter pattern signature (from below, latest-emitted first):
// STA_DP $E2 (impl A)
// LDAfi <FIHi'> -> A
// STA_DP $E0 (impl A)
// LDAfi <FILo'> -> A
// STAfi <srcHi'>, FIHi', 0 <- prior PtrHi
// STAfi <srcLo'>, FILo', 0 <- prior PtrLo
bool ReuseStaging = false;
{
Register MySrcLo = PeelOff ? PeelBaseLo : Ptr;
Register MySrcHi = PeelOff ? PeelBaseHi : Register();
// For non-peel path, both halves come from `Ptr` via subreg; the
// CSE check uses the whole Ptr vreg (so two LDAptr32 with the
// same Ptr vreg can share staging).
auto It = MI.getIterator();
MachineInstr *PrevStaE2 = nullptr;
MachineInstr *PrevLdaHi = nullptr;
MachineInstr *PrevStaE0 = nullptr;
MachineInstr *PrevLdaLo = nullptr;
MachineInstr *PrevStaHi = nullptr;
MachineInstr *PrevStaLo = nullptr;
auto clobbersE0E2 = [&](MachineInstr &PrevMI) -> bool {
// Any call clobbers everything in DP — including $E0..$E3.
if (PrevMI.isCall()) return true;
switch (PrevMI.getOpcode()) {
// FrameLowering's long-indirect expansion of these uses $E2
// as A-stash scratch (see W65816RegisterInfo.cpp).
case W65816::ADCfi: case W65816::ADCEfi:
case W65816::ANDfi: case W65816::ORAfi: case W65816::EORfi:
case W65816::SBCfi: case W65816::SBCEfi:
case W65816::CMPfi:
return true;
case W65816::STA_DP:
case W65816::STZ_DP:
if (PrevMI.getOperand(0).isImm()) {
int64_t Imm = PrevMI.getOperand(0).getImm();
if (Imm == 0xE0 || Imm == 0xE1 ||
Imm == 0xE2 || Imm == 0xE3)
return true;
}
break;
}
return false;
};
// Scan back, fail-soft.
const unsigned MaxScan = 60;
unsigned Scanned = 0;
while (It != BB->begin() && Scanned++ < MaxScan) {
--It;
MachineInstr &P = *It;
if (!PrevStaE2) {
if (P.getOpcode() == W65816::STA_DP &&
P.getOperand(0).isImm() &&
P.getOperand(0).getImm() == 0xE2) {
PrevStaE2 = &P;
continue;
}
if (clobbersE0E2(P)) break;
continue;
}
// After PrevStaE2, expect LDAfi <FIHi'>.
if (!PrevLdaHi) {
if (P.getOpcode() == W65816::LDAfi) { PrevLdaHi = &P; continue; }
break;
}
if (!PrevStaE0) {
if (P.getOpcode() == W65816::STA_DP &&
P.getOperand(0).isImm() &&
P.getOperand(0).getImm() == 0xE0) {
PrevStaE0 = &P;
continue;
}
break;
}
if (!PrevLdaLo) {
if (P.getOpcode() == W65816::LDAfi) { PrevLdaLo = &P; continue; }
break;
}
// Now look for STAfi srcHi', FIHi' and STAfi srcLo', FILo'.
// They appear in either order; the inserter above emits Lo first
// then Hi, but scanning back, we hit Hi first.
if (!PrevStaHi) {
if (P.getOpcode() == W65816::STAfi &&
P.getOperand(1).isFI() &&
P.getOperand(1).getIndex() ==
PrevLdaHi->getOperand(1).getIndex()) {
PrevStaHi = &P;
continue;
}
break;
}
if (!PrevStaLo) {
if (P.getOpcode() == W65816::STAfi &&
P.getOperand(1).isFI() &&
P.getOperand(1).getIndex() ==
PrevLdaLo->getOperand(1).getIndex()) {
PrevStaLo = &P;
// Done with the structural match — fall through to operand
// comparison.
}
break;
}
}
if (PrevStaLo && PrevStaHi) {
Register PrevSrcLo = PrevStaLo->getOperand(0).getReg();
Register PrevSrcHi = PrevStaHi->getOperand(0).getReg();
// Match if the source vregs are identical to mine. For non-peel
// path, PtrLo/PtrHi were freshly created via COPY from Ptr.sub_*
// — match by tracing PrevSrcLo/Hi back through their COPY (if
// any) to the Ptr vreg.
auto traceToPtr = [&](Register R) -> Register {
if (!R.isVirtual()) return R;
MachineInstr *D = MRI.getUniqueVRegDef(R);
while (D && D->isCopy()) {
const MachineOperand &S = D->getOperand(1);
if (!S.isReg() || !S.getReg().isVirtual()) break;
R = S.getReg();
D = MRI.getUniqueVRegDef(R);
// For subreg copies, stop — we'd lose sub-half info.
if (D && D->getOpcode() == TargetOpcode::REG_SEQUENCE) break;
}
return R;
};
Register MyTraceLo = traceToPtr(PeelOff ? PeelBaseLo : PtrLo);
Register MyTraceHi = traceToPtr(PeelOff ? PeelBaseHi : PtrHi);
Register PrevTraceLo = traceToPtr(PrevSrcLo);
Register PrevTraceHi = traceToPtr(PrevSrcHi);
if (MyTraceLo == PrevTraceLo && MyTraceHi == PrevTraceHi &&
MyTraceLo.isValid() && MyTraceHi.isValid()) {
ReuseStaging = true;
}
}
(void)MySrcLo; (void)MySrcHi; // not used directly; trace covers
}
// Stage the 24-bit address at $E0..$E2 unless CSE allows reusing
// the previous staging.
// STA_DP's tablegen def has no implicit A Use, so without an // STA_DP's tablegen def has no implicit A Use, so without an
// explicit kill marker between adjacent LDAfi-STA_DP-LDAfi-STA_DP // explicit kill marker between adjacent LDAfi-STA_DP-LDAfi-STA_DP
// pairs the fast regalloc collapses two A-loads into one (the // pairs the fast regalloc collapses two A-loads into one (the
// first's value is overwritten before STA_DP can store it). Add // first's value is overwritten before STA_DP can store it). Add
// implicit Use of A on the STA_DP to encode the dependency. This // implicit Use of A on the STA_DP to encode the dependency. This
// also helps post-RA passes track A liveness correctly. // also helps post-RA passes track A liveness correctly.
if (!ReuseStaging) {
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi), BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
W65816::A).addFrameIndex(FILo).addImm(0); W65816::A).addFrameIndex(FILo).addImm(0);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
@ -2872,11 +3358,12 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DP)).addImm(0xE2) TII.get(W65816::STA_DP)).addImm(0xE2)
.addReg(W65816::A, RegState::Implicit); .addReg(W65816::A, RegState::Implicit);
}
if (IsLoad) { if (IsLoad) {
Register Dst = MI.getOperand(0).getReg(); Register Dst = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0); TII.get(W65816::LDY_Imm16)).addImm(PeelOff);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0); TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
@ -2886,7 +3373,7 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val); TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0); TII.get(W65816::LDY_Imm16)).addImm(PeelOff);
if (IsByteStore) if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::SEP)).addImm(0x20); TII.get(W65816::SEP)).addImm(0x20);
@ -2897,16 +3384,31 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
TII.get(W65816::REP)).addImm(0x20); TII.get(W65816::REP)).addImm(0x20);
} }
MI.eraseFromParent(); MI.eraseFromParent();
if (DeadPtrDef) DeadPtrDef->eraseFromParent();
if (DeadLoDef) DeadLoDef->eraseFromParent();
if (DeadHiDef) DeadHiDef->eraseFromParent();
for (MachineInstr *D : ExtraChainDeads) D->eraseFromParent();
return BB; return BB;
} }
case W65816::LDAptr32Off: case W65816::LDAptr32Off:
case W65816::STAptr32Off: case W65816::STAptr32Off:
case W65816::STBptr32Off: { case W65816::STBptr32Off: {
// ptr32 deref with constant offset. Compute (sub_lo + off) into A // ptr32 deref with constant offset. The 65816's `[dp],Y` adds Y
// with CLC; ADC, store at $E0..$E1; then propagate the carry into // to the 24-bit pointer at `dp..dp+2` to form the effective
// the bank byte via ADC #0 on (sub_hi) and store at $E2. Carry // address — so we can stage the RAW pointer at $E0..$E2 and put
// propagation is conservatively always emitted — bank wrapping is // the offset in Y, skipping the i32-add carry chain entirely.
// rare but real (bank-spanning struct or negative offset). //
// Saves ~3 instructions per access vs the previous approach
// (which did `lo+off; hi+carry` to compute the pointer then
// derefed with Y=0). Big win on heavy struct-field code like
// Lua's lapi.c. See memory: ptr32-deref-fold-layer1-mi-opcodes.
//
// Bank-wrap caveat: `[dp],Y` doesn't propagate Y into the bank
// byte at $E2 — if pointer+Y crosses a bank boundary, the result
// wraps within the 24-bit address space (not into the next bank).
// For struct fields with offsets < 64KB on malloc'd or globally-
// allocated objects that don't straddle bank boundaries this is
// safe; the caller must not place objects spanning $XX:FFFF.
// //
// Dead unless ptr32 mode is active. // Dead unless ptr32 mode is active.
MachineFunction *MF = BB->getParent(); MachineFunction *MF = BB->getParent();
@ -2936,28 +3438,22 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi)) BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::STAfi))
.addReg(PtrHi).addFrameIndex(FIHi).addImm(0); .addReg(PtrHi).addFrameIndex(FIHi).addImm(0);
// (sub_lo + off) -> $E0..$E1 // ptr_lo -> $E0..$E1 (no offset add)
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi), BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
W65816::A).addFrameIndex(FILo).addImm(0); W65816::A).addFrameIndex(FILo).addImm(0);
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::CLC));
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::ADC_Imm16)).addImm(Off);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DP)).addImm(0xE0); TII.get(W65816::STA_DP)).addImm(0xE0);
// (sub_hi + 0 + carry) -> $E2..$E3. ADC #0 picks up the carry // ptr_hi -> $E2..$E3 (no carry propagation needed)
// from the previous ADC; if no carry, sub_hi is unchanged.
BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi), BuildMI(*BB, MI.getIterator(), DL, TII.get(W65816::LDAfi),
W65816::A).addFrameIndex(FIHi).addImm(0); W65816::A).addFrameIndex(FIHi).addImm(0);
BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::ADC_Imm16)).addImm(0);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::STA_DP)).addImm(0xE2); TII.get(W65816::STA_DP)).addImm(0xE2);
if (IsLoad) { if (IsLoad) {
Register Dst = MI.getOperand(0).getReg(); Register Dst = MI.getOperand(0).getReg();
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0); TII.get(W65816::LDY_Imm16)).addImm(Off);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0); TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
@ -2967,7 +3463,7 @@ W65816TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(TargetOpcode::COPY), W65816::A).addReg(Val); TII.get(TargetOpcode::COPY), W65816::A).addReg(Val);
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::LDY_Imm16)).addImm(0); TII.get(W65816::LDY_Imm16)).addImm(Off);
if (IsByteStore) if (IsByteStore)
BuildMI(*BB, MI.getIterator(), DL, BuildMI(*BB, MI.getIterator(), DL,
TII.get(W65816::SEP)).addImm(0x20); TII.get(W65816::SEP)).addImm(0x20);

View file

@ -125,6 +125,22 @@ def W65816stPtr : SDNode<"W65816ISD::ST_PTR", SDT_W65816StPtr,
def W65816stbPtr : SDNode<"W65816ISD::STB_PTR", SDT_W65816StPtr, def W65816stbPtr : SDNode<"W65816ISD::STB_PTR", SDT_W65816StPtr,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>; [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
// `Off` siblings: ptr32 deref with a folded constant offset. Offset
// is i16 (TargetConstant) k must fit Y's 16-bit unsigned range and
// not span a bank boundary (caller's responsibility).
def SDT_W65816LdPtrOff
: SDTypeProfile<1, 2,
[SDTCisVT<0, i16>, SDTCisVT<1, i32>, SDTCisVT<2, i16>]>;
def SDT_W65816StPtrOff
: SDTypeProfile<0, 3,
[SDTCisVT<0, i16>, SDTCisVT<1, i32>, SDTCisVT<2, i16>]>;
def W65816ldPtrOff : SDNode<"W65816ISD::LD_PTR_OFF", SDT_W65816LdPtrOff,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def W65816stPtrOff : SDNode<"W65816ISD::ST_PTR_OFF", SDT_W65816StPtrOff,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def W65816stbPtrOff : SDNode<"W65816ISD::STB_PTR_OFF", SDT_W65816StPtrOff,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//
// Pseudo Instructions // Pseudo Instructions
//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//
@ -1231,11 +1247,22 @@ def : Pat<(i8 (load AnyWide32:$ptr)),
def : Pat<(store Acc8:$val, AnyWide32:$ptr), def : Pat<(store Acc8:$val, AnyWide32:$ptr),
(STBptr32 (COPY_TO_REGCLASS Acc8:$val, Acc16), AnyWide32:$ptr)>; (STBptr32 (COPY_TO_REGCLASS Acc8:$val, Acc16), AnyWide32:$ptr)>;
// Off variants folded constant-offset add patterns deferred until // Off variants fold a constant-offset add into the ptr32 deref.
// ptr32 mode is activated and we can profile real cases. The base // SDAG `(load (add ptr, K))` lowers directly to LDAptr32Off($ptr, K)
// LDAptr32/STAptr32 pseudos handle the general (add ptr, off) case // which puts K in Y and does `[E0],Y` saves the i32-add carry chain
// correctly via a separate i32 ADD; the Off pseudos are an optional // over splitting into a separate ADD + LDAptr32. Wins big on heavy
// optimization for small constant offsets. // struct-field access (Lua's lapi.c: 11× ~3× Calypsi). See memory:
// feedback_ptr32_deref_fold_layer1_mi.md.
// LowerLoad / LowerStore peel `(add Wide32, const)` from the address
// in ptr32 mode and emit W65816ldPtrOff / stPtrOff / stbPtrOff
// memintrinsics carrying base ptr + i16 immediate offset separately
// so we can match them here.
def : Pat<(W65816ldPtrOff AnyWide32:$ptr, (i16 timm:$off)),
(LDAptr32Off AnyWide32:$ptr, imm:$off)>;
def : Pat<(W65816stPtrOff Acc16:$val, AnyWide32:$ptr, (i16 timm:$off)),
(STAptr32Off Acc16:$val, AnyWide32:$ptr, imm:$off)>;
def : Pat<(W65816stbPtrOff Acc16:$val, AnyWide32:$ptr, (i16 timm:$off)),
(STBptr32Off Acc16:$val, AnyWide32:$ptr, imm:$off)>;
// Split-pair variants: same semantics as LDAptr32/STAptr32/STBptr32 but // Split-pair variants: same semantics as LDAptr32/STAptr32/STBptr32 but
// the ptr is two separate i16 register operands (lo + hi) instead of // the ptr is two separate i16 register operands (lo + hi) instead of

View file

@ -191,6 +191,29 @@ bool W65816PromoteFiToImg::runOnMachineFunction(MachineFunction &MF) {
} }
if (AccessCount.empty()) return false; if (AccessCount.empty()) return false;
// Blocklist FIs referenced by LDAfi_indY / STAfi_indY (Layer 2
// ptr32 stack-rel-indirect-Y path). Those ops use the FI's
// stack-slot offset directly in the `(d,S),Y` opcode encoding;
// promoting the FI to a DP slot leaves the indirect deref reading
// an invalid stack offset. Discovered via strLen/strcpy: the
// benchmark looked great (170 cyc!) because the loop exited
// immediately off a garbage zero byte at the un-promoted slot
// offset where Layer 2 expected the pointer to be.
for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {
unsigned Opc = MI.getOpcode();
if (Opc != W65816::LDAfi_indY && Opc != W65816::STAfi_indY)
continue;
// LDAfi_indY: (outs dst, ins FI, off). FI is operand 1.
// STAfi_indY: (outs , ins src, FI, off). FI is operand 1.
const MachineOperand &MO = MI.getOperand(1);
if (!MO.isFI()) continue;
AccessCount.erase(MO.getIndex());
AccessSites.erase(MO.getIndex());
}
}
if (AccessCount.empty()) return false;
// 2. Determine which IMG0..7 slots are already in use (caller-save). // 2. Determine which IMG0..7 slots are already in use (caller-save).
// Use caller-save IMG0..7 instead of callee-save IMG8..15: this lets // Use caller-save IMG0..7 instead of callee-save IMG8..15: this lets
// us skip ImgCalleeSave entirely (no prologue/epilogue overhead). // us skip ImgCalleeSave entirely (no prologue/epilogue overhead).

View file

@ -427,8 +427,80 @@ bool W65816RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
case W65816::ORAfi: NewOpc = W65816::ORA_StackRel; break; case W65816::ORAfi: NewOpc = W65816::ORA_StackRel; break;
case W65816::EORfi: NewOpc = W65816::EOR_StackRel; break; case W65816::EORfi: NewOpc = W65816::EOR_StackRel; break;
case W65816::CMPfi: NewOpc = W65816::CMP_StackRel; break; case W65816::CMPfi: NewOpc = W65816::CMP_StackRel; break;
case W65816::LDAfi_indY: NewOpc = W65816::LDA_StackRelIndY; break; case W65816::LDAfi_indY:
case W65816::STAfi_indY: NewOpc = W65816::STA_StackRelIndY; break; case W65816::STAfi_indY: {
// (d,S),Y indirect via 8-bit d-byte. If the slot offset fits in
// 8 bits, fall through to the normal NewOpc-based lowering below.
// Else we need a fallback: long-indirect the slot read via $F6,
// stage the resulting 16-bit pointer at $E0 (bank = 0), then
// [dp],Y the actual deref. Y is the caller's offset value and
// must be preserved across the slot read; stash it at $FA.
bool IsLoad = MI.getOpcode() == W65816::LDAfi_indY;
int FI = MI.getOperand(FIOperandNum).getIndex();
int FrameOffset = MFI.getObjectOffset(FI);
int ImmOffset = MI.getOperand(FIOperandNum + 1).getImm();
int Offset = FrameOffset + ImmOffset + (int)MFI.getStackSize() + SPAdj;
if (FrameOffset < 0) Offset += 1;
if (Offset >= 0 && Offset <= 0xFF && !MFI.hasVarSizedObjects()) {
NewOpc = IsLoad ? W65816::LDA_StackRelIndY
: W65816::STA_StackRelIndY;
break;
}
// Fallback path. Requires usesDpFP (we need [$F6],Y for the slot
// read). If no DpFP, fall through to fatal error — caller can
// either drop the Layer 2 flag or use a smaller frame.
if (!MF.getInfo<W65816MachineFunctionInfo>()->getUsesDpFP())
report_fatal_error(
"W65816: LDAfi_indY/STAfi_indY needs usesDpFP for >0xFF slots");
int FPOff = FrameOffset + ImmOffset + (int)MFI.getStackSize();
if (FrameOffset < 0) FPOff += 1;
DebugLoc DL = MI.getDebugLoc();
MachineBasicBlock &MBB = *MI.getParent();
// STY $FA — save caller's Y.
BuildMI(MBB, II, DL, TII.get(W65816::STY_DP)).addImm(0xFA)
.addReg(W65816::Y, RegState::Implicit);
// LDY #FPOff — set Y for slot read.
BuildMI(MBB, II, DL, TII.get(W65816::LDY_Imm16)).addImm(FPOff)
.addReg(W65816::Y, RegState::ImplicitDefine);
if (IsLoad) {
// LDA [$F6],Y — A = 16-bit pointer at slot.
BuildMI(MBB, II, DL, TII.get(W65816::LDA_DPIndLongY)).addImm(0xF6)
.addReg(W65816::A, RegState::ImplicitDefine)
.addReg(W65816::Y, RegState::Implicit);
BuildMI(MBB, II, DL, TII.get(W65816::STA_DP)).addImm(0xE0)
.addReg(W65816::A, RegState::Implicit);
BuildMI(MBB, II, DL, TII.get(W65816::STZ_DP)).addImm(0xE2);
// LDY $FA — restore caller's offset.
BuildMI(MBB, II, DL, TII.get(W65816::LDY_DP)).addImm(0xFA)
.addReg(W65816::Y, RegState::ImplicitDefine);
// LDA [$E0],Y — deref.
BuildMI(MBB, II, DL, TII.get(W65816::LDA_DPIndLongY)).addImm(0xE0)
.addReg(W65816::A, RegState::ImplicitDefine)
.addReg(W65816::Y, RegState::Implicit);
} else {
// For STA: we need A (the value to store) preserved across the
// slot read. Save A to $E0 (we'll overwrite it during slot
// read), restore after.
BuildMI(MBB, II, DL, TII.get(W65816::STA_DP)).addImm(0xFC)
.addReg(W65816::A, RegState::Implicit);
BuildMI(MBB, II, DL, TII.get(W65816::LDA_DPIndLongY)).addImm(0xF6)
.addReg(W65816::A, RegState::ImplicitDefine)
.addReg(W65816::Y, RegState::Implicit);
BuildMI(MBB, II, DL, TII.get(W65816::STA_DP)).addImm(0xE0)
.addReg(W65816::A, RegState::Implicit);
BuildMI(MBB, II, DL, TII.get(W65816::STZ_DP)).addImm(0xE2);
BuildMI(MBB, II, DL, TII.get(W65816::LDY_DP)).addImm(0xFA)
.addReg(W65816::Y, RegState::ImplicitDefine);
// Reload A.
BuildMI(MBB, II, DL, TII.get(W65816::LDA_DP)).addImm(0xFC)
.addReg(W65816::A, RegState::ImplicitDefine);
BuildMI(MBB, II, DL, TII.get(W65816::STA_DPIndLongY)).addImm(0xE0)
.addReg(W65816::A, RegState::Implicit)
.addReg(W65816::Y, RegState::Implicit);
}
MI.eraseFromParent();
return true;
}
case W65816::STA8fi: { case W65816::STA8fi: {
// i8 truncating store via stack-rel. Wrap the store in // i8 truncating store via stack-rel. Wrap the store in
// SEP #$20 / STA d,S / REP #$20 so only one byte is written. We // SEP #$20 / STA d,S / REP #$20 so only one byte is written. We

View file

@ -529,6 +529,20 @@ bool W65816SepRepCleanup::runOnMachineFunction(MachineFunction &MF) {
// a slot we write (in-gap reads of our writes would observe // a slot we write (in-gap reads of our writes would observe
// a stale value after hoist; in-gap writes to our reads would // a stale value after hoist; in-gap writes to our reads would
// produce a different value if hoisted before). // produce a different value if hoisted before).
auto isStackRelIndYRead = [](unsigned O) {
switch (O) {
case W65816::LDA_StackRelIndY:
case W65816::ADC_StackRelIndY:
case W65816::SBC_StackRelIndY:
case W65816::CMP_StackRelIndY:
case W65816::AND_StackRelIndY:
case W65816::ORA_StackRelIndY:
case W65816::EOR_StackRelIndY:
case W65816::STA_StackRelIndY:
return true;
}
return false;
};
auto Back = Php; auto Back = Php;
if (Back == MBB.begin()) { ++It; continue; } if (Back == MBB.begin()) { ++It; continue; }
--Back; --Back;
@ -549,6 +563,19 @@ bool W65816SepRepCleanup::runOnMachineFunction(MachineFunction &MF) {
int64_t off = Back->getOperand(0).getImm(); int64_t off = Back->getOperand(0).getImm();
if (WriteSlots.count(off)) { gapOK = false; break; } if (WriteSlots.count(off)) { gapOK = false; break; }
} }
// *_StackRelIndY ops use their slot operand AS A POINTER for
// the `(d,S),Y` deref. Hoisting a STA WriteSlot above an
// IndY use of that slot changes which value the IndY reads
// through. Forbid the hoist in that case. Caught by Layer 2
// ptr32 sumByteToZero loop: PHP-wrapped `LDA stack.3, 1; STA
// stack.4` was being hoisted across `LDA_StackRelIndY stack.4`,
// making the deref use stack.3's NEW value instead of the
// LAGGED stack.4 value — off-by-one summing the byte stream.
if (isStackRelIndYRead(BO) &&
Back->getNumOperands() >= 1 && Back->getOperand(0).isImm()) {
int64_t off = Back->getOperand(0).getImm();
if (WriteSlots.count(off)) { gapOK = false; break; }
}
// Bail on call / branch / asm. // Bail on call / branch / asm.
if (Back->isCall() || Back->isBranch() || if (Back->isCall() || Back->isBranch() ||
Back->isReturn() || Back->isInlineAsm()) { Back->isReturn() || Back->isInlineAsm()) {
@ -598,6 +625,143 @@ bool W65816SepRepCleanup::runOnMachineFunction(MachineFunction &MF) {
} }
} }
// Lagged-ptr PHI-copy sink. In strLen / strcpy / sumByteToZero
// loop bodies, the deref reads slot B (the "lagged" PHI value)
// while slot A holds the just-incremented iter. At end of body,
// a PHP/PLP-wrapped `LDA slot A ; STA slot B` propagates the new
// iter to slot B for next iter. The wrap costs 8 cyc/iter (PHP +
// PLP) plus 8 cyc for the LDA/STA pair.
//
// Equivalent rewrite: at the start of the body, BEFORE the
// iter++, A already holds slot A's OLD value (loaded for the
// INA). Insert `STA slot B` THERE — it copies OLD iter to slot
// B, matching the lagged semantic. Slot B is no longer touched
// at end of body, so the PHP/PLP wrap (+ its LDA/PLP/STA tail)
// can be erased. Net: -11 cyc/iter on strLen (44 chars → -484
// cyc / -20%).
//
// Pattern at end of MBB (immediately before terminator):
// ANDi #imm ; flag-setter
// PHP
// LDA_StackRel SrcOff ; reload iter NEW (SrcOff is
// PHP-bumped: actually =
// IterSlotOff + 1)
// PLP
// STA_StackRel DstOff ; slot B = iter NEW
// Bxx ... ; conditional branch
//
// Earlier in MBB:
// LDA_StackRel IterSlotOff ; A = OLD iter
// INA_PSEUDO (or ADCi16imm 1) ; iter++
// STA_StackRel IterSlotOff ; iter = NEW
//
// Rewrite: insert `STA_StackRel DstOff` right after the LDA
// (between LDA and INA). Erase the PHP/LDA/PLP/STA + the
// ANDi-after-PHP wrap entirely. The ANDi at the front is kept
// since it's also the BNE's flag source.
{
auto isCondBranch = [](const MachineInstr &MI) {
unsigned O = MI.getOpcode();
return O == W65816::BNE || O == W65816::BEQ ||
O == W65816::BCC || O == W65816::BCS ||
O == W65816::BMI || O == W65816::BPL ||
O == W65816::BVC || O == W65816::BVS;
};
auto isFlagSetter = [](const MachineInstr &MI) {
unsigned O = MI.getOpcode();
return O == W65816::ANDi16imm || O == W65816::ANDi8imm ||
O == W65816::ORAi16imm || O == W65816::EORi16imm;
};
// Find Bxx terminator.
MachineInstr *Bxx = nullptr;
for (auto It = MBB.rbegin(); It != MBB.rend(); ++It) {
if (isCondBranch(*It)) { Bxx = &*It; break; }
if (It->isBranch()) break; // BRA etc. — skip past it
}
if (!Bxx) goto skip_lagged_sink;
{
// Walk backward from Bxx to find STA, PLP, LDA, PHP.
auto It2 = MachineBasicBlock::iterator(Bxx);
if (It2 == MBB.begin()) goto skip_lagged_sink;
--It2; // first non-branch
if (It2->getOpcode() != W65816::STA_StackRel ||
!It2->getOperand(0).isImm()) goto skip_lagged_sink;
MachineInstr *FinalSta = &*It2;
int64_t DstOff = FinalSta->getOperand(0).getImm();
if (It2 == MBB.begin()) goto skip_lagged_sink;
--It2;
if (It2->getOpcode() != W65816::PLP) goto skip_lagged_sink;
MachineInstr *Plp2 = &*It2;
if (It2 == MBB.begin()) goto skip_lagged_sink;
--It2;
if (It2->getOpcode() != W65816::LDA_StackRel ||
!It2->getOperand(0).isImm()) goto skip_lagged_sink;
MachineInstr *InnerLda = &*It2;
int64_t SrcOff = InnerLda->getOperand(0).getImm();
if (It2 == MBB.begin()) goto skip_lagged_sink;
--It2;
if (It2->getOpcode() != W65816::PHP) goto skip_lagged_sink;
MachineInstr *Php2 = &*It2;
if (It2 == MBB.begin()) goto skip_lagged_sink;
--It2;
if (!isFlagSetter(*It2)) goto skip_lagged_sink;
// The PHP-bumped SrcOff is the IterSlotOff + 1.
int64_t IterSlotOff = SrcOff - 1;
// Now find the iter++ sequence earlier in MBB: LDA IterSlotOff;
// INA_PSEUDO; STA IterSlotOff.
MachineInstr *IterLda = nullptr;
MachineInstr *IterIna = nullptr;
MachineInstr *IterSta = nullptr;
for (auto Walk = MBB.begin(); Walk != MachineBasicBlock::iterator(Php2); ++Walk) {
if (Walk->getOpcode() != W65816::LDA_StackRel) continue;
if (!Walk->getOperand(0).isImm() ||
Walk->getOperand(0).getImm() != IterSlotOff) continue;
auto N1 = std::next(Walk);
while (N1 != MBB.end() && N1->isDebugInstr()) ++N1;
if (N1 == MBB.end()) continue;
if (N1->getOpcode() != W65816::INA_PSEUDO &&
N1->getOpcode() != W65816::ADCi16imm) continue;
auto N2 = std::next(N1);
while (N2 != MBB.end() && N2->isDebugInstr()) ++N2;
if (N2 == MBB.end()) continue;
if (N2->getOpcode() != W65816::STA_StackRel) continue;
if (!N2->getOperand(0).isImm() ||
N2->getOperand(0).getImm() != IterSlotOff) continue;
IterLda = &*Walk;
IterIna = &*N1;
IterSta = &*N2;
break;
}
if (!IterLda) goto skip_lagged_sink;
// Safety: make sure DstOff isn't written between IterLda and
// the IndY use of DstOff. Walk forward from IterLda looking
// for STA DstOff (other than our FinalSta) — if found, bail.
for (auto Walk = std::next(MachineBasicBlock::iterator(IterSta));
Walk != MachineBasicBlock::iterator(Php2); ++Walk) {
if (Walk->getOpcode() == W65816::STA_StackRel &&
Walk->getOperand(0).isImm() &&
Walk->getOperand(0).getImm() == DstOff) {
goto skip_lagged_sink;
}
}
// Apply: insert STA_StackRel DstOff right after IterLda,
// BEFORE INA.
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
DebugLoc DL = IterLda->getDebugLoc();
BuildMI(MBB, std::next(MachineBasicBlock::iterator(IterLda)),
DL, TII->get(W65816::STA_StackRel))
.addImm(DstOff)
.addReg(W65816::A, RegState::Implicit);
// Erase PHP, InnerLda, PLP, FinalSta.
Php2->eraseFromParent();
InnerLda->eraseFromParent();
Plp2->eraseFromParent();
FinalSta->eraseFromParent();
Changed = true;
}
skip_lagged_sink:;
}
// i32 += i32 store-bypass. Regalloc materializes the call result // i32 += i32 store-bypass. Regalloc materializes the call result
// (A=lo, X=hi) into Wide32 spill slots before the add, then reads // (A=lo, X=hi) into Wide32 spill slots before the add, then reads
// them back — emitting 4 instructions of redundant store/reload: // them back — emitting 4 instructions of redundant store/reload:

View file

@ -47,6 +47,7 @@
#include "W65816InstrInfo.h" #include "W65816InstrInfo.h"
#include "W65816Subtarget.h" #include "W65816Subtarget.h"
#include "llvm/ADT/DenseMap.h" #include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineFunction.h"
@ -772,12 +773,29 @@ bool W65816StackRelToImg::runOnMachineFunction(MachineFunction &MF) {
} }
} }
} }
if (!SpAdjValid) return false; if (!SpAdjValid) {
ChangedEarly |= elideStoreForwarding(MF);
return ChangedEarly;
}
DenseMap<int64_t, unsigned> AccessCount; DenseMap<int64_t, unsigned> AccessCount;
DenseMap<int64_t, unsigned> ReadCount; DenseMap<int64_t, unsigned> ReadCount;
DenseMap<int64_t, unsigned> WriteCount; DenseMap<int64_t, unsigned> WriteCount;
DenseMap<int64_t, SmallVector<MachineInstr *, 8>> AccessSites; DenseMap<int64_t, SmallVector<MachineInstr *, 8>> AccessSites;
// Slots referenced by *_StackRelIndY (ptr32 Layer 2 deref pseudos).
// Those uses read the slot as a POINTER via `(d,S),Y` — the addressing
// mode REQUIRES stack-rel access; we cannot promote the slot to DP.
DenseSet<int64_t> IndYBlocked;
auto isStackRelIndY = [](unsigned Op) {
switch (Op) {
case W65816::LDA_StackRelIndY: case W65816::STA_StackRelIndY:
case W65816::ADC_StackRelIndY: case W65816::SBC_StackRelIndY:
case W65816::CMP_StackRelIndY: case W65816::AND_StackRelIndY:
case W65816::ORA_StackRelIndY: case W65816::EOR_StackRelIndY:
return true;
}
return false;
};
// Also need to remember the SpAdj at each access site so the rewrite // Also need to remember the SpAdj at each access site so the rewrite
// can compute the right DP address (DP doesn't shift, but we need // can compute the right DP address (DP doesn't shift, but we need
// to know which logical slot the access refers to). // to know which logical slot the access refers to).
@ -801,12 +819,20 @@ bool W65816StackRelToImg::runOnMachineFunction(MachineFunction &MF) {
AccessSites[LogicalOff].push_back(&MI); AccessSites[LogicalOff].push_back(&MI);
SiteSpAdj[&MI] = Sp; SiteSpAdj[&MI] = Sp;
} }
} else if (isStackRelIndY(MI.getOpcode())) {
if (MI.getNumOperands() >= 1 && MI.getOperand(0).isImm()) {
int64_t Off = MI.getOperand(0).getImm();
IndYBlocked.insert(Off + Sp);
}
} }
Sp += miSpDelta(MI); Sp += miSpDelta(MI);
} }
} }
if (AccessCount.empty()) return false; if (AccessCount.empty()) {
ChangedEarly |= elideStoreForwarding(MF);
return ChangedEarly;
}
// 3b. Scan for existing DP-immediate usage in [imgBase..imgBase+0xE]. // 3b. Scan for existing DP-immediate usage in [imgBase..imgBase+0xE].
// Regalloc / backend may have already used these slots (via STX_DP / // Regalloc / backend may have already used these slots (via STX_DP /
@ -863,7 +889,10 @@ bool W65816StackRelToImg::runOnMachineFunction(MachineFunction &MF) {
for (int64_t Off : Ordered) { for (int64_t Off : Ordered) {
if (AccessCount[Off] >= kThreshold) ++HotCount; if (AccessCount[Off] >= kThreshold) ++HotCount;
} }
if (HotCount > kMaxHotSlots) return false; if (HotCount > kMaxHotSlots) {
ChangedEarly |= elideStoreForwarding(MF);
return ChangedEarly;
}
DenseMap<int64_t, uint8_t> OffsetToDp; // logical offset -> DP byte DenseMap<int64_t, uint8_t> OffsetToDp; // logical offset -> DP byte
unsigned NextDpIdx = 0; unsigned NextDpIdx = 0;
// Caller-passed arg slots live ABOVE the return address on the stack; // Caller-passed arg slots live ABOVE the return address on the stack;
@ -876,6 +905,9 @@ bool W65816StackRelToImg::runOnMachineFunction(MachineFunction &MF) {
if (OffsetToDp.size() >= kMaxPromote) break; if (OffsetToDp.size() >= kMaxPromote) break;
// Skip arg slots (offset >= frame_size + 4 from canonical SP). // Skip arg slots (offset >= frame_size + 4 from canonical SP).
if (Off >= ArgSlotMinOff) continue; if (Off >= ArgSlotMinOff) continue;
// Skip slots used as Layer 2 ptr32 deref source: those have
// `(d,S),Y` uses that require stack-rel addressing.
if (IndYBlocked.count(Off)) continue;
// Skip already-used DP slots ($D0..$DE). // Skip already-used DP slots ($D0..$DE).
while (NextDpIdx < 8 && UsedDp.test(NextDpIdx)) ++NextDpIdx; while (NextDpIdx < 8 && UsedDp.test(NextDpIdx)) ++NextDpIdx;
if (NextDpIdx >= 8) break; if (NextDpIdx >= 8) break;

View file

@ -73,6 +73,31 @@ LLVMInitializeW65816Target() {
if (auto *Opt = Opts.lookup("replexitval")) { if (auto *Opt = Opts.lookup("replexitval")) {
Opt->addOccurrence(0, "replexitval", "never"); Opt->addOccurrence(0, "replexitval", "never");
} }
// Default inline-threshold down from 225 to 50. The LLVM default
// is tuned for desktop CPUs where call overhead is high relative
// to inlined-body byte cost. On W65816 a `jsl long:foo` is just
// 4 bytes / ~8 cycles, but each inlined ptr32 deref expands to
// multiple instructions even with Layer 2's stack-rel-indirect-Y.
// The tradeoff inverts: aggressive inlining bloats code without
// commensurate cycle wins.
//
// History of this choice:
// - 225 (LLVM stock): bloats heavily — Lua's lapi.c 2.45x Calypsi
// because index2adr (41 callers) gets copied into every API entry.
// - 75: Lua 0.94x Calypsi total, CoreMark 0.87x — major win.
// - 50 (current): captures CoreMark matrix.o's helper-inlining
// regression (1.37x -> 0.97x at threshold=50) without changing
// the cycle-benchmark numbers. Lua shrinks another 2.5%.
// - 25: further size win (-1.7% on Lua) but pushes near the floor
// where the inliner stops doing anything useful.
//
// The user can still override with `-mllvm -inline-threshold=N`.
// See memory: feedback_lapi_inline_threshold.md +
// feedback_coremark_matrix_test_regression.md.
if (auto *Opt = Opts.lookup("inline-threshold")) {
Opt->addOccurrence(0, "inline-threshold", "50");
}
} }
static Reloc::Model getEffectiveRelocModel(std::optional<Reloc::Model> RM) { static Reloc::Model getEffectiveRelocModel(std::optional<Reloc::Model> RM) {

108
tests/coremark/README.md Normal file
View file

@ -0,0 +1,108 @@
# CoreMark — EEMBC's standard embedded benchmark
CoreMark 1.0 ported to the W65816 / Apple IIgs target. Source is
vendored under `coremark-src/` from
[github.com/eembc/coremark](https://github.com/eembc/coremark).
CoreMark exercises three distinct algorithm families:
1. **Linked list traversal + insert/sort** (`core_list_join.c`)
2. **Matrix init + multiply** (`core_matrix.c`)
3. **State machine** processing a string (`core_state.c`)
…plus utility code (CRC, RNG) in `core_util.c`. Total ~2000 LOC.
This is the embedded benchmark vendors publish CoreMark/MHz scores
against (Cortex-M0, AVR, RISC-V, ...). It's a useful cross-platform
sanity check on our backend's code-quality.
## Files
- `coremark-src/` — vendored EEMBC source (read-only)
- `core_portme.h` / `.c` — W65816 porting layer (timing, malloc,
printf bridge)
- `build.sh` — compile the 5 core .c files + portme
- `runCoreMark.sh` — build + link + run under MAME
## Building
```bash
bash tests/coremark/build.sh --layer2
```
`--layer2` enables `-mllvm -w65816-dbr-safe-ptrs`. This is **required**
to fit the binary in a single bank; without it, text crosses the IO
window at `0xC000`. CoreMark only touches malloc/static-array memory,
so the dbr-safe-ptrs assumption is correct.
Default iteration count is 1 (smallest valid run). Override for
publishable scores:
```bash
ITERATIONS=5 bash tests/coremark/build.sh --layer2
```
CoreMark spec recommends >= 10 seconds of runtime. At ~1 MHz, expect
roughly one iteration per second of in-IIgs time — so iteration counts
of 1060 give a representative score.
## Running
```bash
bash tests/coremark/runCoreMark.sh --layer2
```
The run terminates with `0xC0DE` written to `$025000` on success.
Elapsed VBL ticks (60 Hz) are stored at `$025002` (low/hi halves).
**Note:** running CoreMark under MAME inside this project's restricted
shell crashes MAME (same SIGSEGV as Lua's full interpreter run —
see `feedback_lua_compile_test.md`). The build produces a valid
binary; the run only works in an unrestricted shell. Workaround: copy
`coreMark.bin` out of the sandbox and run with the same
`runInMame.sh` invocation directly.
## Size vs Calypsi (5 core files, ITERATIONS=1, PERFORMANCE_RUN)
| File | Ours (L2+threshold=75) | Calypsi 5.16 | Ratio |
|------|----------------------:|-------------:|------:|
| core_list_join.o | 10,188 | 9,073 | 1.12× |
| core_main.o | 11,656 | 19,772 | 0.59× |
| core_matrix.o | 15,180 | 11,078 | 1.37× |
| core_state.o | 7,348 | 9,944 | 0.74× |
| core_util.o | 3,156 | 4,631 | 0.68× |
| **TOTAL** | **47,528** | **54,498** | **0.87×** |
We beat Calypsi by 13% on CoreMark overall.
## Notes on the porting layer
- `ee_u32` is `unsigned long` (not `unsigned int` — on W65816 `int` is
16-bit; `long` is 32-bit). CoreMark depends on 32-bit `ee_u32` for
CRC and timing math.
- `MEM_METHOD = MEM_STATIC` — a single 2 KB static array in BSS.
Avoids dynamic alloc and the resulting heap-management overhead.
- `start_time` / `stop_time` use `clock()` which returns the 60 Hz VBL
counter. `EE_TICKS_PER_SEC = 60`.
- `HAS_FLOAT = 1` — CoreMark uses double precision for the score
calculation; our soft-double handles it.
- `MULTITHREAD = 1` — single-context. The IIgs doesn't have threads.
## Comparing builds
Lua and CoreMark together cover roughly disjoint code patterns:
| Pattern | Lua | CoreMark |
|---|---|---|
| VM dispatch | yes (`luaV_execute` 30+ case switch) | no |
| Recursive descent parsing | yes (`lparser.c`) | no |
| String + hash table | yes | no |
| Linked-list traversal + sort | (small) | yes |
| Matrix init + multiply | no | yes |
| State machine | (JSON tokenizer in smoke) | yes (formal CoreMark state) |
| CRC | yes (in smoke) | yes |
| Recursion-heavy | yes | no |
So they complement each other for backend coverage. Both now compile
to under-or-near Calypsi size with the standard Layer 2 + threshold=75
config.

48
tests/coremark/build.sh Executable file
View file

@ -0,0 +1,48 @@
#!/usr/bin/env bash
# Build CoreMark for W65816. Compiles the 5 core .c files plus our
# porting layer (core_portme.c). Iteration count baked at compile
# time via -DITERATIONS=N (default 1 — runs ~1 sec at 1 MHz IIgs).
#
# Output: build/*.o for each TU.
#
# Pass --layer2 to enable the dbr-safe-ptrs flag. CoreMark's data is
# all malloc/stack-allocated, so it's safe in this benchmark.
set -eu
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
CM_SRC="$SCRIPT_DIR/coremark-src"
OUT="$SCRIPT_DIR/build"
CLANG="$PROJECT_ROOT/tools/llvm-mos-build/bin/clang"
ITERATIONS="${ITERATIONS:-1}"
LAYER2=()
for arg in "$@"; do
case "$arg" in
--layer2) LAYER2=(-mllvm -w65816-dbr-safe-ptrs) ;;
esac
done
mkdir -p "$OUT"
CFLAGS=(--target=w65816 -O2 -ffunction-sections
-I "$PROJECT_ROOT/runtime/include"
-I "$CM_SRC" -I "$SCRIPT_DIR"
"-DITERATIONS=$ITERATIONS"
"-DPERFORMANCE_RUN=1"
"${LAYER2[@]}")
CORE_FILES="core_list_join core_main core_matrix core_state core_util"
for f in $CORE_FILES; do
echo " CC $f.c"
"$CLANG" "${CFLAGS[@]}" -c "$CM_SRC/$f.c" -o "$OUT/$f.o"
done
echo " CC core_portme.c"
"$CLANG" "${CFLAGS[@]}" -c "$SCRIPT_DIR/core_portme.c" -o "$OUT/core_portme.o"
echo ""
echo "CoreMark built: $(ls "$OUT"/*.o | wc -l) objects, $(du -sh "$OUT" | cut -f1) total"
echo " ITERATIONS = $ITERATIONS"
echo " Layer 2: ${LAYER2[*]:-off}"

View file

@ -0,0 +1,89 @@
// CoreMark porting layer for W65816. See core_portme.h for config.
#include "coremark.h"
#include "core_portme.h"
#include <time.h>
#include <stdio.h>
#if VALIDATION_RUN
volatile ee_s32 seed1_volatile = 0x3415;
volatile ee_s32 seed2_volatile = 0x3415;
volatile ee_s32 seed3_volatile = 0x66;
#endif
#if PERFORMANCE_RUN
volatile ee_s32 seed1_volatile = 0x0;
volatile ee_s32 seed2_volatile = 0x0;
volatile ee_s32 seed3_volatile = 0x66;
#endif
#if PROFILE_RUN
volatile ee_s32 seed1_volatile = 0x8;
volatile ee_s32 seed2_volatile = 0x8;
volatile ee_s32 seed3_volatile = 0x8;
#endif
volatile ee_s32 seed4_volatile = ITERATIONS;
volatile ee_s32 seed5_volatile = 0;
// Timing — uses libc's clock() (VBL counter at 60 Hz).
#define GETMYTIME(_t) (*_t = (CORETIMETYPE)clock())
#define MYTIMEDIFF(fin, ini) ((fin) - (ini))
#define TIMER_RES_DIVIDER 1
#define SAMPLE_TIME_IMPLEMENTATION 1
#define EE_TICKS_PER_SEC (CLOCKS_PER_SEC / TIMER_RES_DIVIDER)
static CORETIMETYPE start_time_val;
static CORETIMETYPE stop_time_val;
void start_time(void) {
GETMYTIME(&start_time_val);
}
void stop_time(void) {
GETMYTIME(&stop_time_val);
}
CORE_TICKS get_time(void) {
return (CORE_TICKS)MYTIMEDIFF(stop_time_val, start_time_val);
}
secs_ret time_in_secs(CORE_TICKS ticks) {
return ((secs_ret)ticks) / (secs_ret)EE_TICKS_PER_SEC;
}
ee_u32 default_num_contexts = 1;
void portable_init(core_portable *p, int *argc, char *argv[]) {
(void)argc;
(void)argv;
// Sentinel BEFORE benchmark: 0xBEEF. MAME can confirm we got here
// even if CoreMark hangs mid-iteration.
*(volatile unsigned short *)0x025000 = 0xBEEF;
p->portable_id = 1;
}
void portable_fini(core_portable *p) {
p->portable_id = 0;
// Final sentinel + elapsed-tick snapshot. CoreMark called
// stop_time() then computed total_time before printing — we
// pick up its captured stop_time_val.
CORE_TICKS elapsed = get_time();
*(volatile unsigned short *)0x025002 = (unsigned short)(elapsed & 0xFFFF);
*(volatile unsigned short *)0x025004 = (unsigned short)((elapsed >> 16) & 0xFFFF);
// 0xC0DE = "CoreMark completed without aborting".
*(volatile unsigned short *)0x025000 = 0xC0DE;
}
// ee_printf is referenced from core_main.c; bridge to printf.
int ee_printf(const char *fmt, ...) {
extern int vprintf(const char *, __builtin_va_list);
__builtin_va_list ap;
__builtin_va_start(ap, fmt);
int r = vprintf(fmt, ap);
__builtin_va_end(ap);
return r;
}

View file

@ -0,0 +1,96 @@
// CoreMark porting layer for the W65816 / Apple IIgs target.
// Adapts the barebones template:
// - clock() via VBL counter at 60 Hz (CLOCKS_PER_SEC=60)
// - printf via our libc
// - MEM_STATIC (heap-free)
// - HAS_FLOAT=1 (soft-double; needed for the score calc)
// - Single-threaded (MULTITHREAD=1)
#ifndef CORE_PORTME_H
#define CORE_PORTME_H
#define HAS_FLOAT 1
#define HAS_TIME_H 1
#define USE_CLOCK 1
#define HAS_STDIO 1
#define HAS_PRINTF 1
#define COMPILER_VERSION "llvm816"
#define COMPILER_FLAGS "-O2 -ffunction-sections (W65816 backend)"
#define MEM_LOCATION "STATIC"
typedef signed short ee_s16;
typedef unsigned short ee_u16;
// On W65816, `int` is 16-bit and `long` is 32-bit. CoreMark needs
// genuine 32-bit ee_s32/ee_u32 (CRCs, timings, iteration counts).
typedef signed long ee_s32;
typedef double ee_f32;
typedef unsigned char ee_u8;
typedef unsigned long ee_u32;
typedef ee_u32 ee_ptr_int;
typedef unsigned long ee_size_t;
#define NULL ((void *)0)
#define align_mem(x) (void *)(4 + (((ee_ptr_int)(x)-1) & ~3))
// ITERATIONS — keep small for the 1 MHz IIgs. CoreMark spec wants
// >= 10 sec of runtime; at this clock that's order-of-magnitude
// 100 iterations. Run with ITERATIONS=1 first to validate, then
// scale up. The user can pass -DITERATIONS=N to override.
#ifndef ITERATIONS
#define ITERATIONS 1
#endif
// Timing. CLOCKS_PER_SEC = 60 (VBL). Use TIMER_RES_DIVIDER=1 since
// 60 Hz is already coarse.
#define CORETIMETYPE ee_u32
typedef ee_u32 CORE_TICKS;
typedef double secs_ret;
#define SECS_VAL_FMT "%.2f"
// Seeds via volatile (default).
#ifndef SEED_METHOD
#define SEED_METHOD SEED_VOLATILE
#endif
// Memory: static block. Heap-free, avoids dynamic alloc.
#ifndef MEM_METHOD
#define MEM_METHOD MEM_STATIC
#endif
#ifndef MULTITHREAD
#define MULTITHREAD 1
#define USE_PTHREAD 0
#define USE_FORK 0
#define USE_SOCKET 0
#endif
#ifndef MAIN_HAS_NOARGC
#define MAIN_HAS_NOARGC 1
#endif
#ifndef MAIN_HAS_NORETURN
#define MAIN_HAS_NORETURN 0
#endif
extern ee_u32 default_num_contexts;
typedef struct CORE_PORTABLE_S {
ee_u8 portable_id;
} core_portable;
void portable_init(core_portable *p, int *argc, char *argv[]);
void portable_fini(core_portable *p);
#if !defined(PROFILE_RUN) && !defined(PERFORMANCE_RUN) \
&& !defined(VALIDATION_RUN)
#if (TOTAL_DATA_SIZE == 1200)
#define PROFILE_RUN 1
#elif (TOTAL_DATA_SIZE == 2000)
#define PERFORMANCE_RUN 1
#else
#define VALIDATION_RUN 1
#endif
#endif
#endif /* CORE_PORTME_H */

View file

@ -0,0 +1,18 @@
name: Check MD5s and make
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check md5sums
run: md5sum -c coremark.md5
- name: make
run: make

View file

@ -0,0 +1,100 @@
# COREMARK® ACCEPTABLE USE AGREEMENT
This ACCEPTABLE USE AGREEMENT (this “Agreement”) is offered by Embedded Microprocessor Benchmark Consortium, a California nonprofit corporation (“Licensor”), to users of its CoreMark® software (“Licensee”) exclusively on the following terms.
Licensor offers benchmarking software (“Software”) pursuant to an open source license, but carefully controls use of its benchmarks and their associated goodwill. Licensor has registered its trademark in one of the benchmarks available through the Software, COREMARK, Ser. No. 85/487,290; Reg. No. 4,179,307 (the “Trademark”), and promotes the use of a standard metric as a benchmark for assessing the performance of embedded systems. Solely on the terms described herein, Licensee may use and display the Trademark in connection with the generation of data regarding measurement and analysis of computer and embedded system benchmarking via the Software (the “Licensed Use”).
## Article 1 License Grant.
1.1. License. Subject to the terms and conditions of this Agreement, Licensor hereby grants to Licensee, and Licensee hereby accepts from Licensor, a personal, non-exclusive, royalty-free, revocable right and license to use and display the Trademark during the term of this Agreement (the “Term”), solely and exclusively in connection with the Licensed Use. During the Term, Licensee (i) shall not modify or otherwise create derivative works of the Trademark, and (ii) may use the Trademark only to the extent permitted under this License. Neither Licensee nor any affiliate or agent thereof shall otherwise use the Trademark without the prior express written consent of Licensor, which may be withheld in its sole and absolute discretion. All rights not expressly granted to Licensee hereunder shall remain the exclusive property of Licensor.
1.2. Modifications to the Software. Licensee shall not use the Trademark in connection with any use of a modified, derivative, or otherwise altered copy of the Software.
1.3. Licensors Use. Nothing in this Agreement shall preclude Licensor or any of its successors or assigns from using or permitting other entities to use the Trademark, whether or not such entity directly or indirectly competes or conflicts with Licensees Licensed Use in any manner.
1.4. Term and Termination. This Agreement is perpetual unless terminated by either of the parties. Licensee may terminate this Agreement for convenience, without cause or liability, for any reason or for no reason whatsoever, upon ten (10) business days written notice. Licensor may terminate this Agreement effective immediately upon notice of breach. Upon termination, Licensee shall immediately remove all implementations of the Trademark from the Licensed Use, and delete all digitals files and records of all materials related to the Trademark.
## Article 2 Ownership.
2.1. Ownership. Licensee acknowledges and agrees that Licensor is the owner of all right, title, and interest in and to the Trademark, and all such right, title, and interest shall remain with Licensor. Licensee shall not contest, dispute, challenge, oppose, or seek to cancel Licensors right, title, and interest in and to the Trademark. Licensee shall not prosecute any application for registration of the Trademark. Licensee shall display appropriate notices regarding ownership of the Trademark in connection with the Licensed Use.
2.2. Goodwill. Licensee acknowledges that Licensee shall not acquire any right, title, or interest in the Trademark by virtue of this Agreement other than the license granted hereunder, and disclaims any such right, title, interest, or ownership. All goodwill and reputation generated by Licensees use of the Trademark shall inure to the exclusive benefit of Licensor. Licensee shall not by any act or omission use the Trademark in any manner that disparages or reflects adversely on Licensor or its Licensed Use or reputation. Licensee shall not take any action that would interfere with or prejudice Licensors ownership or registration of the Trademark, the validity of the Trademark or the validity of the license granted by this Agreement. If Licensor determines and notifies Licensee that any act taken in connection with the Licensed Use (i) is inaccurate, unlawful or offensive to good taste; (ii) fails to provide for proper trademark notices, or (iii) otherwise violates Licensees obligations under this Agreement, the license granted under this Agreement shall terminate.
## Article 3 Indemnification.
3.1. Indemnification Generally. Licensee agrees to indemnify, defend, and hold harmless (collectively “indemnify” or “indemnification”) Licensor, including Licensors members, managers, officers, and employees (collectively “Related Persons”), from and against, and pay or reimburse Licensor and such Related Persons for, any and all third-party actions, claims, demands, proceedings, investigations, inquiries (collectively, “Claims”), and any and all liabilities, obligations, fines, deficiencies, costs, expenses, royalties, losses, and damages (including reasonable outside counsel fees and expenses) associated with such Claims, to the extent that such Claim arises out of (i) Licensees material breach of this Agreement, or (ii) any allegation(s) that Licensees actions infringe or violate any third-party intellectual property right, including without limitation, any U.S. copyright, patent, or trademark, or are otherwise found to be tortious or criminal (whether or not such indemnified person is a named party in a legal proceeding).
3.2. Notice and Defense of Claims. Licensor shall promptly notify Licensee of any Claim for which indemnification is sought, following actual knowledge of such Claim, provided however that the failure to give such notice shall not relieve Licensee of its obligations hereunder except to the extent that Licensee is materially prejudiced by such failure. In the event that any third-party Claim is brought, Licensee shall have the right and option to undertake and control the defense of such action with counsel of its choice, provided however that (i) Licensor at its own expense may participate and appear on an equal footing with Licensee in the defense of any such Claim, (ii) Licensor may undertake and control such defense in the event of the material failure of Licensee to undertake and control the same; and (iii) the defense of any Claim relating to the intellectual property rights of Licensor or its licensors and any related counterclaims shall be solely controlled by Licensor with counsel of its choice. Licensee shall not consent to judgment or concede or settle or compromise any Claim without the prior written approval of Licensor (whose approval shall not be unreasonably withheld), unless such concession or settlement or compromise includes a full and unconditional release of Licensor and any applicable Related Persons from all liabilities in respect of such Claim.
## Article 4 Miscellaneous.
4.1. Relationship of the Parties. This Agreement does not create a partnership, franchise, joint venture, agency, fiduciary, or employment relationship between the parties.
4.2. No Third-Party Beneficiaries. Except for the rights of Related Persons under Article 3 (Indemnification), there are no third-party beneficiaries to this Agreement.
4.3. Assignment. Licensees rights hereunder are non-assignable, and may not be sublicensed.
4.4. Equitable Relief. Licensee acknowledges that the remedies available at law for any breach of this Agreement will, by their nature, be inadequate. Accordingly, Licensor may obtain injunctive relief or other equitable relief to restrain a breach or threatened breach of this Agreement or to specifically enforce this Agreement, without proving that any monetary damages have been sustained, and without the requirement of posting of a bond prior to obtaining such equitable relief.
4.5. Governing Law. This Agreement will be interpreted, construed, and enforced in all respects in accordance with the laws of the State of California, without reference to its conflict of law principles.
4.6. Attorneys Fees. If any legal action, arbitration or other proceeding is brought for the enforcement of this Agreement, or because of an alleged dispute, breach, default, or misrepresentation in connection with any of the provisions of this Agreement, the successful or prevailing party shall be entitled to recover its reasonable attorneys fees and other reasonable costs incurred in that action or proceeding, in addition to any other relief to which it may be entitled.
4.7. Amendment; Waiver. This Agreement may not be amended, nor may any rights under it be waived, except in writing by Licensor.
4.8. Severability. If any provision of this Agreement is held by a court of competent jurisdiction to be contrary to law, the provision shall be modified by the court and interpreted so as best to accomplish the objectives of the original provision to the fullest extent
permitted by law, and the remaining provisions of this Agreement shall remain in effect.
4.9. Entire Agreement. This Agreement constitutes the entire agreement between the parties and supersedes all prior and contemporaneous agreements, proposals or representations, written or oral, concerning its subject matter.
# Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
## TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
You must give any other recipients of the Work or Derivative Works a copy of this License; and
You must cause any modified files to carry prominent notices stating that You changed the files; and
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS

View file

@ -0,0 +1,140 @@
# Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Original Author: Shay Gal-on
# Make sure the default target is to simply build and run the benchmark.
RSTAMP = v1.0
.PHONY: run score
run: $(OUTFILE) rerun score
score:
@echo "Check run1.log and run2.log for results."
@echo "See README.md for run and reporting rules."
ifndef PORT_DIR
# Ports for a couple of common self hosted platforms
UNAME=$(shell if command -v uname 2> /dev/null; then uname ; fi)
ifneq (,$(findstring CYGWIN,$(UNAME)))
PORT_DIR=cygwin
endif
ifneq (,$(findstring Darwin,$(UNAME)))
PORT_DIR=macos
endif
ifneq (,$(findstring FreeBSD,$(UNAME)))
PORT_DIR=freebsd
endif
ifneq (,$(findstring Linux,$(UNAME)))
PORT_DIR=linux
endif
endif
ifndef PORT_DIR
$(error PLEASE define PORT_DIR! (e.g. make PORT_DIR=simple))
endif
vpath %.c $(PORT_DIR)
vpath %.h $(PORT_DIR)
vpath %.mak $(PORT_DIR)
include $(PORT_DIR)/core_portme.mak
ifndef ITERATIONS
ITERATIONS=0
endif
ifdef REBUILD
FORCE_REBUILD=force_rebuild
endif
CFLAGS += -DITERATIONS=$(ITERATIONS)
CORE_FILES = core_list_join core_main core_matrix core_state core_util
ORIG_SRCS = $(addsuffix .c,$(CORE_FILES))
SRCS = $(ORIG_SRCS) $(PORT_SRCS)
OBJS = $(addprefix $(OPATH),$(addsuffix $(OEXT),$(CORE_FILES)) $(PORT_OBJS))
OUTNAME = coremark$(EXE)
OUTFILE = $(OPATH)$(OUTNAME)
LOUTCMD = $(OFLAG) $(OUTFILE) $(LFLAGS_END)
OUTCMD = $(OUTFLAG) $(OUTFILE) $(LFLAGS_END)
HEADERS = coremark.h
CHECK_FILES = $(ORIG_SRCS) $(HEADERS)
$(OPATH):
$(MKDIR) $(OPATH)
.PHONY: compile link
ifdef SEPARATE_COMPILE
$(OPATH)$(PORT_DIR):
$(MKDIR) $(OPATH)$(PORT_DIR)
compile: $(OPATH) $(OPATH)$(PORT_DIR) $(OBJS) $(HEADERS)
link: compile
$(LD) $(LFLAGS) $(XLFLAGS) $(OBJS) $(LOUTCMD)
else
compile: $(OPATH) $(SRCS) $(HEADERS)
$(CC) $(CFLAGS) $(XCFLAGS) $(SRCS) $(OUTCMD)
link: compile
@echo "Link performed along with compile"
endif
$(OUTFILE): $(SRCS) $(HEADERS) Makefile core_portme.mak $(EXTRA_DEPENDS) $(FORCE_REBUILD)
$(MAKE) port_prebuild
$(MAKE) link
$(MAKE) port_postbuild
.PHONY: rerun
rerun:
$(MAKE) XCFLAGS="$(XCFLAGS) -DPERFORMANCE_RUN=1" load run1.log
$(MAKE) XCFLAGS="$(XCFLAGS) -DVALIDATION_RUN=1" load run2.log
PARAM1=$(PORT_PARAMS) 0x0 0x0 0x66 $(ITERATIONS)
PARAM2=$(PORT_PARAMS) 0x3415 0x3415 0x66 $(ITERATIONS)
PARAM3=$(PORT_PARAMS) 8 8 8 $(ITERATIONS)
run1.log-PARAM=$(PARAM1) 7 1 2000
run2.log-PARAM=$(PARAM2) 7 1 2000
run3.log-PARAM=$(PARAM3) 7 1 1200
run1.log run2.log run3.log: load
$(MAKE) port_prerun
$(RUN) $(OUTFILE) $($(@)-PARAM) > $(OPATH)$@
$(MAKE) port_postrun
.PHONY: gen_pgo_data
gen_pgo_data: run3.log
.PHONY: load
load: $(OUTFILE)
$(MAKE) port_preload
$(LOAD) $(OUTFILE)
$(MAKE) port_postload
.PHONY: clean
clean:
rm -f $(OUTFILE) $(OBJS) $(OPATH)*.log *.info $(OPATH)index.html $(PORT_CLEAN)
.PHONY: force_rebuild
force_rebuild:
echo "Forcing Rebuild"
.PHONY: check
check:
md5sum -c coremark.md5
ifdef ETC
# Targets related to testing and releasing CoreMark. Not part of the general release!
include Makefile.internal
endif

View file

@ -0,0 +1,404 @@
# Introduction
CoreMark's primary goals are simplicity and providing a method for testing only a processor's core features. For more information about EEMBC's comprehensive embedded benchmark suites, please see www.eembc.org.
For a more compute-intensive version of CoreMark that uses larger datasets and execution loops taken from common applications, please check out EEMBC's [CoreMark-PRO](https://www.github.com/eembc/coremark-pro) benchmark, also on GitHub.
# Building and Running
In a typical Linux system, to build and run the benchmark, type
`> make`
Full results are available in the files `run1.log` and `run2.log`. CoreMark result can be found in `run1.log`.
For information on using CoreMark with microcontrollers or embedded processor systems without an OS, please see [barebones_porting.md](./barebones_porting.md).
## Cross Compiling
For cross compile platforms please adjust `core_portme.mak`, `core_portme.h` (and possibly `core_portme.c`) according to the specific platform used. When porting to a new platform, it is recommended to copy one of the default port folders (e.g. `mkdir <platform> && cp linux/* <platform>`), adjust the porting files, and run:
~~~
% make PORT_DIR=<platform>
~~~
## Make Targets
* `run` - Default target, creates `run1.log` and `run2.log`.
* `run1.log` - Run the benchmark with performance parameters, and output to `run1.log`
* `run2.log` - Run the benchmark with validation parameters, and output to `run2.log`
* `run3.log` - Run the benchmark with profile generation parameters, and output to `run3.log`
* `compile` - compile the benchmark executable
* `link` - link the benchmark executable
* `check` - test MD5 of sources that may not be modified
* `clean` - clean temporary files
### Make flag: `ITERATIONS`
By default, the benchmark will run between 10-100 seconds. To override, use `ITERATIONS=N`
~~~
% make ITERATIONS=10
~~~
Will run the benchmark for 10 iterations. It is recommended to set a specific number of iterations in certain situations e.g.:
* Running with a simulator
* Measuring power/energy
* Timing cannot be restarted
Minimum required run time: **Results are only valid for reporting if the benchmark ran for at least 10 secs!**
### Make flag: `XCFLAGS`
To add compiler flags from the command line, use `XCFLAGS` e.g.:
~~~
% make XCFLAGS="-DMULTITHREAD=4 -DUSE_FORK"
~~~
### Make flag: `CORE_DEBUG`
Define to compile for a debug run if you get incorrect CRC.
~~~
% make XCFLAGS="-DCORE_DEBUG=1"
~~~
### Make flag: `REBUILD`
Force a rebuild of the executable.
## Systems Without `make`
The following files need to be compiled:
* `core_list_join.c`
* `core_main.c`
* `core_matrix.c`
* `core_state.c`
* `core_util.c`
* `PORT_DIR/core_portme.c`
For example:
~~~
% gcc -O2 -o coremark.exe core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000
% ./coremark.exe > run1.log
~~~
The above will compile the benchmark for a performance run and 1000 iterations. Output is redirected to `run1.log`.
# Parallel Execution
Use `XCFLAGS=-DMULTITHREAD=N` where N is number of threads to run in parallel. Several implementations are available to execute in multiple contexts, or you can implement your own in `core_portme.c`.
~~~
% make XCFLAGS="-DMULTITHREAD=4 -DUSE_PTHREAD -pthread"
~~~
The above will compile the benchmark for execution on 4 cores, using POSIX Threads API. Forking is also supported:
~~~
% make XCFLAGS="-DMULTITHREAD=4 -DUSE_FORK"
~~~
Note: linking may fail on the previous command if your linker does not automatically add the `pthread` library. If you encounter `undefined reference` errors, please modify the `core_portme.mak` file for your platform, (e.g. `linux/core_portme.mak`) and add `-pthread` to the `LFLAGS_END` parameter.
# Run Parameters for the Benchmark Executable
CoreMark's executable takes several parameters as follows (but only if `main()` accepts arguments):
1st - A seed value used for initialization of data.
2nd - A seed value used for initialization of data.
3rd - A seed value used for initialization of data.
4th - Number of iterations (0 for auto : default value)
5th - Reserved for internal use.
6th - Reserved for internal use.
7th - For malloc users only, ovreride the size of the input data buffer.
The run target from make will run coremark with 2 different data initialization seeds.
## Alternative parameters:
If not using `malloc` or command line arguments are not supported, the buffer size
for the algorithms must be defined via the compiler define `TOTAL_DATA_SIZE`.
`TOTAL_DATA_SIZE` must be set to 2000 bytes (default) for standard runs.
The default for such a target when testing different configurations could be:
~~~
% make XCFLAGS="-DTOTAL_DATA_SIZE=6000 -DMAIN_HAS_NOARGC=1"
~~~
# Submitting Results
CoreMark results can be submitted on the web. Open a web browser and go to the [submission page](https://www.eembc.org/coremark/submit.php). After registering an account you may enter a score.
# Run Rules
What is and is not allowed.
## Required
1. The benchmark needs to run for at least 10 seconds.
2. All validation must succeed for seeds `0,0,0x66` and `0x3415,0x3415,0x66`, buffer size of 2000 bytes total.
* If not using command line arguments to main:
~~~
% make XCFLAGS="-DPERFORMANCE_RUN=1" REBUILD=1 run1.log
% make XCFLAGS="-DVALIDATION_RUN=1" REBUILD=1 run2.log
~~~
3. If using profile guided optimization, profile must be generated using seeds of `8,8,8`, and buffer size of 1200 bytes total.
~~~
% make XCFLAGS="-DTOTAL_DATA_SIZE=1200 -DPROFILE_RUN=1" REBUILD=1 run3.log
~~~
4. All source files must be compiled with the same flags.
5. All data type sizes must match size in bits such that:
* `ee_u8` is an unsigned 8-bit datatype.
* `ee_s16` is a signed 16-bit datatype.
* `ee_u16` is an unsigned 16-bit datatype.
* `ee_s32` is a signed 32-bit datatype.
* `ee_u32` is an unsigned 32-bit datatype.
## Allowed
1. Changing number of iterations
2. Changing toolchain and build/load/run options
3. Changing method of acquiring a data memory block
5. Changing the method of acquiring seed values
6. Changing implementation `in core_portme.c`
7. Changing configuration values in `core_portme.h`
8. Changing `core_portme.mak`
## NOT ALLOWED
1. Changing of source file other then `core_portme*` (use `make check` to validate)
# Reporting rules
Use the following syntax to report results on a data sheet:
CoreMark 1.0 : N / C [/ P] [/ M]
N - Number of iterations per second with seeds 0,0,0x66,size=2000)
C - Compiler version and flags
P - Parameters such as data and code allocation specifics
* This parameter *may* be omitted if all data was allocated on the heap in RAM.
* This parameter *may not* be omitted when reporting CoreMark/MHz
M - Type of parallel execution (if used) and number of contexts
* This parameter may be omitted if parallel execution was not used.
e.g.:
~~~
CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2
~~~
or
~~~
CoreMark 1.0 : 1400 / GCC 3.4 -O4
~~~
If reporting scaling results, the results must be reported as follows:
CoreMark/MHz 1.0 : N / C / P [/ M]
P - When reporting scaling results, memory parameter must also indicate memory frequency:core frequency ratio.
1. If the core has cache and cache frequency to core frequency ratio is configurable, that must also be included.
e.g.:
~~~
CoreMark/MHz 1.0 : 1.47 / GCC 4.1.2 -O2 / DDR3(Heap) 30:1 Memory 1:1 Cache
~~~
# Log File Format
The log files have the following format
~~~
2K performance run parameters for coremark. (Run type)
CoreMark Size : 666 (Buffer size)
Total ticks : 25875 (platform dependent value)
Total time (secs) : 25.875000 (actual time in seconds)
Iterations/Sec : 3864.734300 (Performance value to report)
Iterations : 100000 (number of iterations used)
Compiler version : GCC3.4.4 (Compiler and version)
Compiler flags : -O2 (Compiler and linker flags)
Memory location : Code in flash, data in on chip RAM
seedcrc : 0xe9f5 (identifier for the input seeds)
[0]crclist : 0xe714 (validation for list part)
[0]crcmatrix : 0x1fd7 (validation for matrix part)
[0]crcstate : 0x8e3a (validation for state part)
[0]crcfinal : 0x33ff (iteration dependent output)
Correct operation validated. See README.md for run and reporting rules. (*Only when run is successful*)
CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap (*Only on a successful performance run*)
~~~
# Theory of Operation
This section describes the initial goals of CoreMark and their implementation.
## Small and easy to understand
* X number of source code lines for timed portion of the benchmark.
* Meaningful names for variables and functions.
* Comments for each block of code more than 10 lines long.
## Portability
A thin abstraction layer will be provided for I/O and timing in a separate file. All I/O and timing of the benchmark will be done through this layer.
### Code / data size
* Compile with gcc on x86 and make sure all sizes are according to requirements.
* If dynamic memory allocation is used, take total memory allocated into account as well.
* Avoid recursive functions and keep track of stack usage.
* Use the same memory block as data site for all algorithms, and initialize the data before each algorithm while this means that initialization with data happens during the timed portion, it will only happen once during the timed portion and so have negligible effect on the results.
## Controlled output
This may be the most difficult goal. Compilers are constantly improving and getting better at analyzing code. To create work that cannot be computed at compile time and must be computed at run time, we will rely on two assumptions:
* Some system functions (e.g. time, scanf) and parameters cannot be computed at compile time. In most cases, marking a variable volatile means the compiler is force to read this variable every time it is read. This will be used to introduce a factor into the input that cannot be precomputed at compile time. Since the results are input dependent, that will make sure that computation has to happen at run time.
* Either a system function or I/O (e.g. scanf) or command line parameters or volatile variables will be used before the timed portion to generate data which is not available at compile time. Specific method used is not relevant as long as it can be controlled, and that it cannot be computed or eliminated by the compiler at compile time. E.g. if the clock() functions is a compiler stub, it may not be used. The derived values will be reported on the output so that verification can be done on a different machine.
* We cannot rely on command line parameters since some embedded systems do not have the capability to provide command line parameters. All 3 methods above will be implemented (time based, scanf and command line parameters) and all 3 are valid if the compiler cannot determine the value at compile time.
* It is important to note that The actual values that are to be supplied at run time will be standardized. The methodology is not intended to provide random data, but simply to provide controlled data that cannot be precomputed at compile time.
* Printed results must be valid at run time. This will be used to make sure the computation has been executed.
* Some embedded systems do not provide “printf” or other I/O functionality. All I/O will be done through a thin abstraction interface to allow execution on such systems (e.g. allow output via JTAG).
## Key Algorithms
### Linked List
The following linked list structure will be used:
~~~
typedef struct list_data_s {
ee_s16 data16;
ee_s16 idx;
} list_data;
typedef struct list_head_s {
struct list_head_s *next;
struct list_data_s *info;
} list_head;
~~~
While adding a level of indirection accessing the data, this structure is realistic and used in many embedded applications for small to medium lists.
The list itself will be initialized on a block of memory that will be passed in to the initialization function. While in general linked lists use malloc for new nodes, embedded applications sometime control the memory for small data structures such as arrays and lists directly to avoid the overhead of system calls, so this approach is realistic.
The linked list will be initialized such that 1/4 of the list pointers point to sequential areas in memory, and 3/4 of the list pointers are distributed in a non sequential manner. This is done to emulate a linked list that had add/remove happen for a while disrupting the neat order, and then a series of adds that are likely to come from sequential memory locations.
For the benchmark itself:
- Multiple find operations are going to be performed. These find operations may result in the whole list being traversed. The result of each find will become part of the output chain.
- The list will be sorted using merge sort based on the data16 value, and then derive CRC of the data16 item in order for part of the list. The CRC will become part of the output chain.
- The list will be sorted again using merge sort based on the idx value. This sort will guarantee that the list is returned to the primary state before leaving the function, so that multiple iterations of the function will have the same result. CRC of the data16 for part of the list will again be calculated and become part of the output chain.
The actual `data16` in each cell will be pseudo random based on a single 16b input that cannot be determined at compile time. In addition, the part of the list which is used for CRC will also be passed to the function, and determined based on an input that cannot be determined at run time.
### Matrix Multiply
This very simple algorithm forms the basis of many more complex algorithms. The tight inner loop is the focus of many optimizations (compiler as well as hardware based) and is thus relevant for embedded processing.
The total available data space will be divided to 3 parts:
1. NxN matrix A.
2. NxN matrix B.
3. NxN matrix C.
E.g. for 2K we will have 3 12x12 matrices (assuming data type of 32b 12(len)*12(wid)*4(size)*3(num) =1728 bytes).
Matrix A will be initialized with small values (upper 3/4 of the bits all zero).
Matrix B will be initialized with medium values (upper half of the bits all zero).
Matrix C will be used for the result.
For the benchmark itself:
- Multiple A by a constant into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.
- Multiple A by column X of B into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.
- Multiple A by B into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.
The actual values for A and B must be derived based on input that is not available at compile time.
### State Machine
This part of the code needs to exercise switch and if statements. As such, we will use a small Moore state machine. In particular, this will be a state machine that identifies string input as numbers and divides them according to format.
The state machine will parse the input string until either a “,” separator or end of input is encountered. An invalid number will cause the state machine to return invalid state and a valid number will cause the state machine to return with type of number format (int/float/scientific).
This code will perform a realistic task, be small enough to easily understand, and exercise the required functionality. The other option used in embedded systems is a mealy based state machine, which is driven by a table. The table then determines the number of states and complexity of transitions. This approach, however, tests mainly the load/store and function call mechanisms and less the handling of branches. If analysis of the final results shows that the load/store functionality of the processor is not exercised thoroughly, it may be a good addition to the benchmark (codesize allowing).
For input, the memory block will be initialized with comma separated values of mixed formats, as well as invalid inputs.
For the benchmark itself:
- Invoke the state machine on all of the input and count final states and state transitions. CRC of all final states and transitions will become part of the output chain.
- Modify the input at intervals (inject errors) and repeat the state machine operation.
- Modify the input back to original form.
The actual input must be initialized based on data that cannot be determined at compile time. In addition the intervals for modification of the input and the actual modification must be based on input that cannot be determined at compile time.
# Validation
This release was tested on the following platforms:
* x86 cygwin and gcc 3.4 (Quad, dual and single core systems)
* x86 linux (Ubuntu/Fedora) and gcc (4.2/4.1) (Quad and single core systems)
* MIPS64 BE linux and gcc 3.4 16 cores system
* MIPS32 BE linux with CodeSourcery compiler 4.2-177 on Malta/Linux with a 1004K 3-core system
* PPC simulator with gcc 4.2.2 (No OS)
* PPC 64b BE linux (yellowdog) with gcc 3.4 and 4.1 (Dual core system)
* BF533 with VDSP50
* Renesas R8C/H8 MCU with HEW 4.05
* NXP LPC1700 armcc v4.0.0.524
* NEC 78K with IAR v4.61
* ARM simulator with armcc v4
# Memory Analysis
Valgrind 3.4.0 used and no errors reported.
# Balance Analysis
Number of instructions executed for each function tested with cachegrind and found balanced with gcc and -O0.
# Statistics
Lines:
~~~
Lines Blank Cmnts Source AESL
===== ===== ===== ===== ========== =======================================
469 66 170 251 627.5 core_list_join.c (C)
330 18 54 268 670.0 core_main.c (C)
256 32 80 146 365.0 core_matrix.c (C)
240 16 51 186 465.0 core_state.c (C)
165 11 20 134 335.0 core_util.c (C)
150 23 36 98 245.0 coremark.h (C)
1610 166 411 1083 2707.5 ----- Benchmark ----- (6 files)
293 15 74 212 530.0 linux/core_portme.c (C)
235 30 104 104 260.0 linux/core_portme.h (C)
528 45 178 316 790.0 ----- Porting ----- (2 files)
* For comparison, here are the stats for Dhrystone
Lines Blank Cmnts Source AESL
===== ===== ===== ===== ========== =======================================
311 15 242 54 135.0 dhry.h (C)
789 132 119 553 1382.5 dhry_1.c (C)
186 26 68 107 267.5 dhry_2.c (C)
1286 173 429 714 1785.0 ----- C ----- (3 files)
~~~
# Credits
Many thanks to all of the individuals who helped with the development or testing of CoreMark including (Sorted by company name; note that company names may no longer be accurate as this was written in 2009).
* Alan Anderson, ADI
* Adhikary Rajiv, ADI
* Elena Stohr, ARM
* Ian Rickards, ARM
* Andrew Pickard, ARM
* Trent Parker, CAVIUM
* Shay Gal-On, EEMBC
* Markus Levy, EEMBC
* Peter Torelli, EEMBC
* Ron Olson, IBM
* Eyal Barzilay, MIPS
* Jens Eltze, NEC
* Hirohiko Ono, NEC
* Ulrich Drees, NEC
* Frank Roscheda, NEC
* Rob Cosaro, NXP
* Shumpei Kawasaki, RENESAS
# Legal
Please refer to LICENSE.md in this repository for a description of your rights to use this code.
# Copyright
Copyright © 2009 EEMBC All rights reserved.
CoreMark is a trademark of EEMBC and EEMBC is a registered trademark of the Embedded Microprocessor Benchmark Consortium.

View file

@ -0,0 +1,157 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
#include "coremark.h"
#include "core_portme.h"
#if VALIDATION_RUN
volatile ee_s32 seed1_volatile = 0x3415;
volatile ee_s32 seed2_volatile = 0x3415;
volatile ee_s32 seed3_volatile = 0x66;
#endif
#if PERFORMANCE_RUN
volatile ee_s32 seed1_volatile = 0x0;
volatile ee_s32 seed2_volatile = 0x0;
volatile ee_s32 seed3_volatile = 0x66;
#endif
#if PROFILE_RUN
volatile ee_s32 seed1_volatile = 0x8;
volatile ee_s32 seed2_volatile = 0x8;
volatile ee_s32 seed3_volatile = 0x8;
#endif
volatile ee_s32 seed4_volatile = ITERATIONS;
volatile ee_s32 seed5_volatile = 0;
/* Porting : Timing functions
How to capture time and convert to seconds must be ported to whatever is
supported by the platform. e.g. Read value from on board RTC, read value from
cpu clock cycles performance counter etc. Sample implementation for standard
time.h and windows.h definitions included.
*/
CORETIMETYPE
barebones_clock()
{
#error \
"You must implement a method to measure time in barebones_clock()! This function should return current time.\n"
}
/* Define : TIMER_RES_DIVIDER
Divider to trade off timer resolution and total time that can be
measured.
Use lower values to increase resolution, but make sure that overflow
does not occur. If there are issues with the return value overflowing,
increase this value.
*/
#define GETMYTIME(_t) (*_t = barebones_clock())
#define MYTIMEDIFF(fin, ini) ((fin) - (ini))
#define TIMER_RES_DIVIDER 1
#define SAMPLE_TIME_IMPLEMENTATION 1
#define EE_TICKS_PER_SEC (CLOCKS_PER_SEC / TIMER_RES_DIVIDER)
/** Define Host specific (POSIX), or target specific global time variables. */
static CORETIMETYPE start_time_val, stop_time_val;
/* Function : start_time
This function will be called right before starting the timed portion of
the benchmark.
Implementation may be capturing a system timer (as implemented in the
example code) or zeroing some system parameters - e.g. setting the cpu clocks
cycles to 0.
*/
void
start_time(void)
{
GETMYTIME(&start_time_val);
}
/* Function : stop_time
This function will be called right after ending the timed portion of the
benchmark.
Implementation may be capturing a system timer (as implemented in the
example code) or other system parameters - e.g. reading the current value of
cpu cycles counter.
*/
void
stop_time(void)
{
GETMYTIME(&stop_time_val);
}
/* Function : get_time
Return an abstract "ticks" number that signifies time on the system.
Actual value returned may be cpu cycles, milliseconds or any other
value, as long as it can be converted to seconds by <time_in_secs>. This
methodology is taken to accommodate any hardware or simulated platform. The
sample implementation returns millisecs by default, and the resolution is
controlled by <TIMER_RES_DIVIDER>
*/
CORE_TICKS
get_time(void)
{
CORE_TICKS elapsed
= (CORE_TICKS)(MYTIMEDIFF(stop_time_val, start_time_val));
return elapsed;
}
/* Function : time_in_secs
Convert the value returned by get_time to seconds.
The <secs_ret> type is used to accommodate systems with no support for
floating point. Default implementation implemented by the EE_TICKS_PER_SEC
macro above.
*/
secs_ret
time_in_secs(CORE_TICKS ticks)
{
secs_ret retval = ((secs_ret)ticks) / (secs_ret)EE_TICKS_PER_SEC;
return retval;
}
ee_u32 default_num_contexts = 1;
/* Function : portable_init
Target specific initialization code
Test for some common mistakes.
*/
void
portable_init(core_portable *p, int *argc, char *argv[])
{
#error \
"Call board initialization routines in portable init (if needed), in particular initialize UART!\n"
(void)argc; // prevent unused warning
(void)argv; // prevent unused warning
if (sizeof(ee_ptr_int) != sizeof(ee_u8 *))
{
ee_printf(
"ERROR! Please define ee_ptr_int to a type that holds a "
"pointer!\n");
}
if (sizeof(ee_u32) != 4)
{
ee_printf("ERROR! Please define ee_u32 to a 32b unsigned type!\n");
}
p->portable_id = 1;
}
/* Function : portable_fini
Target specific final code
*/
void
portable_fini(core_portable *p)
{
p->portable_id = 0;
}

View file

@ -0,0 +1,210 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
/* Topic : Description
This file contains configuration constants required to execute on
different platforms
*/
#ifndef CORE_PORTME_H
#define CORE_PORTME_H
/************************/
/* Data types and settings */
/************************/
/* Configuration : HAS_FLOAT
Define to 1 if the platform supports floating point.
*/
#ifndef HAS_FLOAT
#define HAS_FLOAT 1
#endif
/* Configuration : HAS_TIME_H
Define to 1 if platform has the time.h header file,
and implementation of functions thereof.
*/
#ifndef HAS_TIME_H
#define HAS_TIME_H 1
#endif
/* Configuration : USE_CLOCK
Define to 1 if platform has the time.h header file,
and implementation of functions thereof.
*/
#ifndef USE_CLOCK
#define USE_CLOCK 1
#endif
/* Configuration : HAS_STDIO
Define to 1 if the platform has stdio.h.
*/
#ifndef HAS_STDIO
#define HAS_STDIO 0
#endif
/* Configuration : HAS_PRINTF
Define to 1 if the platform has stdio.h and implements the printf
function.
*/
#ifndef HAS_PRINTF
#define HAS_PRINTF 0
#endif
/* Definitions : COMPILER_VERSION, COMPILER_FLAGS, MEM_LOCATION
Initialize these strings per platform
*/
#ifndef COMPILER_VERSION
#ifdef __GNUC__
#define COMPILER_VERSION "GCC"__VERSION__
#else
#define COMPILER_VERSION "Please put compiler version here (e.g. gcc 4.1)"
#endif
#endif
#ifndef COMPILER_FLAGS
#define COMPILER_FLAGS \
FLAGS_STR /* "Please put compiler flags here (e.g. -o3)" */
#endif
#ifndef MEM_LOCATION
#define MEM_LOCATION "STACK"
#endif
/* Data Types :
To avoid compiler issues, define the data types that need ot be used for
8b, 16b and 32b in <core_portme.h>.
*Imprtant* :
ee_ptr_int needs to be the data type used to hold pointers, otherwise
coremark may fail!!!
*/
typedef signed short ee_s16;
typedef unsigned short ee_u16;
typedef signed int ee_s32;
typedef double ee_f32;
typedef unsigned char ee_u8;
typedef unsigned int ee_u32;
typedef ee_u32 ee_ptr_int;
typedef size_t ee_size_t;
#define NULL ((void *)0)
/* align_mem :
This macro is used to align an offset to point to a 32b value. It is
used in the Matrix algorithm to initialize the input memory blocks.
*/
#define align_mem(x) (void *)(4 + (((ee_ptr_int)(x)-1) & ~3))
/* Configuration : CORE_TICKS
Define type of return from the timing functions.
*/
#define CORETIMETYPE ee_u32
typedef ee_u32 CORE_TICKS;
/* Configuration : SEED_METHOD
Defines method to get seed values that cannot be computed at compile
time.
Valid values :
SEED_ARG - from command line.
SEED_FUNC - from a system function.
SEED_VOLATILE - from volatile variables.
*/
#ifndef SEED_METHOD
#define SEED_METHOD SEED_VOLATILE
#endif
/* Configuration : MEM_METHOD
Defines method to get a block of memry.
Valid values :
MEM_MALLOC - for platforms that implement malloc and have malloc.h.
MEM_STATIC - to use a static memory array.
MEM_STACK - to allocate the data block on the stack (NYI).
*/
#ifndef MEM_METHOD
#define MEM_METHOD MEM_STACK
#endif
/* Configuration : MULTITHREAD
Define for parallel execution
Valid values :
1 - only one context (default).
N>1 - will execute N copies in parallel.
Note :
If this flag is defined to more then 1, an implementation for launching
parallel contexts must be defined.
Two sample implementations are provided. Use <USE_PTHREAD> or <USE_FORK>
to enable them.
It is valid to have a different implementation of <core_start_parallel>
and <core_end_parallel> in <core_portme.c>, to fit a particular architecture.
*/
#ifndef MULTITHREAD
#define MULTITHREAD 1
#define USE_PTHREAD 0
#define USE_FORK 0
#define USE_SOCKET 0
#endif
/* Configuration : MAIN_HAS_NOARGC
Needed if platform does not support getting arguments to main.
Valid values :
0 - argc/argv to main is supported
1 - argc/argv to main is not supported
Note :
This flag only matters if MULTITHREAD has been defined to a value
greater then 1.
*/
#ifndef MAIN_HAS_NOARGC
#define MAIN_HAS_NOARGC 0
#endif
/* Configuration : MAIN_HAS_NORETURN
Needed if platform does not support returning a value from main.
Valid values :
0 - main returns an int, and return value will be 0.
1 - platform does not support returning a value from main
*/
#ifndef MAIN_HAS_NORETURN
#define MAIN_HAS_NORETURN 0
#endif
/* Variable : default_num_contexts
Not used for this simple port, must contain the value 1.
*/
extern ee_u32 default_num_contexts;
typedef struct CORE_PORTABLE_S
{
ee_u8 portable_id;
} core_portable;
/* target specific init/fini */
void portable_init(core_portable *p, int *argc, char *argv[]);
void portable_fini(core_portable *p);
#if !defined(PROFILE_RUN) && !defined(PERFORMANCE_RUN) \
&& !defined(VALIDATION_RUN)
#if (TOTAL_DATA_SIZE == 1200)
#define PROFILE_RUN 1
#elif (TOTAL_DATA_SIZE == 2000)
#define PERFORMANCE_RUN 1
#else
#define VALIDATION_RUN 1
#endif
#endif
int ee_printf(const char *fmt, ...);
#endif /* CORE_PORTME_H */

View file

@ -0,0 +1,87 @@
# Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Original Author: Shay Gal-on
#File : core_portme.mak
# Flag : OUTFLAG
# Use this flag to define how to to get an executable (e.g -o)
OUTFLAG= -o
# Flag : CC
# Use this flag to define compiler to use
CC = gcc
# Flag : LD
# Use this flag to define compiler to use
LD = gld
# Flag : AS
# Use this flag to define compiler to use
AS = gas
# Flag : CFLAGS
# Use this flag to define compiler options. Note, you can add compiler options from the command line using XCFLAGS="other flags"
PORT_CFLAGS = -O0 -g
FLAGS_STR = "$(PORT_CFLAGS) $(XCFLAGS) $(XLFLAGS) $(LFLAGS_END)"
CFLAGS = $(PORT_CFLAGS) -I$(PORT_DIR) -I. -DFLAGS_STR=\"$(FLAGS_STR)\"
#Flag : LFLAGS_END
# Define any libraries needed for linking or other flags that should come at the end of the link line (e.g. linker scripts).
# Note : On certain platforms, the default clock_gettime implementation is supported but requires linking of librt.
SEPARATE_COMPILE=1
# Flag : SEPARATE_COMPILE
# You must also define below how to create an object file, and how to link.
OBJOUT = -o
LFLAGS =
ASFLAGS =
OFLAG = -o
COUT = -c
LFLAGS_END =
# Flag : PORT_SRCS
# Port specific source files can be added here
# You may also need cvt.c if the fcvt functions are not provided as intrinsics by your compiler!
PORT_SRCS = $(PORT_DIR)/core_portme.c $(PORT_DIR)/ee_printf.c
vpath %.c $(PORT_DIR)
vpath %.s $(PORT_DIR)
# Flag : LOAD
# For a simple port, we assume self hosted compile and run, no load needed.
# Flag : RUN
# For a simple port, we assume self hosted compile and run, simple invocation of the executable
LOAD = echo "Please set LOAD to the process of loading the executable to the flash"
RUN = echo "Please set LOAD to the process of running the executable (e.g. via jtag, or board reset)"
OEXT = .o
EXE = .bin
$(OPATH)$(PORT_DIR)/%$(OEXT) : %.c
$(CC) $(CFLAGS) $(XCFLAGS) $(COUT) $< $(OBJOUT) $@
$(OPATH)%$(OEXT) : %.c
$(CC) $(CFLAGS) $(XCFLAGS) $(COUT) $< $(OBJOUT) $@
$(OPATH)$(PORT_DIR)/%$(OEXT) : %.s
$(AS) $(ASFLAGS) $< $(OBJOUT) $@
# Target : port_pre% and port_post%
# For the purpose of this simple port, no pre or post steps needed.
.PHONY : port_prebuild port_postbuild port_prerun port_postrun port_preload port_postload
port_pre% port_post% :
# FLAG : OPATH
# Path to the output folder. Default - current folder.
OPATH = ./
MKDIR = mkdir -p

View file

@ -0,0 +1,127 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include <math.h>
#define CVTBUFSIZE 80
static char CVTBUF[CVTBUFSIZE];
static char *
cvt(double arg, int ndigits, int *decpt, int *sign, char *buf, int eflag)
{
int r2;
double fi, fj;
char * p, *p1;
if (ndigits < 0)
ndigits = 0;
if (ndigits >= CVTBUFSIZE - 1)
ndigits = CVTBUFSIZE - 2;
r2 = 0;
*sign = 0;
p = &buf[0];
if (arg < 0)
{
*sign = 1;
arg = -arg;
}
arg = modf(arg, &fi);
p1 = &buf[CVTBUFSIZE];
if (fi != 0)
{
p1 = &buf[CVTBUFSIZE];
while (fi != 0)
{
fj = modf(fi / 10, &fi);
*--p1 = (int)((fj + .03) * 10) + '0';
r2++;
}
while (p1 < &buf[CVTBUFSIZE])
*p++ = *p1++;
}
else if (arg > 0)
{
while ((fj = arg * 10) < 1)
{
arg = fj;
r2--;
}
}
p1 = &buf[ndigits];
if (eflag == 0)
p1 += r2;
*decpt = r2;
if (p1 < &buf[0])
{
buf[0] = '\0';
return buf;
}
while (p <= p1 && p < &buf[CVTBUFSIZE])
{
arg *= 10;
arg = modf(arg, &fj);
*p++ = (int)fj + '0';
}
if (p1 >= &buf[CVTBUFSIZE])
{
buf[CVTBUFSIZE - 1] = '\0';
return buf;
}
p = p1;
*p1 += 5;
while (*p1 > '9')
{
*p1 = '0';
if (p1 > buf)
++*--p1;
else
{
*p1 = '1';
(*decpt)++;
if (eflag == 0)
{
if (p > buf)
*p = '0';
p++;
}
}
}
*p = '\0';
return buf;
}
char *
ecvt(double arg, int ndigits, int *decpt, int *sign)
{
return cvt(arg, ndigits, decpt, sign, CVTBUF, 1);
}
char *
ecvtbuf(double arg, int ndigits, int *decpt, int *sign, char *buf)
{
return cvt(arg, ndigits, decpt, sign, buf, 1);
}
char *
fcvt(double arg, int ndigits, int *decpt, int *sign)
{
return cvt(arg, ndigits, decpt, sign, CVTBUF, 0);
}
char *
fcvtbuf(double arg, int ndigits, int *decpt, int *sign, char *buf)
{
return cvt(arg, ndigits, decpt, sign, buf, 0);
}

View file

@ -0,0 +1,700 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include <coremark.h>
#include <stdarg.h>
#define ZEROPAD (1 << 0) /* Pad with zero */
#define SIGN (1 << 1) /* Unsigned/signed long */
#define PLUS (1 << 2) /* Show plus */
#define SPACE (1 << 3) /* Spacer */
#define LEFT (1 << 4) /* Left justified */
#define HEX_PREP (1 << 5) /* 0x */
#define UPPERCASE (1 << 6) /* 'ABCDEF' */
#define is_digit(c) ((c) >= '0' && (c) <= '9')
static char * digits = "0123456789abcdefghijklmnopqrstuvwxyz";
static char * upper_digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static ee_size_t strnlen(const char *s, ee_size_t count);
static ee_size_t
strnlen(const char *s, ee_size_t count)
{
const char *sc;
for (sc = s; *sc != '\0' && count--; ++sc)
;
return sc - s;
}
static int
skip_atoi(const char **s)
{
int i = 0;
while (is_digit(**s))
i = i * 10 + *((*s)++) - '0';
return i;
}
static char *
number(char *str, long num, int base, int size, int precision, int type)
{
char c, sign, tmp[66];
char *dig = digits;
int i;
if (type & UPPERCASE)
dig = upper_digits;
if (type & LEFT)
type &= ~ZEROPAD;
if (base < 2 || base > 36)
return 0;
c = (type & ZEROPAD) ? '0' : ' ';
sign = 0;
if (type & SIGN)
{
if (num < 0)
{
sign = '-';
num = -num;
size--;
}
else if (type & PLUS)
{
sign = '+';
size--;
}
else if (type & SPACE)
{
sign = ' ';
size--;
}
}
if (type & HEX_PREP)
{
if (base == 16)
size -= 2;
else if (base == 8)
size--;
}
i = 0;
if (num == 0)
tmp[i++] = '0';
else
{
while (num != 0)
{
tmp[i++] = dig[((unsigned long)num) % (unsigned)base];
num = ((unsigned long)num) / (unsigned)base;
}
}
if (i > precision)
precision = i;
size -= precision;
if (!(type & (ZEROPAD | LEFT)))
while (size-- > 0)
*str++ = ' ';
if (sign)
*str++ = sign;
if (type & HEX_PREP)
{
if (base == 8)
*str++ = '0';
else if (base == 16)
{
*str++ = '0';
*str++ = digits[33];
}
}
if (!(type & LEFT))
while (size-- > 0)
*str++ = c;
while (i < precision--)
*str++ = '0';
while (i-- > 0)
*str++ = tmp[i];
while (size-- > 0)
*str++ = ' ';
return str;
}
static char *
eaddr(char *str, unsigned char *addr, int size, int precision, int type)
{
char tmp[24];
char *dig = digits;
int i, len;
if (type & UPPERCASE)
dig = upper_digits;
len = 0;
for (i = 0; i < 6; i++)
{
if (i != 0)
tmp[len++] = ':';
tmp[len++] = dig[addr[i] >> 4];
tmp[len++] = dig[addr[i] & 0x0F];
}
if (!(type & LEFT))
while (len < size--)
*str++ = ' ';
for (i = 0; i < len; ++i)
*str++ = tmp[i];
while (len < size--)
*str++ = ' ';
return str;
}
static char *
iaddr(char *str, unsigned char *addr, int size, int precision, int type)
{
char tmp[24];
int i, n, len;
len = 0;
for (i = 0; i < 4; i++)
{
if (i != 0)
tmp[len++] = '.';
n = addr[i];
if (n == 0)
tmp[len++] = digits[0];
else
{
if (n >= 100)
{
tmp[len++] = digits[n / 100];
n = n % 100;
tmp[len++] = digits[n / 10];
n = n % 10;
}
else if (n >= 10)
{
tmp[len++] = digits[n / 10];
n = n % 10;
}
tmp[len++] = digits[n];
}
}
if (!(type & LEFT))
while (len < size--)
*str++ = ' ';
for (i = 0; i < len; ++i)
*str++ = tmp[i];
while (len < size--)
*str++ = ' ';
return str;
}
#if HAS_FLOAT
char * ecvtbuf(double arg, int ndigits, int *decpt, int *sign, char *buf);
char * fcvtbuf(double arg, int ndigits, int *decpt, int *sign, char *buf);
static void ee_bufcpy(char *d, char *s, int count);
void
ee_bufcpy(char *pd, char *ps, int count)
{
char *pe = ps + count;
while (ps != pe)
*pd++ = *ps++;
}
static void
parse_float(double value, char *buffer, char fmt, int precision)
{
int decpt, sign, exp, pos;
char *digits = NULL;
char cvtbuf[80];
int capexp = 0;
int magnitude;
if (fmt == 'G' || fmt == 'E')
{
capexp = 1;
fmt += 'a' - 'A';
}
if (fmt == 'g')
{
digits = ecvtbuf(value, precision, &decpt, &sign, cvtbuf);
magnitude = decpt - 1;
if (magnitude < -4 || magnitude > precision - 1)
{
fmt = 'e';
precision -= 1;
}
else
{
fmt = 'f';
precision -= decpt;
}
}
if (fmt == 'e')
{
digits = ecvtbuf(value, precision + 1, &decpt, &sign, cvtbuf);
if (sign)
*buffer++ = '-';
*buffer++ = *digits;
if (precision > 0)
*buffer++ = '.';
ee_bufcpy(buffer, digits + 1, precision);
buffer += precision;
*buffer++ = capexp ? 'E' : 'e';
if (decpt == 0)
{
if (value == 0.0)
exp = 0;
else
exp = -1;
}
else
exp = decpt - 1;
if (exp < 0)
{
*buffer++ = '-';
exp = -exp;
}
else
*buffer++ = '+';
buffer[2] = (exp % 10) + '0';
exp = exp / 10;
buffer[1] = (exp % 10) + '0';
exp = exp / 10;
buffer[0] = (exp % 10) + '0';
buffer += 3;
}
else if (fmt == 'f')
{
digits = fcvtbuf(value, precision, &decpt, &sign, cvtbuf);
if (sign)
*buffer++ = '-';
if (*digits)
{
if (decpt <= 0)
{
*buffer++ = '0';
*buffer++ = '.';
for (pos = 0; pos < -decpt; pos++)
*buffer++ = '0';
while (*digits)
*buffer++ = *digits++;
}
else
{
pos = 0;
while (*digits)
{
if (pos++ == decpt)
*buffer++ = '.';
*buffer++ = *digits++;
}
}
}
else
{
*buffer++ = '0';
if (precision > 0)
{
*buffer++ = '.';
for (pos = 0; pos < precision; pos++)
*buffer++ = '0';
}
}
}
*buffer = '\0';
}
static void
decimal_point(char *buffer)
{
while (*buffer)
{
if (*buffer == '.')
return;
if (*buffer == 'e' || *buffer == 'E')
break;
buffer++;
}
if (*buffer)
{
int n = strnlen(buffer, 256);
while (n > 0)
{
buffer[n + 1] = buffer[n];
n--;
}
*buffer = '.';
}
else
{
*buffer++ = '.';
*buffer = '\0';
}
}
static void
cropzeros(char *buffer)
{
char *stop;
while (*buffer && *buffer != '.')
buffer++;
if (*buffer++)
{
while (*buffer && *buffer != 'e' && *buffer != 'E')
buffer++;
stop = buffer--;
while (*buffer == '0')
buffer--;
if (*buffer == '.')
buffer--;
while (buffer != stop)
*++buffer = 0;
}
}
static char *
flt(char *str, double num, int size, int precision, char fmt, int flags)
{
char tmp[80];
char c, sign;
int n, i;
// Left align means no zero padding
if (flags & LEFT)
flags &= ~ZEROPAD;
// Determine padding and sign char
c = (flags & ZEROPAD) ? '0' : ' ';
sign = 0;
if (flags & SIGN)
{
if (num < 0.0)
{
sign = '-';
num = -num;
size--;
}
else if (flags & PLUS)
{
sign = '+';
size--;
}
else if (flags & SPACE)
{
sign = ' ';
size--;
}
}
// Compute the precision value
if (precision < 0)
precision = 6; // Default precision: 6
// Convert floating point number to text
parse_float(num, tmp, fmt, precision);
if ((flags & HEX_PREP) && precision == 0)
decimal_point(tmp);
if (fmt == 'g' && !(flags & HEX_PREP))
cropzeros(tmp);
n = strnlen(tmp, 256);
// Output number with alignment and padding
size -= n;
if (!(flags & (ZEROPAD | LEFT)))
while (size-- > 0)
*str++ = ' ';
if (sign)
*str++ = sign;
if (!(flags & LEFT))
while (size-- > 0)
*str++ = c;
for (i = 0; i < n; i++)
*str++ = tmp[i];
while (size-- > 0)
*str++ = ' ';
return str;
}
#endif
static int
ee_vsprintf(char *buf, const char *fmt, va_list args)
{
int len;
unsigned long num;
int i, base;
char * str;
char * s;
int flags; // Flags to number()
int field_width; // Width of output field
int precision; // Min. # of digits for integers; max number of chars for
// from string
int qualifier; // 'h', 'l', or 'L' for integer fields
for (str = buf; *fmt; fmt++)
{
if (*fmt != '%')
{
*str++ = *fmt;
continue;
}
// Process flags
flags = 0;
repeat:
fmt++; // This also skips first '%'
switch (*fmt)
{
case '-':
flags |= LEFT;
goto repeat;
case '+':
flags |= PLUS;
goto repeat;
case ' ':
flags |= SPACE;
goto repeat;
case '#':
flags |= HEX_PREP;
goto repeat;
case '0':
flags |= ZEROPAD;
goto repeat;
}
// Get field width
field_width = -1;
if (is_digit(*fmt))
field_width = skip_atoi(&fmt);
else if (*fmt == '*')
{
fmt++;
field_width = va_arg(args, int);
if (field_width < 0)
{
field_width = -field_width;
flags |= LEFT;
}
}
// Get the precision
precision = -1;
if (*fmt == '.')
{
++fmt;
if (is_digit(*fmt))
precision = skip_atoi(&fmt);
else if (*fmt == '*')
{
++fmt;
precision = va_arg(args, int);
}
if (precision < 0)
precision = 0;
}
// Get the conversion qualifier
qualifier = -1;
if (*fmt == 'l' || *fmt == 'L')
{
qualifier = *fmt;
fmt++;
}
// Default base
base = 10;
switch (*fmt)
{
case 'c':
if (!(flags & LEFT))
while (--field_width > 0)
*str++ = ' ';
*str++ = (unsigned char)va_arg(args, int);
while (--field_width > 0)
*str++ = ' ';
continue;
case 's':
s = va_arg(args, char *);
if (!s)
s = "<NULL>";
len = strnlen(s, precision);
if (!(flags & LEFT))
while (len < field_width--)
*str++ = ' ';
for (i = 0; i < len; ++i)
*str++ = *s++;
while (len < field_width--)
*str++ = ' ';
continue;
case 'p':
if (field_width == -1)
{
field_width = 2 * sizeof(void *);
flags |= ZEROPAD;
}
str = number(str,
(unsigned long)va_arg(args, void *),
16,
field_width,
precision,
flags);
continue;
case 'A':
flags |= UPPERCASE;
case 'a':
if (qualifier == 'l')
str = eaddr(str,
va_arg(args, unsigned char *),
field_width,
precision,
flags);
else
str = iaddr(str,
va_arg(args, unsigned char *),
field_width,
precision,
flags);
continue;
// Integer number formats - set up the flags and "break"
case 'o':
base = 8;
break;
case 'X':
flags |= UPPERCASE;
case 'x':
base = 16;
break;
case 'd':
case 'i':
flags |= SIGN;
case 'u':
break;
#if HAS_FLOAT
case 'f':
str = flt(str,
va_arg(args, double),
field_width,
precision,
*fmt,
flags | SIGN);
continue;
#endif
default:
if (*fmt != '%')
*str++ = '%';
if (*fmt)
*str++ = *fmt;
else
--fmt;
continue;
}
if (qualifier == 'l')
num = va_arg(args, unsigned long);
else if (flags & SIGN)
num = va_arg(args, int);
else
num = va_arg(args, unsigned int);
str = number(str, num, base, field_width, precision, flags);
}
*str = '\0';
return str - buf;
}
void
uart_send_char(char c)
{
#error "You must implement the method uart_send_char to use this file!\n";
/* Output of a char to a UART usually follows the following model:
Wait until UART is ready
Write char to UART
Wait until UART is done
Or in code:
while (*UART_CONTROL_ADDRESS != UART_READY);
*UART_DATA_ADDRESS = c;
while (*UART_CONTROL_ADDRESS != UART_READY);
Check the UART sample code on your platform or the board
documentation.
*/
}
int
ee_printf(const char *fmt, ...)
{
char buf[1024], *p;
va_list args;
int n = 0;
va_start(args, fmt);
ee_vsprintf(buf, fmt, args);
va_end(args);
p = buf;
while (*p)
{
uart_send_char(*p);
n++;
p++;
}
return n;
}

View file

@ -0,0 +1,183 @@
# Using CoreMark with bare-bone systems
This file only contain information for porting CoreMark to bare-bone systems. For other information (e.g. run rules) please see [README.md](README.md).
## Definition of bare-bone systems
The term bare-bones here mean systems that are bare minimum, and only provide the essential parts. A bare-bone processor system might not have an rich-OS (e.g. Linux, Windows), and in that case the system can also be referred as bare-metal.
The bare-bones folder in the CoreMark repository provides the bare minimum to allow CoreMark to be ported to a processor system. As the code inside does not have any dependency on OS, it is the best starting point for porting CoreMark to a baremetal system where OS is not used, such as a microcontroller or an embedded processor.
## Overview
CoreMark can be used with microcontrollers / embedded processor devices. Before you start porting CoreMark, please setup a project environment that provides:
- printf support
- timer support (e.g. base on a reference that has a constant clock frequency)
The CoreMark execution needs to execute for at least 10 seconds. Therefore when selecting a timer peripheral for timing measurement, you need to ensure that the timer can measure the whole duration of the coremark execution. For example, let's say you are using a microcontroller and that has a 24-bit timer, and use the 24-bit timer as the timing reference. If the device is running at 100MHz and the timer is setup to run using the processor's clock, the longest time that the timer can count is 0.16777 second before it overflows or reaches zero. To measure the execution time, you could setup that timer to interrupt at a rate of 1KHz, and increment a counter variable inside the interrupt service routine. With this arrangement, there is some software overhead but the result should still be quite accurate.
If the timer is 32-bit, and providing that:
- the number of iterations is not too high, and
- the timer's frequency is not too high,
then it is possible to measure the entire execution period without timer overflow/underflow. For example, if a 32-bit timer increment/decrement at 100MHz, it takes 42.95 seconds to overflow/underflow. So it is possible to use the timer's value directly if the execution time is between 10 to 42.95 seconds.
You also need to estimate a minimum number of iterations before your start. The number of iterations can be set using a C preprocessing macro "ITERATIONS". For a processor with around 4 CoreMark/MHz and running at 100MHz, you need an iteration count of at least 4 (CoreMark/MHz) x100 (MHz) x 10 (seconds) = 4000 iterations.
Incorrect timing reference is a common error, therefore, please test your timing measurement code. For example, by creating a small program that wait for 10 seconds and compare that to an external timing measurement tool (e.g. stopwatch).
Once that are ready, you can then port the CoreMark project. The following files are required:
- Source files that are used without modifications
- [coremark/core_main.c](https://github.com/eembc/coremark/blob/main/core_main.c)
- [coremark/core_list_join.c](https://github.com/eembc/coremark/blob/main/core_list_join.c)
- [coremark/core_matrix.c](https://github.com/eembc/coremark/blob/main/core_matrix.c)
- [coremark/core_state.c](https://github.com/eembc/coremark/blob/main/core_state.c)
- [coremark/core_util.c](https://github.com/eembc/coremark/blob/main/core_util.c)
- [coremark/coremark.h](https://github.com/eembc/coremark/blob/main/coremark.h)
- Source files that need modifications
- [coremark/barebones/core_portme.c](https://github.com/eembc/coremark/blob/main/barebones/core_portme.c)
- [coremark/barebones/core_portme.h](https://github.com/eembc/coremark/blob/main/barebones/core_portme.h)
And of course you need to include the support files for timer and printf support.
In your project setup you also need to define pre-processing macros:
| Preprocessing macro | Description / value |
|---|---|
|ITERATIONS| Set to the number of iterations the CoreMark workload will execute for at least 10 seconds. |
|STANDALONE| Set to indicate Standalone environment |
|PERFORMANCE_RUN / VALIDATION_RUN | Set to 1 |
## Modifications of core_portme.h
Several C macros require update:
| Preprocessing macro | Value |
|---|---|
|HAS_FLOAT| 0 or 1 based on the processor/device|
|HAS_TIME_H| 0 |
|USE_CLOCK| 0 |
|HAS_STDIO| 1 |
|HAS_PRINTF| 1 |
The next section in the file is dependent on the C compiler that you are using. The original code below utilizes compiler predefine macros for GCC **\_\_GNUC\_\_** and **\_\_VERSION\_\_** to provides printed message of compiler's version. (See [GCC documentation page](https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html) for additional information.)
```C
#ifndef COMPILER_VERSION
#ifdef __GNUC__
#define COMPILER_VERSION "GCC"__VERSION__
#else
#define COMPILER_VERSION "Please put compiler version here (e.g. gcc 4.1)"
#endif
#endif
#ifndef COMPILER_FLAGS
#define COMPILER_FLAGS \
FLAGS_STR /* "Please put compiler flags here (e.g. -o3)" */
#endif
```
You can insert additional compiler specific information using compiler predefine macros for the toolchain that you use. For example, information about LLVM predefined macros is available [here](https://clang.llvm.org/docs/LanguageExtensions.html#builtin-macros).
And finally, set MAIN_HAS_NOARGC:
```C
#ifndef MAIN_HAS_NOARGC
#define MAIN_HAS_NOARGC 1
#endif
```
## Modifications of core_portme.c
Note: The following codes are example. You can change the codes in other ways.
In this file, first you might need to declare external functions for printf, timer and cache support. For example, in my project I declared the following external functions:
```C
extern void timer_config(void); /* Initialize a timer peripheral */
extern void stdio_init(void); /* Initialize printf support (e.g. UART) */
extern void cache_init(void); /* Initialize processor's cache if available */
extern unsigned long get_100Hz_value(void); /* Read a timer value with 0.01 sec resolution */
```
Then I modified the barebones_clock() function as:
```C
CORETIMETYPE
barebones_clock()
{
/*#error \
"You must implement a method to measure time in barebones_clock()! This function should return current time.\n"
*/
return get_100Hz_value();
}
```
In this example, the timer value increments at 100Hz. So I need to tell the score calculation code with the following settings:
```C
/* Define : TIMER_RES_DIVIDER
Divider to trade off timer resolution and total time that can be
measured.
Use lower values to increase resolution, but make sure that overflow
does not occur. If there are issues with the return value overflowing,
increase this value.
*/
#define CLOCKS_PER_SEC 100
#define GETMYTIME(_t) (*_t = barebones_clock())
#define MYTIMEDIFF(fin, ini) ((fin) - (ini))
#define TIMER_RES_DIVIDER 1
#define SAMPLE_TIME_IMPLEMENTATION 1
#define EE_TICKS_PER_SEC (CLOCKS_PER_SEC / TIMER_RES_DIVIDER)
```
Finally, the platform initialization code is updated as follow:
```C
/* Function : portable_init
Target specific initialization code
Test for some common mistakes.
*/
void
portable_init(core_portable *p, int *argc, char *argv[])
{
/* #error \
"Call board initialization routines in portable init (if needed), in particular initialize UART!\n"
(void)argc; // prevent unused warning
(void)argv; // prevent unused warning
*/
/* Hardware initialization */
stdio_init();
cache_init();
timer_config();
if (sizeof(ee_ptr_int) != sizeof(ee_u8 *))
{
ee_printf(
"ERROR! Please define ee_ptr_int to a type that holds a "
"pointer!\n");
}
if (sizeof(ee_u32) != 4)
{
ee_printf("ERROR! Please define ee_u32 to a 32b unsigned type!\n");
}
p->portable_id = 1;
}
```
## Additional considerations
CoreMark demonstrate some performance aspects of the processors, but it might not reflect the performance of your applications in the real world. For example, the critical workloads in CoreMark does not contain floating-point operations, and is not data intensive. Also, because the memory footprint is quite small (This is necessary to allow CoreMark to be used in small, low-cost microcontrollers with limited memory sizes), when using CoreMark on a high-end processor system, the benchmark can easily fit into the level 1 caches and the performance of the memory system outside the L1 cache is not tested. [SPEC](https://spec.org) has other benchmarks that are suitable for testing the performance of high-end processor systems.
If you are using a microcontroller with flash memory for program storage, and if the processor inside comes with Instruction and Data caches, in most cases you need to enable both I and D caches to get the best performance. This is because the program image contains both program instructions and constant data.
Typically the CoreMark project fit within 32KB of ROM/flash and use less than 32KB of RAM. The stack and heap sizes inside the RAM is dependent on the processor architecture as well as the toolchain being used. For example, some toolchains could use more RAM for printf and floating-point library (Note: floating-point operations could be used for benchmark result calculation). In toolchains for typical 32-bit microcontrollers, usually the CoreMark uses less than 4KB of stack and 4KB of heap space.
Many microcontroller vendors and some toolchain vendors provide application notes to explain to their customers how to setup CoreMark project to get the best performance.

View file

@ -0,0 +1,595 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
#include "coremark.h"
/*
Topic: Description
Benchmark using a linked list.
Linked list is a common data structure used in many applications.
For our purposes, this will excercise the memory units of the processor.
In particular, usage of the list pointers to find and alter data.
We are not using Malloc since some platforms do not support this
library.
Instead, the memory block being passed in is used to create a list,
and the benchmark takes care not to add more items then can be
accommodated by the memory block. The porting layer will make sure
that we have a valid memory block.
All operations are done in place, without using any extra memory.
The list itself contains list pointers and pointers to data items.
Data items contain the following:
idx - An index that captures the initial order of the list.
data - Variable data initialized based on the input parameters. The 16b
are divided as follows: o Upper 8b are backup of original data. o Bit 7
indicates if the lower 7 bits are to be used as is or calculated. o Bits 0-2
indicate type of operation to perform to get a 7b value. o Bits 3-6 provide
input for the operation.
*/
/* local functions */
list_head *core_list_find(list_head *list, list_data *info);
list_head *core_list_reverse(list_head *list);
list_head *core_list_remove(list_head *item);
list_head *core_list_undo_remove(list_head *item_removed,
list_head *item_modified);
list_head *core_list_insert_new(list_head * insert_point,
list_data * info,
list_head **memblock,
list_data **datablock,
list_head * memblock_end,
list_data * datablock_end);
typedef ee_s32 (*list_cmp)(list_data *a, list_data *b, core_results *res);
list_head *core_list_mergesort(list_head * list,
list_cmp cmp,
core_results *res);
ee_s16
calc_func(ee_s16 *pdata, core_results *res)
{
ee_s16 data = *pdata;
ee_s16 retval;
ee_u8 optype
= (data >> 7)
& 1; /* bit 7 indicates if the function result has been cached */
if (optype) /* if cached, use cache */
return (data & 0x007f);
else
{ /* otherwise calculate and cache the result */
ee_s16 flag = data & 0x7; /* bits 0-2 is type of function to perform */
ee_s16 dtype
= ((data >> 3)
& 0xf); /* bits 3-6 is specific data for the operation */
dtype |= dtype << 4; /* replicate the lower 4 bits to get an 8b value */
switch (flag)
{
case 0:
if (dtype < 0x22) /* set min period for bit corruption */
dtype = 0x22;
retval = core_bench_state(res->size,
res->memblock[3],
res->seed1,
res->seed2,
dtype,
res->crc);
if (res->crcstate == 0)
res->crcstate = retval;
break;
case 1:
retval = core_bench_matrix(&(res->mat), dtype, res->crc);
if (res->crcmatrix == 0)
res->crcmatrix = retval;
break;
default:
retval = data;
break;
}
res->crc = crcu16(retval, res->crc);
retval &= 0x007f;
*pdata = (data & 0xff00) | 0x0080 | retval; /* cache the result */
return retval;
}
}
/* Function: cmp_complex
Compare the data item in a list cell.
Can be used by mergesort.
*/
ee_s32
cmp_complex(list_data *a, list_data *b, core_results *res)
{
ee_s16 val1 = calc_func(&(a->data16), res);
ee_s16 val2 = calc_func(&(b->data16), res);
return val1 - val2;
}
/* Function: cmp_idx
Compare the idx item in a list cell, and regen the data.
Can be used by mergesort.
*/
ee_s32
cmp_idx(list_data *a, list_data *b, core_results *res)
{
if (res == NULL)
{
a->data16 = (a->data16 & 0xff00) | (0x00ff & (a->data16 >> 8));
b->data16 = (b->data16 & 0xff00) | (0x00ff & (b->data16 >> 8));
}
return a->idx - b->idx;
}
void
copy_info(list_data *to, list_data *from)
{
to->data16 = from->data16;
to->idx = from->idx;
}
/* Benchmark for linked list:
- Try to find multiple data items.
- List sort
- Operate on data from list (crc)
- Single remove/reinsert
* At the end of this function, the list is back to original state
*/
ee_u16
core_bench_list(core_results *res, ee_s16 finder_idx)
{
ee_u16 retval = 0;
ee_u16 found = 0, missed = 0;
list_head *list = res->list;
ee_s16 find_num = res->seed3;
list_head *this_find;
list_head *finder, *remover;
list_data info = {0};
ee_s16 i;
info.idx = finder_idx;
/* find <find_num> values in the list, and change the list each time
* (reverse and cache if value found) */
for (i = 0; i < find_num; i++)
{
info.data16 = (i & 0xff);
this_find = core_list_find(list, &info);
list = core_list_reverse(list);
if (this_find == NULL)
{
missed++;
retval += (list->next->info->data16 >> 8) & 1;
}
else
{
found++;
if (this_find->info->data16 & 0x1) /* use found value */
retval += (this_find->info->data16 >> 9) & 1;
/* and cache next item at the head of the list (if any) */
if (this_find->next != NULL)
{
finder = this_find->next;
this_find->next = finder->next;
finder->next = list->next;
list->next = finder;
}
}
if (info.idx >= 0)
info.idx++;
#if CORE_DEBUG
ee_printf("List find %d: [%d,%d,%d]\n", i, retval, missed, found);
#endif
}
retval += found * 4 - missed;
/* sort the list by data content and remove one item*/
if (finder_idx > 0)
list = core_list_mergesort(list, cmp_complex, res);
remover = core_list_remove(list->next);
/* CRC data content of list from location of index N forward, and then undo
* remove */
finder = core_list_find(list, &info);
if (!finder)
finder = list->next;
while (finder)
{
retval = crc16(list->info->data16, retval);
finder = finder->next;
}
#if CORE_DEBUG
ee_printf("List sort 1: %04x\n", retval);
#endif
remover = core_list_undo_remove(remover, list->next);
/* sort the list by index, in effect returning the list to original state */
list = core_list_mergesort(list, cmp_idx, NULL);
/* CRC data content of list */
finder = list->next;
while (finder)
{
retval = crc16(list->info->data16, retval);
finder = finder->next;
}
#if CORE_DEBUG
ee_printf("List sort 2: %04x\n", retval);
#endif
return retval;
}
/* Function: core_list_init
Initialize list with data.
Parameters:
blksize - Size of memory to be initialized.
memblock - Pointer to memory block.
seed - Actual values chosen depend on the seed parameter.
The seed parameter MUST be supplied from a source that cannot be
determined at compile time
Returns:
Pointer to the head of the list.
*/
list_head *
core_list_init(ee_u32 blksize, list_head *memblock, ee_s16 seed)
{
/* calculated pointers for the list */
ee_u32 per_item = 16 + sizeof(struct list_data_s);
ee_u32 size = (blksize / per_item)
- 2; /* to accommodate systems with 64b pointers, and make sure
same code is executed, set max list elements */
list_head *memblock_end = memblock + size;
list_data *datablock = (list_data *)(memblock_end);
list_data *datablock_end = datablock + size;
/* some useful variables */
ee_u32 i;
list_head *finder, *list = memblock;
list_data info;
/* create a fake items for the list head and tail */
list->next = NULL;
list->info = datablock;
list->info->idx = 0x0000;
list->info->data16 = (ee_s16)0x8080;
memblock++;
datablock++;
info.idx = 0x7fff;
info.data16 = (ee_s16)0xffff;
core_list_insert_new(
list, &info, &memblock, &datablock, memblock_end, datablock_end);
/* then insert size items */
for (i = 0; i < size; i++)
{
ee_u16 datpat = ((ee_u16)(seed ^ i) & 0xf);
ee_u16 dat
= (datpat << 3) | (i & 0x7); /* alternate between algorithms */
info.data16 = (dat << 8) | dat; /* fill the data with actual data and
upper bits with rebuild value */
core_list_insert_new(
list, &info, &memblock, &datablock, memblock_end, datablock_end);
}
/* and now index the list so we know initial seed order of the list */
finder = list->next;
i = 1;
while (finder->next != NULL)
{
if (i < size / 5) /* first 20% of the list in order */
finder->info->idx = i++;
else
{
ee_u16 pat = (ee_u16)(i++ ^ seed); /* get a pseudo random number */
finder->info->idx = 0x3fff
& (((i & 0x07) << 8)
| pat); /* make sure the mixed items end up
after the ones in sequence */
}
finder = finder->next;
}
list = core_list_mergesort(list, cmp_idx, NULL);
#if CORE_DEBUG
ee_printf("Initialized list:\n");
finder = list;
while (finder)
{
ee_printf(
"[%04x,%04x]", finder->info->idx, (ee_u16)finder->info->data16);
finder = finder->next;
}
ee_printf("\n");
#endif
return list;
}
/* Function: core_list_insert
Insert an item to the list
Parameters:
insert_point - where to insert the item.
info - data for the cell.
memblock - pointer for the list header
datablock - pointer for the list data
memblock_end - end of region for list headers
datablock_end - end of region for list data
Returns:
Pointer to new item.
*/
list_head *
core_list_insert_new(list_head * insert_point,
list_data * info,
list_head **memblock,
list_data **datablock,
list_head * memblock_end,
list_data * datablock_end)
{
list_head *newitem;
if ((*memblock + 1) >= memblock_end)
return NULL;
if ((*datablock + 1) >= datablock_end)
return NULL;
newitem = *memblock;
(*memblock)++;
newitem->next = insert_point->next;
insert_point->next = newitem;
newitem->info = *datablock;
(*datablock)++;
copy_info(newitem->info, info);
return newitem;
}
/* Function: core_list_remove
Remove an item from the list.
Operation:
For a singly linked list, remove by copying the data from the next item
over to the current cell, and unlinking the next item.
Note:
since there is always a fake item at the end of the list, no need to
check for NULL.
Returns:
Removed item.
*/
list_head *
core_list_remove(list_head *item)
{
list_data *tmp;
list_head *ret = item->next;
/* swap data pointers */
tmp = item->info;
item->info = ret->info;
ret->info = tmp;
/* and eliminate item */
item->next = item->next->next;
ret->next = NULL;
return ret;
}
/* Function: core_list_undo_remove
Undo a remove operation.
Operation:
Since we want each iteration of the benchmark to be exactly the same,
we need to be able to undo a remove.
Link the removed item back into the list, and switch the info items.
Parameters:
item_removed - Return value from the <core_list_remove>
item_modified - List item that was modified during <core_list_remove>
Returns:
The item that was linked back to the list.
*/
list_head *
core_list_undo_remove(list_head *item_removed, list_head *item_modified)
{
list_data *tmp;
/* swap data pointers */
tmp = item_removed->info;
item_removed->info = item_modified->info;
item_modified->info = tmp;
/* and insert item */
item_removed->next = item_modified->next;
item_modified->next = item_removed;
return item_removed;
}
/* Function: core_list_find
Find an item in the list
Operation:
Find an item by idx (if not 0) or specific data value
Parameters:
list - list head
info - idx or data to find
Returns:
Found item, or NULL if not found.
*/
list_head *
core_list_find(list_head *list, list_data *info)
{
if (info->idx >= 0)
{
while (list && (list->info->idx != info->idx))
list = list->next;
return list;
}
else
{
while (list && ((list->info->data16 & 0xff) != info->data16))
list = list->next;
return list;
}
}
/* Function: core_list_reverse
Reverse a list
Operation:
Rearrange the pointers so the list is reversed.
Parameters:
list - list head
info - idx or data to find
Returns:
Found item, or NULL if not found.
*/
list_head *
core_list_reverse(list_head *list)
{
list_head *next = NULL, *tmp;
while (list)
{
tmp = list->next;
list->next = next;
next = list;
list = tmp;
}
return next;
}
/* Function: core_list_mergesort
Sort the list in place without recursion.
Description:
Use mergesort, as for linked list this is a realistic solution.
Also, since this is aimed at embedded, care was taken to use iterative
rather then recursive algorithm. The sort can either return the list to
original order (by idx) , or use the data item to invoke other other
algorithms and change the order of the list.
Parameters:
list - list to be sorted.
cmp - cmp function to use
Returns:
New head of the list.
Note:
We have a special header for the list that will always be first,
but the algorithm could theoretically modify where the list starts.
*/
list_head *
core_list_mergesort(list_head *list, list_cmp cmp, core_results *res)
{
list_head *p, *q, *e, *tail;
ee_s32 insize, nmerges, psize, qsize, i;
insize = 1;
while (1)
{
p = list;
list = NULL;
tail = NULL;
nmerges = 0; /* count number of merges we do in this pass */
while (p)
{
nmerges++; /* there exists a merge to be done */
/* step `insize' places along from p */
q = p;
psize = 0;
for (i = 0; i < insize; i++)
{
psize++;
q = q->next;
if (!q)
break;
}
/* if q hasn't fallen off end, we have two lists to merge */
qsize = insize;
/* now we have two lists; merge them */
while (psize > 0 || (qsize > 0 && q))
{
/* decide whether next element of merge comes from p or q */
if (psize == 0)
{
/* p is empty; e must come from q. */
e = q;
q = q->next;
qsize--;
}
else if (qsize == 0 || !q)
{
/* q is empty; e must come from p. */
e = p;
p = p->next;
psize--;
}
else if (cmp(p->info, q->info, res) <= 0)
{
/* First element of p is lower (or same); e must come from
* p. */
e = p;
p = p->next;
psize--;
}
else
{
/* First element of q is lower; e must come from q. */
e = q;
q = q->next;
qsize--;
}
/* add the next element to the merged list */
if (tail)
{
tail->next = e;
}
else
{
list = e;
}
tail = e;
}
/* now p has stepped `insize' places along, and q has too */
p = q;
}
tail->next = NULL;
/* If we have done only one merge, we're finished. */
if (nmerges <= 1) /* allow for nmerges==0, the empty list case */
return list;
/* Otherwise repeat, merging lists twice the size */
insize *= 2;
}
#if COMPILER_REQUIRES_SORT_RETURN
return list;
#endif
}

View file

@ -0,0 +1,442 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
/* File: core_main.c
This file contains the framework to acquire a block of memory, seed
initial parameters, tun t he benchmark and report the results.
*/
#include "coremark.h"
/* Function: iterate
Run the benchmark for a specified number of iterations.
Operation:
For each type of benchmarked algorithm:
a - Initialize the data block for the algorithm.
b - Execute the algorithm N times.
Returns:
NULL.
*/
static ee_u16 list_known_crc[] = { (ee_u16)0xd4b0,
(ee_u16)0x3340,
(ee_u16)0x6a79,
(ee_u16)0xe714,
(ee_u16)0xe3c1 };
static ee_u16 matrix_known_crc[] = { (ee_u16)0xbe52,
(ee_u16)0x1199,
(ee_u16)0x5608,
(ee_u16)0x1fd7,
(ee_u16)0x0747 };
static ee_u16 state_known_crc[] = { (ee_u16)0x5e47,
(ee_u16)0x39bf,
(ee_u16)0xe5a4,
(ee_u16)0x8e3a,
(ee_u16)0x8d84 };
void *
iterate(void *pres)
{
ee_u32 i;
ee_u16 crc;
core_results *res = (core_results *)pres;
ee_u32 iterations = res->iterations;
res->crc = 0;
res->crclist = 0;
res->crcmatrix = 0;
res->crcstate = 0;
for (i = 0; i < iterations; i++)
{
crc = core_bench_list(res, 1);
res->crc = crcu16(crc, res->crc);
crc = core_bench_list(res, -1);
res->crc = crcu16(crc, res->crc);
if (i == 0)
res->crclist = res->crc;
}
return NULL;
}
#if (SEED_METHOD == SEED_ARG)
ee_s32 get_seed_args(int i, int argc, char *argv[]);
#define get_seed(x) (ee_s16) get_seed_args(x, argc, argv)
#define get_seed_32(x) get_seed_args(x, argc, argv)
#else /* via function or volatile */
ee_s32 get_seed_32(int i);
#define get_seed(x) (ee_s16) get_seed_32(x)
#endif
#if (MEM_METHOD == MEM_STATIC)
ee_u8 static_memblk[TOTAL_DATA_SIZE];
#endif
char *mem_name[3] = { "Static", "Heap", "Stack" };
/* Function: main
Main entry routine for the benchmark.
This function is responsible for the following steps:
1 - Initialize input seeds from a source that cannot be determined at
compile time. 2 - Initialize memory block for use. 3 - Run and time the
benchmark. 4 - Report results, testing the validity of the output if the
seeds are known.
Arguments:
1 - first seed : Any value
2 - second seed : Must be identical to first for iterations to be
identical 3 - third seed : Any value, should be at least an order of
magnitude less then the input size, but bigger then 32. 4 - Iterations :
Special, if set to 0, iterations will be automatically determined such that
the benchmark will run between 10 to 100 secs
*/
#if MAIN_HAS_NOARGC
MAIN_RETURN_TYPE
main(void)
{
int argc = 0;
char *argv[1];
#else
MAIN_RETURN_TYPE
main(int argc, char *argv[])
{
#endif
ee_u16 i, j = 0, num_algorithms = 0;
ee_s16 known_id = -1, total_errors = 0;
ee_u16 seedcrc = 0;
CORE_TICKS total_time;
core_results results[MULTITHREAD];
#if (MEM_METHOD == MEM_STACK)
ee_u8 stack_memblock[TOTAL_DATA_SIZE * MULTITHREAD];
#endif
/* first call any initializations needed */
portable_init(&(results[0].port), &argc, argv);
/* First some checks to make sure benchmark will run ok */
if (sizeof(struct list_head_s) > 128)
{
ee_printf("list_head structure too big for comparable data!\n");
return MAIN_RETURN_VAL;
}
results[0].seed1 = get_seed(1);
results[0].seed2 = get_seed(2);
results[0].seed3 = get_seed(3);
results[0].iterations = get_seed_32(4);
#if CORE_DEBUG
results[0].iterations = 1;
#endif
results[0].execs = get_seed_32(5);
if (results[0].execs == 0)
{ /* if not supplied, execute all algorithms */
results[0].execs = ALL_ALGORITHMS_MASK;
}
/* put in some default values based on one seed only for easy testing */
if ((results[0].seed1 == 0) && (results[0].seed2 == 0)
&& (results[0].seed3 == 0))
{ /* performance run */
results[0].seed1 = 0;
results[0].seed2 = 0;
results[0].seed3 = 0x66;
}
if ((results[0].seed1 == 1) && (results[0].seed2 == 0)
&& (results[0].seed3 == 0))
{ /* validation run */
results[0].seed1 = 0x3415;
results[0].seed2 = 0x3415;
results[0].seed3 = 0x66;
}
#if (MEM_METHOD == MEM_STATIC)
results[0].memblock[0] = (void *)static_memblk;
results[0].size = TOTAL_DATA_SIZE;
results[0].err = 0;
#if (MULTITHREAD > 1)
#error "Cannot use a static data area with multiple contexts!"
#endif
#elif (MEM_METHOD == MEM_MALLOC)
for (i = 0; i < MULTITHREAD; i++)
{
ee_s32 malloc_override = get_seed(7);
if (malloc_override != 0)
results[i].size = malloc_override;
else
results[i].size = TOTAL_DATA_SIZE;
results[i].memblock[0] = portable_malloc(results[i].size);
results[i].seed1 = results[0].seed1;
results[i].seed2 = results[0].seed2;
results[i].seed3 = results[0].seed3;
results[i].err = 0;
results[i].execs = results[0].execs;
}
#elif (MEM_METHOD == MEM_STACK)
for (i = 0; i < MULTITHREAD; i++)
{
results[i].memblock[0] = stack_memblock + i * TOTAL_DATA_SIZE;
results[i].size = TOTAL_DATA_SIZE;
results[i].seed1 = results[0].seed1;
results[i].seed2 = results[0].seed2;
results[i].seed3 = results[0].seed3;
results[i].err = 0;
results[i].execs = results[0].execs;
}
#else
#error "Please define a way to initialize a memory block."
#endif
/* Data init */
/* Find out how space much we have based on number of algorithms */
for (i = 0; i < NUM_ALGORITHMS; i++)
{
if ((1 << (ee_u32)i) & results[0].execs)
num_algorithms++;
}
for (i = 0; i < MULTITHREAD; i++)
results[i].size = results[i].size / num_algorithms;
/* Assign pointers */
for (i = 0; i < NUM_ALGORITHMS; i++)
{
ee_u32 ctx;
if ((1 << (ee_u32)i) & results[0].execs)
{
for (ctx = 0; ctx < MULTITHREAD; ctx++)
results[ctx].memblock[i + 1]
= (char *)(results[ctx].memblock[0]) + results[0].size * j;
j++;
}
}
/* call inits */
for (i = 0; i < MULTITHREAD; i++)
{
if (results[i].execs & ID_LIST)
{
results[i].list = core_list_init(
results[0].size, results[i].memblock[1], results[i].seed1);
}
if (results[i].execs & ID_MATRIX)
{
core_init_matrix(results[0].size,
results[i].memblock[2],
(ee_s32)results[i].seed1
| (((ee_s32)results[i].seed2) << 16),
&(results[i].mat));
}
if (results[i].execs & ID_STATE)
{
core_init_state(
results[0].size, results[i].seed1, results[i].memblock[3]);
}
}
/* automatically determine number of iterations if not set */
if (results[0].iterations == 0)
{
secs_ret secs_passed = 0;
ee_u32 divisor;
results[0].iterations = 1;
while (secs_passed < (secs_ret)1)
{
results[0].iterations *= 10;
start_time();
iterate(&results[0]);
stop_time();
secs_passed = time_in_secs(get_time());
}
/* now we know it executes for at least 1 sec, set actual run time at
* about 10 secs */
divisor = (ee_u32)secs_passed;
if (divisor == 0) /* some machines cast float to int as 0 since this
conversion is not defined by ANSI, but we know at
least one second passed */
divisor = 1;
results[0].iterations *= 1 + 10 / divisor;
}
/* perform actual benchmark */
start_time();
#if (MULTITHREAD > 1)
if (default_num_contexts > MULTITHREAD)
{
default_num_contexts = MULTITHREAD;
}
for (i = 0; i < default_num_contexts; i++)
{
results[i].iterations = results[0].iterations;
results[i].execs = results[0].execs;
core_start_parallel(&results[i]);
}
for (i = 0; i < default_num_contexts; i++)
{
core_stop_parallel(&results[i]);
}
#else
iterate(&results[0]);
#endif
stop_time();
total_time = get_time();
/* get a function of the input to report */
seedcrc = crc16(results[0].seed1, seedcrc);
seedcrc = crc16(results[0].seed2, seedcrc);
seedcrc = crc16(results[0].seed3, seedcrc);
seedcrc = crc16(results[0].size, seedcrc);
switch (seedcrc)
{ /* test known output for common seeds */
case 0x8a02: /* seed1=0, seed2=0, seed3=0x66, size 2000 per algorithm */
known_id = 0;
ee_printf("6k performance run parameters for coremark.\n");
break;
case 0x7b05: /* seed1=0x3415, seed2=0x3415, seed3=0x66, size 2000 per
algorithm */
known_id = 1;
ee_printf("6k validation run parameters for coremark.\n");
break;
case 0x4eaf: /* seed1=0x8, seed2=0x8, seed3=0x8, size 400 per algorithm
*/
known_id = 2;
ee_printf("Profile generation run parameters for coremark.\n");
break;
case 0xe9f5: /* seed1=0, seed2=0, seed3=0x66, size 666 per algorithm */
known_id = 3;
ee_printf("2K performance run parameters for coremark.\n");
break;
case 0x18f2: /* seed1=0x3415, seed2=0x3415, seed3=0x66, size 666 per
algorithm */
known_id = 4;
ee_printf("2K validation run parameters for coremark.\n");
break;
default:
total_errors = -1;
break;
}
if (known_id >= 0)
{
for (i = 0; i < default_num_contexts; i++)
{
results[i].err = 0;
if ((results[i].execs & ID_LIST)
&& (results[i].crclist != list_known_crc[known_id]))
{
ee_printf("[%u]ERROR! list crc 0x%04x - should be 0x%04x\n",
i,
results[i].crclist,
list_known_crc[known_id]);
results[i].err++;
}
if ((results[i].execs & ID_MATRIX)
&& (results[i].crcmatrix != matrix_known_crc[known_id]))
{
ee_printf("[%u]ERROR! matrix crc 0x%04x - should be 0x%04x\n",
i,
results[i].crcmatrix,
matrix_known_crc[known_id]);
results[i].err++;
}
if ((results[i].execs & ID_STATE)
&& (results[i].crcstate != state_known_crc[known_id]))
{
ee_printf("[%u]ERROR! state crc 0x%04x - should be 0x%04x\n",
i,
results[i].crcstate,
state_known_crc[known_id]);
results[i].err++;
}
total_errors += results[i].err;
}
}
total_errors += check_data_types();
/* and report results */
ee_printf("CoreMark Size : %lu\n", (long unsigned)results[0].size);
ee_printf("Total ticks : %lu\n", (long unsigned)total_time);
#if HAS_FLOAT
ee_printf("Total time (secs): %f\n", time_in_secs(total_time));
if (time_in_secs(total_time) > 0)
ee_printf("Iterations/Sec : %f\n",
default_num_contexts * results[0].iterations
/ time_in_secs(total_time));
#else
ee_printf("Total time (secs): %d\n", time_in_secs(total_time));
if (time_in_secs(total_time) > 0)
ee_printf("Iterations/Sec : %d\n",
default_num_contexts * results[0].iterations
/ time_in_secs(total_time));
#endif
if (time_in_secs(total_time) < 10)
{
ee_printf(
"ERROR! Must execute for at least 10 secs for a valid result!\n");
total_errors++;
}
ee_printf("Iterations : %lu\n",
(long unsigned)default_num_contexts * results[0].iterations);
ee_printf("Compiler version : %s\n", COMPILER_VERSION);
ee_printf("Compiler flags : %s\n", COMPILER_FLAGS);
#if (MULTITHREAD > 1)
ee_printf("Parallel %s : %d\n", PARALLEL_METHOD, default_num_contexts);
#endif
ee_printf("Memory location : %s\n", MEM_LOCATION);
/* output for verification */
ee_printf("seedcrc : 0x%04x\n", seedcrc);
if (results[0].execs & ID_LIST)
for (i = 0; i < default_num_contexts; i++)
ee_printf("[%d]crclist : 0x%04x\n", i, results[i].crclist);
if (results[0].execs & ID_MATRIX)
for (i = 0; i < default_num_contexts; i++)
ee_printf("[%d]crcmatrix : 0x%04x\n", i, results[i].crcmatrix);
if (results[0].execs & ID_STATE)
for (i = 0; i < default_num_contexts; i++)
ee_printf("[%d]crcstate : 0x%04x\n", i, results[i].crcstate);
for (i = 0; i < default_num_contexts; i++)
ee_printf("[%d]crcfinal : 0x%04x\n", i, results[i].crc);
if (total_errors == 0)
{
ee_printf(
"Correct operation validated. See README.md for run and reporting "
"rules.\n");
#if HAS_FLOAT
if (known_id == 3)
{
ee_printf("CoreMark 1.0 : %f / %s %s",
default_num_contexts * results[0].iterations
/ time_in_secs(total_time),
COMPILER_VERSION,
COMPILER_FLAGS);
#if defined(MEM_LOCATION) && !defined(MEM_LOCATION_UNSPEC)
ee_printf(" / %s", MEM_LOCATION);
#else
ee_printf(" / %s", mem_name[MEM_METHOD]);
#endif
#if (MULTITHREAD > 1)
ee_printf(" / %d:%s", default_num_contexts, PARALLEL_METHOD);
#endif
ee_printf("\n");
}
#endif
}
if (total_errors > 0)
ee_printf("Errors detected\n");
if (total_errors < 0)
ee_printf(
"Cannot validate operation for these seed values, please compare "
"with results on a known platform.\n");
#if (MEM_METHOD == MEM_MALLOC)
for (i = 0; i < MULTITHREAD; i++)
portable_free(results[i].memblock[0]);
#endif
/* And last call any target specific code for finalizing */
portable_fini(&(results[0].port));
return MAIN_RETURN_VAL;
}

View file

@ -0,0 +1,359 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
#include "coremark.h"
/*
Topic: Description
Matrix manipulation benchmark
This very simple algorithm forms the basis of many more complex
algorithms.
The tight inner loop is the focus of many optimizations (compiler as
well as hardware based) and is thus relevant for embedded processing.
The total available data space will be divided to 3 parts:
NxN Matrix A - initialized with small values (upper 3/4 of the bits all
zero). NxN Matrix B - initialized with medium values (upper half of the bits all
zero). NxN Matrix C - used for the result.
The actual values for A and B must be derived based on input that is not
available at compile time.
*/
ee_s16 matrix_test(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B, MATDAT val);
ee_s16 matrix_sum(ee_u32 N, MATRES *C, MATDAT clipval);
void matrix_mul_const(ee_u32 N, MATRES *C, MATDAT *A, MATDAT val);
void matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B);
void matrix_mul_matrix(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B);
void matrix_mul_matrix_bitextract(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B);
void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val);
#define matrix_test_next(x) (x + 1)
#define matrix_clip(x, y) ((y) ? (x)&0x0ff : (x)&0x0ffff)
#define matrix_big(x) (0xf000 | (x))
#define bit_extract(x, from, to) (((x) >> (from)) & (~(0xffffffff << (to))))
#if CORE_DEBUG
void
printmat(MATDAT *A, ee_u32 N, char *name)
{
ee_u32 i, j;
ee_printf("Matrix %s [%dx%d]:\n", name, N, N);
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
if (j != 0)
ee_printf(",");
ee_printf("%d", A[i * N + j]);
}
ee_printf("\n");
}
}
void
printmatC(MATRES *C, ee_u32 N, char *name)
{
ee_u32 i, j;
ee_printf("Matrix %s [%dx%d]:\n", name, N, N);
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
if (j != 0)
ee_printf(",");
ee_printf("%d", C[i * N + j]);
}
ee_printf("\n");
}
}
#endif
/* Function: core_bench_matrix
Benchmark function
Iterate <matrix_test> N times,
changing the matrix values slightly by a constant amount each time.
*/
ee_u16
core_bench_matrix(mat_params *p, ee_s16 seed, ee_u16 crc)
{
ee_u32 N = p->N;
MATRES *C = p->C;
MATDAT *A = p->A;
MATDAT *B = p->B;
MATDAT val = (MATDAT)seed;
crc = crc16(matrix_test(N, C, A, B, val), crc);
return crc;
}
/* Function: matrix_test
Perform matrix manipulation.
Parameters:
N - Dimensions of the matrix.
C - memory for result matrix.
A - input matrix
B - operator matrix (not changed during operations)
Returns:
A CRC value that captures all results calculated in the function.
In particular, crc of the value calculated on the result matrix
after each step by <matrix_sum>.
Operation:
1 - Add a constant value to all elements of a matrix.
2 - Multiply a matrix by a constant.
3 - Multiply a matrix by a vector.
4 - Multiply a matrix by a matrix.
5 - Add a constant value to all elements of a matrix.
After the last step, matrix A is back to original contents.
*/
ee_s16
matrix_test(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B, MATDAT val)
{
ee_u16 crc = 0;
MATDAT clipval = matrix_big(val);
matrix_add_const(N, A, val); /* make sure data changes */
#if CORE_DEBUG
printmat(A, N, "matrix_add_const");
#endif
matrix_mul_const(N, C, A, val);
crc = crc16(matrix_sum(N, C, clipval), crc);
#if CORE_DEBUG
printmatC(C, N, "matrix_mul_const");
#endif
matrix_mul_vect(N, C, A, B);
crc = crc16(matrix_sum(N, C, clipval), crc);
#if CORE_DEBUG
printmatC(C, N, "matrix_mul_vect");
#endif
matrix_mul_matrix(N, C, A, B);
crc = crc16(matrix_sum(N, C, clipval), crc);
#if CORE_DEBUG
printmatC(C, N, "matrix_mul_matrix");
#endif
matrix_mul_matrix_bitextract(N, C, A, B);
crc = crc16(matrix_sum(N, C, clipval), crc);
#if CORE_DEBUG
printmatC(C, N, "matrix_mul_matrix_bitextract");
#endif
matrix_add_const(N, A, -val); /* return matrix to initial value */
return crc;
}
/* Function : matrix_init
Initialize the memory block for matrix benchmarking.
Parameters:
blksize - Size of memory to be initialized.
memblk - Pointer to memory block.
seed - Actual values chosen depend on the seed parameter.
p - pointers to <mat_params> containing initialized matrixes.
Returns:
Matrix dimensions.
Note:
The seed parameter MUST be supplied from a source that cannot be
determined at compile time
*/
ee_u32
core_init_matrix(ee_u32 blksize, void *memblk, ee_s32 seed, mat_params *p)
{
ee_u32 N = 0;
MATDAT *A;
MATDAT *B;
ee_s32 order = 1;
MATDAT val;
ee_u32 i = 0, j = 0;
if (seed == 0)
seed = 1;
while (j < blksize)
{
i++;
j = i * i * 2 * 4;
}
N = i - 1;
A = (MATDAT *)align_mem(memblk);
B = A + N * N;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
seed = ((order * seed) % 65536);
val = (seed + order);
val = matrix_clip(val, 0);
B[i * N + j] = val;
val = (val + order);
val = matrix_clip(val, 1);
A[i * N + j] = val;
order++;
}
}
p->A = A;
p->B = B;
p->C = (MATRES *)align_mem(B + N * N);
p->N = N;
#if CORE_DEBUG
printmat(A, N, "A");
printmat(B, N, "B");
#endif
return N;
}
/* Function: matrix_sum
Calculate a function that depends on the values of elements in the
matrix.
For each element, accumulate into a temporary variable.
As long as this value is under the parameter clipval,
add 1 to the result if the element is bigger then the previous.
Otherwise, reset the accumulator and add 10 to the result.
*/
ee_s16
matrix_sum(ee_u32 N, MATRES *C, MATDAT clipval)
{
MATRES tmp = 0, prev = 0, cur = 0;
ee_s16 ret = 0;
ee_u32 i, j;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
cur = C[i * N + j];
tmp += cur;
if (tmp > clipval)
{
ret += 10;
tmp = 0;
}
else
{
ret += (cur > prev) ? 1 : 0;
}
prev = cur;
}
}
return ret;
}
/* Function: matrix_mul_const
Multiply a matrix by a constant.
This could be used as a scaler for instance.
*/
void
matrix_mul_const(ee_u32 N, MATRES *C, MATDAT *A, MATDAT val)
{
ee_u32 i, j;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i * N + j] = (MATRES)A[i * N + j] * (MATRES)val;
}
}
}
/* Function: matrix_add_const
Add a constant value to all elements of a matrix.
*/
void
matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val)
{
ee_u32 i, j;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
A[i * N + j] += val;
}
}
}
/* Function: matrix_mul_vect
Multiply a matrix by a vector.
This is common in many simple filters (e.g. fir where a vector of
coefficients is applied to the matrix.)
*/
void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
ee_u32 i, j;
for (i = 0; i < N; i++)
{
C[i] = 0;
for (j = 0; j < N; j++)
{
C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
}
}
}
/* Function: matrix_mul_matrix
Multiply a matrix by a matrix.
Basic code is used in many algorithms, mostly with minor changes such as
scaling.
*/
void
matrix_mul_matrix(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
ee_u32 i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i * N + j] = 0;
for (k = 0; k < N; k++)
{
C[i * N + j] += (MATRES)A[i * N + k] * (MATRES)B[k * N + j];
}
}
}
}
/* Function: matrix_mul_matrix_bitextract
Multiply a matrix by a matrix, and extract some bits from the result.
Basic code is used in many algorithms, mostly with minor changes such as
scaling.
*/
void
matrix_mul_matrix_bitextract(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
ee_u32 i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i * N + j] = 0;
for (k = 0; k < N; k++)
{
MATRES tmp = (MATRES)A[i * N + k] * (MATRES)B[k * N + j];
C[i * N + j] += bit_extract(tmp, 2, 4) * bit_extract(tmp, 5, 7);
}
}
}
}

View file

@ -0,0 +1,330 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
#include "coremark.h"
/* local functions */
enum CORE_STATE core_state_transition(ee_u8 **instr, ee_u32 *transition_count);
/*
Topic: Description
Simple state machines like this one are used in many embedded products.
For more complex state machines, sometimes a state transition table
implementation is used instead, trading speed of direct coding for ease of
maintenance.
Since the main goal of using a state machine in CoreMark is to excercise
the switch/if behaviour, we are using a small moore machine.
In particular, this machine tests type of string input,
trying to determine whether the input is a number or something else.
(see core_state.png).
*/
/* Function: core_bench_state
Benchmark function
Go over the input twice, once direct, and once after introducing some
corruption.
*/
ee_u16
core_bench_state(ee_u32 blksize,
ee_u8 *memblock,
ee_s16 seed1,
ee_s16 seed2,
ee_s16 step,
ee_u16 crc)
{
ee_u32 final_counts[NUM_CORE_STATES];
ee_u32 track_counts[NUM_CORE_STATES];
ee_u8 *p = memblock;
ee_u32 i;
#if CORE_DEBUG
ee_printf("State Bench: %d,%d,%d,%04x\n", seed1, seed2, step, crc);
#endif
for (i = 0; i < NUM_CORE_STATES; i++)
{
final_counts[i] = track_counts[i] = 0;
}
/* run the state machine over the input */
while (*p != 0)
{
enum CORE_STATE fstate = core_state_transition(&p, track_counts);
final_counts[fstate]++;
#if CORE_DEBUG
ee_printf("%d,", fstate);
}
ee_printf("\n");
#else
}
#endif
p = memblock;
while (p < (memblock + blksize))
{ /* insert some corruption */
if (*p != ',')
*p ^= (ee_u8)seed1;
p += step;
}
p = memblock;
/* run the state machine over the input again */
while (*p != 0)
{
enum CORE_STATE fstate = core_state_transition(&p, track_counts);
final_counts[fstate]++;
#if CORE_DEBUG
ee_printf("%d,", fstate);
}
ee_printf("\n");
#else
}
#endif
p = memblock;
while (p < (memblock + blksize))
{ /* undo corruption is seed1 and seed2 are equal */
if (*p != ',')
*p ^= (ee_u8)seed2;
p += step;
}
/* end timing */
for (i = 0; i < NUM_CORE_STATES; i++)
{
crc = crcu32(final_counts[i], crc);
crc = crcu32(track_counts[i], crc);
}
return crc;
}
/* Default initialization patterns */
static ee_u8 *intpat[4]
= { (ee_u8 *)"5012", (ee_u8 *)"1234", (ee_u8 *)"-874", (ee_u8 *)"+122" };
static ee_u8 *floatpat[4] = { (ee_u8 *)"35.54400",
(ee_u8 *)".1234500",
(ee_u8 *)"-110.700",
(ee_u8 *)"+0.64400" };
static ee_u8 *scipat[4] = { (ee_u8 *)"5.500e+3",
(ee_u8 *)"-.123e-2",
(ee_u8 *)"-87e+832",
(ee_u8 *)"+0.6e-12" };
static ee_u8 *errpat[4] = { (ee_u8 *)"T0.3e-1F",
(ee_u8 *)"-T.T++Tq",
(ee_u8 *)"1T3.4e4z",
(ee_u8 *)"34.0e-T^" };
/* Function: core_init_state
Initialize the input data for the state machine.
Populate the input with several predetermined strings, interspersed.
Actual patterns chosen depend on the seed parameter.
Note:
The seed parameter MUST be supplied from a source that cannot be
determined at compile time
*/
void
core_init_state(ee_u32 size, ee_s16 seed, ee_u8 *p)
{
ee_u32 total = 0, next = 0, i;
ee_u8 *buf = 0;
#if CORE_DEBUG
ee_u8 *start = p;
ee_printf("State: %d,%d\n", size, seed);
#endif
size--;
next = 0;
while ((total + next + 1) < size)
{
if (next > 0)
{
for (i = 0; i < next; i++)
*(p + total + i) = buf[i];
*(p + total + i) = ',';
total += next + 1;
}
seed++;
switch (seed & 0x7)
{
case 0: /* int */
case 1: /* int */
case 2: /* int */
buf = intpat[(seed >> 3) & 0x3];
next = 4;
break;
case 3: /* float */
case 4: /* float */
buf = floatpat[(seed >> 3) & 0x3];
next = 8;
break;
case 5: /* scientific */
case 6: /* scientific */
buf = scipat[(seed >> 3) & 0x3];
next = 8;
break;
case 7: /* invalid */
buf = errpat[(seed >> 3) & 0x3];
next = 8;
break;
default: /* Never happen, just to make some compilers happy */
break;
}
}
size++;
while (total < size)
{ /* fill the rest with 0 */
*(p + total) = 0;
total++;
}
#if CORE_DEBUG
ee_printf("State Input: %s\n", start);
#endif
}
static ee_u8
ee_isdigit(ee_u8 c)
{
ee_u8 retval;
retval = ((c >= '0') & (c <= '9')) ? 1 : 0;
return retval;
}
/* Function: core_state_transition
Actual state machine.
The state machine will continue scanning until either:
1 - an invalid input is detected.
2 - a valid number has been detected.
The input pointer is updated to point to the end of the token, and the
end state is returned (either specific format determined or invalid).
*/
enum CORE_STATE
core_state_transition(ee_u8 **instr, ee_u32 *transition_count)
{
ee_u8 * str = *instr;
ee_u8 NEXT_SYMBOL;
enum CORE_STATE state = CORE_START;
for (; *str && state != CORE_INVALID; str++)
{
NEXT_SYMBOL = *str;
if (NEXT_SYMBOL == ',') /* end of this input */
{
str++;
break;
}
switch (state)
{
case CORE_START:
if (ee_isdigit(NEXT_SYMBOL))
{
state = CORE_INT;
}
else if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')
{
state = CORE_S1;
}
else if (NEXT_SYMBOL == '.')
{
state = CORE_FLOAT;
}
else
{
state = CORE_INVALID;
transition_count[CORE_INVALID]++;
}
transition_count[CORE_START]++;
break;
case CORE_S1:
if (ee_isdigit(NEXT_SYMBOL))
{
state = CORE_INT;
transition_count[CORE_S1]++;
}
else if (NEXT_SYMBOL == '.')
{
state = CORE_FLOAT;
transition_count[CORE_S1]++;
}
else
{
state = CORE_INVALID;
transition_count[CORE_S1]++;
}
break;
case CORE_INT:
if (NEXT_SYMBOL == '.')
{
state = CORE_FLOAT;
transition_count[CORE_INT]++;
}
else if (!ee_isdigit(NEXT_SYMBOL))
{
state = CORE_INVALID;
transition_count[CORE_INT]++;
}
break;
case CORE_FLOAT:
if (NEXT_SYMBOL == 'E' || NEXT_SYMBOL == 'e')
{
state = CORE_S2;
transition_count[CORE_FLOAT]++;
}
else if (!ee_isdigit(NEXT_SYMBOL))
{
state = CORE_INVALID;
transition_count[CORE_FLOAT]++;
}
break;
case CORE_S2:
if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')
{
state = CORE_EXPONENT;
transition_count[CORE_S2]++;
}
else
{
state = CORE_INVALID;
transition_count[CORE_S2]++;
}
break;
case CORE_EXPONENT:
if (ee_isdigit(NEXT_SYMBOL))
{
state = CORE_SCIENTIFIC;
transition_count[CORE_EXPONENT]++;
}
else
{
state = CORE_INVALID;
transition_count[CORE_EXPONENT]++;
}
break;
case CORE_SCIENTIFIC:
if (!ee_isdigit(NEXT_SYMBOL))
{
state = CORE_INVALID;
transition_count[CORE_INVALID]++;
}
break;
default:
break;
}
}
*instr = str;
return state;
}

View file

@ -0,0 +1,249 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
#include "coremark.h"
/* Function: get_seed
Get a values that cannot be determined at compile time.
Since different embedded systems and compilers are used, 3 different
methods are provided: 1 - Using a volatile variable. This method is only
valid if the compiler is forced to generate code that reads the value of a
volatile variable from memory at run time. Please note, if using this method,
you would need to modify core_portme.c to generate training profile. 2 -
Command line arguments. This is the preferred method if command line
arguments are supported. 3 - System function. If none of the first 2 methods
is available on the platform, a system function which is not a stub can be
used.
e.g. read the value on GPIO pins connected to switches, or invoke
special simulator functions.
*/
#if (SEED_METHOD == SEED_VOLATILE)
extern volatile ee_s32 seed1_volatile;
extern volatile ee_s32 seed2_volatile;
extern volatile ee_s32 seed3_volatile;
extern volatile ee_s32 seed4_volatile;
extern volatile ee_s32 seed5_volatile;
ee_s32
get_seed_32(int i)
{
ee_s32 retval;
switch (i)
{
case 1:
retval = seed1_volatile;
break;
case 2:
retval = seed2_volatile;
break;
case 3:
retval = seed3_volatile;
break;
case 4:
retval = seed4_volatile;
break;
case 5:
retval = seed5_volatile;
break;
default:
retval = 0;
break;
}
return retval;
}
#elif (SEED_METHOD == SEED_ARG)
ee_s32
parseval(char *valstring)
{
ee_s32 retval = 0;
ee_s32 neg = 1;
int hexmode = 0;
if (*valstring == '-')
{
neg = -1;
valstring++;
}
if ((valstring[0] == '0') && (valstring[1] == 'x'))
{
hexmode = 1;
valstring += 2;
}
/* first look for digits */
if (hexmode)
{
while (((*valstring >= '0') && (*valstring <= '9'))
|| ((*valstring >= 'a') && (*valstring <= 'f')))
{
ee_s32 digit = *valstring - '0';
if (digit > 9)
digit = 10 + *valstring - 'a';
retval *= 16;
retval += digit;
valstring++;
}
}
else
{
while ((*valstring >= '0') && (*valstring <= '9'))
{
ee_s32 digit = *valstring - '0';
retval *= 10;
retval += digit;
valstring++;
}
}
/* now add qualifiers */
if (*valstring == 'K')
retval *= 1024;
if (*valstring == 'M')
retval *= 1024 * 1024;
retval *= neg;
return retval;
}
ee_s32
get_seed_args(int i, int argc, char *argv[])
{
if (argc > i)
return parseval(argv[i]);
return 0;
}
#elif (SEED_METHOD == SEED_FUNC)
/* If using OS based function, you must define and implement the functions below
* in core_portme.h and core_portme.c ! */
ee_s32
get_seed_32(int i)
{
ee_s32 retval;
switch (i)
{
case 1:
retval = portme_sys1();
break;
case 2:
retval = portme_sys2();
break;
case 3:
retval = portme_sys3();
break;
case 4:
retval = portme_sys4();
break;
case 5:
retval = portme_sys5();
break;
default:
retval = 0;
break;
}
return retval;
}
#endif
/* Function: crc*
Service functions to calculate 16b CRC code.
*/
ee_u16
crcu8(ee_u8 data, ee_u16 crc)
{
ee_u8 i = 0, x16 = 0, carry = 0;
for (i = 0; i < 8; i++)
{
x16 = (ee_u8)((data & 1) ^ ((ee_u8)crc & 1));
data >>= 1;
if (x16 == 1)
{
crc ^= 0x4002;
carry = 1;
}
else
carry = 0;
crc >>= 1;
if (carry)
crc |= 0x8000;
else
crc &= 0x7fff;
}
return crc;
}
ee_u16
crcu16(ee_u16 newval, ee_u16 crc)
{
crc = crcu8((ee_u8)(newval), crc);
crc = crcu8((ee_u8)((newval) >> 8), crc);
return crc;
}
ee_u16
crcu32(ee_u32 newval, ee_u16 crc)
{
crc = crc16((ee_s16)newval, crc);
crc = crc16((ee_s16)(newval >> 16), crc);
return crc;
}
ee_u16
crc16(ee_s16 newval, ee_u16 crc)
{
return crcu16((ee_u16)newval, crc);
}
ee_u8
check_data_types()
{
ee_u8 retval = 0;
if (sizeof(ee_u8) != 1)
{
ee_printf("ERROR: ee_u8 is not an 8b datatype!\n");
retval++;
}
if (sizeof(ee_u16) != 2)
{
ee_printf("ERROR: ee_u16 is not a 16b datatype!\n");
retval++;
}
if (sizeof(ee_s16) != 2)
{
ee_printf("ERROR: ee_s16 is not a 16b datatype!\n");
retval++;
}
if (sizeof(ee_s32) != 4)
{
ee_printf("ERROR: ee_s32 is not a 32b datatype!\n");
retval++;
}
if (sizeof(ee_u32) != 4)
{
ee_printf("ERROR: ee_u32 is not a 32b datatype!\n");
retval++;
}
if (sizeof(ee_ptr_int) != sizeof(int *))
{
ee_printf(
"ERROR: ee_ptr_int is not a datatype that holds an int pointer!\n");
retval++;
}
if (retval > 0)
{
ee_printf("ERROR: Please modify the datatypes in core_portme.h!\n");
}
return retval;
}

View file

@ -0,0 +1,183 @@
/*
Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Original Author: Shay Gal-on
*/
/* Topic: Description
This file contains declarations of the various benchmark functions.
*/
/* Configuration: TOTAL_DATA_SIZE
Define total size for data algorithms will operate on
*/
#ifndef TOTAL_DATA_SIZE
#define TOTAL_DATA_SIZE 2 * 1000
#endif
#define SEED_ARG 0
#define SEED_FUNC 1
#define SEED_VOLATILE 2
#define MEM_STATIC 0
#define MEM_MALLOC 1
#define MEM_STACK 2
#include "core_portme.h"
#if HAS_STDIO
#include <stdio.h>
#endif
#if HAS_PRINTF
#define ee_printf printf
#endif
/* Actual benchmark execution in iterate */
void *iterate(void *pres);
/* Typedef: secs_ret
For machines that have floating point support, get number of seconds as
a double. Otherwise an unsigned int.
*/
#if HAS_FLOAT
typedef double secs_ret;
#else
typedef ee_u32 secs_ret;
#endif
#if MAIN_HAS_NORETURN
#define MAIN_RETURN_VAL
#define MAIN_RETURN_TYPE void
#else
#define MAIN_RETURN_VAL 0
#define MAIN_RETURN_TYPE int
#endif
void start_time(void);
void stop_time(void);
CORE_TICKS get_time(void);
secs_ret time_in_secs(CORE_TICKS ticks);
/* Misc useful functions */
ee_u16 crcu8(ee_u8 data, ee_u16 crc);
ee_u16 crc16(ee_s16 newval, ee_u16 crc);
ee_u16 crcu16(ee_u16 newval, ee_u16 crc);
ee_u16 crcu32(ee_u32 newval, ee_u16 crc);
ee_u8 check_data_types(void);
void * portable_malloc(ee_size_t size);
void portable_free(void *p);
ee_s32 parseval(char *valstring);
/* Algorithm IDS */
#define ID_LIST (1 << 0)
#define ID_MATRIX (1 << 1)
#define ID_STATE (1 << 2)
#define ALL_ALGORITHMS_MASK (ID_LIST | ID_MATRIX | ID_STATE)
#define NUM_ALGORITHMS 3
/* list data structures */
typedef struct list_data_s
{
ee_s16 data16;
ee_s16 idx;
} list_data;
typedef struct list_head_s
{
struct list_head_s *next;
struct list_data_s *info;
} list_head;
/*matrix benchmark related stuff */
#define MATDAT_INT 1
#if MATDAT_INT
typedef ee_s16 MATDAT;
typedef ee_s32 MATRES;
#else
typedef ee_f16 MATDAT;
typedef ee_f32 MATRES;
#endif
typedef struct MAT_PARAMS_S
{
int N;
MATDAT *A;
MATDAT *B;
MATRES *C;
} mat_params;
/* state machine related stuff */
/* List of all the possible states for the FSM */
typedef enum CORE_STATE
{
CORE_START = 0,
CORE_INVALID,
CORE_S1,
CORE_S2,
CORE_INT,
CORE_FLOAT,
CORE_EXPONENT,
CORE_SCIENTIFIC,
NUM_CORE_STATES
} core_state_e;
/* Helper structure to hold results */
typedef struct RESULTS_S
{
/* inputs */
ee_s16 seed1; /* Initializing seed */
ee_s16 seed2; /* Initializing seed */
ee_s16 seed3; /* Initializing seed */
void * memblock[4]; /* Pointer to safe memory location */
ee_u32 size; /* Size of the data */
ee_u32 iterations; /* Number of iterations to execute */
ee_u32 execs; /* Bitmask of operations to execute */
struct list_head_s *list;
mat_params mat;
/* outputs */
ee_u16 crc;
ee_u16 crclist;
ee_u16 crcmatrix;
ee_u16 crcstate;
ee_s16 err;
/* multithread specific */
core_portable port;
} core_results;
/* Multicore execution handling */
#if (MULTITHREAD > 1)
ee_u8 core_start_parallel(core_results *res);
ee_u8 core_stop_parallel(core_results *res);
#endif
/* list benchmark functions */
list_head *core_list_init(ee_u32 blksize, list_head *memblock, ee_s16 seed);
ee_u16 core_bench_list(core_results *res, ee_s16 finder_idx);
/* state benchmark functions */
void core_init_state(ee_u32 size, ee_s16 seed, ee_u8 *p);
ee_u16 core_bench_state(ee_u32 blksize,
ee_u8 *memblock,
ee_s16 seed1,
ee_s16 seed2,
ee_s16 step,
ee_u16 crc);
/* matrix benchmark functions */
ee_u32 core_init_matrix(ee_u32 blksize,
void * memblk,
ee_s32 seed,
mat_params *p);
ee_u16 core_bench_matrix(mat_params *p, ee_s16 seed, ee_u16 crc);

View file

@ -0,0 +1,6 @@
9007fe7861b60ee6f210d156b62974c8 core_list_join.c
4a9e6dadce1ac3866381021fbe843fc9 core_main.c
5fa21a0f7c3964167c9691db531ca652 core_matrix.c
fb49e7605c125306575a83f14f5798ac core_state.c
45540ba2145adea1ec7ea2c72a1fbbcb core_util.c
8ca974c013b380dc7f0d6d1afb76eb2d coremark.h

View file

@ -0,0 +1,17 @@
# Copyright 2018 Embedded Microprocessor Benchmark Consortium (EEMBC)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Original Author: Shay Gal-on
include posix/core_portme.mak

View file

@ -0,0 +1 @@
This folder contains the original, unaltered documents from the CoreMark V1.0 release.

BIN
tests/coremark/coremark-src/docs/balance_O0_joined.png (Stored with Git LFS) Normal file

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show more